Improving the performance of Relight, a Kotlin/JS application

I recently got around to improving the performance of Relight, a simulator for a smartphone game, which includes both an interactive gameplay simulator and a Monte Carlo bulk simulator that runs thousands of simulations and outputs statistics for a given configuration. This application is largely written in Kotlin/Multiplatform and features a Kotlin/JS frontend client and both Kotlin/JVM and Kotlin/JS backends which contain the actual logic. In this article, I will be covering improvements to the Kotlin/JS portions.

Profiling

All of the improvements I made were based on profiling, using both browser profilers (Chrome and Firefox) to identify what specifically affected the JS backend's performance, as well as the IntelliJ profiler for identifying hot paths in general. Doing so was essential to ensuring focus on the areas which had the most impact and revealing non-obvious sources of overhead, especially those specific to Kotlin/JS.

General Kotlin/JS Improvements

In this section, I list the optimizations I made which apply more generally to Kotlin/JS applications and not just Relight. These changes also primarily benefit bulk simulations and not interactive simulations, where the performance overhead comes from other places.

Use JavaScript Map and Set backed HashMaps and HashSets

Relight makes heavy use of HashMap and HashSet objects, particularly for the implementation of buffs, which need to be queried by the type of effect they have and support removal in an arbitrary order. After profiling bulk simulations, it became clear that a lot of time was spent on HashMap and HashSet related operations (get(), set(), contains(), etc.), making them clear targets for optimization.

The HashMap and HashSet implementations on Kotlin/JS do some relatively heavy lifting so that they work identically to implementations for other platforms. However, being compiled down to JavaScript makes them quite a bit slower than their JVM counterparts.

JavaScript has built-in Map and Set types, though they do not support configurable equality comparators. For String instances equality is by their content, and for Object instances it is by identity, which suffices for purpose of storing and managing buffs. As natively supported JavaScript types, these are much faster than the Kotlin/JS versions. To incorporate the native JavaScript Map and Set into Kotlin/JS code, I wrote wrappers around them which implemented the respective Kotlin interfaces.

Use Doubles where possible

For some operations, particularly damage calculation, there was a risk of integer overflow, so I had previously switched to using the Long type instead of the Int type for calculations in several places. In the JVM this was not a big deal and the performance impact was minimal. However, it turns out that they have a fairly noticeable performance impact in Kotlin/JS.

In JavaScript, the two built-in numeric types are Number which is a 64-bit (double precision) floating point number type, and BigInt which is an arbitrary precision integer type. The 32-bit Kotlin Int type fits in a JavaScript Number, so the implementation is relatively simple. However, the 64-bit Kotlin Long type does not, so it is implemented as its own class, which has a greater performance impact than Int does, since the implementation of arithmetic operations is more complex and involves two Number instances under the hood.

In the case of damage calculations, the requirements were that the numeric type used was large enough to store intermediate values and that intermediate results had integer values. The Kotlin Long type would be good for this purpose if not for the aforementioned performance issues. On the other hand, the Kotlin Double type, a 64-bit floating point type, is also large enough to easily store any intermediate results while performing much better because it directly corresponds to the JavaScript Number type, incurring essentially no additional overhead compared to other numeric types. Flooring after operations sufficed to ensure intermediate results were integers. I wrote an infix extension function for floor division for notational convenience.

Combined with the Map and Set changes, end performance was about 3-5x where it had started for bulk simulations (completing an equal number of simulation iterations took about 1/3rd to 1/5th as much time as before).

Avoid Kotlin Serialization

The official Kotlin Serialization (kotlinx.serialization) library is pretty great in general, but it does have a non-negligible performance impact on Kotlin/JS. The native JavaScript JSON.parse() is much faster at the cost of feature richness. To make use of it, rather than defining @Serializable classes, external class definitions can be used for deserialization purposes with extension functions defined on them for conversion to proper Kotlin classes. Serialization requires defining functions returning plain JavaScript objects and then calling JSON.stringify() on the result.

Using the native JSON methods instead of Kotlin Serialization is much faster since the former is, of course, native. This difference is particularly noticeable during the initial invocation of deserialization methods since the JavaScript JIT has not warmed up yet. Bulk simulations involve a fair amount of (de)serialization because JSON strings are used to communicate between the main browser thread and web workers. Since bulk simulations spawn several web workers each time they run, this warm-up period is particularly significant. Switching away from Kotlin Serialization allowed workers to start up in noticeably less time.

Relight Specific Improvements

In addition to the preceding optimizations which are applicable to Kotlin/JS applications in general, I also made some improvements more specific to the functionality of Relight. Coincidentally, these all applied to the interactive simulation mode.

Cache formatted strings

Even after excluding the performance overhead of any client-side display functionality in interactive simulations, they ran much slower than bulk simulations. Typical devices can run hundreds of individual simulations per second in bulk. Interactive simulations ran orders of magnitude more slowly. Profiling revealed the answer to why: log strings. Formatting log strings, which bulk simulations skip for all but one iteration, was by far what took the most time in interactive simulations.

Interactive simulations work by storing the sequence of actions (commands) previously taken and re-running the simulation each time a new action is taken. Storing just the sequence of actions makes implementing save-state and undo functionality quite straightforward and lightweight. However, this meant doing the expensive work of formatting log strings each time a new action was taken.

The solution was to cache log strings and skip computing them when it was unnecessary. Since taking another action only added more entries to the end of the existing log, doing so was as simple as skipping as many log calls as there were in the previous log and reusing the existing entries. Adding undo functionality just meant storing the history of the length of the log at each point in time and truncating the log to match its previous length whenever an undo happened.

Avoid recreating identical HTML elements

In the client webpage, interactive simulations display a log of past events which basically appear as a bunch of <div> elements. Originally, all of these elements were cleared and recreated whenever the log was updated, which is obviously inefficient. Retaining old log entries that did not need to be changed after each update substantially reduced the overhead of updating the displayed log.

Command Palette