Performance testing Fuzzier

Return to front page

The time has finally come. I want to create some performance metrics for Fuzzier.

Fuzzier is a file finder plugin for JetBrains editors inspired by Telescope for Nvim that I've been developing for a little over a year now.

First, I need a plan. Fuzzier is written in Kotlin and I'm a Java developer, so I'll be leaning into the tools that I know.

Hindsight

In this article I'll choose to set up JMH and focus on a single function, that I measure and improve. I faced some challenges while setting up JMH with Kotlin and IntelliJ plugin dependencies and have documented them here.

The Plan

Hindsight

While the initial idea was to benchmark the actual search and I had some thoughts on this, I did end up choosing a more contained problem to focus on. I think it was a good choice, as now I have a better idea how to approach performance testing Fuzzier, and have a working setup for the future.

Here are some questions that need to be answered first.

Let's go over each of these one by one.

What do I want to measure?

Primarily, I want to measure how long Fuzzier takes to process IntelliJ's file index and produce a result. The secondary goal is to ensure that the memory footprint is kept under control, though I don't expect this to be a problem.

The measurements should account for the settings used in the processing, either through the IDE or separate integration.

What can I use the results for?

The goal is to create a baseline performance metric, which I can compare future results to.

I cannot account for everything, so these initial results are not perfect and will likely need to be tweaked and measured again later.

Will the same setup work in the future?

I'd like to be able to use the same setup/frame in the future to measure comparable benchmarks.

One large issue is that I do not have static hardware to benchmark on. My personal pc is the primary way that I will be doing these tests, but that is likely to change in the future which would invalidate the previous measurements instantly.

I'm not experienced in using them, or creating performance metrics, so I'll have to do some exploring. Here are my initial thoughts between the tools.

JMeter

I could create a testing endpoint, which could be used to launch full tests using a running IDE. This would allow me to use JMeter to start and measure the actual time taken.

JMH

With JMH I could test smaller parts of the code with more precision, but this would require a lot of work to capture a large part of the program. I would write the tests using code and running them with JMH.

The hard part (setting up JMH)

Hindsight

The hardest part was trying to put JMH, Kotlin and IntelliJ together. Here I also realize that I have a good function to use and the initial test can be copy pasted from my existing unit tests. Here's a link to the pull request, which includes changes to the build.gradle file.

This took me some time to figure out, because the missing libraries were not in a public dependency. I had to manually add the jars to the jmh task dependency, because I did not find a solution which would just reuse all of the dependencies and plugins from the main gradle project.

My solution was to copy the dependencies to the project and use a file dependency to add them to the jmh task. This way I could avoid adding an absolute path to the build file.

The default source file location is src/jmh, so my tests live in the following location src/jmh/kotlin/com/mituuz/fuzzier/performance.

Configuration can be done through build.gradle file. Here's an example:

jmh {
  fork = 1
  warmupIterations = 1
  iterations = 1
  timeOnIteration = "1s"
  resultFormat = "JSON"
  jvmArgs = listOf("-Xms2G", "-Xmx2G")
  jmhTimeout = "30s"
}

Running JMH

Note to self

Do not use prints when testing JMH. The terminal can't keep up and it froze my system twice before I realized what's going on.

Friendly reminder for us Linux folk: Raising Elephants Is So Utterly Boring

After solving the print issue, I was able to run the JMH task from the terminal and through IntelliJ. jmh task wasn't compatible with gradle's configuration cache, so I ended up disabling it when running the task. ./gradlew jmh --no-configuration-cache

Crafting tests to use with JMH

While setting this up, I realized that I have a perfect function to do some actual improvements.

I created an ugly highlighting function that appends several strings together to form an HTML string that has certain letters using different background color (<font style='background-color: $colorAsHex'). I knew that it wasn't the best way and always had the intention to return to this.

Creating the test

To keep this simple, I just want to measure the highlight function for a single string with varying number of characters to highlight. I can just copy my existing test and change some parameters to create variance.

It's hard to say anything about benchmarking results when you don't have anything to compare to, but here is the initial run.

Benchmark Mode Cnt Score Error Units
PerformanceTests.noMatchesInFilename thrpt 36214160.146 ops/s
PerformanceTests.oneMatchInFilename thrpt 17464086.561 ops/s
PerformanceTests.threeMatchesInFilename thrpt 6670398.128 ops/s
PerformanceTests.sevenMatchesInFilename thrpt 2880108.760 ops/s

But before doing any improvements it's time to think about how I can get consistent performance results.

Setting up test environment

Hindsight

I did not end up with a consistent test environment and do not need it at this point, but here is my failed attempt to benchmark with two devices using docker with constraints.

Okay, so I dug around a bit and decided to go with docker constraints. First building the image from a dockerfile and then defining constraints on the docker run command.

Not sure if this is the best way, but you gotta start somewhere. (Using docker-compose and defining the resource limits in the file would be an alternative way to do this.)

Here's my initial run configuration and the results (I decided to increase the measurement time a bit).

jmh {
  fork = 1
  warmupIterations = 2
  iterations = 2
  timeOnIteration = "2s"
  resultFormat = "JSON"
  jvmArgs = listOf("-Xms2G", "-Xmx2G")
  jmhTimeout = "30s"
}
docker build -t fuzzier-perf .
docker run --memory="3G" --cpus="2.0" -d fuzzier-perf

Initial results

Laptop

Benchmark Mode Cnt Score Error Units
PerformanceTests.noMatchesInFilename thrpt 20833517.400 ops/s
PerformanceTests.oneMatchInFilename thrpt 7015488.826 ops/s
PerformanceTests.threeMatchesInFilename thrpt 3328829.228 ops/s
PerformanceTests.sevenMatchesInFilename thrpt 1342837.360 ops/s

Desktop

Benchmark Mode Cnt Score Error Units
PerformanceTests.noMatchesInFilename thrpt 44547130.894 ops/s
PerformanceTests.oneMatchInFilename thrpt 13409151.110 ops/s
PerformanceTests.threeMatchesInFilename thrpt 6207958.712 ops/s
PerformanceTests.sevenMatchesInFilename thrpt 2635322.828 ops/s

No luck

So, because the cores aren't equal between the devices the results aren't either. In hindsight this makes perfect sense.

I think I'll use a smaller measure -> change -> measure cycle instead of trying to create permanent, comparable results.

This means that my performance testing will be limited to the scope of a single change and will only be valid for a little time. This is fine, because my goal shifted from creating general measurements to improving a single function.

Pivot

With the new cycle defined I can start working on the improvements.

measure -> change -> measure

I'll do all of the work on a single device to ensure that the measurements are valid.

Baseline

First, I'll create the baseline measurements that I can compare against after doing some changes.

Because I don't need docker anymore, I'll be running the JMH task without it for convenience and speed.

Now that I am ready to start measuring, I should take a closer look on which mode I will be using and what am I actually measuring.

First, I need to select the JMH mode that I want to use.

JMH modes

JMH offers several modes, which measure different things.

Because the function in question is being run for x amount of files with each search, I could use either average time or throughput. The function is so fast that throughput feels easiest to read, the higher the score, the better.

Results

So, here are the initial results of how many operations complete in a second. Quite noticeable decrease in the score when the match count increases, which is to be expected because that requires more operations.

Benchmark Mode Cnt Score Error Units
PerformanceTests.noMatchesInFilename thrpt 2 44580990.574 ops/s
PerformanceTests.oneMatchInFilename thrpt 2 15314862.488 ops/s
PerformanceTests.threeMatchesInFilename thrpt 2 6661081.448 ops/s
PerformanceTests.sevenMatchesInFilename thrpt 2 2652866.574 ops/s

Optimization time

Here is the function that I would like to optimize. It works by inserting tags in to an HTML string, which then highlights the correct characters.

The tag used is a font style which sets the background color. Swing components do not support every HTML tag and I remember trying a lot of different things before ending up with this.

var startStyleTag = createStartStyleTag()
const val END_STYLE_TAG: String = "</font>"

private fun createStartStyleTag(): String {
    val color = JBColor.YELLOW
    return "<font style='background-color: ${colorAsHex(color)};'>"
}
fun highlight(source: String): String {
    val stringBuilder: StringBuilder = StringBuilder(source)
    var offset = 0
    val hlIndexes = score.highlightCharacters.sorted()
    for (i in hlIndexes) {
        if (i < source.length) {
            stringBuilder.insert(i + offset, startStyleTag)
            offset += startStyleTag.length
            stringBuilder.insert(i + offset + 1, END_STYLE_TAG)
            offset += END_STYLE_TAG.length
        }
    }
    return stringBuilder.toString()
}

Going for CoPilot

Before even taking a look at what I could do, let's see what CoPilot thinks.

I simply asked it to optimize the function, copy pasted the results and measured again.

fun highlight(source: String): String {
    val hlIndexes = score.highlightCharacters.sorted()
    val parts = mutableListOf<String>()
    var lastIndex = 0

    for (i in hlIndexes) {
        if (i < source.length) {
            parts.add(source.substring(lastIndex, i))
            parts.add(startStyleTag)
            parts.add(source[i].toString())
            parts.add(END_STYLE_TAG)
            lastIndex = i + 1
        }
    }
    parts.add(source.substring(lastIndex))

    return parts.joinToString("")
}
Benchmark Mode Cnt Score Error Units
PerformanceTests.noMatchesInFilename thrpt 2 21140629.430 ops/s
PerformanceTests.oneMatchInFilename thrpt 2 12419843.488 ops/s
PerformanceTests.threeMatchesInFilename thrpt 2 3177763.124 ops/s
PerformanceTests.sevenMatchesInFilename thrpt 2 1366734.188 ops/s

Here's the explanation that CoPilot gave me.

To optimize the highlight function, we can use a StringBuilder more efficiently by collecting all the parts of the string and then joining them at the end. This avoids multiple insertions which can be costly.

This made everything slower. Instead of using StringBuilder and appending the parts, CoPilot wanted to store all of the invidual strings and keep them in a list. That does not sound like it would be faster, or more memory efficient. Instead of using StringBuilder to create a single String object, it would create and store multiple distinct String objects.

I'll do it myself

Okay, so I know what I can look for. Instead of adding tags for each character, I could group them and add the start and end tags for each group. This of course means that I want to add another test where we have a same number of characters to tag, but the grouping is different.

I decided to create separate threeMatchesInFilenameInARow and threeMatchesInFilename. So a new set of initial benchmarks, where it is clear that no matter the groupings 3 matches take the same time.

I'll also drop the no matches case, because every time this function is called the string must have at least one match.

Benchmark Mode Cnt Score Error Units
PerformanceTests.oneMatchInFilename thrpt 2 15147366.761 ops/s
PerformanceTests.threeMatchesInFilenameInARow thrpt 2 6757701.170 ops/s
PerformanceTests.threeMatchesInFilename thrpt 2 6751191.472 ops/s
PerformanceTests.sevenMatchesInFilename thrpt 2 2703818.535 ops/s

Grouping characters

Here's a comparison on what the results would look like with and without grouping. I think it is pretty clear why this would be an improvement.

Without grouping:
<font style='background-color: #ffff00;'>H</font><font style='background-color: #ffff00;'>e</font>llo

With grouping:
<font style='background-color: #ffff00;'>He</font>llo

This had the exact effect that I was looking for, by reducing the number of tags inserted we can make sequential character processing faster. It does slow down when there are no sequential matches, but I believe that some sequences are to be expected in the most common cases.

Benchmark Mode Cnt Score Error Units Change
PerformanceTests.oneMatchInFilename thrpt 2 15262031.008 ops/s
PerformanceTests.threeMatchesInFilenameInARow thrpt 2 10133267.856 ops/s
PerformanceTests.threeMatchesInFilename thrpt 2 5662795.456 ops/s
PerformanceTests.sevenMatchesInFilename thrpt 2 3672124.419 ops/s

Here is a better look into the changes between the two implementations. For such a small change, I think it is a good one.

Test Name Initial ops/s Optimized ops/s Change in ops/s % Change
PerformanceTests.oneMatchInFilename 15147366.76 15262031.01 114664.25 0.76%
PerformanceTests.threeMatchesInFilenameInARow 6757701.17 10133267.86 3375566.69 49.95%
PerformanceTests.threeMatchesInFilename 6751191.47 5662795.46 -1088396.02 -16.12%
PerformanceTests.sevenMatchesInFilename 2703818.54 3672124.42 968305.88 35.81%

I think this has gone on for long enough and this article is getting a bit too long. This is not the only issue that I had with the highlighting and I think it can still be improved, but it is a start.

Final thoughts

Setting up JMH was considerable amount of work, but honestly I feel like it was a good investment. Fuzzier is a file finder plugin and it needs to be fast to feel good. Having the framework ready to keep iterating and being able to see the results in a readable form will help me in the future.

I didn't even get to JMeter here, but maybe at some point in the future I could look at how to integrate it to measure the scoring and matching.

If you have any tips, questions or anything else feel free to contact me at [email protected].

References

Return to front page