The time has finally come. I want to create some performance metrics for Fuzzier.
Fuzzier is a file finder plugin for JetBrains editors inspired by Telescope for Nvim that I've been developing for a little over a year now.
First, I need a plan. Fuzzier is written in Kotlin and I'm a Java developer, so I'll be leaning into the tools that I know.
In this article I'll choose to set up JMH and focus on a single function, that I measure and improve. I faced some challenges while setting up JMH with Kotlin and IntelliJ plugin dependencies and have documented them here.
While the initial idea was to benchmark the actual search and I had some thoughts on this, I did end up choosing a more contained problem to focus on. I think it was a good choice, as now I have a better idea how to approach performance testing Fuzzier, and have a working setup for the future.
Here are some questions that need to be answered first.
Let's go over each of these one by one.
Primarily, I want to measure how long Fuzzier takes to process IntelliJ's file index and produce a result. The secondary goal is to ensure that the memory footprint is kept under control, though I don't expect this to be a problem.
The measurements should account for the settings used in the processing, either through the IDE or separate integration.
The goal is to create a baseline performance metric, which I can compare future results to.
I cannot account for everything, so these initial results are not perfect and will likely need to be tweaked and measured again later.
I'd like to be able to use the same setup/frame in the future to measure comparable benchmarks.
One large issue is that I do not have static hardware to benchmark on. My personal pc is the primary way that I will be doing these tests, but that is likely to change in the future which would invalidate the previous measurements instantly.
I'm not experienced in using them, or creating performance metrics, so I'll have to do some exploring. Here are my initial thoughts between the tools.
I could create a testing endpoint, which could be used to launch full tests using a running IDE. This would allow me to use JMeter to start and measure the actual time taken.
With JMH I could test smaller parts of the code with more precision, but this would require a lot of work to capture a large part of the program. I would write the tests using code and running them with JMH.
The hardest part was trying to put JMH, Kotlin and IntelliJ together. Here I also realize that I have a
good function to use and the initial test can be copy pasted from my existing unit tests.
Here's a link to the pull request, which includes changes
to the build.gradle
file.
jmh
task doesn't have the same dependencies available as the main project,
which caused me some headache.
While trying to run the task I got an error where com.intellij.ui.JBColor
was not being
found.
This took me some time to figure out, because the missing libraries were not in a public dependency.
I had to manually add the jars to the jmh
task dependency, because I did not find a solution which
would just reuse all of the dependencies and plugins from the main gradle project.
My solution was to copy the dependencies to the project and use a file dependency to add them to the
jmh
task. This way I could avoid adding an absolute path to the build file.
id("me.champeau.jmh") version "0.7.2"
implementation("org.openjdk.jmh:jmh-core:1.37")
annotationProcessor("org.openjdk.jmh:jmh-generator-annprocess:1.37")
idea:ideaIC:2024.3
. Copied each .jar
file from my .gradle
directory.
ideaIC-2024.3/lib cp *.jar ~/IdeaProjects/fuzzier/libs/
jmh(files("libs/*"))
The default source file location is src/jmh
, so my tests live in the following location src/jmh/kotlin/com/mituuz/fuzzier/performance
.
Configuration can be done through build.gradle
file. Here's an example:
jmh {
fork = 1
warmupIterations = 1
iterations = 1
timeOnIteration = "1s"
resultFormat = "JSON"
jvmArgs = listOf("-Xms2G", "-Xmx2G")
jmhTimeout = "30s"
}
Do not use prints when testing JMH. The terminal can't keep up and it froze my system twice before I realized what's going on.
Friendly reminder for us Linux folk: Raising Elephants Is So Utterly Boring
After solving the print issue, I was able to run the JMH task from the terminal and through IntelliJ.
jmh
task wasn't compatible with gradle's configuration cache, so I ended up disabling it
when running the task.
./gradlew jmh --no-configuration-cache
While setting this up, I realized that I have a perfect function to do some actual improvements.
I created an ugly highlighting function that appends several strings together to form an HTML string that has
certain letters using different background color (<font style='background-color:
$colorAsHex'
).
I knew that it wasn't the best way and always had the intention to return to this.
To keep this simple, I just want to measure the highlight function for a single string with varying number of characters to highlight. I can just copy my existing test and change some parameters to create variance.
It's hard to say anything about benchmarking results when you don't have anything to compare to, but here is the initial run.
Benchmark | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|
PerformanceTests.noMatchesInFilename | thrpt | 36214160.146 | ops/s | ||
PerformanceTests.oneMatchInFilename | thrpt | 17464086.561 | ops/s | ||
PerformanceTests.threeMatchesInFilename | thrpt | 6670398.128 | ops/s | ||
PerformanceTests.sevenMatchesInFilename | thrpt | 2880108.760 | ops/s |
But before doing any improvements it's time to think about how I can get consistent performance results.
I did not end up with a consistent test environment and do not need it at this point, but here is my failed attempt to benchmark with two devices using docker with constraints.
Okay, so I dug around a bit and decided to go with docker constraints. First building the image
from a dockerfile and then defining constraints on the docker run
command.
Not sure if this is the best way, but you gotta start somewhere. (Using docker-compose and defining the resource limits in the file would be an alternative way to do this.)
Here's my initial run configuration and the results (I decided to increase the measurement time a bit).
jmh {
fork = 1
warmupIterations = 2
iterations = 2
timeOnIteration = "2s"
resultFormat = "JSON"
jvmArgs = listOf("-Xms2G", "-Xmx2G")
jmhTimeout = "30s"
}
docker build -t fuzzier-perf .
docker run --memory="3G" --cpus="2.0" -d fuzzier-perf
Benchmark | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|
PerformanceTests.noMatchesInFilename | thrpt | 20833517.400 | ops/s | ||
PerformanceTests.oneMatchInFilename | thrpt | 7015488.826 | ops/s | ||
PerformanceTests.threeMatchesInFilename | thrpt | 3328829.228 | ops/s | ||
PerformanceTests.sevenMatchesInFilename | thrpt | 1342837.360 | ops/s |
Benchmark | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|
PerformanceTests.noMatchesInFilename | thrpt | 44547130.894 | ops/s | ||
PerformanceTests.oneMatchInFilename | thrpt | 13409151.110 | ops/s | ||
PerformanceTests.threeMatchesInFilename | thrpt | 6207958.712 | ops/s | ||
PerformanceTests.sevenMatchesInFilename | thrpt | 2635322.828 | ops/s |
So, because the cores aren't equal between the devices the results aren't either. In hindsight this makes perfect sense.
I think I'll use a smaller measure -> change -> measure cycle instead of trying to create permanent, comparable results.
This means that my performance testing will be limited to the scope of a single change and will only be valid for a little time. This is fine, because my goal shifted from creating general measurements to improving a single function.
With the new cycle defined I can start working on the improvements.
measure -> change -> measure
I'll do all of the work on a single device to ensure that the measurements are valid.
First, I'll create the baseline measurements that I can compare against after doing some changes.
Because I don't need docker anymore, I'll be running the JMH task without it for convenience and speed.
Now that I am ready to start measuring, I should take a closer look on which mode I will be using and what am I actually measuring.
First, I need to select the JMH mode that I want to use.
JMH offers several modes, which measure different things.
Because the function in question is being run for x amount of files with each search, I could use either average time or throughput. The function is so fast that throughput feels easiest to read, the higher the score, the better.
So, here are the initial results of how many operations complete in a second. Quite noticeable decrease in the score when the match count increases, which is to be expected because that requires more operations.
Benchmark | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|
PerformanceTests.noMatchesInFilename | thrpt | 2 | 44580990.574 | ops/s | |
PerformanceTests.oneMatchInFilename | thrpt | 2 | 15314862.488 | ops/s | |
PerformanceTests.threeMatchesInFilename | thrpt | 2 | 6661081.448 | ops/s | |
PerformanceTests.sevenMatchesInFilename | thrpt | 2 | 2652866.574 | ops/s |
Here is the function that I would like to optimize. It works by inserting tags in to an HTML string, which then highlights the correct characters.
The tag used is a font style which sets the background color. Swing components do not support every HTML tag and I remember trying a lot of different things before ending up with this.
var startStyleTag = createStartStyleTag()
const val END_STYLE_TAG: String = "</font>"
private fun createStartStyleTag(): String {
val color = JBColor.YELLOW
return "<font style='background-color: ${colorAsHex(color)};'>"
}
fun highlight(source: String): String {
val stringBuilder: StringBuilder = StringBuilder(source)
var offset = 0
val hlIndexes = score.highlightCharacters.sorted()
for (i in hlIndexes) {
if (i < source.length) {
stringBuilder.insert(i + offset, startStyleTag)
offset += startStyleTag.length
stringBuilder.insert(i + offset + 1, END_STYLE_TAG)
offset += END_STYLE_TAG.length
}
}
return stringBuilder.toString()
}
Before even taking a look at what I could do, let's see what CoPilot thinks.
I simply asked it to optimize the function, copy pasted the results and measured again.
fun highlight(source: String): String {
val hlIndexes = score.highlightCharacters.sorted()
val parts = mutableListOf<String>()
var lastIndex = 0
for (i in hlIndexes) {
if (i < source.length) {
parts.add(source.substring(lastIndex, i))
parts.add(startStyleTag)
parts.add(source[i].toString())
parts.add(END_STYLE_TAG)
lastIndex = i + 1
}
}
parts.add(source.substring(lastIndex))
return parts.joinToString("")
}
Benchmark | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|
PerformanceTests.noMatchesInFilename | thrpt | 2 | 21140629.430 | ops/s | |
PerformanceTests.oneMatchInFilename | thrpt | 2 | 12419843.488 | ops/s | |
PerformanceTests.threeMatchesInFilename | thrpt | 2 | 3177763.124 | ops/s | |
PerformanceTests.sevenMatchesInFilename | thrpt | 2 | 1366734.188 | ops/s |
Here's the explanation that CoPilot gave me.
To optimize the highlight function, we can use a StringBuilder more efficiently by collecting all the parts of the string and then joining them at the end. This avoids multiple insertions which can be costly.
This made everything slower. Instead of using StringBuilder and appending the parts, CoPilot wanted to store all of the invidual strings and keep them in a list. That does not sound like it would be faster, or more memory efficient. Instead of using StringBuilder to create a single String object, it would create and store multiple distinct String objects.
Okay, so I know what I can look for. Instead of adding tags for each character, I could group them and add the start and end tags for each group. This of course means that I want to add another test where we have a same number of characters to tag, but the grouping is different.
I decided to create separate threeMatchesInFilenameInARow
and threeMatchesInFilename
.
So a new set of initial benchmarks, where it is clear that no matter the groupings 3 matches take the same time.
I'll also drop the no matches case, because every time this function is called the string must have at least one match.
Benchmark | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|
PerformanceTests.oneMatchInFilename | thrpt | 2 | 15147366.761 | ops/s | |
PerformanceTests.threeMatchesInFilenameInARow | thrpt | 2 | 6757701.170 | ops/s | |
PerformanceTests.threeMatchesInFilename | thrpt | 2 | 6751191.472 | ops/s | |
PerformanceTests.sevenMatchesInFilename | thrpt | 2 | 2703818.535 | ops/s |
Here's a comparison on what the results would look like with and without grouping. I think it is pretty clear why this would be an improvement.
Without grouping:
<font style='background-color: #ffff00;'>H</font><font style='background-color: #ffff00;'>e</font>llo
With grouping:
<font style='background-color: #ffff00;'>He</font>llo
This had the exact effect that I was looking for, by reducing the number of tags inserted we can make sequential character processing faster. It does slow down when there are no sequential matches, but I believe that some sequences are to be expected in the most common cases.
Benchmark | Mode | Cnt | Score | Error | Units | Change |
---|---|---|---|---|---|---|
PerformanceTests.oneMatchInFilename | thrpt | 2 | 15262031.008 | ops/s | ||
PerformanceTests.threeMatchesInFilenameInARow | thrpt | 2 | 10133267.856 | ops/s | ||
PerformanceTests.threeMatchesInFilename | thrpt | 2 | 5662795.456 | ops/s | ||
PerformanceTests.sevenMatchesInFilename | thrpt | 2 | 3672124.419 | ops/s |
Here is a better look into the changes between the two implementations. For such a small change, I think it is a good one.
Test Name | Initial ops/s | Optimized ops/s | Change in ops/s | % Change |
---|---|---|---|---|
PerformanceTests.oneMatchInFilename | 15147366.76 | 15262031.01 | 114664.25 | 0.76% |
PerformanceTests.threeMatchesInFilenameInARow | 6757701.17 | 10133267.86 | 3375566.69 | 49.95% |
PerformanceTests.threeMatchesInFilename | 6751191.47 | 5662795.46 | -1088396.02 | -16.12% |
PerformanceTests.sevenMatchesInFilename | 2703818.54 | 3672124.42 | 968305.88 | 35.81% |
I think this has gone on for long enough and this article is getting a bit too long. This is not the only issue that I had with the highlighting and I think it can still be improved, but it is a start.
Setting up JMH was considerable amount of work, but honestly I feel like it was a good investment. Fuzzier is a file finder plugin and it needs to be fast to feel good. Having the framework ready to keep iterating and being able to see the results in a readable form will help me in the future.
I didn't even get to JMeter here, but maybe at some point in the future I could look at how to integrate it to measure the scoring and matching.
If you have any tips, questions or anything else feel free to contact me at [email protected]
.