Maybe somewhat unrelated. I ran into an issue while optimizing my game, some piece of code runs super fast in an micro-benchmark, and I think to myself:
Congratulations Usnul, you are adequate.
But when I run the same code in the context of the game, together with all of the other messy parts - it suddenly does not behave the same way
Being an amazing software god that I am , I realize that there is a CPU cache, and that my efforts in optimizing a piece of code in isolation translates to complete garbage when running together with other stuff that competes for the cache space.
I’ve had a lot of cases in my past where I would optimize the heck out of something on a benchmark, only to realize that a super-naive approach with no fancy data structures or algorithms would run faster given a broader dataset or competitive context (other code running on the same context).
So… profiling. You know that something runs slow, but often have no clue as to why. For example, I have had 20-30% performance gains in some scenarios by simply removing tiny memory allocations that were unrelated to actual code that was showing up as slow in the profiler. This resulted in lower memory fragmentation , less cache thrashing
, smaller GC
overhead or… something? The point is - it was faster.
Another thing, profiling. It’s the observer effect.
When your profiler gazes into the code, the code gazes back at the profiler (c)
You sometimes get incredible (literally) results in some cases with and without the profiler. This has gotten better over the years, but it still holds true. For example, I run a profiler for 1 minute on a piece of code that uses function X 1000 times per frame, and after 1 minute the profiler tells me that it had 0ms total runtime.
Ok, Profiler. Whatever you say, Profiler.