Measure Measure Measure

Consider the following real-life problem: you are assigned a task to write a program that keeps a collection of objects, sorted according to some criteria, and at any given time you should be able to provide a list of the top N objects in that collection. Your program will receive events that modify the objects (and therefor trigger re-ordering of the list), and the processing of these events is time-critical. How would you implement it? What algorithms and data structures would you use?

My instinct, and I think that most programmers share this tendency with me, is to start thinking in Computer Science terms when we hear this question. We iterate over all the data structures we know, mentally compare their performance, try to think about sorting algorithms, weigh the options and…

Wait! Is this the right approach for this problem?

I’d argue that it certainly isn’t. In this case, all of the facts we know about our data structures and algorithms, are true in certain conditions. Take sorting performance, for instance. Everyone knows that insertion sort has horrible performance (O(n^2)), so we shouldn’t use it, right? But if our array size is small enough, and the data is already sorted, it is superior to other algorithms, so maybe for this problem it is better than any other option? What about data structures? Different implementation of the same data structure differ significantly in performance. And how does your preferred solution behaves on the production machine?

This is why I think that the best way to go about this problem is to write a small benchmark application, that compares several approaches with typical data and sample sizes, using different data structures implementations and algorithms, and run it on the machine that would eventually host the final product, if possible.

Programmers (and especially those of us who work in the field of performance measurement) tend to think that they know where the performance issue in the application will be, and that they can prepare for them in advance.

Well, they are wrong.

I’ve had the pleasure of doing performance testing and improvement several times in my career, and quite ironically, I’ve achieved the biggest improvements by undoing the damage that other programmers did when they thought they were writing some really clever, lightning fast, code[1].

Now, don’t get me wrong – I don’t say that knowing your stuff isn’t important. It is. But real-life application behavior is just too complicated to be evaluated on a whiteboard, and your performance issues will always be where you least expect them to be.

Stay humble, and measure measure measure.

[1] Seriously, people? Writing your own (incorrect) versions of data structures? Re-implementing String.equals, which is one of the most optimized pieces of code in the JVM? What were you thinking?!


2 thoughts on “Measure Measure Measure”

  1. So here’s a question for you:
    Recently was asked to help design for a new feature. It was clear we have insufficient infrastructures to support such feature (that was a given). The question is what new infrastructure do we want, given that performance is crucial? Well, that depends on the algorithm we choose (among other things). But we should also decide if we are going with shared memory or a messaging system. Are we using a token ring, one-to-many communication or are we setting “leader” to over-see the system?

    Well, obviously, we can’t implement all of these options and “measure, measure, measure” (actually, I’m not sure we even have the resources to implement even one of these options – it depends on prioritizing).

    So the question is (at last): from which point to do start making changes and measuring there effects?

    1. Nice question.

      The ideal option would be to write little mock-ups of the architectures you want to evaluate, and check how they behave under the expected conditions. If you think that the performance is top priority, that this decision will have a great impact on the performance and you will not have a chance to reiterate the design, I believe that spending a little more time on testing will save trouble in the long run.

      But what normally happens, as far as I can tell, is that people just choose something that seems reasonable, and then change the architecture later, after the problems arise (which is also a way to measure LOL). If the program is designed properly (encapsulation, separation of concerns etc) , it is usually possible, but of course – in this case you would write the infrastructure two times.

      The exception is when you already have prior information on how different architectures behave. For instance, in my company we know that certain parts of the system create CPU burden, others consume memory, several trade offs between the two, and the pitfalls we may encounter if we apply them (because we had several architectures and moved them around, BTW). In this case you can make a more informed decision without actually testing, but let me tell you a little secret – we still make mistakes and have to fix them post-mortem, in the performance improvement stage.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s