Consider the following real-life problem: you are assigned a task to write a program that keeps a collection of objects, sorted according to some criteria, and at any given time you should be able to provide a list of the top N objects in that collection. Your program will receive events that modify the objects (and therefor trigger re-ordering of the list), and the processing of these events is time-critical. How would you implement it? What algorithms and data structures would you use?
My instinct, and I think that most programmers share this tendency with me, is to start thinking in Computer Science terms when we hear this question. We iterate over all the data structures we know, mentally compare their performance, try to think about sorting algorithms, weigh the options and…
Wait! Is this the right approach for this problem?
I’d argue that it certainly isn’t. In this case, all of the facts we know about our data structures and algorithms, are true in certain conditions. Take sorting performance, for instance. Everyone knows that insertion sort has horrible performance (O(n^2)), so we shouldn’t use it, right? But if our array size is small enough, and the data is already sorted, it is superior to other algorithms, so maybe for this problem it is better than any other option? What about data structures? Different implementation of the same data structure differ significantly in performance. And how does your preferred solution behaves on the production machine?
This is why I think that the best way to go about this problem is to write a small benchmark application, that compares several approaches with typical data and sample sizes, using different data structures implementations and algorithms, and run it on the machine that would eventually host the final product, if possible.
Programmers (and especially those of us who work in the field of performance measurement) tend to think that they know where the performance issue in the application will be, and that they can prepare for them in advance.
Well, they are wrong.
I’ve had the pleasure of doing performance testing and improvement several times in my career, and quite ironically, I’ve achieved the biggest improvements by undoing the damage that other programmers did when they thought they were writing some really clever, lightning fast, code.
Now, don’t get me wrong – I don’t say that knowing your stuff isn’t important. It is. But real-life application behavior is just too complicated to be evaluated on a whiteboard, and your performance issues will always be where you least expect them to be.
Stay humble, and measure measure measure.
 Seriously, people? Writing your own (incorrect) versions of data structures? Re-implementing String.equals, which is one of the most optimized pieces of code in the JVM? What were you thinking?!