A while ago I was asked by a colleague the works on a different part of our product, what is bytecode instrumentation and how it is done. Bytecode instrumentation is a technique for changing the code of compiled Java applications, either before running them – on the disk, or “on the fly” as they are loaded into memory.
Why would you want to do something like that? The company I work for is an APM vendor. An APM product will help you assess your application’s performance by showing you, for each business transaction, how it flows across application tiers, how much time it spends in each one, and hopefully, by providing some insight as for why it behaves the way it does. So in essence, we make a “thin and wide” profiler for production time – thin because it has less features for each technology than a real profiler, and wide because it covers a broad set of technologies, and the interfaces between them. Bytecode instrumentation is one of the methods used to extract profiling information from applications, so naturally – we have much use for it.
We use it to add monitoring capabilities to applications without having to modify the monitored code manually. Code coverage tools may use this technique to inject tracing calls into the tested code, and then analyze the collected data in order to generate testing coverage reports. Frameworks may use instrumentation to implement some of their services.
It may seem like a wild idea at first – an application that modifies itself during runtime? Sounds like a script for a sci-fi movie! but it makes a lot of sense if you consider the way the JVM works.
Java is a dynamically loaded language. Every class is loaded lazily when needed using a component called a class loader. The class loader looks for the class code, reads it, and passes it to the JVM which links it with the rest of the code that is already loaded. This behavior makes Java very flexible – selection of concrete classes is deferred to the last possible moment, and the code itself can be brought from exotic places of all sorts.
Classes are stored on the disk in .class files, which are just a binary dumps of the memory structure that describes the class’s properties, and the code that it implements. The compiled code itself is kept in bytecode format – an intermediate language, which resembles assembly language. You can see the bytecode of a compiled class, as well as some of the other data stored in the compiled file, using the disassembly tool that comes with the JDK: javap (be sure to use the “-v” option in order to see the bytecode):
Normally, the class loader just reads the class file from wherever it is stored and passes it to the JVM as an array of bytes, but since it is just a structure in a well-defined and rather simple format, we can reason about the code and do some sophisticated things with it prior to that. In our case, the code is modified at that stage, right before the class is defined by the JVM.
There are several ways to “hook” this instrumentation mechanism into the JVM. You could implement your own special-purpose class loader and load parts of the code with it, you could instrument the JRE’s class loader on the disk, and have it call into your own code, and in Java 1.5 and up – there is a special instrumentation API that gives you access to the bytecode of every class right before it is defined. For the bytecode modification itself, you can use open source frameworks, like ASM and BCEL (which is somewhat outdated) that take care of some of the more annoying technical details involved in this process.
And this is bytecode instrumentation in a nutshell – seems magical at first, but really simple once you realize how it is done.
 And .Net applications, although the process differs from what I describe in this post
 Application Performance Management
 I like to call it pointing fingers scientifically
 Network locations, dubious coding sweatshops, whatever…