mardi 19 avril 2016

Why, when and how you should avoid using agent mode



So I thought I would write a short ironic post today.

Java agents are great for bytecode instrumentation but even as intrusive as they are, they still come short of their goal sometimes. Also, they come with a certain overhead in resources and configuration maintenance due to the fact that they require modifications and updates on the monitored JVM side. I could summarize this thought by saying that they cause a higher "TCO", than the simple, risk-free, collector-mode sampling approach.

Or I could illustrate it in a more explicit way. Depending on your context, using instrumentation might be like owning a Ferrari. If you liked sports cars and even if you had enough money to buy one, maybe renting a Ferrari for a day or two when you want to race or take a drive down the coast would make more sense than owning it, with all the ramifications it implies? I feel like I'm still not nailing this entirely here though, seeing as most Ferrari customers are probably not the most pragmatic and rational buyers, and they probably don't care about efficiency when it comes to using their sportscar.

But the reason why this post is slightly ironic though, is that I just uploaded a youtube tutorial today showing you how you can use djigger in agent mode for instrumentation purposes.

Obviously, in many cases you do need instrumentation. And in certain cases you'll want it on at all times, for instance if you end up building your business insights on top of it. But is that always a requirement, or is there some sort of way you could use sampling data to come up with the same or 'good-enough' information to understand and solve your problem?

I've partially covered this topic in my simulated "Q&A" session, but I felt I needed to explain myself a little more and illustrate my point with a recent example.

Here's an upcoming feature in djigger (which should be published some time this week in R 1.5.1) that will allow you to answer the classic question "is method X being called very frequently or do just a few calls take place but each call lasts a long time"?

This is the question you'll ask (or have asked) yourself almost every time after sampling a certain chunk of runtime behavior. In theory, the nature of the sampling approach and the logic behind stack trace aggregation (explained here) causes for us to be blind and "lose" that information.

However, there is a way to extract a very similar piece of information out of the stacktrace samples. Here's how.

When sampling at a given frequency, let's say every 50 ms, method calls lasting less then 50 ms might sometimes be "invisible" to the sampler. However, every single time a stacktrace changes in comparison to that of the previous snapshot (i.e a sub-method is called or the method call itself finishes or a different code line number is on the stack) then you know for sure that if you find that method or code line number again in one of the next snapshots, that the method count has to have been increased at least by one.

This is what we call computing a min bound for method call counts. And we're very excited about releasing this feature, as it is one of the primary reasons why people need instrumentation.

Again you have to understand, we have nothing against instrumentation and we offer instrumentation capability ourselves through our own java agent. However, there are numerous reasons (simplicity, overhead, risks, speed of analysis, etc) for which we love being able to refine our "data mining" logic at the sampling results level.

My next youtube tutorial will either provide in-depth coverage of that functionality or maybe cover collector mode. Either way, I can't wait to show you more of the benefits of djigger. Also I will try to wrap up part 2 of my silly benchmark tomorrow, so I can tell for sure what the impact of the network link looked like (see my previous entry on this blog).

Until then, I'm signing off from Le Bonhomme.

Aucun commentaire:

Enregistrer un commentaire