OpenSolaris

  subsites   code review   repo   packages   bugs   defect   polls   planet
You are not signed in. Sign in or register.

Profile: The Team Behind DTrace

by chromatic (chromatic@oreilly.com)

June 10, 2005

One of the most powerful and eagerly anticipated features of Solaris 10 is DTrace. It's useful for administrators and developers to profile and debug applications. The DTrace team is Bryan Cantrill, Adam Leventhal, and Mike Shapiro. Recently, OpenSolaris.org interviewed the team about DTrace, its history, and its future.

Where did the idea of DTrace come from? Profiling is an old idea and instrumenting code isn't new, but I find your work plenty amazing.

Bryan Cantrill: Actually, dynamically instrumenting production systems is new with DTrace: no other system for dynamic instrumentation views safety as an absolute, non-negotiable constraint. For a more specific review of prior work, see our USENIX paper (on DTrace).

The origins of DTrace reach back nearly a decade, when Mike and I were both undergraduates at Brown. In a conversation with my advisor in 1996, I asked him why systems didn't support dynamic instrumentation--and I sketched out for him some of the very early ideas for what would become DTrace. His response was that if such a thing were possible, it would have been done already--and that the techniques I was suggesting must not work for some reason. I couldn't see why the ideas wouldn't work, but I didn't question that there was something I was missing. Later that year, when interviewing with the Solaris Kernel Development group at Sun, I posed the same question to Solaris engineer Jeff Bonwick (now the lead engineer on ZFS). Jeff responded that he couldn't see why it wouldn't work, and that it seemed like a good idea. That was the moment that I knew I wanted to come to Sun: there was (and is) an idea that nothing was impossible simply because it hadn't been done before.

The next year, Mike (who had stayed at Brown for a master's degree) joined Solaris Kernel Development, and we began to talk actively about DTrace. In talking about it, we realized that there were more pressing problems in Solaris; at least for the moment, DTrace was shelved. But as the years went on, Mike and I continued to think about DTrace, and we began to have a clearer and clearer vision for what we wanted DTrace to become. This vision came purely out of experience: when we muddled through a tough system performance problem, or tried to debug some nasty transient condition, we envisioned a framework that would let us understand the problem in minutes instead of hours. It got to the point where one of us would say to the other "damn, I really needed DTrace today"--which is a little absurd given that we didn't have so much as line of code for this supposed miracle tool! (Worse, when others were complaining about the difficulty in tracking down some problem or another, one of us would chime in with "you know, DTrace solves that problem ...") By 1999, we knew we had to start formally thinking about DTrace. Various crises intervened, however, and it wasn't until the fall of 2001 that we were able to finally start working on it. By the spring of 2002, we had a prototype that was exciting enough to prove that our ideas had merit, but we also realized that we had bitten off more than we might be able to chew. We asked Adam--a promising young engineer who had joined Sun in 2001 (also, as it happens, from Brown)--to join the team as the third engineer, and the three of us have been working more or less full-time on it ever since.

What can't you do with DTrace right now? (It's okay to say "It's really hard to run it on itself, but everything else is fair game.")

BC: DTrace can't currently instrument languages that have dynamic program text: Java, Python, Perl, PHP, etc. This is not an easy problem to solve: these languages have not been designed or implemented with dynamic instrumentation in mind, and the techniques required to instrument them are often very specific to a particular language and its run-time environment. Solving this problem still lies in the indefinite future for us--the next year or so will be spent evolving our existing functionality--but we very much intend to solve it.

In what sense is this difficult?

BC: In what sense is this not difficult?

I'm curious if you mean that instrumenting JITted code from the JVM or Parrot is difficult, or if the granularity of ops in Perl, Python, or PHP is too high above processor instructions to do much good.

BC: Both. Instrumenting JITted code is difficult because the virtual machine needs to know what you're up to at some level--even if only enough to know to leave your instrumentation alone. Once the code is instrumented, the granularity of operations makes it difficult to tie that instrumentation back to something meaningful. To give you a simple, concrete example: Perl 5 doesn't associate line numbers with parse tree nodes unless it is explicitly started in an explicit debugging mode. By discarding that information, Perl makes it very difficult to [connect] Perl-induced system activity (I/O, CPU utilization, network activity, etc.) to the specific body of Perl that is inducing it. Programs written in C and C++ generally discard their debugging information too, but these languages are so much closer to the operating system and the underlying microprocessor that we can generally still draw meaningful inferences. Not so for Perl, Java, Python, PHP, etc.

What do you have in mind to solve it?

BC: We're still getting a grip on the problem. We know that the instrumentation techniques will be VM-specific, and that any mechanism for getting at higher-level data will also be VM-specific. Given the high level of VM specificity, Parrot is clearly attractive to us: because Parrot can run multiple languages, we'll get quite a bit of leverage by extending just Parrot to become a DTrace provider.

So even in the case where Parrot, for example, supports both high-granularity behavior thanks to its opcodes hiding a lot of complexity under the hood and JITting to platform-specific code, there's still a lot of distance between the high-level language and the code the processor accesses and that's what makes this difficult?

Mike Shapiro: There are basically two categories of issues, as Bryan mentioned: one is that in a VM environment program instructions are either dynamically created and destroyed (i.e., a JIT) or they are simply data that the VM itself acts on (i.e., an interpreter). Some VMs do a mixture of both. So for issue #1, we can't just blithely and generically hot-instrument code from DTrace (as we do for kernel and user processes at the ABI level); we must explicitly interface with a VM and have it act as our proxy when we wish to instrument something.

The second area of challenge is relating the language-specific details of the VM (e.g., a Perl line number, or a Java class file, line number, and/or method name, etc.) to the user of DTrace in an appropriate fashion. As every VM is implementing some language with differing semantics and constructs, we need to both be able to retrieve relevant information (e.g., the line number example that Bryan gives) and also present some unified model for those semantics to users of DTrace. That is a hard design problem. To put it another way, DTrace can't export the union of all known languages: it has to provide some underlying simplified abstractions that don't require DTrace to be revised every time someone implements a new language on top of Parrot.

Does it help that Parrot works on pcode (or bytecode, colloquially) rather than optrees, or is the hard work not on where to put the instrumentation?

MS: The VM needs to either put it in the instrumentation or tell us where to do so, as we discussed above. But that requires the ability for a user of DTrace to specify a semantic location in a way that is meaningful to someone using the language that Parrot is executing.

BC: But as I said earlier, we've got plenty of work to do just fleshing out what we've already got; it's going to be another nine months before we start thinking about this in earnest.

How has the rest of the Solaris team reacted to DTrace? Is there good uptake?

BC: Oh yes. When you develop a tool that allows people to do their jobs orders of magnitudes faster, you find that you have to do very little in the way of explicit evangelizing. By the time we integrated DTrace into Solaris, we had hundreds of users inside of Sun using our prototype as part of their day-to-day work--and our prototype was even being used to run portions of Sun's production infrastructure! In fact, some projects viewed DTrace as so essential that they based their project gate off of our project gate--which is to say that their project was not a child of Solaris, but rather a child of DTrace (which was itself a child of Solaris). It's hard to imagine a stronger vote of confidence from one's peers.

Do you find yourself giving informal tutorials in the hallways and in conference rooms?

BC: We found that the best tutorials were the bug reports in which we had used DTrace to solve an actual problem. This showed people how to use DTrace--and made it clear that DTrace could be used to solve real problems. From this, a core group of users formed, and as the number of users increased, these savvier and more experienced users helped the new ones. In fact, the first piece of comprehensive DTrace documentation was not written by us (we were too busy implementing new features and fixing older ones!) but rather by one of our users.

What's the inspiration for the user-level syntax? It looks a bit like XPath to me.

BC: I suppose the probe description syntax looks a little like XPath (if only because we use the colon as a delimiter), but the inspiration is much more old school: C and awk.

One of the problems I've seen in VMs is that JIT is running into operating systems that wisely mark some pages as read-but-do-not-execute. Have you encountered this?

MS: That is either a bug in an OS or a bug in a JIT. On Solaris, if a process wishes to dynamically generate machine code and execute it, it needs to call the mmap(3C) or mprotect(3C) interfaces and specify PROT_EXEC to tell the kernel that executable instructions will be put there. Some old UNIX programs assumed implicitly that the stack or data segments were executable, which has conflicted with recent security trends like making stacks non-executable to limit buffer overflow attacks. In any case, these issues don't affect DTrace and are just artifacts of certain JIT implementations.

What's next for DTrace?

BC:Currently, to use DTrace most effectively, one must often have at least passing familiarity with the application being instrumented. We have addressed this problem in the kernel by introducing providers with stable semantics: the io provider makes available probes relating to I/O, the sched provider makes available probes relating to CPU scheduling, and so on. So the most immediate future direction for DTrace is the mechanism that will allow applications to export their own providers with stable semantics. This will allow the system to be instrumented in ways that reflect the system's semantics, allowing one to tie together activity in an application (say, the start of a database transaction) with activity elsewhere in the system (say, the I/O induced by that transaction) without having to know the implementation of either the database or the operating system.

In the more indefinite future, we want to expand DTrace to those parts of the system that are not currently instrumentable with DTrace. As long as there's an answer to your question about "what can't you do with DTrace," we're not done.

What responses and feedback have you received after the January release of the DTrace source code?

BC: I don't think too many have read the DTrace source per se--I think people (rightly) view it primarily as an expression of the seriousness of our intention to open source Solaris.

Do you have other ideas in mind after DTrace, or is this your current long-term project for the time being?

BC: Yes. ;)

What do you have in mind after DTrace, if anything?

BC: My answer was actually accurate, if flip: we definitely have other ideas in mind after DTrace (we are always harboring ideas--small and large--on how to improve the system), but for the time being our current project is DTrace. There is enough left to do in DTrace to keep us busy for at least another couple of years; after that ... we'll keep you posted.

chromatic is the technical editor of the O'Reilly Network.