|
Intel’s VTune has a long-standing reputation as one of the better
tools for application analysis — at least for applications headed
for Intel-based systems. I hadn’t touched it since version 3.5,
and I was more than curious to see what new improvements were implemented
in the new version 6. I wasn’t disappointed either.
I had actually used the Xbox version of VTune only a few months earlier,
and the install process had been more than a little painful, so I was
a bit nervous. However, apart from a few redundant reboots and a small
problem recognizing the installed version of Flash, everything went smoothly.
I created a quick wizard project, turned on call graphing, and off it
ran. After 20 seconds, the project stopped, reran, and then did it again.
Was this a bug? On reading the tutorial further, I discovered that the
first pass was a calibration pass, the second pass was the actual samples,
and the final pass was call graphing. Simple enough, but why did it all
only take 20 seconds? It seems that is the default execution time; this
might not work well if that time is spent loading, though fortunately
you can modify activities and increase this time. Even without call graphing
you’ll need the calibration pass, so I manually aborted that after
a minute or two, and then left the main sampling pass for a few minutes.
There is also a simple option for the project to run an application with
sampling paused and then have the user manually resume, so you can add
your own hooks in your code to generate resume/pause messages using a
vtuneperformance.dll. This came in very handy for isolating samples to
specific areas (for example, if the frame rate drops, call a resume function
to start logging samples).
I made numerous attempts with sampling, trying to get a good representation,
and it didn’t take long for me to remove the call graphing. I think
this is a feature to turn on only in trouble spots, as it’s very
intrusive in execution timing and slows down a game to the point where
it’s not really useful unless you know what you’re looking
for — or if you’re smart and have set up some prerecorded
joystick presses that can walk through a game perfectly every time (I’m
not that smart).
The results I got without the call graph were great starting points.
I recorded about five minutes of samples and counter events. After sampling,
the data is displayed in graph form representing everything from CPU percent
time, privileged CPU percent, page misses, thread queuing, and more. Intel
has supplied a lot of mechanisms to view this data, including graphing
as splines, blocks, solid, or wire form, and it’s all very customizable.
I chose a spline form, though I recommend playing with the display a bit
as it does impact how you perceive the data execution. Using another icon,
I selected a time range in order to investigate a peculiar spike in processor-privileged
time that looked odd. I highlighted a small, one-second range, hit the
drill-down icon, and was rewarded with a much more detailed breakdown.
At this point I had numerous modules (DLLs and also the exe) I wished
to look at, but I couldn’t merge all the modules together. It’s
a minor annoyance, but one I can live with until version 7.
Now, it was a simple case of double clicking on the desired module to
bring up a detailed source breakdown, though it did ask me to specify
the dsp (project or makefile). This led to my second problem: the project
I was debugging has numerous dlls as well as the main exe, but the system
only wanted to accept a single project file. I wanted it to ask for my
workspace from Visual C++ (the .dsw), as it is an extremely complex source
base. The good news for Java and other non-C++ people is that VTune does
allow you to specify a multitude of project types. I was very happy to
see Java, .net, and even FORTRAN supported (does anyone really still use
that?).
By now I’ve screamed a few times in my head, and once out loud,
when my eyes gravitate to the top 10 functions that stall out the execution.
Intel’s terms are CPU Clockticks (non-sleep) and Instructions Retired
(sounds like a CIA euphemism for a shagged compiler). One particular surprise
VTune found was a bit of code that looked harmless enough, but when I
viewed the source with disassembly it showed the NEG assembler instruction
was kicking the function’s teeth in, taking a bite out of its performance,
and doing this twice. After a quick fix, it dropped off the top-10 list
completely.
I tried another of my top-10 items, and rather than trying to find a
solution myself, I just right-clicked on the source and selected “VTune
Assistant, This Function” (you can also do this to a selection or
an entire file). Now this is where I started to get impressed. The assistant
returned about 20 occurrences of problems with nice little light bulbs
at the lines concerned. It also offered a light bulb at the end of the
function, with suggestions on a general problem and solution. What I was
very pleased to see was the comment “Logical AND/OR statement conditional,”
which offered me a very informative description of what it believed could
be done.
VTune’s assistant is one I would gladly hire, or at least get writing
PS2 code. It also caught the loop invariant catch, where you resolve a
pointer to a pointer within a loop or a for statement that has the count
of a pointer to a class — nasty stuff. But my favorite feature and
the one I am very interested to do more with is vectorization, which crops
up a lot with virtuals and templates. If you’re a progressive C++
engineer who likes to template array handling, then VTune’s going
to ring your bell, because one of its recommended optimizations is to
recommend the new Intel C/C++ compiler version with the new supported
vectorization pass, meaning that the compiler will use SSE instructions
for some loop operations. I saw this compiler at GDC this year, and it
was very impressive (at least by this feature). Beyond that, it recommends
restructuring the code to allow for better vectorizing.
I’m generally very happy with the assistant, though I thought it
lacked a single critical feature, which really got to be frustrating.
The assistant is in its own window, and it displays the line number, but
I can’t click on the line number and have the window scroll to the
correct line of source, so I had to scroll down manually to the line number.
This became annoying, especially when I started to do class analysis over
function analysis and wanted large blocks of code to be analyzed by the
assistant so that I could just scroll through the trivial fixes. Another
problem was I could not jump to code in the editor. For those who have
used SN Systems’ debugger, it has a hot key, Ctrl+E, that jumps
to code in Visual C++. I sorely missed this when VTuning my data.
All in all, I thought VTune 6 was extremely good, very stable, and the
tutorial was insightful and valuable, making my life significantly easier.
The in-depth help, and the fact that hitting F1 on any item brought up
the correct context help menu, was invaluable. I had only one crash during
a marathon six-hour session. Even when I did crash, when I got back up
and running, my project was intact, and it was just a minor inconvenience.
VTune 6 is a must for developers, and you don’t need to be an assembler
wiz to use it (though it does help).
|
|

VTune 6 Performance Analyzer
Intel
Santa Clara, CA
Price:
$699
Requirements:
Hardware: At least an Intel Pentium III processor-based
system and 128MB of RAM. An Intel Celeron, Intel Pentium II,
Pentium II Xeon, Pentium III, Pentium III Xeon, Pentium 4,
or Intel Xeon, or Intel Itanium processor-based system for
event-based sampling. Note: Event-based sampling is not supported
on mobile processors.
Software:
Microsoft Windows 98 (SE), Windows ME, Windows NT 4.0 with
Service Pack 4 or later, Windows 2000 Build 2195 or later,
or Windows XP Build 2475 or later. Microsoft Internet Explorer
5.0 or later (5.5 or newer recommended).
Pros:
1. Excellent tutorial.
2. Great VTune assistant.
3. Easy to get into and get real info out of.
Cons:
1. Could have better integration back into Visual Studio
for editing source.
2. Windows could be better sized and positioned, as it gets
a little cluttered.
3. Calibration phase was frustrating.
|
 |
 |
 |
|