|

Wyatt's World

Processor
Detection and a Pentium III Update
How do I get a textual name
for the processor?
If the processor type needs
to be reported to the user, care must be taken not to cause confusion
by reporting the wrong type. It is best to only report the basic model
such as ‘Intel Pentium’, ‘Intel Pentium II’, and so on. There is often
little need to report such details as ‘Intel Pentium III Xeon, 1024K Level
2 Cache’.
Many new non-Intel processors
have a series of extended CPUID
commands (0x80000002 to 0x80000004)
that return the name of the processor as 48-character string. Using this
feature is as simple as calling CPUID
several times and storing the resulting registers as shown below:
mov eax,0x80000000
CPUID
cmp eax,0x80000004
jb EXIT_NAME
// functions up to 0x80000004 must be present
mov eax,0x80000002
CPUID
mov DWORD PTR
[ProcessorName+0],eax
mov DWORD
PTR [ProcessorName+4],ebx
mov DWORD
PTR [ProcessorName+8],ecx
mov DWORD
PTR [ProcessorName+12],edx
mov eax,0x80000003
CPUID
mov DWORD PTR
[ProcessorName+16],eax
mov DWORD
PTR [ProcessorName+20],ebx
mov DWORD
PTR [ProcessorName+24],ecx
mov DWORD
PTR [ProcessorName+28],edx
mov eax,0x80000004
CPUID
mov DWORD PTR
[ProcessorName+32],eax
mov DWORD
PTR [ProcessorName+36],ebx
mov DWORD
PTR [ProcessorName+40],ecx
mov DWORD
PTR [ProcessorName+44],edx
Intel does not provide this
feature, but it does publish a list of all the models and their signatures.
However, even with this information, a reliable name can be difficult
to create. The reason for the difficulty is that the model and stepping
information does not distinguish all models. For example, an Intel family
6, model 5 could be a Pentium II, a Celeron or a Pentium II Xeon. It’s
all based on the cache size, and because of this you cannot distinguish
between a Pentium II and a Pentium II Xeon with 512K of level 2 cache.
The same problem arises with the Pentium III and Pentium III Xeon processors.
Fortunately, the class provided
with this article has a get_processor_name() member that does all the
bit twiddling and provides a very reliable processor name. Look at the
code and comments for the exact detection details.
How do I measure the processor
clock speed?
To measure the speed of a processor
that has the RDTSC instruction
is significantly easier than on one that does not. You require access
to the second high-performance timer with a known frequency, and you basically
run one timer against the other for a given amount of time. From this
the frequency of the RDTSC
timer can be obtained. The source of the second timer in Windows is the
high-performance counter API (the second timer does not have to be all
that fast, but it needs to be reliable, so the standard Windows ticker
can not be used). The high-performance counter API provides two functions
called QueryPerformanceCounter()
and QueryPerformanceFrequency().
The high performance counter by default has a frequency of 1.1927 MHz,
which is the frequency of the system timers (remember the DOS days?).
Although the API says nothing about the default frequency of this timer,
I have yet to see it change. However, that’s no excuse for not using the
frequency returned from QueryPerformanceFrequency()
— in fact, the documentation states that the frequency of this timer is
OEM specific.
The following pseudo code will
run for at least 1000 ticks of the Windows high-performance counter, and
from the counter it will calculate the speed of the processor clock. All
the times used are in actual elapsed time, so context switches and interrupts
are accounted for (unless they occur just between the initial or final
timing pairs). To get the most accurate timings you may want to run this
loop a number of times and keep averaging the results until the change
in the average is within some tolerance.
hp_start = hp_end =
QueryPerformanceCounter()
RDTSC_start =
ReadRDTSCTimer();
while (hp_end-hp_start<1000)
{
hp_end = QueryPerformanceCounter()
}
RDTSC_end = ReadRDTSCTimer();
RDTSC_elapsed =
RDTSC_end – RDTSC_start;
hp_elapsed =
hp_end – hp_start;
hp_frequency =
QueryPerformanceFrequency();
The clock speed is then derived
by the following equation:
Real time = (1/performance_timer_frequency)
* performance_timer_elapsed
Clock speed
= RDTSC_Elapsed / Real Time
The full code can be see in
the calculate_processor_frequency_rdtsc()
private function within the source code. There is a little more
to it than what is shown here, due to numerical inaccuracies.
As a backup for processors
without the RDTSC instruction,
or on machines with the RDTSC
instruction disabled, you have to resort to timing a loop of instructions
that take a known number of clock cycles to execute. This will usually
be used only on a 386 and 486 processors, due to more modern processors
that have the RDTSC instruction
(Windows does not disable it). For the best timing results, you want a
long instruction that takes a known and constant number of cycles to execute
which cannot be interrupted. The Bit Scan Forward (BSF)
instruction takes a known number of cycles on all of the 386, 486 and
Pentium processors so executing this instruction multiple times within
a loop should take a known amount of time from which the processor clock
speed can be derived. The loop is shown below:
mov
eax, 80000000h
mov ebx,
10000
LOOP: bsf
ecx,eax
dec ebx
jnz LOOP
The cycles required by the
above loop are 115, 47 and 43 cycles for the Intel 386, 486 and Pentium
processors, respectively (including the loop overhead). On the Pentium
Pro family of processors, the reliability of this algorithm is at the
mercy of the architecture due to its dynamic execution and resource allocation,
although it appears to take about 3.3 cycles per loop. If a reliable timing
cannot be made, the number of iterations of the loop needs to be increased.
Once a reliable timing is made, it is trivial to calculate the processor
clock frequency from the frequency of the high-performance counter.
The code for this method of
determining the processor frequency can be found in the calculate_processor_frequency_loop()private
function. In this form, it will only work for the Intel 386 and
486; early non-Intel processors that do not have the RDTSC
instruction may report the wrong speed due to the inability to distinguish
them from their Intel brethren. This is an ideal example of why relying
on loop timings is a bad practice – but in this case we have no choice.
See the source code comments for more specific information.
How do I detect multiple
processors?
Depending on your needs, there
are a few ways to detect multiple processors. First of all, if the Advanced
Programmable Interrupt Controller (APIC) feature bit is set in the CPUID
feature flags, then more than one processor is present. Unfortunately,
without access to the processor control address ranges, it is impossible
to directly detect the number of processors within the machine. An easier
approach is the Win32 GetSystemInfo()
function. This
function takes a pointer to a SYSTEM_INFO
structure, which it fills. The member of interest is dwNumberOfProcessors.
After calling this system function, you know the actual number of
processors in the machine and the basic model (such as 80386, 80486 or
Pentium), and if this is enough information, you’re all set. From the
data returned in the SYSTEM_INFO
structure you can determine if you are running on an Intel-compatible,
Alpha, PowerPC or Mips processor, but you cannot distinguish between Intel
and AMD (AMD will have multi-processor support beginning with its K7 processor).
If you need to know specific information about each processor, threads
need to be created with a processor affinity, forcing each thread to execute
on a single processor. These threads then need to interrogate their processor
for its make, model and features. This is what the detect.dll code within
this article does. The same approach was used in my last column to read
all of the serial numbers in a multi-processor Pentium III machine.
A word on multi-processor
machines.
Just because a machine has
more than one processor does not mean that these processors are all available
for a given process to use. Each process has a bit vector of the processors
available for it to use. This bit vector, called the "process affinity
mask", is available to applications by calling GetProcessAffinityMask().
This mask is a 32-bit value where each bit represents a possible processor.
Bit 0 is set if processor 1 is present, bit 1 is set if processor 2 is
present, and so on. Each thread within a process also has an affinity
mask called the "thread affinity mask", and this can be obtained
by calling GetThreadAffinityMask().
A thread can also have an ideal
processor. This is the processor on which the system will try schedule
thread if possible. The ideal processor can be set on a thread-by-thread
basis by calling SetThreadIdealProcessor().
If no ideal processor is set for a thread, the system will try to schedule
a thread on the same processor as it ran previously. On Windows 95/98,
all processor masks have a value of 1. The table below describes the various
processor masks and the functions which manipulate them.
|
Processor Mask
|
Description
|
Manipulation Functions
|
|
Active Processor Mask
|
Master set of processors available
within the machine.
|
Obtained by calling GetSystemInfo()
and reading the dwActiveProcessorMask
element of the SYSTEM_INFO
structure. The Active Processor Mask cannot be set.
|
|
System Affinity Mask
|
The system affinity mask is the set
of processors available for use to the system. This should be the
same as the above.
|
The system affinity mask is obtained
by calling GetProcessAffinityMask().
The system affinity mask cannot be set.
|
|
Process Affinity Mask
|
The process affinity mask is the set
of processors available to the calling process. This mask may be
different for each process in the system.
|
The process affinity mask is obtained
by calling GetProcessAffinityMask()
and it can be set by calling SetProcessAffinityMask().
A processor affinity mask be a subset of the system affinity mask.
|
|
Thread Affinity Mask
|
Set of processors available to the
calling thread. This mask may be different for each thread within
a process.
|
The thread affinity mask is obtained
by calling GetThreadAffinityMask()
and it can be set by calling SetThreadAffinityMask().
A thread affinity mask must be a subset of its containing process
affinity mask.
|
|