CMP Game Media Group Presents: Home
  JoinHelpContact UsShop

Newswire
Features
Connection
Job Search
Directories
By Rob Wyatt
Gamasutra
July 9, 1999
Vol. 3: Issue 27

Features
Wyatt's World

Processor Detection and a Pentium III Update

Contents

Introduction

Part 2

How do I get a textual name for the processor?

The code…

More code…

How do I get a textual name for the processor?

If the processor type needs to be reported to the user, care must be taken not to cause confusion by reporting the wrong type. It is best to only report the basic model such as ‘Intel Pentium’, ‘Intel Pentium II’, and so on. There is often little need to report such details as ‘Intel Pentium III Xeon, 1024K Level 2 Cache’.

Many new non-Intel processors have a series of extended CPUID commands (0x80000002 to 0x80000004) that return the name of the processor as 48-character string. Using this feature is as simple as calling CPUID several times and storing the resulting registers as shown below:

mov eax,0x80000000
CPUID
cmp eax,0x80000004
jb EXIT_NAME // functions up to 0x80000004 must be present

mov eax,0x80000002
CPUID

mov DWORD PTR [ProcessorName+0],eax
mov DWORD PTR [ProcessorName+4],ebx
mov DWORD PTR [ProcessorName+8],ecx
mov DWORD PTR [ProcessorName+12],edx

mov eax,0x80000003
CPUID

mov DWORD PTR [ProcessorName+16],eax
mov DWORD PTR [ProcessorName+20],ebx
mov DWORD PTR [ProcessorName+24],ecx
mov DWORD PTR [ProcessorName+28],edx

mov eax,0x80000004
CPUID

mov DWORD PTR [ProcessorName+32],eax
mov DWORD PTR [ProcessorName+36],ebx
mov DWORD PTR [ProcessorName+40],ecx
mov DWORD PTR [ProcessorName+44],edx

Intel does not provide this feature, but it does publish a list of all the models and their signatures. However, even with this information, a reliable name can be difficult to create. The reason for the difficulty is that the model and stepping information does not distinguish all models. For example, an Intel family 6, model 5 could be a Pentium II, a Celeron or a Pentium II Xeon. It’s all based on the cache size, and because of this you cannot distinguish between a Pentium II and a Pentium II Xeon with 512K of level 2 cache. The same problem arises with the Pentium III and Pentium III Xeon processors.

Fortunately, the class provided with this article has a get_processor_name() member that does all the bit twiddling and provides a very reliable processor name. Look at the code and comments for the exact detection details.

How do I measure the processor clock speed?

To measure the speed of a processor that has the RDTSC instruction is significantly easier than on one that does not. You require access to the second high-performance timer with a known frequency, and you basically run one timer against the other for a given amount of time. From this the frequency of the RDTSC timer can be obtained. The source of the second timer in Windows is the high-performance counter API (the second timer does not have to be all that fast, but it needs to be reliable, so the standard Windows ticker can not be used). The high-performance counter API provides two functions called QueryPerformanceCounter() and QueryPerformanceFrequency(). The high performance counter by default has a frequency of 1.1927 MHz, which is the frequency of the system timers (remember the DOS days?). Although the API says nothing about the default frequency of this timer, I have yet to see it change. However, that’s no excuse for not using the frequency returned from QueryPerformanceFrequency() — in fact, the documentation states that the frequency of this timer is OEM specific.

The following pseudo code will run for at least 1000 ticks of the Windows high-performance counter, and from the counter it will calculate the speed of the processor clock. All the times used are in actual elapsed time, so context switches and interrupts are accounted for (unless they occur just between the initial or final timing pairs). To get the most accurate timings you may want to run this loop a number of times and keep averaging the results until the change in the average is within some tolerance.

hp_start = hp_end = QueryPerformanceCounter()
RDTSC_start = ReadRDTSCTimer();
while (hp_end-hp_start<1000)
{

hp_end = QueryPerformanceCounter()

}

RDTSC_end = ReadRDTSCTimer();
RDTSC_elapsed = RDTSC_end – RDTSC_start;
hp_elapsed = hp_end – hp_start;
hp_frequency = QueryPerformanceFrequency();

The clock speed is then derived by the following equation:

Real time = (1/performance_timer_frequency) * performance_timer_elapsed
Clock speed = RDTSC_Elapsed / Real Time

The full code can be see in the calculate_processor_frequency_rdtsc() private function within the source code. There is a little more to it than what is shown here, due to numerical inaccuracies.

As a backup for processors without the RDTSC instruction, or on machines with the RDTSC instruction disabled, you have to resort to timing a loop of instructions that take a known number of clock cycles to execute. This will usually be used only on a 386 and 486 processors, due to more modern processors that have the RDTSC instruction (Windows does not disable it). For the best timing results, you want a long instruction that takes a known and constant number of cycles to execute which cannot be interrupted. The Bit Scan Forward (BSF) instruction takes a known number of cycles on all of the 386, 486 and Pentium processors so executing this instruction multiple times within a loop should take a known amount of time from which the processor clock speed can be derived. The loop is shown below:

mov eax, 80000000h

mov ebx, 10000

LOOP: bsf ecx,eax

dec ebx

jnz LOOP

The cycles required by the above loop are 115, 47 and 43 cycles for the Intel 386, 486 and Pentium processors, respectively (including the loop overhead). On the Pentium Pro family of processors, the reliability of this algorithm is at the mercy of the architecture due to its dynamic execution and resource allocation, although it appears to take about 3.3 cycles per loop. If a reliable timing cannot be made, the number of iterations of the loop needs to be increased. Once a reliable timing is made, it is trivial to calculate the processor clock frequency from the frequency of the high-performance counter.

The code for this method of determining the processor frequency can be found in the calculate_processor_frequency_loop()private function. In this form, it will only work for the Intel 386 and 486; early non-Intel processors that do not have the RDTSC instruction may report the wrong speed due to the inability to distinguish them from their Intel brethren. This is an ideal example of why relying on loop timings is a bad practice – but in this case we have no choice. See the source code comments for more specific information.

How do I detect multiple processors?

Depending on your needs, there are a few ways to detect multiple processors. First of all, if the Advanced Programmable Interrupt Controller (APIC) feature bit is set in the CPUID feature flags, then more than one processor is present. Unfortunately, without access to the processor control address ranges, it is impossible to directly detect the number of processors within the machine. An easier approach is the Win32 GetSystemInfo() function. This function takes a pointer to a SYSTEM_INFO structure, which it fills. The member of interest is dwNumberOfProcessors. After calling this system function, you know the actual number of processors in the machine and the basic model (such as 80386, 80486 or Pentium), and if this is enough information, you’re all set. From the data returned in the SYSTEM_INFO structure you can determine if you are running on an Intel-compatible, Alpha, PowerPC or Mips processor, but you cannot distinguish between Intel and AMD (AMD will have multi-processor support beginning with its K7 processor). If you need to know specific information about each processor, threads need to be created with a processor affinity, forcing each thread to execute on a single processor. These threads then need to interrogate their processor for its make, model and features. This is what the detect.dll code within this article does. The same approach was used in my last column to read all of the serial numbers in a multi-processor Pentium III machine.

A word on multi-processor machines.

Just because a machine has more than one processor does not mean that these processors are all available for a given process to use. Each process has a bit vector of the processors available for it to use. This bit vector, called the "process affinity mask", is available to applications by calling GetProcessAffinityMask(). This mask is a 32-bit value where each bit represents a possible processor. Bit 0 is set if processor 1 is present, bit 1 is set if processor 2 is present, and so on. Each thread within a process also has an affinity mask called the "thread affinity mask", and this can be obtained by calling GetThreadAffinityMask().

A thread can also have an ideal processor. This is the processor on which the system will try schedule thread if possible. The ideal processor can be set on a thread-by-thread basis by calling SetThreadIdealProcessor(). If no ideal processor is set for a thread, the system will try to schedule a thread on the same processor as it ran previously. On Windows 95/98, all processor masks have a value of 1. The table below describes the various processor masks and the functions which manipulate them.

Processor Mask

Description

Manipulation Functions

Active Processor Mask

Master set of processors available within the machine.

Obtained by calling GetSystemInfo() and reading the dwActiveProcessorMask element of the SYSTEM_INFO structure. The Active Processor Mask cannot be set.

System Affinity Mask

The system affinity mask is the set of processors available for use to the system. This should be the same as the above.

The system affinity mask is obtained by calling GetProcessAffinityMask(). The system affinity mask cannot be set.

Process Affinity Mask

The process affinity mask is the set of processors available to the calling process. This mask may be different for each process in the system.

The process affinity mask is obtained by calling GetProcessAffinityMask() and it can be set by calling SetProcessAffinityMask(). A processor affinity mask be a subset of the system affinity mask.

Thread Affinity Mask

Set of processors available to the calling thread. This mask may be different for each thread within a process.

The thread affinity mask is obtained by calling GetThreadAffinityMask() and it can be set by calling SetThreadAffinityMask(). A thread affinity mask must be a subset of its containing process affinity mask.



The code…
 


Home | Join | Help | Contact Us | Shop | Newswire | Site Map | Calendar
Write for Us | Features | Connection | Job Search | Directories


Copyright © 2000 CMP Media Inc. All rights reserved.
Privacy Policy