Our Properties: Gamasutra GameCareerGuide IndieGames Indie Royale GDC IGF Game Developer Magazine GAO
My Message close
Contents
Programming the Cell Broadband Engine
 
 
Printer-Friendly VersionPrinter-Friendly Version
 
Latest News
spacer View All spacer
 
February 10, 2012
 
Analyst questions validity of unusual January NPD results [3]
 
DICE 2012: Blizzard's Pearce on World Of Warcraft's launch hangover
 
DICE 2012: Insomniac's Price on Quality Of Life, ditching the 'Loser' badge [2]
spacer
Latest Jobs
spacer View All     Post a Job     RSS spacer
 
February 10, 2012
 
Sony Computer Entertainment America LLC
Audio Tools Engineer
 
Sony Computer Entertainment America LLC
World Wide Studios Technical Product Manager
 
Sony Computer Entertainment America LLC
Senior Software Application Engineer
 
Sony Computer Entertainment America LLC
Senior Gamer Insights Specialist
 
High 5 Games
Technical Artist
 
Airtight Games
Art Director
spacer
Latest Features
spacer View All spacer
 
February 10, 2012
 
arrow Principles of an Indie Game Bottom Feeder [18]
 
arrow Postmortem: CyberConnect 2's Solatorobo: Red the Hunter [1]
 
arrow Jerked Around by the Magic Circle - Clearing the Air Ten Years Later [39]
 
arrow Building the World of Reckoning [4]
 
arrow SPONSORED FEATURE: TwitchTV - How to Build Community Around Your Game in 2012 [13]
 
arrow Happy Action, Happy Developer: Tim Schafer on Reimagining Double Fine [9]
 
arrow Building an iOS Hit: Phase 1 [11]
 
arrow Postmortem: Appy Entertainment's SpellCraft School of Magic [5]
spacer
Latest Blogs
spacer View All     Post     RSS spacer
 
February 10, 2012
 
Audio Passes: Success Through Layering
 
What the current RPG can learn from Diablo 1
 
Double Fine's Kickstarter Windfall: Will Patronage Supplant Traditional Game Publishing? [5]
 
The Principles of Game Monetization
 
Did DoubleFine Just break the publishing model for good? [11]
spacer
About
spacer Editor-In-Chief/News Director:
Kris Graft
Features Director:
Christian Nutt
Senior Contributing Editor:
Brandon Sheffield
News Editors:
Frank Cifaldi, Tom Curtis, Mike Rose, Eric Caoili, Kris Graft
Editors-At-Large:
Leigh Alexander, Chris Morris
Advertising:
Jennifer Sulik
Recruitment:
Gina Gross
 
Feature Submissions
 
Comment Guidelines
Sponsor
Features
  Programming the Cell Broadband Engine
by Alex Chunghen Chow [Programming]
Post A Comment Share on Twitter Share on Facebook RSS
 
 
July 21, 2006 Article Start Page 1 of 6 Next
 

With nine processor cores, a single Cell processor chip (called a Cell Broadband Engine or CBE) often performs an order of magnitude more work than a traditional single-core chip at the same clock rate. Cell's parallel configuration and performance are seldom seen in traditional CPU architectures for any market--much less the cost-sensitive consumer electronics business.

The CBE, jointly created by Sony, Toshiba, and IBM, distributes its huge computational capacity over two different kinds of processor cores, so its development environment is quite different from that of comparatively conventional homogeneous multiprocessor architectures. Cell programmers need special facilities to help them harvest such computation resources effectively.


In this article, I'll introduce a new object format called CBE Embedded SPE Object Format (CESOF). Programmers in the Sony, Toshiba, and IBM Cell Broadband Engine Design Center (STIDC) created the CESOF specification to help Cell programmers integrate the interacting programs for these two different types of processor cores. I'll also introduce the design concept, the structure, and a simple usage sample of CESOF.

Multiple Cores in the Cell Processor

Using heterogeneous (that is, different kinds of) processor cores in a multi-core system has become a popular practice in the embedded systems space. For a particular algorithm or application with expected regularity, a specialized and highly optimized circuit usually provides better performance within a smaller chip area and with lower power consumption than general-purpose cores. In fact, embedded systems designers recognize that many of their applications or workloads can benefit from specialized cores such as a single instruction, multiple data (SIMD) engine, a floating-point accelerator, or a direct memory access (DMA) controller.

High-performance computation workloads, modern media-rich applications, and many algorithms in other domains all exhibit a lot of regularity in their tasks. Replacing one or more of the generic processor cores with specialized circuits will likely give a better performance/cost ratio for these applications.

Cell's chip design, shown in Figure 1, strikes a balance by using one generic Power Processor Element (PPE) and eight Synergistic Processor Elements (SPEs) to provide a better performance/cost ratio (in terms of chip area and power consumption), particularly for high-performance computing and media processing. The eight SPEs are specialized SIMD cores, each with its own private local memory. The performance/cost ratio is particularly impressive when an algorithm can be distributed over all eight engines at the same time with properly staged data traffic.

 

Each SPE can run independently from the others. Its instruction set is designed to execute SIMD instructions efficiently. All SIMD instructions handle 128-bit vector data in different element configurations: byte, half word, word, and quad-word sizes. For example, one SIMD instruction can perform 16 character operations at the same time.

Another important design aspect is the use of the on-chip local memory located next to each SPE. This closeness reduces the distance, and thus the latency, from a processor core to its execution memory space. The address space of the SPE instructions spans only its own local memory; the SPE fetches instructions from the local store, loads data from the local store, and stores data to the local store. SPE instructions cannot "see" the rest of the chip's (or the system's) address space.

The simplicity of this local memory design improves memory-access time, memory bandwidth, chip area, and power consumption, but it does require extra steps for an SPE program to bring external data into the local store. An SPE can't load or store data to/from the system memory directly. Instead, it uses a DMA operation to transfer data between the system memory and its local store. This is quite different from the general-purpose PPE core. The PPE load and store instructions access the data directly from the effective address backed by off-chip physical system memory.

As a side note, the internal core of an SPE, without the DMA engine, is called a Synergistic Processor Unit (SPU). The use of SPU in the naming convention of software code is sometimes intermixed with the use of SPE where a distinction may not be as important.

Connecting these nine cores (one PPE and eight SPEs) with the physical memory is a high-speed bus called Element Interconnect Bus (EIB). Through this bus, an SPE DMA engine (not the SPE load and store instructions) transfers data between the system memory and its local store memory.

Developing and combining the code modules for cores with different instruction sets and memory spaces presents a big challenge to conventional programming tools. Programmers need an additional facility, such as CESOF, to glue these heterogeneous code modules together. In the remainder of this article I'll introduce the design concept, the structure, and a simple usage example of CESOF.

 
Article Start Page 1 of 6 Next
 
Comments


none
 
Comment:
 




UBM Techweb
Game Network
Game Developers Conference | GDC Europe | GDC Online | GDC China | Gamasutra | Game Developer Magazine | Game Advertising Online
Game Career Guide | Independent Games Festival | Indie Royale | IndieGames

Other UBM TechWeb Networks
Business Technology | Business Technology Events | Telecommunications & Communications Providers

Privacy Policy | Terms of Service | Contact Us | Copyright © UBM TechWeb, All Rights Reserved.