Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
Procedural Rendering on Playstation 2
View All     RSS
May 17, 2021
arrowPress Releases
May 17, 2021
Games Press
View All     RSS







If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Procedural Rendering on Playstation 2


September 26, 2001 Article Start Previous Page 3 of 7 Next
 

TenThings Nobody Told You About PS2

Before we start translating this Lifeform algorithm into a PS2 friendly design, I’d like to cover some more about the PS2. Later we’ll use these insights to re-order and tweak the algorithm to get some speed out of the machine.

1. You must design before coding.
Lots of people have said this about PS2 – you cannot just sit down and code away and expect high speed programs as a result. You have to plan and design your code around the hardware and that requires insight into how the machine works. Where do you get this insight from? This is where this paper comes in. The aim later is to present some of the boilerplate designs you can use as a starting point.

2. The compiler doesn’t make things easy for you.
Many of the problems with programming PS2 come from the limitations of the compiler. Each General Purpose Register in the Emotion Engine is 128-bits in length but the compiler only supports these as a special case type. Much of the data you need to pass around the machine comes in 128-bit packets (GIF tags, 4D float vectors, etc.) so you will spend a lot of time casting between different representations of the same data type, paying special attention to alignment issues. A lot of this confusion can be removed if you have access to well designed Packet and 3D Vector classes.

Additionally the inline assembler doesn’t have enough information to make good decisions about uploading and downloading VU0 Macro Mode registers and generating broadcast instructions on vector data types. There is a patch for the ee-gcc compiler by Tyler Daniel and Dylan Cuthbert that update the inline assembler to add register naming, access to broadcast fields and new register types which are used to good effect in our C++ Vector classes. It’s by no means perfect as you’re still limited to only 10 input and output registers, but it’s a significant advance.

3. All the hardware is memory mapped.
Nearly all of the basic tutorials I have seen for PS2 have started by telling you that, in order to get anything rendered on screen, you have to learn all about DMA tags, VIF tags and GIF tags, alignment, casting and enormous frustration before your “Hello World” program will work. The tutorials always seem to imply that the only way to access outboard hardware is through painstakingly structured DMA packets. This statement is not true, and it greatly complicates the process of learning PS2. In my opinion this is one of the reasons the PS2 is rejected as “hard to program”.

Much of this confusion comes from the lack of a detailed memory map of the PS2 in the documentation. Understandably, the designers were reticent to provide one as the machine was in flux at the time of writing (the memory layout is completely reconfigurable by the kernel at boot time) and they were scared of giving programmers bad information. Let’s change this.

All outboard registers and memory areas are freely accessible at fixed addresses. Digging through the headers you will come across a header called eeregs.h that holds the key. In here are the hard-coded addresses of most of the internals of the machine. First a note here about future proofing your programs. Accessing these registers directly in final production code is not advisable as it’s fully possible that the memory map could change with future versions of the PS2. These techniques are only outlined here for tinkering around and learning the system so you can prove to yourself there’s no magic here. Once you have grokked how the PS2 and the standard library functions work, it’s safest to stick to using the libraries.

Let’s take a look at a few of the values in the header and see what they mean:

#define VU1_MICRO ((volatile u_long *)(0xNNNNNNNN))
#define VU1_MEM ((volatile u_long128 *)(0xNNNNNNNN))

These two addresses are the start addresses of VU1 program and data memory if VU1 is not currently caclulating. Most tutorials paint VU1 as “far away”, a hands off device that’s unforgiving if you get a single instruction wrong and consequently hard to debug. Sure, the memory is unavailable if VU1 is running a program, but using these addresses you can dump the contents before and after running VU programs. Couple this knowledge with the DMA Disassembler and VCL, the vector code compiler, and VU programming without expensive proprietary tools and debuggers is not quite as scary as it seems.

#define D2_CHCR
#define D2_MADR
#define D2_QWC
#define D2_TADR
#define D2_ASR0
#define D2_ASR1

#define D3_CHCR
#define D3_MADR
#define D3_QWC
((volatile u_int *)(0xNNNNNNNN))
((volatile u_int *)(0xNNNNNNNN))
((volatile u_int *)(0xNNNNNNNN))
((volatile u_int *)(0xNNNNNNNN))
((volatile u_int *)(0xNNNNNNNN))
((volatile u_int *)(0xNNNNNNNN))

((volatile u_int *)(0xNNNNNNNN))
((volatile u_int *)(0xNNNNNNNN))
((volatile u_int *)(0xNNNNNNNN))

If you have only read the SCE libraries you may be under the impression that “Getting a DMA Channel” is an arcane and complicated process requiring a whole function call. Far from it. The DMA channels are not genuine programming abstractions, in reality they’re just a bank of memory mapped registers. The entries in the structure sceDmaChan map direc tly onto these addresses like a cookie cutter.

#define GIF_FIFO ((volatile u_long128 *)(0xNNNNNNNN))

The GIF FIFO is the doorway into the Graphics Synthesizer. You push qwords in here one after another and the GS generates polygons - simple as that. No need to use DMA to get your first program working, just program up a GIF Tag with some data and stuff it into this address.

This leads me to my favorite insight into the PS2…

4. The DMAC is just a Pigeon Hole Stuffer.
The DMA Controller (DMAC) is a very simple beast. In essence all it does is read a qword from a source address, write it to a destination address, increments one or both of these addresses, decrements a counter and loops. When you’re DMAing data from memory to the GIF all that’s happening is that the DMA chip is reading from the source address and pushing the quads through the GIF_FIFO we mentioned earlier – that DMA Channel has a hard-wired destination address.

5. Myth: VU code is hard.
VU code isn’t hard. Fast VU code is hard, but there are now some tools to help you get 80 percent of the way there for a lot less effort.

VCL (Vector Command Line, as opposed to the interactive graphic version) is a tool that preprocesses a single stream of VU code (no paired instructions necessary), analyses it for loop blocks and control flow, pairs and rearranges instructions, opens loops and interleaves the result to give pretty efficient code. For example, take this simplest of VU programs that takes a block of vectors and in-place matrix multiplies them by a fixed matrix, divides by W and integerizes the value:

; test.vcl
; simplest vcl program ever
.init_vf_all
.init_vi_all
--enter
--endenter
.name start_here
start_here:
ilw.x srce_ptr 0(vi00)
ilw.x counter, 1(vi00)
iadd counter, counter, srce_ptr
lq v_transf0 2(vi00)
lq v_transf1 3(vi00)
lq v_transf2 4(vi00)
lq v_transf3 5(vi00)
loop:
--LoopCS 6, 1
lq vec, 0(srce_ptr)
mulax.xyzw ACC, v_transf0, vec
madday.xyzw ACC, v_transf1, vec
maddaz.xyzw ACC, v_transf2, vec
maddw.xyzw vec, v_transf3, vf00
div Q, vf0w, vecw
mulq.xyzw vec, vec, Q
ftoi4.xyzw vec, vec
sq vec, 0(srce_ptr)
iaddiu srce_ptr,srce_ptr,1
ibne srce_ptr, counter, loop
--exit
--endexit
. . .

VCL takes the source code, pairs the instructions and unwrap the loop to this seven instruction inner loop (with entry and exit blocks not shown):

loop__MAIN_LOOP:
; [0,7) size=7 nU=6 nL=7 ic=13 [lin=7 lp=7]
maddw VF09,VF04,VF00w lq.xyz VF08,0(VI01)
nop sq VF07,(0)-(5*(1))(VI01)
ftoi4 VF07,VF06 iaddiu VI01,VI01,1
mulq VF06,VF05,Q move VF05,VF10
mulax ACC,VF01,VF08x div Q,VF00w,VF09w
madday ACC,VF02,VF08y ibne VI01,VI02,loop__MAIN_LOOP
maddaz ACC,VF03,VF08z move VF10,VF09

6. Myth: Synchronization is complicated.
The problem with synchronization is that much of it is built into the hardware and the documentation isn’t clear about what’s happening and when. Synchronization points are described variously as “stall states” or hidden behind descriptions of queues and scattered all over the documentation. Nowhere is there a single list of “How to force a wait for X” techniques.

The first point to make is that complicated as general purpose synchronization is, when we are rendering to screen we are dealing with a more limited problem: you only need to keep things in sync once a frame. All your automatic processes can kick off and be fighting for resources during a frame, but as soon as you reach the end of rendering the frame then everything must be finished. You are only dealing with short bursts of synchronization.

The PS2 has three main systems for synchronization:

  • synchronization within the EE Core
  • synchronization between the EE Core and external devices
  • synchronization between external devices.

This whole area is worthy of a paper in itself as much of this information is spread around the documentation. Breaking the problem down into these three areas sheds allows you to grok the whole system. Briefly summarizing:

Within the EE Core we have sync.l and sync.e instructions that guarantee that results are finished before continuing with execution.

Between the EE Core and external devices (VIF, GIF, DMAC, etc) we have a variety of tools. Many events can generate interrupts upon completion, the VIF has a mark instruction that sets the value of a register that can be read by the EE Core allowing the EE Core to know that a certain point has been reached in a DMA stream and we have the memory mapped registers that contain status bits that can be polled.

Between external devices there is a well defined set of priorities that cause execution orders to be well defined. The VIF can also be forced to wait using flush, flushe and flusha instructions. These are the main ones we’ll be using in this tutorial.


Article Start Previous Page 3 of 7 Next

Related Jobs

Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States
[05.17.21]

Senior Project Manager
Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States
[05.17.21]

Senior Producer
Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States
[05.17.21]

Project Manager
Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States
[05.17.21]

Producer





Loading Comments

loader image