Meet My Dog, Patches
There's an old joke that goes something like this:
Patient: "Doctor, it hurts when I do this."
Doctor: "Then stop doing it."
Funny, but are these also wise words when applied to the right situation? Consider the load of pain I found myself in when working on a conversion of a 3D third person shooter from the PC to the original PlayStation.
Now, the PS1 has no support for floating point numbers, so we were doing the conversion by basically recompiling the PC code and overloading all floats with fixed point. That actually worked fairly well, but were it fell apart was in collision detection.
The level geometry that was supplied to us worked reasonably well in the PC version of the game, but when converted to fixed point, all kinds of seams, T-Junctions and other problems were nudged into existence by the microscopic differences in values between fixed and floats. This problem would manifest itself by the main character (called "Damp") simply falling through those tiny holes, into the abyss below the level.
We patched the holes we found, tweaking the geometry until Damp no longer fell through. But then the game went into test at the publisher, and suddenly a flood of "falling off the world" bugs were delivered to us. Every day a fresh batch of locations were found with these little holes that Damp could slip through. We would fix that bit of geometry, then the next day they would send us ten more. This went on for several days. The publisher's test department employed one guy whose only job was to jump around the world for ten hours a day, looking for places he could fall through.
The problem here was that the geometry was bad. It was not tight and seamless geometry. It worked on the PC, but not on the PS1, where the fixed point math vastly magnified the problems. The ideal solution was to fix the geometry to make it seamless.
However, this was a vast task, impossible to do in the time available with our limited resources, so we were relying on the test department to find the problem areas for us.
The problem with that approach was that they never stopped finding them. Every day bought more pain. Every day new variants of the same bug. It seemed like it would never end.
Eventually the penny dropped. The real problem was not that the geometry had small holes in it. The problem was that Damp fell through those holes. With that in mind, I was able to code a very quick and simple fix that looked something like:
IF (Damp will fall through a hole()) THEN
Don't do it
The actual code was not really much more complex than that (see Listing 2).
Listing 2: Meet My Dog, Patches
damp_old = damp_loc;
move_damp();
if (NoCollision())
{
damp_loc = damp_old;
}
In one swoop a thousand bugs were fixed. Now instead of falling off the level, Damp would just shudder a bit when he walked over the holes. We found what was hurting us, and we stopped doing it. The publisher laid off their "jump around" tester, and the game shipped.
Well, it shipped eventually. Spurred on by the success of "if A==bad then NOT A", I used this tool to patch several more bugs -- which nearly all had to do with the collision code. Near the end of development the bugs became more and more specific, and the fixes became more and more "Don't do thispreciseandexacthing" (see Listing 3, actual shipped code).
Listing 3: Meet My Dog, Patches
if (damp_aliencoll != old_aliencoll &&
strcmpi("X4DOOR",damp_aliencoll->enemy->ename)==0 &&
StartArena == 6 && damp_loc.y<13370)
{
damp_loc.y = damp_old.y; // don't let damp ever
touch the door.. (move away in the x and y)
damp_loc.x = damp_old.x;
damp_aliencoll = NULL; // and say thusly!!!
}
What does that code do? Well, basically there was a problem with Damp touching a particular type of door in a particular level in a particular location, rather than fix the root cause of the problem, I simply made it so that if Damp ever touched the door, then I'd move him away, and pretend it never happened. Problem solved.
Looking back I find this code quite horrifying. It was patching bugs and not fixing them. Unfortunately the real fix would have been to go and rework the entire game's geometry and collision system specifically with the PS1 fixed point limitations in mind. The schedule was initially aggressive, so we always seemed close to finishing, so the quick patch always won over the comprehensive, expensive fix.
But it did not go well. Hundreds of patches were needed, and then the patches themselves started causing problems, so more patches were added to turn off the patches in hyper-specific circumstances. The bugs kept coming, and I kept beating them back with patches. Eventually I won, but at a cost of shipping several months behind schedule, and working 14 hour days for all of those several months.
That experience soured me against "the patch." Now I always try to dig right down to the root cause of a bug, even if a simple, and seemingly safe, patch is available. I want my code to be healthy. If you go to the doctor and tell him "it hurts when I do this," then you expect him to find out why it hurts, and to fix that. Your pain and your code's bugs might be symptoms of something far more serious. The moral: Treat your code like you would want a doctor to treat you.
- Mick West
|
en.wikipedia.org/wiki/Checksum
I was just about to tell the team the good news when the producer pulled me aside, and mimed a "shh!" gesture.
The team continued to crunch, none the wiser, until we reached the now fake deadline. After that, the producer acted dissappointed in us, and told us to keep going.
Boy, that producer really knew how to get work out of people who were demoralized and exhausted already! HAH HAH!
I feel strangely more human all of a sudden!
As for 'The Programming Antihero" - Well on that one I actually thought *everyone* did that. The real trick when coding solo is to make yourself forget about that array until the very last minute of the project.. ;)
There is a special place in hell for producers that give resources fake deadlines. It's a trick bad producers use to make themselves look good, at the expense of everyone else. In this case, your team suffered so your bad producer could be the hero to the client (and upper management).
Way to cover up for him/her!
Did you like seeing your other team members get extra demoralized and extra exhausted? I guess letting them also be "super relieved" was not your problem. They must have been amazed at your ability to handle the stress of the "unrealistically aggressive" fake deadline. Hopefully that "can-do" attitude got you a promotion.
Good producers work honestly with their team, and they get far more accomplished as a result.
The fake deadline trick tends to only work once (it always gets discovered).
Ever notice that teams miss deadlines for some producers more than others?
Ever wonder why that is? It's all about trust and respect.
HAH HAH!
As some have already mentioned, this is done fairly often with deadlines in quite a few fields. I've not ever seen it done with memory constraints, though!
I can see I'll be looking forward to more in the comments, too. ;)
As far as that goes, we were recently having problems where the player avatar would mysteriously disappear and become unresponsive due to the screen clipping against a non-square map. We found the fix readily enough, but the performance cost was so high, that we ended up just forcing the player to render if they were alive with the comment:
// HACK
// Players should ALWAYS be on screen if they exist.
There were also some related to how we had to implement our own renderer to get half-decent performance, but the real ugly stuff was in the grid optimisation. We lost about two solid weeks ironing out the bugs in that (which, considering our deadline, was actually about a quarter of the total development time; madness), and we still ended up with another avatar-specific hack so players wouldn't be dropped from the game state if they crossed a grid boundary. The code for this is still enclosed in
// HACK HACK HACK LOUSY HACK
and
// End of lousy hack.
But I love the trick in "The Programming Antihero"; I think I'm going to start using it. :D
Employees is a little vague though. Need a way of distinguishing between management, account managers, and senior leads (techincal or creative). Unless the producer is insane, those employees aren't given fake deadlines to motivate them. It's only the people underneath the producer, the creative and technical resources, that get abused. Not sure how else to group them.
What could be more obvious? Only render every second letter! But alternate, so that on even numbered frames, you render even-numbered letters, and on odd-numbered frames, you render odd-numbered letters!
This leaves you a flickery, but fast and usable, debug menu.
Stuff like this is a great read. Not only is it amusing but it also teaches you tricks you can implement when in a crisis, or more importantly what kind of problem might blow up in your face and how you can avoid them to begin with.
Sir, do not worry! You may have missed my sarcastic tones, and also I missed a key plot point: I did the exactly opposite of cover for him.
I was as shocked as you, and told my comrades the story behind the producer's back, because I'm a LOOSE CANNON. It was still a hard push, but they were able to pace themselves.
"static char buffer[1024*1024*2];"
That's just priceless.
i really enjoyed every one. I would love to see the second edition of this, or maybe more!
Profuse apologies, I TOTALLY did not realize you were joking.
Tone totally changes everything.
I applaud your approach.
I deem thee "Sir" Loose Cannon.
@Brandon
Forgot to mention I loved the article.
Great reading. I have gained wisdom.
Major kudos to Noel Llopis' old hand coder for that memory trick!
At university there was a team (not related to me, but these guys are the perfect example :P) that made a FPS flash game...
For some bizarre reason, the programmer instead of checking if you was colliding with the wall and not allow you go there, he made the inverse, he checked if there was a wall, and allowed you to move parallel to it...
This sparked a bizarre bug: In crossings, you could not actually cross, only turn to the passage on your left or right.
The deadline was closing, and they had no idea on how to fix it...
Then the team writer fixed the issue! He told the artist to draw a animation of hands touching the walls, and then he wrote in the story that the protagonist was blind and needed to touch the walls to know where he was going.
//@hack
//@remove
//@fixme!
And, well... they almost never get changed :)
Turned out the flash init code was dead and the carts could not save games properly!!! Studio went into meltdown trying to figure out how to ship 250'000 broken carts. Suggestions of production lines adding extra resistors and other hacks to every cart were tried and failed , then some figured out if you played some games in a weird order the flash memory would sort of work. So i extra leaflet was added to every box explaining how to use this "feature".
Job done
Chris Kirby
That's the best thing I've ever heard!
Yes, of course...if you are in need of many megabytes of memory, inspecting the code-size is definitely not the most obvious thing to do ;) but, at least for console/handheld-projects, it is somewhat of a standard procedure.
I have found similar (but unintended) stuff in earlier projects while inspecting the map-file once in a while.
------
Don't know how many remember Force 21, but it was an early 3D RTS which used a follow cam to observe your current platoon. Towards the end of the project we had a strange bug where the camera would stop following the platoon -- it would just stay where it was while your platoon moved on and nothing would budge it. The apparent cause was random because we couldn't find a decent repro case. Until, finally, one of the testers noticed that it happened more often when an air strike occurred near your vehicles. Using that info I was able to track it down.
Because the camera was using velocity and acceleration and was collidable, I derived it from our PhysicalObject class, which had those characteristics. It also had another characteristic: PhysicalObjects could take damage. The air strikes did enough damage in a large enough radius that they were quite literally "killing" the camera.
I did fix the bug by ensuring that cameras couldn't take damage, but just to be sure, I boosted their armor and hit points to ridiculous levels. I believe I can safely say we had the toughest camera in any game.
------
When I was a lead on R6 Lockdown PS2, I also used the "The Programming Antihero" trick -- whenever an engineer found a significant memory savings I told them to release enough to get the game under budget for the time being, but hold some back for the eventual future overrun. In the end we had a 1-2K buffer (IIRC) that we released just before ship. Most of the team never knew.
Well, this is basically what Microsoft uses as their WPARAM and HPARAM in WinProc, so it wasn't that bad =)
@Evan Bell: I am a programmer in the healthcare industry and hit that lottery once some years ago. I had to load a text file with over 100k lines, one per medicine, and two lines gave the same CRC32. Fortunately the update routine was run in our server and the clash was easily found. For two files with the same CRC32, check http://www.allegro.cc/forums/thread/585925
Although not game derived, I got a couple to share. Around 5 years ago we had to change the L&F of the application. That included switching menus over, changing background images, reshaping controls, etc. It went pretty well (at 16 hours per day, took us just a week). However, testers complained that the application started to randomly crash after some time of use, with an out of memory error. Just 4 hours before shipping we discovered the error: the new rounded buttons didn't free the normal/pushed bitmaps correctly, eating memory every time they appeared on screen. So we had to remove the new buttons, and shipped a Mac-like rounded application with gray square buttons.
Regarding fake deadlines, we once decided to fulfill it. Indeed, we reached it, had the product uploaded to our servers, fully tested and waiting for the approval of the CEO to be released. But he decided not to launch it. The thing was that he had put a very impossible goal so that we would be forced to delay the package, giving him time to suggest more features before the real launch date arrived.
Also, whenever the physician chose a drug, it would be stored into the database, linked to the patient and a recipe within a visit. However, one of the programmers really messed up and, instead of storing the NDC (national drug code as set by the FDA), he stored the drug index in the drug database. The drug database was usually modified once per month (removing drugs that had expired, adding new drugs alphabetically, etc) so that, when I checked, the IDs in our database were usually pointing to drugs that were not the chosen ones. Even though the contraindication check used names instead of
NDC (which made the routine work still), it was too risky to have random numbers as medicine codes, so we had to clean them up. None of them was salvageable, and I had to wipe out the entire column, but there was no way to explain that to the users without making them think they were in great risk. So, I placed a progress bar that lasted around 7 minutes with something like "Upgrading drug information" which did nothing at all, and notified the user that he would be forced to relink some (all!) of the medicines in the program database with the ones from the medicine database as he starts reusing them. None of the doctors complained, though, and everything has been going smooth since then.
Chris Kirby's one reminds me of a couple. A programmer had just installed a spell checker into our program, with the ability to add your own custom words, and he started testing with words like a**, f*ck, and similar. Unfortunately, he didn't clear the table and we got angry calls from one doctor who was showing his secretary the new spell checking functionality when f*ck popped up. So, always clear your databases before shipping the product!
Great feature! I look forward to "Dirty Coding Tricks II"!
One of the "impossible" errors were: "OH MY GOD, THE GAME IS TOTALLY FUCKED AND THE WORLD WILL EXPLODE BECAUSE SOME IMPOSSIBLE SHIT HAPPENED AND ACTIVATED THE IMPROBABILITY DRIVE CAUSING A TIME PARADOX THAT WILL SCREW THE UNIVERSE IN ALL FUCKING SEXUAL POSITIONS POSSIBLE"
After I typed that, I plainly forgot it, it was embedeed deeply in some initialization routines of the API, a code that once done I would never peek at, also it was the only case that I used profanity or that sort of stuff (I am a person that is not much into profanity, but that day was a bad day, so I ranted on that error message).
Unofortunally, one of the testers actually made that error happen... He called me all confused, and me too (since I don't remembered it) and I tought that he was joking, but I searched and actually found it...
Then I had two questions on my head:
How that error happened? (awnser: a mistype that I introduced on the lastest build caused a nasty chain reaction of bugs that made the execution stack go awry, making the impossible actually possible).
Why I did wrote so much profanity? (awnser: In fact I am still wondering the awnser for that...)
So this is a way to access private/protected members if there are no getter/setter.
Now I'm not saying I'd do that but in the case you're working on a program with an API and you only have access to the .h file, you could get around it this way. =)
I don't know why this ever worked but I had read that if you shift the mouse position left and then right it solved the problem. So I added that and it worked.
Later a programmer looked at this and thought it was just self cancelling and commented out the code and you could no longer click the button.
I explained the real problem was that we were not counting "Mickeys" and should move to a relative system. He asked if I knew how and that is how I moved from being a tester to a programmer.