Friday, April 09, 2010

Quiz...

Okay, so perhaps I should have been a little clearer, but the general rule of question one sticks through out. So here we got...

1) Memory Access. This is it pretty much it, especially on today's hardware but does have a similar effect (if not as profound) on older machines as well. If someone is reading/writing gigs of data every frame, it's gonna suck, not just because they have a huge loop in there, but because in modern computing (and we'll stick with this just now), memory is the number 1 enemy.

In the past, CPU access was around the same speed as memory, so it would be a little slower to read from memory - usually just a few cycles. These days, memory is incredibility slow compared to the CPU. Register operations will (in real terms) take less than a cycle, while un-cached memory access can take thousands of cycles (once you remember page faults and the rest).

This is crazy, so the less memory you can touch the better. Now... this may include reducing passes over data and doing it in one pass (while doing a little more register based work), or simply removing data access and tables if you can do it in a simple calculation. In the past, we used to have multiply and divide tables, but these days this table can be so expensive, you're far better just doing the ASM instruction which only takes a few cycles.

So, heres a real world example - particles. If your doing particles on the main CPU (we'll ignore the GPU for now), then the smaller you can make your particle structure the more you'll be able to render; not because you can draw more, but simply because the CPU only has a limited memory bandwidth and reducing that means you can do more, or better yet, do other things - like gameplay.

I've seen it over and over again. People continually looping over large amounts of data, wondering why things are slow when it's not that much data they're processing. Remember in this day of the multitasking OS, a 1Mb cache is not yours alone. You're data will be continually kicked out by other processes, so even if you only have 64K of data, you'll be surprised how little time it spends in the cache. The answer is to prefetch the next block, and do a little more processing on the current one, thereby reducing the number of iterations you have to do. After all, if your talking 400 cycles (say) to read a cacheline (around 64bytes last time I checked), then why not use the 400 cycles doing something instead of 400 cycles waiting on memory coming into the cache?

2) This actually has nothing to do with optimisation - by bad. Its a simple 2 part question...

2.a) Release it. No game, or application is any good if you never release it. No matter how shiny, awe inspiring, or ground breaking; who cares if it never sees the light of day? So rule 1 of any program development, make sure you get something out, or it's just wasted effort.

2.b) Make it fun. In games, it's easy to release something with lots of features and levels, but if it's not fun, no ones gonna play it. It's that simple. I can name several games that appear to have been developed by idiots. Games that were all gloss and no gameplay. Some teams fixate of making things as pretty as possibly, but thats really not the most important thing. You have to enjoy being IN the game, or like 2.a it's pointless and a waste of time and effort. You'd be amazed how often this rule is ignored, or something particularly frustrating removes all the fun that SHOULD be there.


So there you go... Yeah, not the best phrased questions, but I bet looking at these you're either nodding in agreement, or shouting at the monitor something like "Rubbish! Algorithms are FAR more important!!". Well, this is true... but given even a reasonable algorithm, you can then apply the memory rule and speed it up more. The less memory you touch, the quicker your code will be, it's that simple.

oh... and no smart answers about being in a calculation loop with no memory access for a second - we're assuming you're not a moron. :)

Saturday, April 03, 2010

Performance...

Here's a little quiz for you. Both of these issues have come up in conversation over the past week or so, and I thought it would be interesting to pose them, and give you the chance to prove your smarter that most games developers...

Question 1
Whats the main thing to tackle inside a program, that's virtually guaranteed to speed it up?

Question 2
When writing a game, whats the 2 most important things to make sure you do? (In order please)


The answer to both these questions should be obvious, but lets see how you do. I'll answer them in a couple of days...

Now... if you cast your mind back a few years, you'll remember me doing a fancy little routine for XeO3's bullet allocation. This was a simple "stack allocator" that sped up my code by quite a bit. Well, you can now find this article in the new Games Programming Gems 8 book. Yes, about a year ago I got the article accepted, and it's now in print and out. This is the 1st time I've decided to try and get an article published, but I thought it was about time. It takes the general concept a little further and uses it in standard programming (rather than 6502), and comes with a few little examples.

The only reason I mention this (aside from being chuffed that I finally got something published), is that this is one of the main reasons I still code old machines. Without writing XeO3, this would never have occurred to me. Old machine place unique limits on what you can do, limits that simple don't appear to be around anymore, and as such require you to think outside the box. They are still a valuable learning tool, and can lead to better code on a day-to-day basis. I've now used the Stack Allocator in several applications I've since written, and found it much simpler/quicker to implement than linked lists, not to mention easier to follow. I love how retro coding really can teach an old dog new tricks....