Friday, April 09, 2010


Okay, so perhaps I should have been a little clearer, but the general rule of question one sticks through out. So here we got...

1) Memory Access. This is it pretty much it, especially on today's hardware but does have a similar effect (if not as profound) on older machines as well. If someone is reading/writing gigs of data every frame, it's gonna suck, not just because they have a huge loop in there, but because in modern computing (and we'll stick with this just now), memory is the number 1 enemy.

In the past, CPU access was around the same speed as memory, so it would be a little slower to read from memory - usually just a few cycles. These days, memory is incredibility slow compared to the CPU. Register operations will (in real terms) take less than a cycle, while un-cached memory access can take thousands of cycles (once you remember page faults and the rest).

This is crazy, so the less memory you can touch the better. Now... this may include reducing passes over data and doing it in one pass (while doing a little more register based work), or simply removing data access and tables if you can do it in a simple calculation. In the past, we used to have multiply and divide tables, but these days this table can be so expensive, you're far better just doing the ASM instruction which only takes a few cycles.

So, heres a real world example - particles. If your doing particles on the main CPU (we'll ignore the GPU for now), then the smaller you can make your particle structure the more you'll be able to render; not because you can draw more, but simply because the CPU only has a limited memory bandwidth and reducing that means you can do more, or better yet, do other things - like gameplay.

I've seen it over and over again. People continually looping over large amounts of data, wondering why things are slow when it's not that much data they're processing. Remember in this day of the multitasking OS, a 1Mb cache is not yours alone. You're data will be continually kicked out by other processes, so even if you only have 64K of data, you'll be surprised how little time it spends in the cache. The answer is to prefetch the next block, and do a little more processing on the current one, thereby reducing the number of iterations you have to do. After all, if your talking 400 cycles (say) to read a cacheline (around 64bytes last time I checked), then why not use the 400 cycles doing something instead of 400 cycles waiting on memory coming into the cache?

2) This actually has nothing to do with optimisation - by bad. Its a simple 2 part question...

2.a) Release it. No game, or application is any good if you never release it. No matter how shiny, awe inspiring, or ground breaking; who cares if it never sees the light of day? So rule 1 of any program development, make sure you get something out, or it's just wasted effort.

2.b) Make it fun. In games, it's easy to release something with lots of features and levels, but if it's not fun, no ones gonna play it. It's that simple. I can name several games that appear to have been developed by idiots. Games that were all gloss and no gameplay. Some teams fixate of making things as pretty as possibly, but thats really not the most important thing. You have to enjoy being IN the game, or like 2.a it's pointless and a waste of time and effort. You'd be amazed how often this rule is ignored, or something particularly frustrating removes all the fun that SHOULD be there.

So there you go... Yeah, not the best phrased questions, but I bet looking at these you're either nodding in agreement, or shouting at the monitor something like "Rubbish! Algorithms are FAR more important!!". Well, this is true... but given even a reasonable algorithm, you can then apply the memory rule and speed it up more. The less memory you touch, the quicker your code will be, it's that simple.

oh... and no smart answers about being in a calculation loop with no memory access for a second - we're assuming you're not a moron. :)


Dave Shadoff said...

Yeah, those resonate.

I got half of #2 right when I said (facetiously) "get paid", and (seriously) "control scope" - in other words, release it.

Kind of overlooked the obvious on making it fun, but I guess that's because 90% of games in the past 10 years haven't been fun....

For #1, I am a little surprised that modern games traverse memory quite *that* much, but if I think about it, it's plausible. On the other hand, I work with business programs, and I thought less specifically when you used the generic term "program". It would be pretty hard (and basically insanely stupid) to write a business app that traverses that much data without being Extract-Translate-Load (ETL database), or some sort of technical app which used to be relegated to supercomputers.

Mike said...

Actually, you'd be surprised... if anyone uses STL, or makes they're own "dynamic" arrays or lists, then this can seriously screw with memory. Even if your not looping over it lots, you may well be memcpy() shed loads of stuff in the background without knowing it, and this will whack your memory bandwidth just has hard...

Dave Shadoff said...

If you're talking about C++ usage, I'm not surprised at all - it's a terror for wastage for the average programmer.

I'm always finding crap like unnecessary instantiation/destruction of a non-primitive data type in an inner loop, causing massive CPU waste and memory fragmentation.

It's why I prefer to stay away from C++. Well, that and the fact that debugging even 'c = a + b' can be stupidly complex. (ie. you have to understand the entire inheritance hierarchy of 'a' to determine whether '+' has been overloaded for 'b' types to know whether it has a non-standard meaning) ...but let's not go any further into my gripes with C++. I have too many.

Anonymous said...

"if anyone uses STL ... then this can seriously screw with memory"

Can you explain what you mean here?

Mike said...

Oops.. missed this comment.

STL lists and the like can dynamically shrink and grow, but to do so they allocate new arrays and copy stuff back and forth. This means when doing a simple operation like adding elements to a list, you could end up allocating and reallocating lots of memory. This is bad.

You'll notice I said could. Well, you can pre-allocate space but many don't as they don't really understand what STL is doing in the background. Still, on top of this STL can also do other unexpected things like go in/out of critical sections for thread safty which slows everything down as well.

I'm not a huge fan of STL - or templates in general, I've seen cases where the program has doubled in size because some noob programmer just doesn't know what's going on with it. The BOOST library is a case in point and bloats your code+data with utter crap most of the time.

Of course like anything in programming, you have to KNOW what you're getting yourself in for with any lib, step through it and understand it, if you don't then your gonna get stung.