Okay, so perhaps I should have been a little clearer, but the general rule of question one sticks through out. So here we got...
1) Memory Access. This is it pretty much it, especially on today's hardware but does have a similar effect (if not as profound) on older machines as well. If someone is reading/writing gigs of data every frame, it's gonna suck, not just because they have a huge loop in there, but because in modern computing (and we'll stick with this just now), memory is the number 1 enemy.
In the past, CPU access was around the same speed as memory, so it would be a little slower to read from memory - usually just a few cycles. These days, memory is incredibility slow compared to the CPU. Register operations will (in real terms) take less than a cycle, while un-cached memory access can take thousands of cycles (once you remember page faults and the rest).
This is crazy, so the less memory you can touch the better. Now... this may include reducing passes over data and doing it in one pass (while doing a little more register based work), or simply removing data access and tables if you can do it in a simple calculation. In the past, we used to have multiply and divide tables, but these days this table can be so expensive, you're far better just doing the ASM instruction which only takes a few cycles.
So, heres a real world example - particles. If your doing particles on the main CPU (we'll ignore the GPU for now), then the smaller you can make your particle structure the more you'll be able to render; not because you can draw more, but simply because the CPU only has a limited memory bandwidth and reducing that means you can do more, or better yet, do other things - like gameplay.
I've seen it over and over again. People continually looping over large amounts of data, wondering why things are slow when it's not that much data they're processing. Remember in this day of the multitasking OS, a 1Mb cache is not yours alone. You're data will be continually kicked out by other processes, so even if you only have 64K of data, you'll be surprised how little time it spends in the cache. The answer is to prefetch the next block, and do a little more processing on the current one, thereby reducing the number of iterations you have to do. After all, if your talking 400 cycles (say) to read a cacheline (around 64bytes last time I checked), then why not use the 400 cycles doing something instead of 400 cycles waiting on memory coming into the cache?
2) This actually has nothing to do with optimisation - by bad. Its a simple 2 part question...
2.a) Release it. No game, or application is any good if you never release it. No matter how shiny, awe inspiring, or ground breaking; who cares if it never sees the light of day? So rule 1 of any program development, make sure you get something out, or it's just wasted effort.
2.b) Make it fun. In games, it's easy to release something with lots of features and levels, but if it's not fun, no ones gonna play it. It's that simple. I can name several games that appear to have been developed by idiots. Games that were all gloss and no gameplay. Some teams fixate of making things as pretty as possibly, but thats really not the most important thing. You have to enjoy being IN the game, or like 2.a it's pointless and a waste of time and effort. You'd be amazed how often this rule is ignored, or something particularly frustrating removes all the fun that SHOULD be there.
So there you go... Yeah, not the best phrased questions, but I bet looking at these you're either nodding in agreement, or shouting at the monitor something like "Rubbish! Algorithms are FAR more important!!". Well, this is true... but given even a reasonable algorithm, you can then apply the memory rule and speed it up more. The less memory you touch, the quicker your code will be, it's that simple.
oh... and no smart answers about being in a calculation loop with no memory access for a second - we're assuming you're not a moron. :)