The life of a Games Programmer

Saturday, July 05, 2008

C64 raster splits

C64 sprites are great and all, but they really screw with your raster splits. Depending on how many sprites are over a raster, what character line we're on you may end up with virtually NO cpu cycles left at all! Looking up some timing info on the VIC (HERE) shows that BAD lines+ all sprites leaves you with less that 10 CPU cycles a scanline, and only 4 write cycles at the start.

Theres a few ways to get a cleaner split, but most use too much memory. However, I've got a reasonably stable raster for now and although it flickers a little now and then (when LOTS of sprites scross the panel raster) its mostly okay.

I guess the best way would be to count the number of sprites that cross then do some kind of variable delay depending on the count. Thats faffy stuff, and I could spent LOTS of time trying to get a clean raster split, so I'm happy enough just now with a basic flicker... I might fix it one day....

Actually.... If I lose the 1st pixel of the panel, the flicker won't be visible, thats probably the best solution.

EDIT: Well, removing the 1st pixel of the chars at the top of the panel actually works pretty well. I now have a solid raster split no matter what crosses the raster - cool.

Friday, July 04, 2008

XeO3: C64...

I've spent this evening plugging in the new sprite multiplexor sort to XeO3 which has given it a little speed boost as well. It's nice when all these things come together like this, bit of code here... function there. I've also sped up some of the other functions in the IRQ although these are all pretty minor. Still, faster IS faster...

I've had to start freeing up some zero-page though as I've started to run out. Right at the start I allocated 30 bytes for TEMP usage, and that was a bit silly, as it now means that temp+?? is used everywhere and its going to be very hard to reduce that count - even though I suspect that less than 15 bytes are actually used. Damn.

I've also shuffled some variable usage around so that most of the sort now accesses Zero-Page which helps with speed a little too. However, the multiplexor IRQ doesn't and that means its just a little slower than is possible, but ZeroPage is severly limited really so....

Sprite expansion....

You know I don't think I realised just how expensive X and Y sprite expansion was to the multiplexor. It's used twice (that I can remember), the reverse control radio transmitter, and the giant grab claws, and while its nice every sprite in the multiplexor has to store and process these bits! If I remove this feature the copy section of the sort is reduced by another 6 scanlines - SIX!!! Now that means each sprite pays around 1/3 a scanline for it in the copy which isn't much, but the more you have the more you pay.

On top of that we also have the display IRQ's which again has to pay the cost for only a couple of sprites in the whole game! Now I'm wondering if its worth the expense, or should I just drop those baddies altogether? Actually... only the CrabClaws need to be dropped (or reworked) since the tower would only require 1 extra sprite which is hardly expensive!

Anyway, I know when I was coding them I thought it was cool to allow expansion in a multiplexor as no one else did, and it allowed for HUGE baddies (oh..come to think of it, the level2 BOSS used it as well), but is it really worth the lost CPU time?

Quick tip....

So.... in the old days when I was trying to speed up a function, I'd change the border colour, mark in pencil where it was on the side of the monitor make changes and try and see if it was any faster - not very accurate I'm sure you'll agree. I had to make sure my head was in the same place (which usually meant marking the TOP and bottom of the bar), and with other functions marked in the same way, the monitor got a bit messy.

Now with emultors its even harder, they startup in a window and you cant draw on a TFT like you could a CRT, and I really dont want to draw on the side of it. I also used to use post-it's but again, its not very accurate. So what I do now is grab a screenshot of the first slow version and paste it into paintshop pro (or paint.net), then speed it up, then grab it again. Now I paste that as a new layer and apply a little transparancy to it so I can see the original through it - hay presto! I can see instantly how much faster (or slower) it actually is!

The image on the right shows timings for the new sort over the old one. The white+blue at the bottom is the new sort+copy, while the dark gray at the very bottom is the extra time the OLD sort+copy used to take. And if you copy the gray bit out, you'll notice its 9 raster lines (remember its double sized so 2 pixels = 1 raster) faster. Quite a saving, and VERY easy to see. I actually have an image with several old versions layered on so I can go back and forward through them easily which is pretty neat.

Thursday, July 03, 2008

Minor increase.... But size isn't important...

Well, I've managed to shave 8 raster lines off the sort using the new system and it's all nice and stable now so I'm gonna stick with this new one. The good news is I'll be able to use this in the C64 version of XeO3 as well, so it's been a good couple days of playing. However, it's not the huge boost I was wanting, but its better than nothing! The new sort is shown below...

              ;---------------------------------
              ; NEW multiplexor sort
              ;---------------------------------
              ldx   #15
!FindFirst:   lda   yy,x
              bne   !FoundFirst
!BackHere:    dex
              bpl   !FindFirst
              lda   #0
              sta   yc
              rts                            ; NO sprites on!

   
!FoundFirst   lda   Anim_Current,x           ; if shape >= 200 then DONT multiplex
              cmp   #200
              bcs   !BackHere

              ;
              ; Found 1st sprite
              ;
              lda   #-1                      ; set first active as last in list
              sta   SPNext+1,x
              stx   SPNext                   ; and set first active as FIRST
              dex                            ; and move on one - we dont need to do the 1st one
              bmi   !AllDone
!SortAll
              ldy   yy,x
              beq   !DoNext
              lda   Anim_Current,x           ; if shape >= 200 then DONT multiplex
              cmp   #200
              bcs   !DoNext
              tya

              stx   xcount                   ; Sprite number we're inserting
              ldx   #-1                      ; pLast
              ldy   SPNext                   ; pCurrent
              ;
              ; X and Y take turns about at being pCurrent and pLast...
              ; The first itteration X=pLast, Y=pCurrent. The second is reversed.
              ;
!FindSpace:
              cmp   yy,y
              bcc   !InsertHere
              ldx   SPNext+1,y               ; get next
              bmi   !InsertHere2             ; Not end of list... so keep going!
              cmp   yy,x
              bcc   !InsertHere2
              ldy   SPNext+1,x               ; get next
              bpl   !FindSPace               ; Not end of list... so keep going!
!InsertHere
              inx                            ; allow for -1 when head of list
              lda   xcount                   ; Set last->pNext to be this one
              sta   SPNext,x
              tax                            ; Now move to the new entry
              sty   SPNext+1,x
              ldx   xcount                   ; Sprite number we're inserting
              dex
              bpl   !SortAll
              bmi   !AlLDone

!InsertHere2
              iny                            ; allow for -1 when head of list
              lda   xcount                   ; Set last->pNext to be this one
              sta   SPNext,y
              tay                            ; Now move to the new entry
              stx   SPNext+1,y
              ldx   xcount                   ; Sprite number we're inserting
!DoNext
              dex
              bpl   !SortAll
!AlLDone

So the idea here is that SPNext is a list of 17 bytes with 0 being the First in the 1D indexed-linked list. That way setting pFirst rather than an index is no different thanks to the inx/iny being used at the start of the insert code.
The only slight downer here is I need to loop forward to find the 1st allocated entry to start from rather than let the main loop do it itself. Still, its not really a slow down, just another bit of code to run.

The other trick is to run a paired inner loop. This lets me keep a LAST value without having to transfer it via A every frame, and that in turn lets me load A up with the Y value to compare at the start, and then I never have to reload it. It works pretty well although depending on the actual order of the sprites, timing may vary by a good few scanlines.

Multiplexor fun!

I've been playing with trying to speed up the multiplexor sorting, and I thought I had a much quicker way of doing it too. The old method was dumb, really dumb. I'd run through all 16 sprites, get the smallest Y value one store it, set it to $FF and then loop again until there were none left. VERY simple stuff. However that means for the worst case its doing a 16*16 loop, with a preloop of 16 to setup some values.

So, I though, "How about an insertion linked list?" Simply add a sprite index to a 1D indexed-linked-list at the correct point. Now, worst case shouldn't be as bad as 16*16 since you will only ever check with whats in the list already. Now for the 1st few, that means theres only a couple of entrys there, while the last few will obviously check to n-1(ish). Now this sounds all very good, and I'd hoped that it double the speed.... however, that was not to be. Would you believe for the worst case, its only about 1 scanline faster! Damn!

It's still not working 100% but its mostly there, and while the code is around 72 bytes smaller, its much harder to follow (since linked lists under 6502 are tricky anyway) so I'm wondering which version I should take....

After the sort I need to copy all the sprite data into the IRQ buffer for displaying, so thats another N loop as well. On the plus side, when theres not as much on screen its quite a bit quicker, and only gets bad once a load of things come on. I guess this means that when I'm displaying lots of character sprites and only a couple of H/W sprites I'd win out over all; still, worst case is usually the ones to watch for.... *sigh*

Tuesday, July 01, 2008

Optimising the IRQ's...

Last post of today - honest!

I was watching the timing bars on the IRQ's and had to again shake my head in appreciation of Dan's multiplexor, his trick of setting up a sprite if its already been displayed rather than kick off a new IRQ is top notch. I've added this to Blood money and you can see the result here. (Remmber I only multiplex 6 sprites, so theres more IRQ's needed than normal)

The bars on the LEFT are the old IRQ's, while the RED bars on the right are the new ones. You can see its almost halved the number and is happily packing them together thanks to about 6 lines of code. The multiplexor code is pretty slow really, and it was never unrolled, this means it has to keep track of VIC indexes etc. and thats not good. However, theres only 200 bytes free just now so I'd need to free up a lot more memory to be able to plug in the new one I've written for XeO3.

The saftly limit I've got in Blood Money is also 22 raster lines; thats the number of lines above where I kick off a new raster interrupt, while XeO3's magic number is 10! That means XeO3 can get sprites 12 scanlines closer than Blood Money which is a huge improvement.

Anyhow before I do anything major I need to free up memory... I think the front end takes up way too much, and all the fancy disolving characters (which was a fad at the time) also eats up space. Theres also lots of variables. I have space for 30 bullets here - and theres no difference between play and alien bullets! - so that should probably be cut down, and 12 character sprites which again I think could be reduced.

I also need to update the character sprite system as its not only slow, but embarrassing! The clipping is simply a loop to count how many chars are off screen, rather than just subtracting the thing! Stupid... Still I was young, I did that routine when I was 17 or 18!

Mmmm.... How thick was I!!

I do sometimes wonder what the hell I was thinking back when I was doing C64 stuff..... Now thats its all running I've been looking at the code a little, and some of its shocking! Now, I know commercial presures and all, but this was my dream game, and I remember thinking I was doing things so well.... But I really was young (only 19!) and I was doing whole games on my own - still, I was a twat sometimes.

The fastest code you can write, is code you can delete and never run, simple as that. Now the turrets in Blood Money are a case in point. It takes AGES to process and draw them, so with only a four on screen its taking up to 32 scan lines to process them! Thats ridiculous!! XeO3's turrets are a nightmare it has to be said. The scrolling is slower because of them, and when you hit them, its horrible! BUT! They're quick.... I never have to print them or really think very much about them until they shoot, or you shoot them. Here though, I draw them, do collision with them (bounding boxes) and then blow them up.... horrible.

Turrets should really be part of the landscape and drawn in there - like XeO3's, then you just dont have to worry about it. It would be trick in Blood Money because of the multi directional scrolling, and because the screen is drawn in 2 blits (theres actually 3 screens. 2 being drawn as they scroll between them, and a 3rd being built), but it is possible. Failing all that.... Im pretty sure you could at least speed them up!

More Blood Money from a stone....

I've managed to get everything back up and running properly now, well... except shops. For some reason they aren't being printed correctly at all. Turns out the multiply routine had a bug, probably a character removed by accident one of the times I was browsing away at the source. Anyway, that fixed the character sprites, and the circle script function (which was the other crash). So now the whole of level 1 is playing away fine and Im wondering what I can change first. I suspect I'll back up what I have working here as its taken quite a while to get it running again, but after that I'd like to try and put in a static starfield.

For now Im trying to replace the xply routine I had there as the new one I use in XeO3 is a third quicker taking only 119 cycle (worst) as opposed to 164. That will save around 32 scanlines in itself when all 16 sprites are in use doing the circle command (which is when the speed drops to a crawl). However its not working, and I'm not really sure why. Looks like the low byte of the result is wrong - very odd.

Blood Money Lives Again!!

I've been playing with trying to get my old Blood Money source to recompile off and on for a while now, and its finally rebuilding and running! Well, more or less. It still has some issues to be sure; character sprites appear to be killing it stone dead, and if I switch them off, I get a little further than then the game dies.

It's good to see it actually building and running though. One thing thats always bugged my was the crappy starfield I had on level 1 and I'd love to take some time and tune it up a bit. I'd also love to make a proper MMC64 version, one that loads directly from an MMC card - which would be cool.

Theres lots of things I did wrong back then. Shops weren't part of the background but drawn, as were turrets. Pretty stupid really as the only follow the screen - something the background map does pretty well without any extra CPU time. I'd also take away the 2 player mode and give that extra sprite over to the multiplexor. (Although.... a 2 player internet version would be interesting!) Theres a few areas where things glitch, and it would be nice to smooth that out. Talking of multiplexors, I would love to replace the one there and put my new one in as the IRQ is about twice the speed (I think).

I've also sped up the Character sprites a bit, and mu multiply routine (used for all the circle paths) is also about twice the speed. I suspect I could get it under 2 frames easily enough. Yes, believe it or not Blood money ran in 3 frames, with the main player ship running in the vertical blank.

So yes... theres lots of things I'd like to take a crack at fixing, although its gonna be tough as memory is down to a few bytes left! I've no idea how many sprites I used, but while I was aiming to get them all in, I think a redux-version could lose a bit to save memory. They were stored compressed then blitted down when needed. I might be able to speed that up or remove it completely.

Oh.... and get rid of that stupid front end screen....its pathetic... Not to mention the flickery raster on the game.

And don't worry.... I'm not setting aside XeO3 to do this, it's a project I'm doing in the background as I always wanted a console version of Blood Money, with nice dual playfields etc. and this is the first step.