Wednesday, July 09, 2008

C64 Framework

I'm currently cleaning up my new C64 framework and just now its pretty simple. A basic multiplexor with "N" sprites multiplexed (currently 16) and rasters going and I was wondering what else I should add?

Got any ideas as to what the basic frame work should contain? It should be fairly broad scope as theres no point in specialising stuff since it would get ripped out pretty quickly by anyone actually using it, but if you can think of some common stuff you always use, like keyboard routines, joystick functions and the like... then I'll probably add them in too.

So let me know what you think would be handy....

Sunday, July 06, 2008

Back to work tomorrow....

I've had a reasonably fun week or so off, but it'll be back to the usual grind tomorrow which will mean fewer updates and less playing around. Oh well, its been fun :)

I'm currently playing with lots of different things just now, flicking back and forth as the mood takes me. Currently I've got my SuperCPU hooked up as I was playing with a true dual-playfield system. Its got 2 bitmaps and masks one onto the other if you can believe it! I've said it before but Metal Dust I don't think used the power very well and does look like just a nice C64 game. To my mind the new Enforcer 2 actually looks just as good but is on a stock machine. I think you need to do something really different - something that would be impossible on a stock machine, and I'm still playing around with what that could be.

The problem with TRUE dual playfields is that you can't make best use of colour, in fact your pretty much back to 3 colour displays or you end up with huge colour clash again, and thats not very SUPER. Also to make it simpler, you really need to manually smooth scroll the front layer and that means you can't make them up from a software character map meaning you can't have it all animating away which would also be a shame. Still, I'm working away on it...

On the C64 XeO3 front I've obviously been pluggin in my new multiplexor and speeding that up, although there is one more major speed up needed here and thats when I do the sprite copy to the IRQ buffers, I also copy in the first seven sprites, however these then immediately get copied into the hardware. This means an obvious speed up is to skip that and copy the first seven sprites into the hardware directly. This will slow down the copy a little, but not having to duplicate it will be a big speedup over all - I hope.

PC Engine.... I've almost got my assembler going although I need to add a few more addressing modes to deal with TEST #??,blah types. After that I can start to play properly, however my PC Engine has started to act up and it looks like its dying, so I'll need to find my TG16 in the loft at somepoint.

Lastly... XeO3 on the Plus4... After playing Blood Money a little again, I'm starting to sway towards the player dying but the action keeps on going. You then come back on flickering or something to give you a little bit of a shield and then carry on from there. This means you won't be thrown back to the start of the level, but you WILL lose some weapon power (not all). The reason for my sudden reversal is that after playing and watching Blood Money I realise that its gonna get hard later on anyway, so even if you don't get thrown back to the start, you're not gonna finish is easily anyhow - so what the hell. It will hopefully mean most folk won't get too frustrated early on, but it does mean it'll get pretty hard later on, which was to be expected anyway I guess. I'll try this out and get a few peoples feedback, or we might even release another test although I think that unlikely for now.

Oh... and a quickie. With the C64 codebase getting pretty stable, I think I'll be releasing a framework sooner rather than later. So it won't be the full XeO3 source yet, but it will be a nice framework for starting a game in, complete with a production quality multiplexor. People can then strip it to bits and bend it into thier own idea of a framework and make some toys with it..... we can hope.

And thats where things stand for now! Fun, fun, fun!! Any questions/suggestions/gripes/feedback, please just post a comment.

Saturday, July 05, 2008

C64 raster splits

C64 sprites are great and all, but they really screw with your raster splits. Depending on how many sprites are over a raster, what character line we're on you may end up with virtually NO cpu cycles left at all! Looking up some timing info on the VIC (HERE) shows that BAD lines+ all sprites leaves you with less that 10 CPU cycles a scanline, and only 4 write cycles at the start.

Theres a few ways to get a cleaner split, but most use too much memory. However, I've got a reasonably stable raster for now and although it flickers a little now and then (when LOTS of sprites scross the panel raster) its mostly okay.

I guess the best way would be to count the number of sprites that cross then do some kind of variable delay depending on the count. Thats faffy stuff, and I could spent LOTS of time trying to get a clean raster split, so I'm happy enough just now with a basic flicker... I might fix it one day....

Actually.... If I lose the 1st pixel of the panel, the flicker won't be visible, thats probably the best solution.

EDIT: Well, removing the 1st pixel of the chars at the top of the panel actually works pretty well. I now have a solid raster split no matter what crosses the raster - cool.

Friday, July 04, 2008

XeO3: C64...

I've spent this evening plugging in the new sprite multiplexor sort to XeO3 which has given it a little speed boost as well. It's nice when all these things come together like this, bit of code here... function there. I've also sped up some of the other functions in the IRQ although these are all pretty minor. Still, faster IS faster...

I've had to start freeing up some zero-page though as I've started to run out. Right at the start I allocated 30 bytes for TEMP usage, and that was a bit silly, as it now means that temp+?? is used everywhere and its going to be very hard to reduce that count - even though I suspect that less than 15 bytes are actually used. Damn.

I've also shuffled some variable usage around so that most of the sort now accesses Zero-Page which helps with speed a little too. However, the multiplexor IRQ doesn't and that means its just a little slower than is possible, but ZeroPage is severly limited really so....

Sprite expansion....

You know I don't think I realised just how expensive X and Y sprite expansion was to the multiplexor. It's used twice (that I can remember), the reverse control radio transmitter, and the giant grab claws, and while its nice every sprite in the multiplexor has to store and process these bits! If I remove this feature the copy section of the sort is reduced by another 6 scanlines - SIX!!! Now that means each sprite pays around 1/3 a scanline for it in the copy which isn't much, but the more you have the more you pay.

On top of that we also have the display IRQ's which again has to pay the cost for only a couple of sprites in the whole game! Now I'm wondering if its worth the expense, or should I just drop those baddies altogether? Actually... only the CrabClaws need to be dropped (or reworked) since the tower would only require 1 extra sprite which is hardly expensive!

Anyway, I know when I was coding them I thought it was cool to allow expansion in a multiplexor as no one else did, and it allowed for HUGE baddies (oh..come to think of it, the level2 BOSS used it as well), but is it really worth the lost CPU time?

Quick tip....

So.... in the old days when I was trying to speed up a function, I'd change the border colour, mark in pencil where it was on the side of the monitor make changes and try and see if it was any faster - not very accurate I'm sure you'll agree. I had to make sure my head was in the same place (which usually meant marking the TOP and bottom of the bar), and with other functions marked in the same way, the monitor got a bit messy.

Now with emultors its even harder, they startup in a window and you cant draw on a TFT like you could a CRT, and I really dont want to draw on the side of it. I also used to use post-it's but again, its not very accurate. So what I do now is grab a screenshot of the first slow version and paste it into paintshop pro (or paint.net), then speed it up, then grab it again. Now I paste that as a new layer and apply a little transparancy to it so I can see the original through it - hay presto! I can see instantly how much faster (or slower) it actually is!

The image on the right shows timings for the new sort over the old one. The white+blue at the bottom is the new sort+copy, while the dark gray at the very bottom is the extra time the OLD sort+copy used to take. And if you copy the gray bit out, you'll notice its 9 raster lines (remember its double sized so 2 pixels = 1 raster) faster. Quite a saving, and VERY easy to see. I actually have an image with several old versions layered on so I can go back and forward through them easily which is pretty neat.

Thursday, July 03, 2008

Minor increase.... But size isn't important...

Well, I've managed to shave 8 raster lines off the sort using the new system and it's all nice and stable now so I'm gonna stick with this new one. The good news is I'll be able to use this in the C64 version of XeO3 as well, so it's been a good couple days of playing. However, it's not the huge boost I was wanting, but its better than nothing! The new sort is shown below...

              ;---------------------------------
; NEW multiplexor sort
;---------------------------------
ldx #15
!FindFirst: lda yy,x
bne !FoundFirst
!BackHere: dex
bpl !FindFirst
lda #0
sta yc
rts ; NO sprites on!


!FoundFirst lda Anim_Current,x ; if shape >= 200 then DONT multiplex
cmp #200
bcs !BackHere

;
; Found 1st sprite
;
lda #-1 ; set first active as last in list
sta SPNext+1,x
stx SPNext ; and set first active as FIRST
dex ; and move on one - we dont need to do the 1st one
bmi !AllDone
!SortAll
ldy yy,x
beq !DoNext
lda Anim_Current,x ; if shape >= 200 then DONT multiplex
cmp #200
bcs !DoNext
tya

stx xcount ; Sprite number we're inserting
ldx #-1 ; pLast
ldy SPNext ; pCurrent
;
; X and Y take turns about at being pCurrent and pLast...
; The first itteration X=pLast, Y=pCurrent. The second is reversed.
;
!FindSpace:
cmp yy,y
bcc !InsertHere
ldx SPNext+1,y ; get next
bmi !InsertHere2 ; Not end of list... so keep going!
cmp yy,x
bcc !InsertHere2
ldy SPNext+1,x ; get next
bpl !FindSPace ; Not end of list... so keep going!
!InsertHere
inx ; allow for -1 when head of list
lda xcount ; Set last->pNext to be this one
sta SPNext,x
tax ; Now move to the new entry
sty SPNext+1,x
ldx xcount ; Sprite number we're inserting
dex
bpl !SortAll
bmi !AlLDone

!InsertHere2
iny ; allow for -1 when head of list
lda xcount ; Set last->pNext to be this one
sta SPNext,y
tay ; Now move to the new entry
stx SPNext+1,y
ldx xcount ; Sprite number we're inserting
!DoNext
dex
bpl !SortAll
!AlLDone


So the idea here is that SPNext is a list of 17 bytes with 0 being the First in the 1D indexed-linked list. That way setting pFirst rather than an index is no different thanks to the inx/iny being used at the start of the insert code.
The only slight downer here is I need to loop forward to find the 1st allocated entry to start from rather than let the main loop do it itself. Still, its not really a slow down, just another bit of code to run.

The other trick is to run a paired inner loop. This lets me keep a LAST value without having to transfer it via A every frame, and that in turn lets me load A up with the Y value to compare at the start, and then I never have to reload it. It works pretty well although depending on the actual order of the sprites, timing may vary by a good few scanlines.

Multiplexor fun!

I've been playing with trying to speed up the multiplexor sorting, and I thought I had a much quicker way of doing it too. The old method was dumb, really dumb. I'd run through all 16 sprites, get the smallest Y value one store it, set it to $FF and then loop again until there were none left. VERY simple stuff. However that means for the worst case its doing a 16*16 loop, with a preloop of 16 to setup some values.

So, I though, "How about an insertion linked list?" Simply add a sprite index to a 1D indexed-linked-list at the correct point. Now, worst case shouldn't be as bad as 16*16 since you will only ever check with whats in the list already. Now for the 1st few, that means theres only a couple of entrys there, while the last few will obviously check to n-1(ish). Now this sounds all very good, and I'd hoped that it double the speed.... however, that was not to be. Would you believe for the worst case, its only about 1 scanline faster! Damn!

It's still not working 100% but its mostly there, and while the code is around 72 bytes smaller, its much harder to follow (since linked lists under 6502 are tricky anyway) so I'm wondering which version I should take....

After the sort I need to copy all the sprite data into the IRQ buffer for displaying, so thats another N loop as well. On the plus side, when theres not as much on screen its quite a bit quicker, and only gets bad once a load of things come on. I guess this means that when I'm displaying lots of character sprites and only a couple of H/W sprites I'd win out over all; still, worst case is usually the ones to watch for.... *sigh*

Tuesday, July 01, 2008

Optimising the IRQ's...

Last post of today - honest!

I was watching the timing bars on the IRQ's and had to again shake my head in appreciation of Dan's multiplexor, his trick of setting up a sprite if its already been displayed rather than kick off a new IRQ is top notch. I've added this to Blood money and you can see the result here. (Remmber I only multiplex 6 sprites, so theres more IRQ's needed than normal)

The bars on the LEFT are the old IRQ's, while the RED bars on the right are the new ones. You can see its almost halved the number and is happily packing them together thanks to about 6 lines of code. The multiplexor code is pretty slow really, and it was never unrolled, this means it has to keep track of VIC indexes etc. and thats not good. However, theres only 200 bytes free just now so I'd need to free up a lot more memory to be able to plug in the new one I've written for XeO3.

The saftly limit I've got in Blood Money is also 22 raster lines; thats the number of lines above where I kick off a new raster interrupt, while XeO3's magic number is 10! That means XeO3 can get sprites 12 scanlines closer than Blood Money which is a huge improvement.

Anyhow before I do anything major I need to free up memory... I think the front end takes up way too much, and all the fancy disolving characters (which was a fad at the time) also eats up space. Theres also lots of variables. I have space for 30 bullets here - and theres no difference between play and alien bullets! - so that should probably be cut down, and 12 character sprites which again I think could be reduced.

I also need to update the character sprite system as its not only slow, but embarrassing! The clipping is simply a loop to count how many chars are off screen, rather than just subtracting the thing! Stupid... Still I was young, I did that routine when I was 17 or 18!

Mmmm.... How thick was I!!

I do sometimes wonder what the hell I was thinking back when I was doing C64 stuff..... Now thats its all running I've been looking at the code a little, and some of its shocking! Now, I know commercial presures and all, but this was my dream game, and I remember thinking I was doing things so well.... But I really was young (only 19!) and I was doing whole games on my own - still, I was a twat sometimes.

The fastest code you can write, is code you can delete and never run, simple as that. Now the turrets in Blood Money are a case in point. It takes AGES to process and draw them, so with only a four on screen its taking up to 32 scan lines to process them! Thats ridiculous!! XeO3's turrets are a nightmare it has to be said. The scrolling is slower because of them, and when you hit them, its horrible! BUT! They're quick.... I never have to print them or really think very much about them until they shoot, or you shoot them. Here though, I draw them, do collision with them (bounding boxes) and then blow them up.... horrible.

Turrets should really be part of the landscape and drawn in there - like XeO3's, then you just dont have to worry about it. It would be trick in Blood Money because of the multi directional scrolling, and because the screen is drawn in 2 blits (theres actually 3 screens. 2 being drawn as they scroll between them, and a 3rd being built), but it is possible. Failing all that.... Im pretty sure you could at least speed them up!