Friday, July 04, 2008

Quick tip....

So.... in the old days when I was trying to speed up a function, I'd change the border colour, mark in pencil where it was on the side of the monitor make changes and try and see if it was any faster - not very accurate I'm sure you'll agree. I had to make sure my head was in the same place (which usually meant marking the TOP and bottom of the bar), and with other functions marked in the same way, the monitor got a bit messy.

Now with emultors its even harder, they startup in a window and you cant draw on a TFT like you could a CRT, and I really dont want to draw on the side of it. I also used to use post-it's but again, its not very accurate. So what I do now is grab a screenshot of the first slow version and paste it into paintshop pro (or paint.net), then speed it up, then grab it again. Now I paste that as a new layer and apply a little transparancy to it so I can see the original through it - hay presto! I can see instantly how much faster (or slower) it actually is!

The image on the right shows timings for the new sort over the old one. The white+blue at the bottom is the new sort+copy, while the dark gray at the very bottom is the extra time the OLD sort+copy used to take. And if you copy the gray bit out, you'll notice its 9 raster lines (remember its double sized so 2 pixels = 1 raster) faster. Quite a saving, and VERY easy to see. I actually have an image with several old versions layered on so I can go back and forward through them easily which is pretty neat.

Thursday, July 03, 2008

Minor increase.... But size isn't important...

Well, I've managed to shave 8 raster lines off the sort using the new system and it's all nice and stable now so I'm gonna stick with this new one. The good news is I'll be able to use this in the C64 version of XeO3 as well, so it's been a good couple days of playing. However, it's not the huge boost I was wanting, but its better than nothing! The new sort is shown below...

              ;---------------------------------
; NEW multiplexor sort
;---------------------------------
ldx #15
!FindFirst: lda yy,x
bne !FoundFirst
!BackHere: dex
bpl !FindFirst
lda #0
sta yc
rts ; NO sprites on!


!FoundFirst lda Anim_Current,x ; if shape >= 200 then DONT multiplex
cmp #200
bcs !BackHere

;
; Found 1st sprite
;
lda #-1 ; set first active as last in list
sta SPNext+1,x
stx SPNext ; and set first active as FIRST
dex ; and move on one - we dont need to do the 1st one
bmi !AllDone
!SortAll
ldy yy,x
beq !DoNext
lda Anim_Current,x ; if shape >= 200 then DONT multiplex
cmp #200
bcs !DoNext
tya

stx xcount ; Sprite number we're inserting
ldx #-1 ; pLast
ldy SPNext ; pCurrent
;
; X and Y take turns about at being pCurrent and pLast...
; The first itteration X=pLast, Y=pCurrent. The second is reversed.
;
!FindSpace:
cmp yy,y
bcc !InsertHere
ldx SPNext+1,y ; get next
bmi !InsertHere2 ; Not end of list... so keep going!
cmp yy,x
bcc !InsertHere2
ldy SPNext+1,x ; get next
bpl !FindSPace ; Not end of list... so keep going!
!InsertHere
inx ; allow for -1 when head of list
lda xcount ; Set last->pNext to be this one
sta SPNext,x
tax ; Now move to the new entry
sty SPNext+1,x
ldx xcount ; Sprite number we're inserting
dex
bpl !SortAll
bmi !AlLDone

!InsertHere2
iny ; allow for -1 when head of list
lda xcount ; Set last->pNext to be this one
sta SPNext,y
tay ; Now move to the new entry
stx SPNext+1,y
ldx xcount ; Sprite number we're inserting
!DoNext
dex
bpl !SortAll
!AlLDone


So the idea here is that SPNext is a list of 17 bytes with 0 being the First in the 1D indexed-linked list. That way setting pFirst rather than an index is no different thanks to the inx/iny being used at the start of the insert code.
The only slight downer here is I need to loop forward to find the 1st allocated entry to start from rather than let the main loop do it itself. Still, its not really a slow down, just another bit of code to run.

The other trick is to run a paired inner loop. This lets me keep a LAST value without having to transfer it via A every frame, and that in turn lets me load A up with the Y value to compare at the start, and then I never have to reload it. It works pretty well although depending on the actual order of the sprites, timing may vary by a good few scanlines.

Multiplexor fun!

I've been playing with trying to speed up the multiplexor sorting, and I thought I had a much quicker way of doing it too. The old method was dumb, really dumb. I'd run through all 16 sprites, get the smallest Y value one store it, set it to $FF and then loop again until there were none left. VERY simple stuff. However that means for the worst case its doing a 16*16 loop, with a preloop of 16 to setup some values.

So, I though, "How about an insertion linked list?" Simply add a sprite index to a 1D indexed-linked-list at the correct point. Now, worst case shouldn't be as bad as 16*16 since you will only ever check with whats in the list already. Now for the 1st few, that means theres only a couple of entrys there, while the last few will obviously check to n-1(ish). Now this sounds all very good, and I'd hoped that it double the speed.... however, that was not to be. Would you believe for the worst case, its only about 1 scanline faster! Damn!

It's still not working 100% but its mostly there, and while the code is around 72 bytes smaller, its much harder to follow (since linked lists under 6502 are tricky anyway) so I'm wondering which version I should take....

After the sort I need to copy all the sprite data into the IRQ buffer for displaying, so thats another N loop as well. On the plus side, when theres not as much on screen its quite a bit quicker, and only gets bad once a load of things come on. I guess this means that when I'm displaying lots of character sprites and only a couple of H/W sprites I'd win out over all; still, worst case is usually the ones to watch for.... *sigh*

Tuesday, July 01, 2008

Optimising the IRQ's...

Last post of today - honest!

I was watching the timing bars on the IRQ's and had to again shake my head in appreciation of Dan's multiplexor, his trick of setting up a sprite if its already been displayed rather than kick off a new IRQ is top notch. I've added this to Blood money and you can see the result here. (Remmber I only multiplex 6 sprites, so theres more IRQ's needed than normal)

The bars on the LEFT are the old IRQ's, while the RED bars on the right are the new ones. You can see its almost halved the number and is happily packing them together thanks to about 6 lines of code. The multiplexor code is pretty slow really, and it was never unrolled, this means it has to keep track of VIC indexes etc. and thats not good. However, theres only 200 bytes free just now so I'd need to free up a lot more memory to be able to plug in the new one I've written for XeO3.

The saftly limit I've got in Blood Money is also 22 raster lines; thats the number of lines above where I kick off a new raster interrupt, while XeO3's magic number is 10! That means XeO3 can get sprites 12 scanlines closer than Blood Money which is a huge improvement.

Anyhow before I do anything major I need to free up memory... I think the front end takes up way too much, and all the fancy disolving characters (which was a fad at the time) also eats up space. Theres also lots of variables. I have space for 30 bullets here - and theres no difference between play and alien bullets! - so that should probably be cut down, and 12 character sprites which again I think could be reduced.

I also need to update the character sprite system as its not only slow, but embarrassing! The clipping is simply a loop to count how many chars are off screen, rather than just subtracting the thing! Stupid... Still I was young, I did that routine when I was 17 or 18!

Mmmm.... How thick was I!!

I do sometimes wonder what the hell I was thinking back when I was doing C64 stuff..... Now thats its all running I've been looking at the code a little, and some of its shocking! Now, I know commercial presures and all, but this was my dream game, and I remember thinking I was doing things so well.... But I really was young (only 19!) and I was doing whole games on my own - still, I was a twat sometimes.

The fastest code you can write, is code you can delete and never run, simple as that. Now the turrets in Blood Money are a case in point. It takes AGES to process and draw them, so with only a four on screen its taking up to 32 scan lines to process them! Thats ridiculous!! XeO3's turrets are a nightmare it has to be said. The scrolling is slower because of them, and when you hit them, its horrible! BUT! They're quick.... I never have to print them or really think very much about them until they shoot, or you shoot them. Here though, I draw them, do collision with them (bounding boxes) and then blow them up.... horrible.

Turrets should really be part of the landscape and drawn in there - like XeO3's, then you just dont have to worry about it. It would be trick in Blood Money because of the multi directional scrolling, and because the screen is drawn in 2 blits (theres actually 3 screens. 2 being drawn as they scroll between them, and a 3rd being built), but it is possible. Failing all that.... Im pretty sure you could at least speed them up!

More Blood Money from a stone....

I've managed to get everything back up and running properly now, well... except shops. For some reason they aren't being printed correctly at all. Turns out the multiply routine had a bug, probably a character removed by accident one of the times I was browsing away at the source. Anyway, that fixed the character sprites, and the circle script function (which was the other crash). So now the whole of level 1 is playing away fine and Im wondering what I can change first. I suspect I'll back up what I have working here as its taken quite a while to get it running again, but after that I'd like to try and put in a static starfield.

For now Im trying to replace the xply routine I had there as the new one I use in XeO3 is a third quicker taking only 119 cycle (worst) as opposed to 164. That will save around 32 scanlines in itself when all 16 sprites are in use doing the circle command (which is when the speed drops to a crawl). However its not working, and I'm not really sure why. Looks like the low byte of the result is wrong - very odd.

Blood Money Lives Again!!

I've been playing with trying to get my old Blood Money source to recompile off and on for a while now, and its finally rebuilding and running! Well, more or less. It still has some issues to be sure; character sprites appear to be killing it stone dead, and if I switch them off, I get a little further than then the game dies.

It's good to see it actually building and running though. One thing thats always bugged my was the crappy starfield I had on level 1 and I'd love to take some time and tune it up a bit. I'd also love to make a proper MMC64 version, one that loads directly from an MMC card - which would be cool.

Theres lots of things I did wrong back then. Shops weren't part of the background but drawn, as were turrets. Pretty stupid really as the only follow the screen - something the background map does pretty well without any extra CPU time. I'd also take away the 2 player mode and give that extra sprite over to the multiplexor. (Although.... a 2 player internet version would be interesting!) Theres a few areas where things glitch, and it would be nice to smooth that out. Talking of multiplexors, I would love to replace the one there and put my new one in as the IRQ is about twice the speed (I think).

I've also sped up the Character sprites a bit, and mu multiply routine (used for all the circle paths) is also about twice the speed. I suspect I could get it under 2 frames easily enough. Yes, believe it or not Blood money ran in 3 frames, with the main player ship running in the vertical blank.

So yes... theres lots of things I'd like to take a crack at fixing, although its gonna be tough as memory is down to a few bytes left! I've no idea how many sprites I used, but while I was aiming to get them all in, I think a redux-version could lose a bit to save memory. They were stored compressed then blitted down when needed. I might be able to speed that up or remove it completely.

Oh.... and get rid of that stupid front end screen....its pathetic... Not to mention the flickery raster on the game.

And don't worry.... I'm not setting aside XeO3 to do this, it's a project I'm doing in the background as I always wanted a console version of Blood Money, with nice dual playfields etc. and this is the first step.

Monday, June 30, 2008

My CV....

I've been busy making an about me page which I've been meaning to do for years. The idea being that it'll eventually turn into some kind of life history I guess. Anyway, you can find the first version of it HERE, and although its just a list of things I've done, its a good start.

Sunday, June 29, 2008

TG16 Shadow of the Beast - ROM!!

I've managed to convert the old devkit Beast.BX file into a real BEAST.PCE file! The only reason I was able to do this easily was because years ago when I was doing SotB(Shadow of the Beast) I hated the offical debugger so much, I wrote my own! If you want to see what I was dealing with, drop down into a command prompt and type DEBUG. This is what the offical devkit was like - terrible! So I spent a couple of months after work hacking the debugger and working out how it talked to the ICE unit, then I wrote my own. The first thing I did of course was to upload the assembled .BX files, and to do that I had to reverse engineer the format of that as well as the symbol file.

Of course back then I was having so much fun doing this I stayed at the office and worked late into the night. It took a couple of weeks to get to a usable point, and around 3 months to actually get to a finished state, but it was a vast improvement. I remember being pissed off at Dave though as he refused to try and sell it back to NEC, even though they would have been sure to take it. It was party because I wanted something out of it, and becasue it was going to be more compilcated than just sell and forget, he couldn't be bothered.

However, it was good fun and taught me a lot, particually the expression evaluation stuff that Brian Watson wrote for me. I had a really simple one it, and it wasn't recursive. So he knocked a proper one up for me, and I've been using that method ever since.Its also funny to see the cheat mode is still in there, but unlike the final game, this one displays the Lemming's logo! No idea why that was removed, I was probably told to - but its pretty funny seeing it pop up. For those that don't know, on the title page of SotB, you can press Fire1,Fire2,Fire2,Fire1 and the number of lives go to 99. And your into cheat mode. Easy as that!

This demo is only the top level, and the first multi-directional level, theres no castle yet and all the graphics are Amiga ones with the exception of the trees (which do look a little pants). After seeing this Psygnosis weren't happy with the look of it at all and so drafted in Martyn Chudley (who went on to own Bizzar Creations) who drew all the cool backgrounds and sprites. He was actually busy on his own game at Psygnosis at the time Wiz 'n Liz on the Megadrive, which I almost ported to the SNES funnily enough.

I have to say this is interesting...but dull really. It's an interesting snapshot of development, but the game still sucks and its only a partial game with just 2 levels. But its funny seeing it never the less.

TG16 stuff...

Not been doing a lot past few days, but I'm now off on holiday for a week so Im going to push on with my TG16 assembler first. I did get a little shock though as one of the manuals I have must have been a pretty early one since it doesn't document CSH or CSL which are used to switch the PC Engine into high and low speed modes. Now, I knew about these commands since I have some old source, but the manual I was using didn't document them at all, and its not like there were pages missing as the page numbers count up fine! They just weren't there! However, my other (slightly grubbier) manual does have them, so I can find out the opcode and add them in.

It's probably going to take me a few days to get all this stuff in which is a pain, but then I should be good to go!

I also found a VERY old bit of shadow of the beast. I think this is pre-CDROM! In fact, I have a sneaking suspicion that its the version that was in the Games-X mag back in 1991 where I just had the beast man running around on the 1st level - before all the pretty graphics went in! I'll need to try and work out the format of the ROM dump as its a special outpu for the offical kit, but if memory serves its pretty simple. I'd love to see this again and get a bit of video of it for my DMA page.

I also found my SuperSystem2 card, and on the back It's been labeled No. 2. I wonder if we did get the second one ever made... or if it was the 2nd one given to developers or something - who knows. I do know we were one od the 1st to develop on using the new SuperCDROM, so it might well be the second card ever made - which is cool.