Thursday, July 05, 2018

CSpect 1.13

CSpect changes
  • Kempston joystick emulation added using Direct Input. All controllers map to a single port, port $001F = 0FUDLR
  • Fixed CPU on WRITE and on READ breakpoints
  • -port1f command line removed
  • .NEX format now leaves IRQs enabled
  • Removed some of the old opcodes from the debugger
  • Removed MIRROR DE
  • Fixed memory contention disable (so now tested!)



Sunday, June 24, 2018

CSpect V12.1

CSpect changes
  • Added .NEX format loading. (see format below)
  • Added nextreg 8, bit 6 - set to disable memory contention. (untested)
  • Added Specdrum Nextreg mirror ($2D)
  • Removed OUTINB
  • Removed MIRROR DE
  • Added sprite global transparancy register ($4B) - $e3 by default, same as global transparancy ($14).
  • NextReg reg,val timing changed to 20T
  • NextReg reg,a timing changed to 17T
  • Added LDWS instruction (might change)


SNasm changes
  • Added LDWS opcode
  • Removed OUTINB instruction
  • Removed MIRROR DE instruction



Saturday, May 26, 2018

Making NEXT Lemmings: Part 4

I decided to get back to work and fix my sprite clipping for my object rendering. It turned out to be a simple fix and was just messing up when crossing a bank.



Once I got this done I went looking for more levels with lots of objects so I could test out the level rendering and get a good idea of overall performance. When I was doing levels, I used to love adding in lots of water for decoration - usually right across the level, so I went looking for one of them.
However, it looked like I wasn't converting the levels properly, as there was no water to be seen - or at least very little. I tried several of my old levels, but they were all the same, loads of water removed. Had I missed something?


After much investigation, it turns out that Windows Lemmings doesn't have all the water that the Amiga one did! What the hell!?!? I quizzed Russell Kay (who wrote it Windows Lemmings), and he told me they'd removed a lot of the decorative items for performance reasons. Damn...

This was a mixed blessing as sure it wouldn't look 100% like the Amiga one, but at the same time, it meant I'd be able to keep performance up quite a bit. Oh well.... it wasn't like I could do anything about it.

Speaking of performance.... I'd been using the new ZXNext instructions a lot in my rendering code, so I suddenly started to wonder how I'd fair if I used only the original Z80 instruction set. I was in for a shock that's for sure, as the extra code required would push rendering times up massively.


You can see from the image above the huge speed boost the new instructions - in particular LDIX, gives 256 colour (Layer 2) rendering code. The image on the left uses a very standard rendering loop, load a value from (HL) into A, test to see if it's 0, branch if it is, other wise, store in (DE), then INC HL and INC DE. LDIX does this pretty much in one instruction but has the added advantage you can compare A to any value, not just 0.

There are several new instruction aimed at giving game devs more tools to speed up their code, some of them are real beauties.

Final new Z80 opcodes on the NEXT (V1.10.06 core)
======================================================================================
   swapnib           ED 23           8Ts   A bits 7-4 swap with A bits 3-0
   mul               ED 30           8Ts   multiply D*E = DE (no flags set)
   add  hl,a         ED 31           8Ts   Add A to HL (no flags set)
   add  de,a         ED 32           8Ts   Add A to DE (no flags set)
   add  bc,a         ED 33           8Ts   Add A to BC (no flags set)
   add  hl,$0000     ED 34 LO HI     16Ts  Add $0000 to HL (no flags set)
   add  de,$0000     ED 35 LO HI     16Ts  Add $0000 to DE (no flags set)
   add  bc,$0000     ED 36 LO HI     16Ts  Add $0000 to BC (no flags set)
   outinb            ED 90           16Ts  out (c),(hl), hl++
   ldix              ED A4           16Ts  As LDI,  but if byte==A does not copy
   ldirx             ED B4           21Ts  As LDIR, but if byte==A does not copy
   lddx              ED AC           16Ts  As LDD,  but if byte==A does not copy, and DE is incremented
   lddrx             ED BC           21Ts  As LDDR,  but if byte==A does not copy
   ldpirx            ED B7           16Ts  (de) = ( (hl&$fff8)+(E&7) ) when != A
   ldirscale         ED B6           21Ts  As LDIRX,  if(hl)!=A then (de)=(hl); HL_E'+=BC'; DE+=DE'; dec BC; Loop.
   mirror a          ED 24           8Ts   mirror the bits in A     
   mirror de         ED 26           8Ts   mirror the bits in DE     
   push $0000        ED 8A LO HI     19Ts  push 16bit immidiate value
   nextreg reg,val   ED 91 reg,val   16Ts  Set a NEXT register (like doing out($243b),reg then out($253b),val )
   nextreg reg,a     ED 92 reg       12Ts  Set a NEXT register using A (like doing out($243b),reg then out($253b),A )
   pixeldn           ED 93           8Ts   Move down a line on the ULA screen
   pixelad           ED 94           8Ts   using D,E (as Y,X) calculate the ULA screen address and store in HL
   setae             ED 95           8Ts   Using the lower 3 bits of E (X coordinate), set the correct bit value in A
   test $00          ED 27           11Ts  And A with $XX and set all flags. A is not affected.

New instructions like MUL, MIRROR, PIXELAD,PIXELDN are ones lots of game devs would have killed for back in the day. With the spectrum screen being so tricky, the new instructions like pixelad and pixeldn are a god send for developers, taking away one of the major pains and slow downs they had in rendering.

So after getting a warm fuzzy feeling at my rendering speed, I decided to try and get the SID chip working. This was before we lost it obviously. I decided to use the reSID library and loaded the DLL on startup. But I just could not get it working....


This is an image of a single channel playing a pulse wave - so it should be a simple square layout, but as you can seem, the waves are not only very thin, but have odd little bumps on the top, and that odd block missing. I fought with this for a while, quickly getting nowhere, so eventually gave up and decided to stick with my own SID code from my C64 emulator. It's not great, but does sound okay, and does work - which is always a plus.

All this was working towards a new major CSpect release, to try and get as close to the actual machine as I could. This would also include the new 3xAY chip, and DMA.


DMA (Direct Memory Access controller) was something I was really wanting, as it would speed up my Lemmings rendering code hugely. When I copy the screen each game cycle, it can take 2-3 frames just for that copy as it needs to copy 38K each game tick, which for a spectrum, is a hell of a lot. DMA runs at the same speed as the CPU clock, and at 4T-States per byte copied, is a massive boost in performance. But first, I needed to get it into CSpect, and that meant understanding how it worked - beyond what most coders would care about.

I spent a while hunting for more info on the DMA chip, and finally found the datasheet for it, which you can find on an earlier blog post ( DMA Datasheet ). It's a little confusing, but with the help of Victor I stumbled through creating the state machine inside CSpect. The DMA is basically a set of registers that you set by doing a stack of OUTs, with the first byte of the instruction telling the DMA controller what registers follow. Once I had this in place, Victor gave my little DMA sample code a once over, testing it on the real hardware, and I was then able to also get something running locally.

DMA has a few modes, it can either increment, decrement or not move the source or destination, and it can go to RAM or a PORT. So I started off by trying to DMA a stack of data to the border and see what happens...


After a bit of fiddling around, I finally got the DMA working. I had to rearrange my CSpect processing loop as the DMA locks out the CPU, but I still needed the screen to render each scanline based on the number of T-States the DMA was taking from the machine overall. It's certainly not perfect, but it doesn't have to be. CSpect is all about making it easy to code for the Next, not about making it pixel perfect.

Next I wanted to do a memory to memory copy, so I grabbed a screen show and DMA'd it up and got the image below-  this was at 28Mhz...


It's a shame we've lost the 28Mhz, as it's ballistically quick. Here you can see I can copy a normal spectrum screen in about 16 scan lines - although this is probably without the old memory contention in there, but no matter what, it's still incredibly quick. That's not to say 14Mhz isn't quick as well mind, and the speed up it gives me for my Lemmings screen copy code is well worth the effort. Here's the little DMA program that copies the screen above (which is included in the CSpect archive)...

DMA db $C3   ;R6-RESET DMA
 db $C7   ;R6-RESET PORT A Timing
        db $CB   ;R6-SET PORT B Timing same as PORT A

        db $7D    ;R0-Transfer mode, A -> B
        dw ScreenDump  ;R0-Port A, Start address    (source address)
        dw 6912   ;R0-Block length     (length in bytes)

        db $54    ;R1-Port A address incrementing, variable timing
        db 2   ;R1-Cycle length port A
    
        db $50   ;R2-Port B address fixed, variable timing
        db $02    ;R2-Cycle length port B
    
        db $C0   ;R3-DMA Enabled, Interrupt disabled

 db $AD    ;R4-Continuous mode  (use this for block tansfer)
        dw $4000  ;R4-Dest address     (destination address)
    
 db $82   ;R5-Restart on end of block, RDY active LOW
  
 db $CF   ;R6-Load
 db $B3   ;R6-Force Ready
 db $87   ;R6-Enable DMA

With the DMA now running in CSpect, I thought I'd give some of the old DMA demos a go, see how compatible I am.


It was pretty cool seeing these demos "just work", and showed my DMA code was working well.

I was about to take a break as I headed out to Orlando with the family, but that didn't stop me having a little fun on the plane as we headed out...


It would be a few months before I pick up any of this again as work got busy, and deadlines loomed...



Wednesday, May 16, 2018

CSpect V1.11

Minor update to CSpect

CSpect changes
  • Fixed Lowres window (right edge)
  • You can now use long filenames in RST $08 commands (as per NextOS), can be set back to 8.3 via command line
  • Layer 2 now defaults to banks 9 and 12 as per NextOS
  • Added command line option to retrun $FF from port $1f
  • Fixed a possible issue in loading 128K SNA files. Last entry in stack (SP) was being wiped - this may have been pointing to ROM....
  • Fixed mouse buttons return value (bit 0 for button 1, bit 1 for button 2)





Monday, May 14, 2018

Making NEXT Lemmings: Part 3

Now that I knew things were actually possible, it was time to start thinking about the levels themselves. I started reading up on the level format, a text file by someone called "rt"... (Thank you whoever that is!!)

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
          LEMMINGS .LVL FILE FORMAT
                    BY rt
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

document revision 0.0

thanks to TalveSturges for the original alt.lemmings posting which got me
started on decoding the .lvl format

if you liked lemmings you should give CLONES a try. go to www.tomkorp.com
for more information. 

this document will explain how to interpret the lemmings .lvl level
file for the windows version of lemmings (directly saved levels in LemEdit).

This document is also on my Github repository.

As it happens, I was flying off to LA for a holiday with my daughter, so an 11 hour flight was just the thing I needed to blitz this chunk of code...


I first converted and exported the level brush .SPR files into a large set of sprite banks, with a table at the start that held start addresses, banks, widths and heights, followed by bitmap data. Once I had this, I was able to start writing a "level bob" routine, something that would draw these into my 1600x160 level bitmap (which is the size of a level in Lemmings - 5x320x160 screens).

This immediately turned nasty, as working out the "Y" coordinate was horrible - this was before the MUL instruction appeared. I created a large "line" table that held the address of each line and the bank it started in, and started to use that to index down into the level.

This was "okay", but it was ugly... and the table wasn't exactly small either. It also had a problem that a single line could cross a bank, which wouldn't do at all. I could get round this by not filling the last 384 bytes of each 16K bank, but still....

So, I decided to scrap all this, and burn some of this lovely new memory the NEXT had. I change the background level bitmap into a 2048x160 screen, burning an extra 70K+ of RAM. However... this now meant that the line addresses was just the Y coordinate shifted a couple of times and stuck in H (of HL), and the base of the bank ($C000) added on. It also meant that by banking in 16K at a time, if bit 6 of H reset to 0, I needed to swap bank. This was much nicer...

Doing the basic draw routine was fairly simple, I just had to clip top and bottom. I decided (one again) to cheat like mad and not actually clip things. As this was just being used at level creation time, I simply looked at the Y coordinate, and if it was off screen i didn't draw it, and moved onto the next one. Ideally you'd work out how much you need to clip and only draw that, but for this case, performance wasn't the number one goal, so i though "Sod it".

I also realised with my longer level, I didn't actually have to clip on X either, i could just start drawing 16 pixels further on. That was fortunate. .

With a simple draw blob in place, I looped through all the brushes to draw, and built the screen.

For a first version, this was pretty good I though. You can see what it's supposed to be, and looked pretty close to the actual level. It wasn't perfect though. Lemmings background brushes can have a few special draw modes: Flip on Y, Behind and Remove. Each of these modes can be combined giving 8 different draw modes in total.

While on the Las Vegas leg of our trip I finished off the level loading and added the missing modes...




It took me a while to figure out what was wrong with the image that has the corruption, especially as it was the correct shape - very odd. But turns out the brush I was drawing from was overflowing the bank it was in, so I'd need to do a check on exporting to make sure each line of the sprite is inside the same bank. I can't make the whole sprite fit, as the brushes can get really quite large and won't fit. I've yet to fix this bug... but I'll get around to it one day.

As you can see, I'd gotten most levels now loading fine. After this, I set about actually finishing off the export of all the lemmings as code that I could call, and actually use


This took a little longer than I expected. I had to make sure the generated code wasn't going to cross a bank, and make the offset table that went before it all. But once I finally got it all together, I was able to flick through the different graphics inside the game (above).

It was getting to the point i wanted to have the objects in the level. This was really the last major item missing, not just in terms of level components, but that would show me how fast things would run. Lots of the levels have a load of objects, and they would take up a lot of rendering time, so I really needed to see if it would still be fast enough once I added them.

I managed to get the sprites exported okay, but the colours just weren't right. I ended up having to load new palette files, one for each style and convert them into the NEXT RRRGGGBB format. once I had that, exporting the objects was just the same as the level brushes.

Once I'd exported this, i suddenly remembered those arrows.... damn it. They appear "inside" the background, so this meant I wouldn't be able to use the LDIX instruction, but have to do it long hand. Worse yet, there was always lots of these on a level, meaning I'd have to draw loads of these objects using very slow code.... Bugger.  Well, that was one I would look to address later.

With objects I needed "standard" sprite rendering code as the objects must be clipped right/left and top/bottom, and they have similar drawing modes to the background - normal, behind and inside.
In this case, I opted for a tower of LDIX instructions, and then jumped into the middle of them depending on how many pixels on X I was drawing. I then looped around this on Y, checking for bank swapping as I went. This give me this image below....


After how fast I got the Lemming rendering, I was a little disappointed at the speed of this, especially as I know there are levels with water all over the place. damn. It wasn't the end of the world, but it may well be the beginning of the end.... Each sprite in this case takes about it's own height to render, and if I have several on screen at once, that's easily going to chew up a frame or so.

Still, I knew I still had some tricks up my sleeves so I cracked on. To give me a little break and some thinking time, I decided to start drawing the Lemming font. Time away from actual code is important, as it gives your mind time to chew over some problems. Most coders I know will come up with solutions to things at the funniest times; In the shower, having a poo, out for a walk - all of them away from the computer.

I used the image editor in GameMaker Studio 2, as it's a great pixel editor, and very like D-Paint ( can't for the life of me think why! ).


The Amiga font was 16x8, but due to the smaller screen, I decided to reduce it down to just 8x8. This works pretty well, and as you can see I've "mostly" managed to keep the feel of the original.


I've still to do the numbers, but this gave me a welcome break and let me know i was on a good track for my screen layouts.









Sunday, May 06, 2018

Making NEXT Lemmings: Part 2

Before getting onto Plan B, I decided to get a nice smooth cursor. Reading the mouse every 3 or 5 frames will make the whole thing feel sluggish. perception is everything in games, and if you feel like input is responsive, you'll ignore a lot of a games flaws. So even though the game will only check clicks and positions every 3 to 5 frames, having it move every frame makes it feel like the games moving and playing much more smoothly. Oh... and Hardware sprites are a godsend for this kind of thing, it makes it basically "free".
I used the same trick in Blood Money on the C64. The game ran in 3 frames as well as it was a slow scrolling game, but the player sprites moved every frame, and so it all felt quite responsive.
I started by moving the mouse reading into my interrupt routine, and it all promptly fell flat on it's face. Well... that wasn't the plan. Turns out, I needed to "save" the current NEXT register (this was before the NextReg instruction appeared). So on entry into the IRQ, I now save the current register, restoring it on exit and suddenly everything was looking much nicer, in fact it was looking like a "proper" lemmings level, cursor and everything!


Also, now that I had managed to extract the .SPR format graphics, I was also able to extract the level brushes that are used to build the level. This was great, as it meant I could use the original level files, and not the 320K "bitmaps" I had been using.


Not having to convert the level files is a bonus to be sure. The level files are much smaller, and also well documented. They not only include the level bitmap data, but also objects and collision. So using the level format was going to be far more useful.

About this time, I also realised I'd need a debugger in my emulator.... bleh. More tools work, but again one that would pay dividends later. So yet again, I spend a few weeks writing a good debugger, one that is actually usable for development, not just hacking. This also involved loading the symbol file from SNasm so I could display labels. I've always had a very specific view of how assembly debuggers should work, and the "view" you should use. I've used stacks of truly terrible debuggers over the years, simple scrolling registers and "current" opcode ones stand high on my hate list...

The debugger has been updated over time like the rest of the tool, but it's pretty useful now. I've used the same layout for almost 30 years.... I wrote my first useful debugger for the PC Engine back in '91, and used almost exactly the same kind of layout. It was the one I used on the PDS (Personal Development System) kit I used to make C64 games in the late 80's, and it works incredibly well.


A free moving, scrolling disassembly window is vital, it just makes life MUCH simpler, and a static place for registers is just as important as your eyes will always drift to that fixed location when actually debugging. If it moves around, you'll waste time trying to find it. a second or two each "step" adds up.
The rest of the space just depends on the machine, and what hardware it has. On the PC Engine, I displayed VRam and some hardware info, on the Next I display the screen and some hardware registers - I'll show them all eventually.
The column of numbers on the right is my latest innovation.... a CPU execution list. This is a godsend when the machine crashes and resets. I can just flick back over this (massive!) list until it gets back into the game, and I can see where it went nuts. I've used this a few times already.

Anyway... back to Lemmings and Plan B! So when you have any kind of performance issue, there's 2 ways to approach it.
  1. Try and optimise the hell out the slow function.
  2. Try a totally different method.
Now it was clear that unrolling loops, making towers of code etc. just wouldn't work in this case. Even taking the loops away would only remove a scanline or 2 at most. What I needed was the fastest way possible to stick pixels on the screen. So....what is that exactly?

Well, if you think about how a sprite is drawn, you see you have a source address - the sprite, the destination address - the screen, and you need to copy the data from one to the other. This normally consists of loading A with a value and sticking it into the destination. So how about.... we just write a sprite as a series of load/store instructions? Kind of like this...

LD (IX+0),COLOUR1      ; 19 T-States
LD (IX+1),COLOUR1
LD (IX+2),COLOUR2
LD (IX+3),COLOUR1      ; = 76 T-States

And so on... Now... IX would have been ideal, as you could point it to the top of the graphic, and it could offset along a line, but storing via IX takes 19 T-States. That's pretty damn slow - in fact, it's even slower than our LDIX which is just 16. However.... LD (HL),A is only 7.... that's a fair speed up. And Lemmings by their nature have "runs" of colour green hair, white face, blue body and so on. That means we'd only need to reload A when the colour changes. This turns the above code into this...

LD A,COLOUR1           ; 7 T-States
LD (HL),A              ; 7
INC HL                 ; 6
LD (HL),A
INC HL
LD A,COLOUR2
LD (HL),A
INC HL
LD A,COLOUR1
LD (HL),A              ; = 67 TStates

Also.... LD (HL),$XX  is only 10 T-States, so if we need to change "A" for only a little, we could just store a value directly, which would save reloading A

LD A,COLOUR1          ; 7 T-States
LD (HL),A             ; 7
INC HL                ; 6
LD (HL),A             ; 7
INC HL                ; 6
LD (HL),COLOUR2       ; 10
INC HL                ; 6
LD (HL),A             ; 7 = 56 TStates

Getting there.... Now, as the Layer 2 screen is only 256 wide, we also know on a single line, we'll never cross a 256 byte boundary. So we don't need to increment HL, only L.

LD A,COLOUR1          ; 7 T-States
LD (HL),A             ; 7
INC L                 ; 4
LD (HL),A             ; 7
INC L                 ; 4
LD (HL),COLOUR2       ; 10
INC L                 ; 4
LD (HL),A             ; 7 = 50 TStates

Now we're talking! Lastly, not every pixel in a (say) 6x9 sprite is plotted, so unlike a load of LDIXs which is basically 6*9*16 = 864, just for the load/store parts never mind the management and line changing, we can get by with "JUST" the pixels we need to draw. This might mean doing 2 INC Ls, or an ADD HL,$0000 (another new NEXT opcode) to skip pixels, but that's still faster than storing. A normal Lemming is probably half, to 3 quarters actual pixels to plot, so that's a big saving.

I decided to just sit down and manually write a Z80 function for drawing a single lemming - which while it took a bit of time, meant I could hand tune this code to make it as fast as possible.

A good tip for when doing any kind of R&D like this, is that you should always use the machine to its maximum and then work back from there. Sure, you might manage to do something using 1Mb of code or tables, and that may not work in a game situation, but at least you know it IS possible. There's normally a middle ground where you lose a little speed, but maintain most of the benefits. I've done a lot of R&D code in my time that's turned into real code, and this has always been the best approach.


This is a snip-it of the hand build Lemming draw function. You can see I try and reduce reloading as much as possible, but you do have to do the "newline" section after each line, as you could be bank swapping. This did work out fairly well though, and I managed to reduce the Lemming drawing down to just over a scanline!

Here you can see the new Lemming speed (timing bar in white), and the old one (in red). This was definitely looking more promising. This means I could now draw a screen full in under a frame, which is certainly a requirement if I have any hope of achieving my target frame rate of 17fps. (same as the Amiga version).

Now that I saw this was possible, it was time to take the next step.... That step, is to automatically generate Z80 code based on a graphic so I didn't have to hand build hundreds of Lemming graphics. So inside my Lemmings graphics converter, I started to scan each lemming sprite, and generate a large, optimised, Z80 function that would draw it.
As I progressed I came up with more rules and optimisations that would help. This was pretty cool, because each time I sped things up, I could just regenerate ALL my graphics and everything gets faster! How cool is than!
For example, I wasn't using the DE register pair, so I was able to pre-load D and E with the 2 most common colours. This meant I didn't have to continually reload them, in fact only if there was a run of 3 pixels of the same colour would I need to load A at all.
I use the special "write" only mode of Layer 2, in location $0000 to $3FFF, and this allows me to write some simple, and fast, bank swapping code.


You can see the fruits of my labour above. As you can also see in the video, the next major issue I would have, is clipping. If you draw sprites normally, you can simply reduce the loop size to draw less, but with code drawing things, you can no longer do that. But, as the sprites are less than 16 pixels wide I decided to reduce the screen size by 8 pixels on either side, meaning I no longer had to clip left/right - sweet! However.... vertical clipping is a real issue as it has the potential to run off into memory, or come back in the bottom.
Clipping the top is by far the most difficult, as the panel at the bottom gives me a way to "replace" corrupted graphics. At the moment I simply copy the panel each frame, "fixing" the overdraw, but this is only a temporary solution. Ideally I'd use a raster IRQ and flip buffers so I can free up that copy - or a simple copper list (which would be better, but we didn't have that yet!).
The top of the screen however, is an issue.... how can you simply clip the top of the sprite. I could build a table of jumps and jump into the code at the right point, but that would slow, and a nightmare to generate. What I decided to do was to draw the sprite "backwards". This means I could ignore the lower clipping as I planned, then as I draw upwards, detect going off the top of the screen (by looking to see the bank "loop"), and simply RET from the function. This turned out to work just fine...  here's a snippet of the auto generating code function...

{
   switch (bc)
   {
       case 0: break;
       case 1: sb.AppendLine("\t\tinc\tl"); tstates += 4; bytes += 1; buff.Write(0x2C, 1); break;
       case 2:
               sb.AppendLine("\t\tinc\tl"); tstates += 4; bytes += 1; buff.Write(0x2C, 1);
               sb.AppendLine("\t\tinc\tl"); tstates += 4; bytes += 1; buff.Write(0x2C, 1);
               break;
       case 3:
               sb.AppendLine("\t\tinc\tl"); tstates += 4; bytes += 1; buff.Write(0x2C, 1);
               sb.AppendLine("\t\tinc\tl"); tstates += 4; bytes += 1; buff.Write(0x2C, 1);
               sb.AppendLine("\t\tinc\tl"); tstates += 4; bytes += 1; buff.Write(0x2C, 1);
               break;
       default:
               // "HL" never crosses a 256 byte boundary here, so just add to L. 15 Tstates, 4 bytes (1 T-State faster than add hl,$0000)
               sb.AppendLine("\t\tld\ta," + (bc & 0xff).ToString()); tstates += 7; bytes += 2; buff.Write(0x3E | (((uint)(bc & 0xff)) << 8), 2);
               sb.AppendLine("\t\tadd\ta,l"); tstates += 4; bytes += 1; buff.Write(0x85, 1);
               sb.AppendLine("\t\tld\tl,a"); tstates += 4; bytes += 1; buff.Write(0x6f, 1);
               a = bc & 0xff;
               break;
    }
}

In this section, each time I move along the line to the next pixel, I check to see what way is fastest. For anything under 4 bytes, you just INC L = 12 T-States (max). For any more, you need to add using A and that makes 15 T-STates, one faster than the new ADD HL,$0000 instruction. Lots of these little tricks helps drive the count down and make the function faster. You'll also notice I generate ASM source and byte code. This was so I could not only debug it, but also view a complete function and see if there was any little tricks I was missing.

So to round off this entry, here's an example of the auto-generated code in it's final state for drawing the sprite below....


Remember, this draws from the BOTTOM-LEFT most pixel, across to the right, then jumps back up one line and back to the start of the line again...

At 14Mhz in the border we have 224*4 = 896 T-States (while the screen area is more complicated as it drops to 7Mhz over the screen itself). As you can see from below, a typical lemming is now drawing in well under a scanline. Sure, there is surrounding management code, but this is pretty good going, and means I can draw all my lemmings in well under a frame - probably around 150-200 scan lines (taking the CPU speed drop into account).

If you happen to spot a possible speed-up, please let me know. I do know I could use JP instead of JR for my next line test, but working out the address instead of being relative would be a major pain in the butt, so I've left it for now. I may go back and change it later if I'm feeling adventurous.
I could also move more out of the common code section, but for the rest, feel free to point anything out. (*both these have now been done)

lastly.... remember there will only ever be one case where it could bank swap. No graphics are large enough to swap 2 banks, and usually it won't swap at all.

EDIT: Code below has been updated.


; Sprite Number 22
  ; Common code = 59 T-States (outside function)
  ; HL = screen address [y,x]
  ld de,20479  ;Most common 2 colours

  ld a,5
  add a,l
  ld l,a
  ld (hl),e
  add hl,-258   ;move back to start of line, and up one line

  ;New line
  bit 6,h
  jp z,@NoBankChange9
  ld a,h
  and a,$3f
  ld h,a
  ex af,af'
  sub $40
  ret m
  out (c),a
  ex af,af'
@NoBankChange9:

  ld (hl),d
  inc l
  ld (hl),e
  add hl,-259   ;move back to start of line, and up one line

  ;New line
  bit 6,h
  jp z,@NoBankChange8
  ld a,h
  and a,$3f
  ld h,a
  ex af,af'
  sub $40
  ret m
  out (c),a
  ex af,af'
@NoBankChange8:

  ld (hl),e
  inc l
  ld (hl),d
  inc l
  ld (hl),d
  add hl,-259   ;move back to start of line, and up one line

  ;New line
  bit 6,h
  jp z,@NoBankChange7
  ld a,h
  and a,$3f
  ld h,a
  ex af,af'
  sub $40
  ret m
  out (c),a
  ex af,af'
@NoBankChange7:

  ld (hl),e
  inc l
  inc l
  ld (hl),d
  inc l
  ld (hl),d
  add hl,-259   ;move back to start of line, and up one line

  ;New line
  bit 6,h
  jp z,@NoBankChange6
  ld a,h
  and a,$3f
  ld h,a
  ex af,af'
  sub $40
  ret m
  out (c),a
  ex af,af'
@NoBankChange6:

  ld (hl),e
  inc l
  inc l
  ld (hl),d
  inc l
  ld (hl),d
  inc l
  inc l
  ld (hl),e
  add hl,-260   ;move back to start of line, and up one line

  ;New line
  bit 6,h
  jp z,@NoBankChange5
  ld a,h
  and a,$3f
  ld h,a
  ex af,af'
  sub $40
  ret m
  out (c),a
  ex af,af'
@NoBankChange5:

  ld (hl),e
  inc l
  ld (hl),d
  inc l
  ld (hl),d
  inc l
  ld (hl),e
  add hl,-258   ;move back to start of line, and up one line

  ;New line
  bit 6,h
  jp z,@NoBankChange4
  ld a,h
  and a,$3f
  ld h,a
  ex af,af'
  sub $40
  ret m
  out (c),a
  ex af,af'
@NoBankChange4:

  ld (hl),d
  inc l
  ld (hl),d
  add hl,-258   ;move back to start of line, and up one line

  ;New line
  bit 6,h
  jp z,@NoBankChange3
  ld a,h
  and a,$3f
  ld h,a
  ex af,af'
  sub $40
  ret m
  out (c),a
  ex af,af'
@NoBankChange3:

  ld (hl),e
  inc l
  ld (hl),e
  inc l
  ld (hl),e
  inc l
  ld (hl),20
  add hl,-258   ;move back to start of line, and up one line

  ;New line
  bit 6,h
  jp z,@NoBankChange2
  ld a,h
  and a,$3f
  ld h,a
  ex af,af'
  sub $40
  ret m
  out (c),a
  ex af,af'
@NoBankChange2:

  ld (hl),e
  inc l
  ld (hl),20
  inc l
  ld (hl),20
  add hl,-258   ;move back to start of line, and up one line

  ;New line
  bit 6,h
  jp z,@NoBankChange1
  ld a,h
  and a,$3f
  ld h,a
  ex af,af'
  sub $40
  ret m
  out (c),a
  ex af,af'
@NoBankChange1:

  ld (hl),20
  inc l
  ld (hl),20

  ld a,2
  out (c),a
  ret
  ; T-States=714/762     bytes =246



Sunday, April 29, 2018

Making Lemmings on the ZX Spectrum Next

So..... I was thinking on the choices I've been making while writing Lemmings, to get where I have so far, and thought it might be nice to write it all down. I've been a huge fan of development diaries, ever since I was a teenager in fact. Reading Andrew Braybrook's development of Morpheus on the C64 in ZZap64 was the highlight of the month for me. I didn't just learn a lot from them, but I was inspired while reading them. So, here we go.....

When the ZX Spectrum Next was launched on Kickstarter I got really excited, I've always been a fan of the Spectrum, and the thought of a faster one with some extra "toys" get me unreasonably excited. I - like everyone else, watched the various videos Jim, Victor and Henrique posted and relished the possibilities that they presented. I really, really wanted to get in on the act as well. I tried to find a TBBlue board on ebay, only to find out, I'd just missed one, and as they don't come up very often, I figured I'd need to take another tack. I looked out my old Spectrum emulator and got it all working again - it was a little old, and then started to add in the hardware that was available on the TBBlue. I was able to get a hold of the sprite demo that was shown on one of the videos, and so decided to tackle that first. I decompiled it to examine it more easily, then realised I'd need an assembler...

Damn. Yet more tools work. I spent a few weeks adding a Z80 assembler to my SNasm assembler - which was mainly a 6502/65c02/65816 assembler. This required me to rejig things, and I spent a long time assembling a test file, and comparing it with the binary output of another assembler to validate the results. I could have used other tools/assemblers, but experience tells me it's worth investing in your own tools if you can afford to. Things usually end up going faster in the long run.

Once I had the sprite example in a state I could mess around with, I was able to get the sprite test running in my emulator.


After this I added hacked a Layer 2 bitmap demo by Jim Bagley, and commented so I could figure out how it all worked (you can see it here: http://dailly.blogspot.co.uk/2017/06/zx-spectrum-next-bitmap-example.html ). Once I'd figured that all out, I added Layer 2 into CSpect, along with scrolling. Now I had a good solid base from which to actually start playing.


This has obviously been an on-going task, as hardware is added, I've been adding it into the emulator. I released it a while back as several folk would themselves in the same position as I was, and this meant the whole "write a game" thing kept getting put back.

Once the 28Mhz mode appeared, I suddenly started to wonder if it could actually handle Lemmings. Not like it did before with only 20 lemmings at a horrible frame rate, but a full blown Amiga style one running at full speed.. With Layer 2 and 256 colour stuff, it was possible to look like it, but moving around 256 colour graphics takes a lot of grunt and 28Mhz might just manage it.

I started by loading in a 320K bitmap into memory, and drawing it to the screen.


So.... this was promising. Using just the CPU it was copying the screen over at a reasonable rate, but I'd need to try and get some lemmings on screen first, to see how I really stood.

Before being able to do this, I first needed to get some graphics.... Now, this is where things get a little complicated. If I ever wanted to release this, then I'd have to come up with a way to get the graphics without actually distributing them. Sony owns Lemmings. There's no getting away from that. They do occasionally seem a little lax in chasing down folk using the old levels and graphics, but that's not something I wish to test - they have far more money than I do! I've seen web clones that have been around for over a decade now, a clone on the Windows store, some on iPhone versions, and a GameBoy DS homebrew version, but still.... if i can avoid that, I'd be much happier.

Still all that said.... I did see someone who made a clone take a different approach. They provided a tool that converted Windows Lemmings assets, into what he needed. This was ideal. it means people could buy the content legally, then convert it into what I needed, and I wouldn't ever need to ship with copyrighted content. Sweet.

Now, while I worked on the original, I have had no idea how Windows Lemmings works, or what the format of the files it uses. Fortunately, Windows Lemmings has been out for a while, and it's been hacked to bits, and documented pretty well over the years.

After much Googling, I eventually managed to find the various documents, and although they were a little tricky to follow at times I was all set. The sprite format had a funny compressed method, basically a byte-run compression variant and that took a little bit of time to figure out even using the docs. The code below shows what I ended up with, along with some examples of the compression. As you can see, it can start with a "skip" which helps move over the blank data at the start of a sprite scanline.

            for (int y = 0; y < dataHeight; y++)
            {
                int x = 0;
                // Basic compression
                //
                //  0x84 0xAA 0xBB 0xCC 0xDD 0x80
                //  represents four bytes image data(AA, BB, CC, DD) starting at offset 0.
                //
                //  0x03 0x85 0xAA 0xBB 0xCC 0xDD 0xEE 0x80
                //  represents 5 bytes image data(AA, BB, CC, DD, EE) starting at offset 3.
                //
                //  Each line can have many "sub" lines
                //  05 84 30 2C 2C 2C ## 02 84 30 30 2C 2C 80
                //
                // 7F 0B 94 1A 0B 18...
                // The offset here is not 0x7f, but 0x7f + 0x0b, the following 0x94 is the start character for a data line of 0x14 elements.
                while (buffer[_offset] != 0x80)
                {
                    int len = buffer[_offset++];
                    if ((len & 0x80) == 0)
                    {
                        x += len;
                        len = buffer[_offset++];
                    }
                    if (len == 0x7f)
                    {
                        len += buffer[_offset++];
                    }
                    int counter = len & 0x7f;
                    while (counter > 0)
                    {
                        spr[x, y] = pal[buffer[_offset++]];
                        counter--;
                        x++;
                    }
                }
                _offset++;
            }


Still, I eventually got all this sussed out and was able to save out all the Lemmings into a usable format, including a sprite pointer table at the start of it. Here's more or less what was saved out.


Once i did that... it was now time to actually draw a lemming. This was done using one of the new Z80 instructions that have added to the Next: LDIX. This loads from (HL) and stores in (DE), but only if (HL) wasn't the same a what's in the A register. This new instruction is ideal for sprites, as it lets you drop out pixels on demand. As you see from above, I use Magenta/Pink as transparent so simply set A to be this colour, then I did a simple Z80 Y by X loop around one of these sprites and got the image below - the while bar is a timing bar, showing how fast (or slow) the function was.


So... this was disappointingly slow. A single Lemming was taking 5 scan lines to draw. Scale that up to 100, include some "loop" and processing time and we'd be looking at at least 2 frames (probably 3) to draw 100 lemmings. Since the Amiga version ran in 3 frames, this wasn't good news. I decided I needed to scale this up to get a better idea of where I stood with this, so I did a 100 lemming test next



As I suspected, this wasn't great news... so it was time for Plan B.


Wednesday, April 25, 2018

Brexit.... Lying, cheating and stealing a nations future.


With Brexit looming, and brexiteers continually telling remain voters to shut up and accept it, i thought I'd look back at what happened, and why I think we shouldn't shut up. I believe we were lied to, the broke election laws on spending, and data protection laws when targeting their ads. This doesn't leave much ground that they didn't act illegally in their bid to win the vote, yet for some reason, the vote is still valid. How can that be? This sends a clear message that you can do whatever you like to win, and there will be no repercussions.

If vote leave broke the law, then those running it, should be charged, and the vote should be invalid and run again. I keep hearing more and more that is just suspect about the vote, so I wanted to list the main points - more for myself than anything, so I can see just how bad/wrong/illegal the vote was, to list what exactly I though was shady about the Brexit vote in general, and how those who now claim are looking out for the best interests of us, are lying through their teeth.

  • The Brexit bus.This is a well known one. The £350 million figure was a work of fiction. It was known at the time, yet they never removed it, it remained a key point in their campaign and many voters claimed it was one of the key things that swayed their vote.
    It should be asked... why is anyone allowed to blatantly lie in any vote, and yet the vote is allowed to stand? This is election fraud of the highest order - surely?
  • Immigrants.
    Another well known one. The claims that they are a constant drain, stealing peoples jobs is a myth - no matter what Nigel Farage and his band of merry racists claim. Not only are farmers now struggling to hire people to help farm, pick crops and pack produce, but the NHS is struggling to maintain and recruit enough doctors and nurses to help save lives. Ironic since making the NHS better was a key claim of the leave campaign. At the end of the day, for someone to steal a job, it has to be one someone from the UK is willing to do, and it seems they aren't.
    There are now many EU companies getting ready to leave the UK so they can remain in the EU and benefit from EU trade deals.
  • Parliamentary Sovereignty
    This claim that the UK can't make it's own rules is rabble-rousing-rubbish. The UK only takes a tiny fraction of laws from the EU, but it's a "vote winner", so it's front page for the vote.
    But lets be clear.... the Government only want Parliamentary Sovereignty when it works for them. They want to reduce workers rights, remove the human rights act and allow surveillance on anyone or anything, and without the EU to counter that, they'll be able to push through whatever they like.
    As proof of their contempt for Parliamentary Sovereignty, the latest strikes on Syria were done while Parliamentary was in recess, so there wouldn't be a vote on it. Now, why would you do that if you wanted Parliamentary to be Sovereign? it's not about Sovereignty, it's about power, plain and simple.
  • Politicians that don't believe in Brexit.
    This is one that really annoys me. Those attempting to make Brexit happen, don't even believe in it. The Prime Minister spent the EU Referendum saying it would be a disaster, that the economic and safety of the UK would suffer, and that the Norther Ireland border was an impossible problem if we left. She now tries to say the complete opposite. Has she changed her mind? Or is she just doing whatever she can to cling to power? After the £1 Billion deal with the DUP to stay in power, I'd argue for the latter.
    She's not the only one either. Boris Johnson was writing prior to the vote why we should stay in, then decided then running the leave campaign that he thought would lose, would be a great springboard for higher office - if he ran a good campaign. Then he won. Watching the press conferences afterwards, Johnson and Gove talk to the press like a pair of naughty schoolboys, clearly crapping themselves and what they had done. Johnson immediately launches a PM bid, with Gove stabbing him in the back the only reason he pulls out. Gove claiming that in the past couple of weeks he'd come to realise Johnson wasn't the person to leave. This from a man who had known Johnson for 30 years. The truth being Gove wanted the position, and decided to stab Johnson in the back to get it. These are just the kinds of people we want running Brexit, basically doing whatever they can for themselves, and no one else
  • Deal would be "Easiest in human history"
    This was another laughable claim. Liam Fox claimed a post-Brexit free trade deal with the EU should be the “easiest in human history”. This was picked up by many politicians, claiming the EU couldn't do without our trade, and that any deal with them would be the simplest ever.
    This is of course a lie. It was't simply them being mistaken either, but an our right falsehood. With so many member states, no deal can ever be that simple.The ALL have to agree. On top of this, the EU pretty much holds all the cards. The UK government insisted that a trade deal must be worked on along with other issues, but the EU insistence that the divorce bill and EU residents status be discussed first, meant the UK had no choice by to fold and give them pretty much whatever they wanted.
  • Having our cake and eating it.
    The UK prior to the vote was also claiming they could get pretty much what ever they wanted. Free trade without freedom of movement, being outside the EU but getting "special" deals for the motor and finance industries. All of this was never going to happen. Why would the Eu give away any of this? From their perspective, any business that wanted to maintain these things, would move from the UK to another member state, and the EU would benefit. Why give the UK a special deal to keep something they wanted for themselves? How could you possibly persuade 27 other member states to agree to a "give away" of epic proportions.
    You can't. Simple as that, and they never would have. The EU said from day one, this was never going to happen.
  • Northern Ireland. 
    Theresa May pre-Brexit claimed it would be impossible to have Brexit and any kind of free movement across the Irish border. Now that she's supposed to be running it all, it's really simple, and of course it could happen. This is all about her just clinging to power.
    The EU have said, if the UK leaves, then there will have to be some kind of border. Between 2 different types of rules and regulations, you have to have a border.
    The UK Government have flip-flopped back and forth from saying Northern Ireland can stay in the customs union, to they can simply maintain the same laws/rules on goods, to saying technology will solve all the worlds problems. All of these are fantasy.
    As a Scot, what's particularly annoying is that the UK government has refused point blank to Scotland remaining in the customs union, yet to save their own hides, suggest N.Ireland can. Surely of they can, then so could Scotland? But they refuse to even discuss it.
  • Vote Leave broke spending limits
    So here's another one which is just outright illegal, and should also have invalidated the vote.
    An ex-employee of vote leave has said it "broke spending limits on industrial scale". These rules/laws are there for a reason, and by breaking them - along with using companies like Cambridge Analytics who use stolen data to target more ads at voters, means they reached more people than Remain. This makes for an unbalanced contest, and sure if you have twice the money (say) you'll reach more people and you'll be able to get a better result.
    So surely because they did this, it means the vote is in itself, illegal? How can it not be?
  • Scottish Independence Referendum
    Another one which, as a Scot, really pisses me off.
    The No camp told voters continuously that it was safer to remain part of the UK, that Scotland would never be able to join the EU, and we would have to say No in order to remain in the EU.
    This promise has clearly been broken.
    We were also told if we left, we couldn't use the pound, and the value of our savings and pensions would suffer. Well, again, thanks to Brexit, the pound has been smashed, and savings and investments have suffered anyway. In fact, a study has placed UK pensions as the worst in the developed world.
    Told we'd have no oil soon, so we'd need the rest of the UK to prop us up. Lots of new oil fields are being discovered, so again....bollocks.
    We were also told if we left, there would be no friction-less trade with the UK. Yet, this is precisely the argument the Brexit camp are using with the EU. In fact, I heard on an interview a UK minister say "Why wouldn't they want free trade with their nearest neighbour?", yet this was exactly what we were told would never happen if Scotland went independent.
    Huge numbers of new powers. They've given a few new powers over, but have kept some of the most significant - like immigration. In fact, The Minister of State for Immigration compared the whole of Scotland to an English county - to quote "I wouldn't grant any powers to the Scottish Government that I wouldn't grant to Lincolnshire county council", that's how little Westminster thinks of Scotland.
  • Biased press
    All you need to do is look at the BBC, and the main stream media and see that they've all been backing Leave for some time. Nigel Farage is never off the BBC, despite having never won a UK election. Why is he on? He speaks for no one.
    Owners of papers (like the Sun, Mirror, Daily Express etc) have also been pushing hard on Brexit because they'll have more power. These owners are all multimillionaires, many of them not living in the UK, but using their powers to push their own agenda. Who can forget "The Enemies of the People" headline when a legal challenge on parliamentary sovereignty was questioned? These papers will have more chance to bend the laws to what they want, if they can more easily influence their creation, and that will only happen with Brexit.
  • People didn't understand what they were voting for
    This is a personal one. I've had discussions with some people I know who voted to leave, and a lot of the time, they've either listened to the lies and fictions they were sold, or just didn't understand. One person who voted Leave, didn't understand why everything now had to change, and why it couldn't just stay the same. I was speechless at this - "Well, because you voted to leave...perhaps?".
    I've also heard folk complain that they'll now have longer immigration queues, and why do we have to have this? Well again.... no freedom of movement means....
    I think a large number of folk simply don't understand the complexities involved, and the banner headlines of £350 million a week, immigrants stealing YOUR job! The EU making our laws! It'll be easy to get a great deal - we'll get everything we want. All this kind of thing makes it seem like nothing was going to be a problem, and we'd have everything we have now - and more. Which is all rubbish obviously.
  • Jacob Rees-Mogg and his millionaire friends
    Jacob Rees-Mogg is a hard Brexit fan, he wants to cut the ties and leave now. no looking back. But all you need to do is look as were his money is, and you see why. His investments are highly suspect, including some banned Russian banks, and if the UK were to have a hard brexit, he will personally make millions out of it.
    This is another thing to remember, most of these big players - politicians, news papers etc. Everyone of them stands to gain hugely from it, and they don't give a toss if the average person in the street suffers for it. If you doubt that, then just look at what Conservative policies are doing to the average person just now. Foodbank usage is soaring as people struggle to put food on the table, and all Jacob Rees-Mogg has to say is that it's "rather uplifting". For those of you wondering, the correct response is "That's horrible, what can we do to help and stop it". But it's very much a case of "I'm okay, so I don't care".
  • Even the Government says it'll be a disaster
    The Government had a study done - one it tried to hide, because it stated in every possible case, everyone will be worse off. It ranges from just "slightly worse", to monumentally worse.
    They tried hard to bury this, but it eventually got out, and it makes for some disturbing reading. The EU vote was advisory, and that means, they should have said "Oh okay, the public would like us to look into leaving"....do the study....."okay, we looked. Turns out the UK will be much worse off if we do - here's the study, have a look. So, we thanks for your input, but we've decided it's not in the UKs best interest".
    Instead we constantly get.... they people voted to leave, so we're leaving.
    I'm pretty sure the people didn't vote the make rich people richer, and everyone else poorer, while at the same time removing a whole heap of rights.
  • The witch-hunt of immigrants
    There seems to be a thing, that appears to be taking back control, means "hunting down" anyone not born in the UK. There are cases of people who have been living here for decades, paying taxes, doing jobs we want them to do, and immigration is trying to get them deported. Not only that, but people who came here as very young kids, have lived here all their lives, and they are trying to deport them back to a country they many not even speak the language of.
    This is just evil. Can I have my loving, compassionate country back please?



One last interesting fact. The EU vote was "adversary", which means by definition many might have thought it didn't matter, and so may not have voted. This again calls into question how such a vote can be deemed to be "the peoples choice". In fact, only around 30-odd percent (can't remember  he exact number) of eligible voters, voted to leave. That's hardly the will of the people.
Also, part of the law stated that if "significant" changes in status or rights occurred, then a second vote would be required. This kind of means that by definition, any "final deal" needs a second vote. It's in the EU vote bill, and it hasn't been withdrawn, so it's still valid.

So.... I'm struggling to find something they said that IS true. Anyone?



Friday, April 20, 2018

Writing a ZX Spectrum Emulator in GameMaker Studio

So if you followed my last emulator series, you'll know that I built up a lot of caches of shapes (characters and sprites) on demand, and then drew them when required. This works great for old consoles, and computers with character map screens, because on the whole, games tend not to change character set images very often, just the actual character map screen, which referenced these images. Because these kinds of machines have pretty good hardware support, they don't have to resort to shifting bitmaps around, there are much easier ways of doing things.

On a ZX Spectrum however, we have a single bitmap screen, with no hardware support at all. This means as soon as a game scrolls, the whole screen changes, and you'd have to refresh the entire cache. Sure, there would be lots of games that worked just great - Manic Miner, Monty on the Run - single screen platformers for the most part, but nothing that scrolled.

Because of this it means you have to find a way of drawing the spectrum screen from scratch, every frame. A tall order. The spectrum has a resolution of 256x192, or 49,152 "dots". While it's a fair bet that theres more off than on, you would still have to check every pixel to see if you needed to plot anything. Another way of doing this would be to have 256 sprites, of 1x8 pixels in size, with the bits set correctly, then draw pixels 8 at a time. This means you'd be drawing 6,144 sprites - certainly doable, if it wasn't for the attribute map of course. For each 8x8 cell, the Spectrum can change the paper and ink colours (foreground and background), and that complicates things. While we could no doubt draw that number of sprites, it's an open question as to whether we could run the render loop fast enough - while emulating the machine at the same time.

So that as they say, was that. A bitmap screen means we can't do it the way I have been, so there was no real point in even trying.... Then I had a brainwave.... and it's one that I'm still considering the implications of on other emulators.

Sure, we can't render the screen pixel by pixel, but..............and how about this for radical.... so lets not try and cache a screen that changes all the time, but lets put the WHOLE of Spectrum RAM onto a texture, and give the GPU access to everything - and it can convert the actual, raw screen memory on the fly!

I'll let this just sink in a little.......... While you're thinking about that though, here's what a snapshot of ManicMiner looks like as a 256x256 texture (the 48K spectrum having a 64K address space, and where 256x256 = 65536). A spectrum screen is easy to get hold of from an .SNA file, as they are just a pure memory dump and the current register values. So if we take one of these snapshots, and put it onto a texture, this is what it looks like:


As tiny as this is, it really is the WHOLE the ZX Spectrum memory. You can see there are several bands to it, the top section with a thick while line under it, is the ROM (which isn't part of the SNA file, but I've added it), after that the area with spaces and rectangles is the actual screen, and the rest is the game. The Spectrums screen starts at 16384 ($4000 in hex) and is 6144 ($1800) bytes long - it can not be moved. The attribute screen follows it.

So... now that we have a snapshot loaded into a texture (or rather a surface), all we need to do to keep is to keep it it up to date. To do this whenever the spectrum emulation does a POKE() into memory, we also execute a draw_pixel_colour(...) onto the surface image. We plot the value as a grayscale so it's easy to visualise, but we certainly don't have to, we only need a single value. A surface texture like this is actually 4 times the memory we need (ARGB channels, each hold 64K of data). So... if we're going to do this, just how many times a frame will a spectrum need to update the surface? Can we even handle that?

Well, turns out not much - only a few thousand times a frame - probably even less than the cache regeneration on a bitmap game on the C64! And actually, we can refine this even more. Because as the screen is in a fixed location and a fixed size, we don't need to plot any point outside of the screen address range , and this cuts down the pixel requirements even more. First, lets look at how we get a snapshot onto the surface...

/// LoadSNA(filename)
var SNA = buffer_load(argument0);
var add=16384;
var count=0;

// RAM image starts at 27 bytes in....
for(var i=27;i<(49152+27);i++){
    var b = buffer_peek(SNA,i,buffer_u8);
    Poke( add++, b );
    count++;
}
buffer_delete(SNA);
This loads a Spectrum .SNA file, and then copies it into memory using our Poke() command where poke is this....

You can see the address is broken up into an X,Y by using the lower 8 bits as X and the upper 8 bits as Y, and makes it very simple to access this "grid" of data. This now means as a game runs, the GPU "memory" will also be updated. Now comes the really fun part - how can the GPU use this data?

Before getting into decoding a spectrum screen, lets consider what the GPU has to work with. First, it'll get the two triangles we're drawing, and as part of this is the texture coordinates. These 0.0 to 1.0 UV coordinates tell us exactly where in the screen we are on U and V (or X and Y if you like). We then need to convert these 0.0 to 1.0 value into something that can use to access the screen memory. We know the screen RAM is 256x192, so we take the 0.0 to 1.0 value on U and multiply it by 256.0 giving us the X coordinate, and then take the 0.0 to 1.0 on V and multiply it by 192.0, giving us the Y coordinate. We'll then need to floor these as they will have fractions and we want whole values so we can get the actual pixel coordinate. This might sound pretty complicated, but it's pretty simple....

const vec2 Size = vec2(256.0,192.0);

void main()
{
   vec2 pos = floor( v_vTexcoord * Size );
}
This has now converted our UV's out of 0.0 to 1.0 texture space, and into 0 to 255, and 0 to 191 coordinate space giving us proper X and Y coordinates - much better. Now we need to work out the spectrum memory address, that is the address on the screen the UV's are pointing to. This now gets much more complicated... The spectrum screen address requires us to shuffle bits around, and that's very tricky in floating point. To do so, you have to use floor(), mod() and subtraction to isolate the parts you want, and then extract them.


The diagram above shows how to work out a byte address on the spectrum screen, and you can see from this that while the X coordinate is simply the lower 5 bits (0 to 31), the Y coordinate is split up all over the place. The 1 at the top is the base address 16384 = %0100000000000000 in binary being added on.

So first, how do we extract the bits? Well to get the top two bits of Y, we simply shift them down by 6 bits, or rather since this is floating point maths, we divide by 64.0, then floor() the result. This moves Y7_Y6 down into the the first two bits and the floor() removes the lower bits (which have now become fractions), where we can then scale them up to the correct location later. To get Y2_Y1_Y0, we use mod(8.0), as this gives us the remainder of a divide by 8 (or a shift right 3 if it were integer). Lastly, to get Y5_Y4_Y3, we subtract off the bits we extracted for Y7_Y6, divide by 8.0 and then floor() to remove the lower Y2_Y1_Y0. Once this is all done, we have the bits in a state where we can now reorder them. All this complicated explanation looks like this in code....

float y7_y6 = floor(yy/64.0);                   // upper 2 bits
float y2_y0 = mod(yy,8.0);                      // keep lowest 3 bits
float y5_y3 = floor((yy-(y7_y6*64.0))/8.0);     // middle 3 bits       
Which obviously looks much easier. Now we just have to use these to work out the index into the spectrum screen RAM, and then add on the base address, which we do like this....

float xx_byte = floor(pos.x/8.0); 
float address = 16384.0 + (xx_byte + (y7_y6*2048.0) + (y2_y0*256.0) + (y5_y3*32.0));    
The xx_byte gives us the byte index, rather than the pixel index, and we then simply add that on. But now, we have a value "address", which is the current address in the spectrum RAM we're interested in processing. Pretty sweet!

All we need to do now is write a Peek(address) function for the GPU to get the byte, and we do that by again splitting the X and Y values (as we did on the POKE() in GML), and re-scaling it all back into 0.0 to 1.0 space for a texture lookup.

const vec2 TextureSize = vec2(1.0/256.0,1.0/256.0);
float Peek(float _address)
{
    vec2 index = vec2( mod(_address,256.0), floor(_address/256.0 ) ) * TextureSize;
    return (texture2D( gm_BaseTexture, index )*255.0).r;
}
This will return us the byte of spectrum memory from the screen. If we just used this, we'd get a very blocky version of the screen - like this..


The reason it comes out blocky white rather than a grayscale (as you'd expect), is because our PEEK() routine returns a 0.0 to 255.0 number, and gl_FragColor expects 0.0 to 1.0 values, so it's being saturated down to 1.0 all the time. If we divided the value by 255.0, then we'd get an odd grey-scale version of this screen. However, this isn't what we're after so we'll move on....

Of course, once we have this the next part is to extract the bit we require (since a single byte of RAM is 8 bits). If you remember we removed the pixel index in favour of the byte index to calculate the address, but this time - we want only the bit value (0 to 7), and once we have this, we can extract the correct 0 or 1 from the byte of spectrum RAM - exciting stuff!

//
//  given a byte, and a bit number, return a 0 or 1 if its set/unset
//
float GetBit( float _value, float _bit)
{
    float scaler = pow(2.0, 7.0-_bit);
    return mod(floor(_value/scaler), 2.0);
}
This will extract the bit for us, and now we just need to call it....

float mem = GetBit( Peek(address) ,bit);
gl_FragColor = vec4(mem,mem,mem,1.0);
And this will now give us a fully black and white version of the spectrum screen - direct from it's RAM.


How cool is that!! Now that we have this, it's a small step to get the proper colours - the hard part, as they say....is done. The attribute screen is much simpler, as it's just an X by Y grid of values - and no funny interleave. So this time, you just take the Y pixel position, divide if by 8floor() it, them multiply it by 32 and add on the X byte position, and you have an index into the attribute screen. Add on the base address, and you've got another value to PEEK() with.

You'll then have to split this value into two - ink and paper (which are 0 to 7 values), and while your at it - extract the flash (bit 7) and bright (bit 6) bits.

With this done, you can now lookup the colours - just like we did in the C64 emulator to get real ARGB values, and then depending on if we had a 0 to 1 pixel, use the paper or ink colours.

if( mem!=0.0){
   mem = ink_col;
}else{
   mem = paper_col;
}
gl_FragColor=GetColour(mem);
With this done.... we finally have a real looking ZX Spectrum screen!


Now, there are a couple of extra bits to deal with, bright, flash and the border. We've already extracted the bright bit, so you can handle that easily enough, but flash needs an external input. The GPU has no way of doing "time", so the CPU will have to handle that, and pass in a 0 or 1 depending on the current flash state. You can either do this through constants, or you can pass in a value via a channel in the vertex colours. I opted to use the vertex colour because I also pass in the current border colour in this manner as well, so it works out pretty well.

Speaking of the border.... Because we deal with the spectrum screen in terms of 0 to 255 and 0 to 192, we can simply increase these values and do a screen size of 320x256. This gives us 32 pixels around the whole screen. We can easily detect this inside the shader once we've worked out the X and Y coordinate, and display the border colour when we're in that zone - like so...

// Top and bottom border?
if( yy<32.0 || yy>=224.0 )    
{        
    gl_FragColor = GetColour( v_vColour.r*255.0 );
}
else
{  
   // Side borders?
   if( xx<32.0 || xx>=288.0 )
   {
        gl_FragColor = GetColour( v_vColour.r*255.0 );        
   }
   else
   {
        // process screen...  
   }
}
So unlike the C64 where I simply couldn't afford to draw the border, here the shader does everything, and it barely registers as a blip in the FPS. With the border added, we now have a fully functional ZX Spectrum screen, and after a frame of emulation, we can just draw it using a simple draw_surface(), surrounded by a shader.

Although... it doesn't quite end there..... Just like the C64 emulator, Spectrum programmers were sneaky, and as the raster draws the screen, they will update it, this means by the time the frame has finished, it's probably not the same as it would have looked if we drew it as we went. The game Cobra shows this pretty well...



The reason for the flicker, is because this programmer would draw things in such a way that it didn't flicker, and he didn't have to double buffer the screen, but in doing so, screen RAM at the end of the frame wasn't the final image displayed to the user. In order to get around this, I yet again draw the screen in chunks - 16 pixel high strips this time. I could do single line strips, but unless I'm doing Hires colour simulation (where the attributes are changed every line), I just don't need that, and games never did much of this because it consumed too much time. It should be noted that I could detect that the game has modified the attributes and then flush out a line at a time at that point, and that would allow me to "auto-swap" for Hires colour, but I'm not that fussed here.

So after every 16 scanlines emulated, I draw the next 16 lines on the screen. This works perfectly for my purposes, and makes Cobra look rock solid again.



So, there you go.... a somewhat different approach to displaying an emulated screen, but one that works increadibly well, especially for dynamically changing bitmap screens. I suspect if you did an Atari ST emulator, you could render the screen in much the same way. Anything with hardware assistance is more complicated, but as long as the GPU has access to the hardware registers, this would still work - but the shader might get increadibly large. A C64 shader dealing with sprites, characters and bitmaps - and all the funny modes they can do, would be very cool, but incredibly complex.


Friday, February 23, 2018

CSpect 1.10

New version of CSpect, complete with a new demo!!

CSpect changes
  • -cur to map cursor keys to 6789 (l/r/d/u)
  • Fixed copper instruction order
  • Copper fixed - and confirmed working. (changes are only visible on next line)
  • Copper increased to 2K - 1024 instructions
  • Fixed a bug in the AY emulation (updated source included)
  • Fixed Lowres colour palette selection
  • Added new "Beast" demo+source to the package to show off the copper