Tuesday, October 02, 2018

Advanced programming of the ZX Spectrum Next

So while porting Super Crate Box to the NEXT I decided to break with the norm, and use the hardware more fully. While doing so... I realised I'm probably the first to do so - although we all knew about this when the hardware features were being designed, I think I'm the first to take advantage of it. That being so.... I thought I'd write it up so others could also take advantage.

One of the (many) things I loved about the C64, was the 64k of RAM and address space you had, you could fill that machine with your game, banking out ROMs, VIC, Character sets and IO ports etc and use every part of it! This was awesome. I use this when writing Blood Money on the C64, and it was very liberating. :P

When putting the Next together, the team extended the new 8K memory mapping to allow for the same thing - you can now have a full 64k of program space. This means you can now bank RAM into the lower 16k, and even move the screen location. This is incredible, and really helps you make full use of the machine.

So... how did I do it? Actually... it's fairly simple, but it's nice to see it all laid out. Using the new .NEX format file, I simply had to move code around a bit, and have a small "boot" loader somewhere. Here's the memory map I was aiming for...

$0000-$7FFF - Game Code
$8000-$BFFF - Game Data
$C000-$DFFF - Graphics
$E000-$FFFF - ULA/Timex screen

I will use $C000-$FFFF for other things on demand (like Hardware sprite graphics etc), but this is the main layout. As you can see, it's a much better layout, meaning you can have so much more code, and not fill the middle of memory with 16K of screens. The 8K banks also allows the paging of graphics AND the ULA/Timex screen.

Using the Timex screen means I can double buffer the ULA screen without using the ZX128 shadow screen, and it's much easier to control. it's also important to realise that the NEXT will ALWAYS use bank 5 (banks 10 and 11) for the ULA and Timex screen, even if it's not banked in. This is very cool.

Using SNASM and the new segments I can lay this out easily, but I'll be more generic so you can use another assembler if you want.

First, you need to ORG at $8000 so we can write out BOOT loader. This is incredibly simple

  NextReg $50,Code_Bank  ; $0000-$1FFF
  NextReg $51,Code_Bank+1  ; $2000-$3FFF
  NextReg $52,Code_Bank+2  ; $4000-$5FFF
  NextReg $53,Code_Bank+3  ; $6000-$7FFF
  rst $00

StackEnd:       ds      127
StackStart:     db      0

Technically, I only need to bank in the first bank, and then I could do the rest in the main code block, but for me this is fine. I've set code to use banks 12,13,14 and 15 - which maps to the ZX Spectrum 128 banks 6 and 7. Bank 2 (4,5) is my data, and bank 5(10,11) is the ULA/Timex screen.

I've yet to use Bank 0 (0,1)...so it's free to something else - perhaps tables I need to bank in on demand. I can decide that later

So next I need to set bank 6 (8K bank 12) to assemble to (however your assembler does this), then set the target "ASSEMBLE TO" address to $0000. In SNASM you would do this....

                SEG  CODE_SEG,12:0,$0000    ; create segment (bank 12,offset 0. Assemble to location $0000)
                SEG  CODE_SEG               ; set segment

BootUp          di                          ; RST $00 jumps here....
                jp      StartCode

                ; RST $08

                ; RST $10

                ; RST $18

                ; RST $20

                ; RST $28

                ; RST $30

                ; RST $38
IRQ             ei

                ; Main game start up....

One interesting side effect of this, is that you no longer have to use IM 2 for interrupts. You have access to the hardware vector, which is handy.
Now when I (say) clear the screen, no matter which it is - Timex or ULA, I just bank in the correct bank (10 or 11) to $E000, and then clear - like this.

; ************************************************************************
; Clear the ULA Screen using DMA
; ************************************************************************
  ld a,(ULABank)
  NextReg $57,a   ; bank screen into $E000
                ld      hl,$e000  ; Get the current buffer we're drawing to
                ld de,$e001
                ld      (hl),0
                ld      bc,6143   ; fill the screen
                jp DMACopy

You'll notice I use DMA instead of LDIR, as it's much faster - but that's just an aside. As you can see, I bank in the current buffer, then always target $E000, it's as simple as that.

The only real gotcha is that using the new opcode pixelad returns an address based on $4000, so we need to OR in $A0 to get the proper address. Fortunately, pixeldn works with any base address - which is cool.

Lastly...the main loop and double buffering is pretty simple, and looks like this...

; *****************************************************************************************************************************
; Flip Buffers
; *****************************************************************************************************************************
                ; Flip ULA/Timex screen (double buffer ULA screen)
                ld      a,(ULABank)             ; Get screen to display this frame
                cp      10
                jr      z,@DisplayTimex

                ld      b,10                    ; set target screen to ULA
                ld      a,1                     ; set CURRENT screen to TIMEX
                jp      @DisplayULA

@DisplayTimex:  ld      b,11                    ; set target screen to TIMEX
                xor     a                       ; set CURRENT screen to ULA
@DisplayULA:    out     ($ff),a                 ; Select Timex/ULA screen
                ld      a,b                     ; get bank to render to next frame
                ld      (ULABank),a             ; store...

                jp      ClearULAScreen          ; wipe ULA/Timex screen 

A little aside.... using RST $?? as common functions is very handy, especially as they are smaller than normal calls, and if it's a tiny function, it's also faster than a call by 7 T-States. If Interrupts are disabled, you can also use RST $38 for a larger call, giving you much quicker CALLs to a common function.

So that's the basics. As you can see, it's not that complicated, but the new memory layout is awesome. lots of code without having to bank in overlays, and being able to move the ULA screen AND have it double buffered is just brilliant.

Tuesday, September 25, 2018

Making NEXT Lemmings: Part 5

So between families holidays and work, it was a couple months before I got time to get back to doing anything on Next related, but it was for a good reason - the Next motherboard had finally arrived!

I was hyped, really...REALLY hyped!! I had the extra RAM for it already, and after doing an initial power on I installed it and booted it up.

I spent the evening playing around, trying demos and the like, and much fun was had by all - well, me.
A couple of weeks later, I figured it was time to try and get Lemmings running on it. Needless to say, it didn't just "run". I poked around for a bit, but in the end I had to cut out everything and get some very basic code to run. I then slowly started adding code back in.

If you don't know, this is by far the best debugging method on a system where you've no debugger. Comment stuff out, and if you can, draw to the visible screen or change the border/screen colour as you go, so that when it crashes, you can see how far it got and that'll give you a clue as to where it crashed. Then repeat, moving the colour changes into the new area where it crashed. This is obviously a painful process, but it's utterly reliable, and you will make (very slow) progress.

This image above is actually major progress. I had managed to go from a black screen, to the game loop actually running.  I'd skipped all loading/processing code, and all I was doing was copying the level bitmap to the screen, and copying and panel - or rather where the panel lives in memory.

Next was to get something loading. The panel was the simplest thing as it was just pure data/graphics content, so once that was loading I knew the loading code was good, and could start looking at other things. Next thing I wanted to do, was get interrupts working again, and this is where I hit the issue. Just enabling it didn't work, so I started by creating a very simple handler that just changed the colour, which worked fine. This was odd, as it meant setting up and running the IRQ was totally fine. So I spent a while trying to figure this out then I twigged.... The new NEXTREG instruction wasn't working. So the replacement for this took several instructions to set a value, and the IRQ could happening in the middle of it, and then my IRQ code was then changing it.
So...I had to now remember the current Next Register, and restore it later. Once I'd added this, my IRQs worked fine, and the main code carried on properly, and then I was able to slowly re-enable the rest of the code.

All this took 2 or 3 days to get working, but now it was going, I starting to think that now's the time to start actually doing proper Lemming processing.
I'd been putting this off for a bit because it's....well, boring to code. But a lot of game code is boring, especially when you've done it several times before. Still it had to be done, so I started building a simple state machine for the Lemmings, where it would start out as a Floater, then when it hits the ground either splats, or turns into a walker.

In order to get to this point, I had to define a basic lemming structure like this

; lemmings structure
LemType             rb 1
LemX                rw 1
LemY                rb 1
LemDir              rb 1  ; left or right facing
LemFrameBase        rw 1  ; keep base,count and offset together
LemFrameCount       rb 1  ; so "SetAnim" function is quicker
LemFrameOffX        rb 1
LemFrameOffY        rb 1
LemFrame            rb 1
LemSkillMask        rb 1  ; skill mask
LemBombCounter      rb 1
LemBombCounter_Frac rb 1
LemSkillTemp        rb 9
LemStructSize       rb 0
LemDataSize         equ LemStructSize*MAX_LEM

Structure definition like this is a feature of my assembler SNasm, and means I can load the base of the lemming into IX, then offset using the value above like this LD A,(IX+LEMY). Using IX is pretty slow, but in these cases, there's really no alternative. I'd have loved to find a faster way of doing it, but oh well. once I had this defined, I loaded the base address into IX, and looped around 100 Lemmings - that still being my goal. I then set a basic counter and dropped a lemming out somewhere on screen in "Faller" mode, then hoped.

This is where it started to get complicated. I'd originally planned to to store a bitfield collision mask of the whole level - a bitfield mask being 1 bit per pixel, so packing the whole mask down to around 40K, so that moving up/down was very fast. However, this would mean that not only would I need to remove data from the Level Bitmap, but I'd also need an old school sprite masking routine so that I could add/remove pixels from the bitmask screen as well. I toyed with several ideas about how to lay this out in memory, and what would be fastest/simplest, but in the end I decided to do what the Amiga did, and just use the background screen.

This actually works fairly well, although it does still have the bank swapping issue that the rendering has. But aside from that, it's pretty good.

This has the added benefit that I just have to add or remove things from the level bitmap - which you have to do, and then your Lemmings will just automatically walk over it. This keeps things very simple, and stops anything getting out of sync.

Once I had this working, I was able to churn out the 100 Lemmings and see how fast things were going....

This all worked pretty well, and while a little sluggish, I figured I'd get a lot back once I was able to use the copper to display the panel, and didn't have to copy the whole thing each frame. During this time however, the Next team managed to get the speed back up to 14Mhz again (mostly), so this was a great free boost and the above video shows this 100 Lemmings running at 14Mhz.

The next biggest CPU hog - and the next coolest thing to do, would be to explode a Lemming. This introduces several important functions; panel selection, selecting a Lemming, removing the background, the explosion particle system.

First the panel selection was reasonably straight forward, especially as I opted to use a sprite for the selection box which simplified things a lot. While doing the basic clicking on the panel, I also decided to implement the release rate properly. I had up until now just had a simple frame counter, but actually Lemmings doesn't do it this way. In fact, Lemmings release rate is very odd, a fudge that just gives you a nice curve. Basically, it takes the 0-99 number, inverts it (so its a 99-0 delay), divides by 2 - just because 0 to 99 was nicer than 0 to 49. Then comes the "magic". it negates this value,adds 53, negates again and adds 57. Yeah... i don't know why either. This has however stuck in my mind for years, as it's basically sums up game coding in a nutshell; fudge it till it works. No matter what the code looks like or does, as long as it works the way you want, then go with it. You can see this function in my github repository in level.asm/ConvertToDelay.

Next, I had to be able to pick a Lemming. I started out by doing this simply, that being, just looping over all the Lemmings and finding the first one under the cursor. This however doesn't work very well, you end up picking a Lemming who is mostly out from under the cursor, when in fact you really want the one that's mostly central to it. This means I needed to measure the distance to the middle of the cursor and pick the Lemming closest to that. This take a bit more oomph, but works much better, and feels much more natural.

The next step was a big one, removing background pixels. This is one of the key elements of lemmings, so would be good to get this into place. This is basically an inverse sprite routine, removing pixels instead of drawing. Because this goes into the level bitmap (the 2048x160 bitmap) and not the screen (256x160 screen), I copied the bob draw code from the level creation code, and then whenever I was about to draw a pixel, I'd store a 0 (black) instead.

I then defined an egg shaped blob and attached it to the mouse, and hey-presto! We go this. The code was simpler as you only need to clip top and bottom, as the level has lots of space left/right, which is nice.You'll also notice that the Lemmings are walking through the areas I've cut out due to the fact I use the level bitmap to collide with.

Interestingly, we changed this method in Lemmings 2 so we could have pretty backgrounds that you wouldn't walk over or dig away, as simple Black backgrounds are pretty dull really....

In the next post, I'll go through the explosion code and the nuke function...

Sunday, September 16, 2018

CSpect 1.18

Changes to the debugger. This allows you set breakpoints using the physical address, allowing you to set breakpoints in code not yet paged in. This allows for simpler debugging of proper overlay code.
Also, if your using another assembler, you can use the new -16bit command line to use logical addresses only.

CSpect changes
  • added -16bit to use only the logical address of the MAP file
  • Execution Breakpoints are now set in physical address space. A HEX number or SHIFT+F9 will set a logical address breakpoint. This means you can now set breakpoints on code that is not banked in yet.
  • Next mode enabled when loading a .NEX file

Monday, September 10, 2018

CSpect 1.17.1

Minor update to SNasm. 48K SNAs weren't saving out correctly, and labels under $8000 were whacky. This is now fixed.

SNasm changes
  • Fixed labels
  • Fixed SNA saving

Sunday, September 09, 2018

CSpect V1.17

New version  of CSpect and SNasm with some big changes for those using it to dev. You can now specify segments and save out NEX format packages, allowing you to build "large" packages all from within SNasm.
CSpect will now load the new symbol format and correctly display the proper symbol when that bank is paged in, allowing overlays to work more seamlessly.

CSpect changes
  • ULA colours updated to match new core colours. Bright Magenta no longer transparent by default. Now matches with $E7 (not $E3)
  • Fixed debugger bug where "0" is just left blank
  • New MAP format allowing overlays mapped in. Labels in the debugger are now based on physical addresses depending on the MMU
  • You can now specify a bank+offset in the debuggers memory view (M $00:$0000) to display physical addresses. 
  • Numeric Keypad added for debugger use.
  • EQUates are no longer displayed in the debugger, only addresses. This makes the view much cleaner

SNasm changes
  • SEG command added for Segment control on ZX Spectrum Next machines. Banks are 8K in size.
                                    SEG  NAME,BANK:OFFSET,TARGET_PC ; to create

                                    SEG  NAME              ;to use
  • SAVENEX "name",StartPC,[StackSP] added.
  • New MAP format exported for CSpect

Saturday, August 25, 2018

CSpect V1.16

Okay, so the last version was actually a debug version and so some folk had problems running it. This is a proper release build. There are a couple of additions...

CSpect changes
  • Fixed sprite transparency to use the index before the palette and colour conversion is done
  • -r on the command line will now allow CSpect to remember the window location, so if you move it somewhere, it'll start up there again next time. Just delete the cspect.dat file to reset.

Friday, August 24, 2018

CSpect V1.15

There's been a few folk requesting an updated version so even though it's not a huge change, here it is...

CSpect changes
  • Fixed sprite transparancy index. now reads transparancy colour from sprite palette.
  • Timex Hi-Colour mode fixed
  • M_GETDATE added to esxDOS emulation (MSDOS format)

Bits Description
0-4     Day of the month (1–31)
5-8     Month (1 = January, 2 = February, and so on)
9-15    Year offset from 1980 (add 1980 to get actual year)

Bits    Description
0-4     Second divided by 2
5-10    Minute (0–59)
11-15   Hour (0–23 on a 24-hour clock)

    Sunday, August 19, 2018

    Z80 shifts...

    Being a 6502 boy, I have to continually hunt for the shift opcode that I need. So after doing this for about a year now, I thought I'd knock up a little diagram to make it easy. So on the off chance someone else is in the same situation - here it is.

    Saturday, July 21, 2018

    CSpect 1.14

    CSpect changes

    • -joy to disable DirectInput gamepads
    • Re-added OUTINB instruction (16Ts)
    • Fixed a but with MMU1 mapping when reading, it was reading using MMU0 instead.
    • In ZXNext mode, if you have RAM paged in MMU0, then RST8 will be called, and esxDOS not simulated
    • Palette control registers are now readable (Hires Pawn demo now works)
    • Reg 0 now returns 10 as Machine ID
    • Reg 1 now returns 0x1A as core version (1.10)
    • Layer2 transparancy now checks the final colour, not the index
    • Fixed sprites in the right border area (the 320x192 picture demo)
    • Timex Hires colours now correct (Pawn demo)

    • Re-added OUTINB instruction

    Thursday, July 05, 2018

    CSpect 1.13

    CSpect changes
    • Kempston joystick emulation added using Direct Input. All controllers map to a single port, port $001F = 0FUDLR
    • Fixed CPU on WRITE and on READ breakpoints
    • -port1f command line removed
    • .NEX format now leaves IRQs enabled
    • Removed some of the old opcodes from the debugger
    • Removed MIRROR DE
    • Fixed memory contention disable (so now tested!)