Tuesday, February 11, 2020

RayCasting engine on the ZXSpectrumNext - Part2

Now we had the prototype, the first thing to do was a direct port to Z80. To do this, I went through each line of the C# like this...

// calculate ray position and direction
    // Int16 cameraX = (Int16)(2 * ((xx << 8) / screen_width) - 0X100); //x-coordinate in camera space
    short cameraX = (short)(xx <<<< 2);
    cameraX = (short)(cameraX - 0x100);
and wrote an untested Z80 version  - like this
ld a,0
     ld  (xx),a

     ; Int16 cameraX = (Int16)(2 * ((xx << 8) / screen_width) - 0X100);
     ; cameraX = (short)(xx << 2);
     ; cameraX = (short)(cameraX - 0x100);
     ld   l,a  
     ld   h,0
     add  hl,hl  ; x/128
     add  hl,hl  ; *2
     ld   de,$100
     xor  a
     sbc  hl,de
     ld   (cameraX),hl

I added the C# code in as comments in order to keep track - as there's a LOT of code to put in, it's also a great reference when doing the actual port and looking for bugs.

The next thing I needed was a fast, signed 16bit x 16bit multiply. I got an unsigned one from the Z80 C library, and I then needed to make it a signed version. Signed multiples are easy enough, you simply XOR the top 2 bits of each value, and remember if it's a 1 or not. You then take the ABS() of these values and multiply them using the "unsigned" 16x16 multiple... then on exit, if the xor answer from the start was 1, you negate the answer. Job done.

; ****************************************************************************************
; multiplication of two 16-bit numbers into a 32-bit product
; enter : de = 16-bit multiplicand = y
;         hl = 16-bit multiplicand = x
; exit  : hlde = 32-bit product
;         carry reset
; uses  : af, bc, de, hl
; ****************************************************************************************
     ld   b,l                  ; x0
     ld   c,e                  ; y0
     ld   e,l                  ; x0
     ld   l,d
     push hl                   ; x1 y1
     ld   l,c                  ; y0

     ; bc = x0 y0
     ; de = y1 x0
     ; hl = x1 y0
     ; stack = x1 y1

     mul                       ; y1*x0
     ex   de,hl
     mul                       ; x1*y0

     xor  a                    ; zero A
     add  hl,de                ; sum cross products p2 p1
     adc  a,a                  ; capture carry p3

     ld   e,c                  ; x0
     ld   d,b                  ; y0
     mul                       ; y0*x0

     ld   b,a                  ; carry from cross products
     ld   c,h                  ; LSB of MSW from cross products

     ld   a,d
     add  a,l
     ld   h,a
     ld   l,e                  ; LSW in HL p1 p0

     pop  de
     mul                       ; x1*y1

     ex   de,hl
     adc  hl,bc

With this done, I could now do the basic 16 bit maths I needed like this...

; var rayDirX = dirX + ((planeX * cameraX)>>8);
     ld   hl,(cameraX)
     ld   de,(planeX)
     call SMul_16x16           ; exit  : hlde = 32-bit product
     ld   h,l
     ld   l,d                  ;>>8
     ld   de,(dirX)
     add  hl,de
     ld   (rayDirX),hl

You can see, that once it's fit into 8.8 maths, a lot of of complexity falls away. Aside from the 16x16 multiply, you can see the shift 8 is actually just taking the whole byte from one register to another. This basic process is relatively quick, however you have to do hundreds of them - which is the real speed issue we'll need to tackle later.

There's a few of these "blocks" to convert, but the biggest target was the delta stepping. It's important to get that as fast as possible. There are 3 different stepping functions, X axis, Y axis, and a general that moves across both axis at once - this is the one that'll be hardest to optimise and keep the speed up with. It's important to get this one as fast as possible, because stepping across the map until you hit a block will be executed hundreds if not thousands of times, especially in large open rooms.

So here's the C# code I need to port.....
while (true)
          //jump to next map square, OR in x-direction, OR in y-direction
          if (sideDistX < sideDistY)
               sideDistX += deltaDistX;
               mapX += stepX;
               side = 0;
               sideDistY += deltaDistY;
               mapY += stepY;
               side = 1;

           //Check if ray has hit a wall 
           int map_index = (mapY * MAP_WIDTH) + mapX;
           last_tile = map.worldMap[map_index];                    
           if (last_tile != 0) break;
I spent some time fiddling with register layouts and the rest, trying to keep it all in registers as memory access is painful.
; --------------------------------- General ---------------------------
       ; while (true)
       ; jump to next map square, OR in x-direction, OR in y-direction
       ld   a,(mapX)
       ld   c,a
       ld   a,(mapY)
       ld   b,a
       ld   a,(stepX)
       ld   d,a
       ld   a,(stepY)
       ld   e,a
       ld   hl,(sideDistX)   ; 16
       ld   iy,(sideDistY)   ; 20
       ld   de,(deltaDistX)  ; 20
       ld   bc,(deltaDistY)  ; 20
       ld   ixl,$30          ; side
       xor  a                ; and at the end of the loop clears carry
       ; if (sideDistX < sideDistY)
       ld   a,l              ; 4
       sbc  a,iyl            ; 8
       ld   a,h              ; 4
       sbc  a,iyh            ; 8

       jr   nc,@ix_greaterthan 
       ; sideDistX += deltaDistX;
       add  hl,de            ; 11
       ;mapX += stepX;
       exx                   ; 4
       ld   a,c              ; get mapX
       add  a,d              ; add stepX
       ld   c,a

       ; side = 0;
       ld   ixl,$30          ; 9Ts  ($30 for $3000 base address)
       jp   @skip_branch

       ;sideDistY += deltaDistY;
       add  iy,bc            ; 15Ts
       ;mapY += stepY;
       ld   a,b              ; get mapY
       add  a,e              ; add stepY
       ld   b,a

       ; side = 1;
       ld   ixl,$20          ; 9Ts  ($20 for $2000 base address)
       ld   a,c              ; mapX

       ld   h,b              ; mapY
       ld   l,0
       srl  h                ; *64
       rr   l
       srl  h
       rr   l
       add  hl,Map           ; 16
       add  hl,a             ; A already mapX
       ld   a,(hl)           ; get map entry
       and  a
       jp   z,@KeepLooping

       ld   (lastblock),a  
       ld   a,ixl
       ld   (side),a
       ld   a,b
       ld   (mapY),a
       ld   a,c
       ld   (mapX),a
So you can see I've managed to keep it all in registers - even though I had to use the alt set, and ix and iy. But that's still much faster than saving values, and reloading others from memory. The X and Y axis ones are similar, but without the branches and doesn't need as many registers. The last part is simply working out how how to draw the column and drawing it. This is simply a case of working out the screen address and plotting a vertical line, clipping to the top and bottom of the screen - simple compared to the rest of the stuff we've just done! Once that's done - and once I spent a day or so debugging it and getting it all working, I was left with this....

This was the first publicly shown version, and took about a month and a half of my spare time to get running (more or less).

Monday, February 03, 2020

RayCasting engine on the ZXSpectrumNext - Part1

So I've been making some good progress on my Ray Casting Engine - which is technically what a Wolfenstein engine is, and I thought it'd be fun to write how I did it. It's a large, complex engine and didn't come about over night. In fact, as I'd never written one of these before, I spent about a month (in the evenings) working out how to do it in the first place, before even starting on any Z80.
Believe it or not, I actually wrote 3 different versions before touching Z80, and then a further 2 to help debug it. but more on that later....

I did buy the Wolfenstein engine book, but actually, didn't really need it. What I ended up using was this website.

This was a great place to start, as it gives a workable demo, but I needed a framework to put it into. I went with GameMaker: Studio 2, as I'm intimately familiar with it, and it meant i could jump right in. This gave me a screen size of 640x480 (same as the demo)

Once I'd cut and paste (pretty much), the example into GameMaker, I could start to try and figure out how it was working. The goal was to get the maths down to 8.8 fixed point, so that it would fit in Z80's 16bit registers, and I could handle the maths quickly. But before doing that, I needed to do a 16.16 fixed point version. Doing this meant I could verify that it worked, without getting too close to the maths limits, as 8.8 would be cutting pretty close. In fact these were both 15.16 and 7.8 "signed", as vectors etc would be in all directions, and so could be negative.

Converting to fixed point is pretty straight forward, basically for all 16.16 numbers,  you have to multiply all numbers by 65536, or use a <<16. So 15.453 would be 15.453<<16 which is 1,012,727 or $F73F7 in HEX. I personally think of all numbers in hex, as $F73F7 ( $000F_73F7 ), where $000F is the whole number (which is 15), and $73F7 is the fraction. This is perfect, because it means to get the whole number, you just have to take the upper 16bits, usually by doing a >>16.

There's a heap of pages on fixed point maths, so I'll just say here that the basics are when you multiply a 16.16 number by another 16.16 number, you then >>16 to get the final answer. So...

var a = $F73F7;            // 15.453
var b = $83687;            // 8.213
var ans = (a*b)>>16;    // == $7EEA54  (126.9153)

And that's basically it. So after converting to 16.16 (as shown below), I was happy to try and get it into 8.8

This simply meant copying the above code and doing shifts of 8 rather than 16. I did also have to reduce the screen size to 128x76, down from 640x480 that the demo used. The original Wolfie used 304x152, we're a little under quarter the size. But since I can only hold a number from 0 to 127 in 7.8, and as I'm targeting the Spectrum Next's "lores" mode (127x96), then this all fits pretty snugly.

Once I swapped everything over to use >>8 and <<8 type maths, I started noticing the odd missing line on the screen. This turned out to be when DX or DY was 0 exactly - stepping on the axis exactly. This is due to the maths no longer being accurate enough. There are times when you do a 1/X so that you can avoid lots of divisions (i.e. 10*0.5, is faster than 10/2 as 1/2 = 0.5). In 7.8 fixed point you really need to do $10000/X, but that's out of range, so I'm stuck with doing $7fff/X. This has knock on effects, but you can cope with them later. Note: technically it's $100, but as you need to shift up by 8 before doing an actual divide in fixed point, that makes it $10000, which is what is out of range, so you're stuck with $7FFF.

After porting to 7.8 I did hit another issue, rotating the player's view was going "nuts". This was because the original demo rotated a vector constantly, and while floating point could handle the accuracy, 7.8 just "drifted" and these vectors stretched and went bizarre. To combat this, I created a table of 256 angles using floating point, and then taking it down into 7.8 fixed point for storing. This means the player now has an "angle" he's facing, and I simply look up a perfect vector for that angle.

Once I had these issues fixed up, I started to do a Z80 port, only to discover GameMaker doesn't really do the job of fixed point-ing properly. This is due to the typeless nature of GameMaker, and that many calculations are either done in doubles, or 64bit. This isn't useful for my needs, I need something very strongly typed, so that I can make sure it all "fits", before taking the leap into Z80.

So..... I needed a C/C++ or C# framework, where I could use Int16's directly. I decided to rip the guts out of my #CSpect framework, and use that to give me a basic bitmap and keyboard input. I then ported ALL GameMaker code (bot 8.8 and 16.16 versions!) over to C# so I had a good debugging framework.

One extra bit I needed to do in C#, was to deal with signed >>8. C/C++ and C# doesn't do signed shifts the way Z80 does, so I wrote a small signed shift right function to use instead of a simple >>8. This will be replaced in Z80 with actual shifts if needed, although usually  you just take the top 16 bits directly and need no shifts at all.

The signed shift function isn't fancy, it's just designed to work as I'd expect....

Once all THAT was done.... Actually, I want to just pause here. It may seem like I'm just through stuff together at a great rate of knots, but saying "I'll just do this.... there, done that", actually this all took some time. In all, I spent about a month, reading about the engine, and getting these prototypes up and running. It's important to know there is effort involved in this like this, no matter how experienced you are, you still need to do the grunt work. And this is all BEFORE doing any Z80 at all really!!

Now that all this "prep" work was done..... I'd figured out how the engine worked - mostly, I have a prototype that I could step through along with the Z80 one, so that I could see what answers the Z80 should be giving - this is invaluable on any complex bit of code (and if you don't fully understand the engine). My goal is to almost line for line port the C#, so that should mean the answers and variables should get exactly the same answers. So stepping the Z80 and C# will give exactly the same answers, and if they don't, then the Z80 is wrong and I'll be able to figure out why.

Now comes the hard bit.... writing the Z80 port!

Monday, January 06, 2020

#CSpect V2.12.5

Fixed plugin loading. This was due to the new loading/reset system that was nuking loaded plugin mappings.
I've also added a "Tick" to the plugin interface which gets called once per emulated frame, along with a debugger call which allows you to tell CSpect to enter the debugger. This is handy if you need to debug the operation that "just" happened.
Lastly... I've included a very simple plugin example, along with the interface .CS files to look at.

#CSpect V2.12.4 changes
  • Border now comes from fallback colour if paper/ink mode is 255 (as per hardware)
  • Fixed Plugin loading. Was broken due to new system loading+resetting.
  • Added new "Tick" to iPlugin interface, called once per emulated frame
  • Added Debugger() call to CSpect interface, allowing you to enter the debugger from the emulator
  • Added Plugin example (and current interface)

Friday, January 03, 2020

#CSpect V2.12.4

Another minor fix to fix a memory access bug, and a new MMC folder issue that appeared in the last couple of versions.

SNasm 2.0.21
  • Added “SLL” instruction

#CSpect V2.12.4 changes
  • Fixed memory reading of Layer2 mapping in $2000-$3fff
  • Fixed MMC path, it was being reset when loading SNA/NEX from the command line.

Wednesday, December 25, 2019

#CSpect V2.12.3

This is a minor update to fix Lowres and ULA scrolling. ULA X scrolling now uses the 2 new NextRegs - $26+$27.
It also fixes an annoying startup issue where if you didn't start in the EXE path, it just wouldn't start properly. This was due to the new plugin system, but should now be fixed. (fingers crossed). I've also added the missing LDWS instruction to SNASM

SNasm 2.0.20
  • Added “LDWS” instruction

#CSpect V2.12.3 changes
  • Lowres scrolling fixed.
  • Added ULA Scrolling registers $26 and $27.
  • Fixed HL being set to $10000 after reading a file at the end of memory...
  • Fixed a command line startup issue when not started in the EXE path

Sunday, December 22, 2019

#CSpect V2.12.2

This update is a little different, as it add's the ability for users to write their own "plugins". These plugins can take over memory read/write actions, port in/out's and Next Register access - or all of the above!
They can also query the Next's memory, ports, Next registers and Z80 registers, making it pretty useful for folk to add new toys, or add things like logging or profiling etc.

#CSpect V2.12.2 changes
  • Changed 320x256 mode to use Y/X orientation (0,0)=$0000, (1,0)=$0100, (2,0)=$0200. (0,1)=$0001,(0,2)=$0002 etc...
  • Plugin system can now get/set Z80 registers
  • A better reset on load of an SNA/Nex file. Should be a complete system reset now...
  • 320x256 Layer 2 screen mode added
  • NextReg $70 added ($10=320x256 mode)
  • NextReg $71 added. MSB of Layer 2 XScroll added
  • Added Port 0x123B extended memory mapping mode added - using bit 4 to select (untested)
  • A new plugin system added, so that folk can add their own toys or support custom hardware. See CSpectReadme.txt

Friday, December 06, 2019

#CSpect V2.11.9

EDIT: Updated to V2.11.10 - Couple of new command line options for the next dev team

This update is really just to release fixes the Next team have had for a while. With all the changes to NextOS, and the hardware changes, I figured I should release this even though 60Hz mode isn't working right. So most things should work fine, but 60hz mode certainly has issues....

SNasm V2.0.19 changes
  • You can now reference a local label. i.e. LD HL,LABEL@LOCAL

#CSpect V2.11.10 changes
  • added -major and -minor to let you set the CORE version number
  • added -emu to force the setting of the emulation bit in nextreg 0
  • added debugger command "nextreg ," so you can set a next register from the debugger

#CSpect V2.11.9 changes
  • Updated audio to be 16bit so DAC isn't squished to oblivion. (should just sound better in general)

#CSpect V2.11.8 changes
  • Fixed "all RAM mode". 2 configurations weren't working right.
  • Core version number updated to 3.0
  • -60 now does proper 60hz, including reduced lines and proper 60hz audio *not working fully yet*
  • -com2=??? added. You can now have a second UART that can be used to send to a real Pi
  • port 0x153b (bit 6) added to switch UARTs from Wifi to Pi
  • Reg $7f now defaults to $FF on power up
  • When no UART2, now returns $FF
  • .ROM files (for OS loading) now come from the same path as the IMG file, rather than program folder
  • Mono "textmode" added
  • CP/M mode now works
  • Proper 60Hz Layer2 and Sprites now working
  • Coms with the Pi (and it not being there), no longer crashes CSpect.
  • null comport now returns 0, as per hardware.
  • Tilemap clip window right edge fix - I think.
  • 1 Bit Tilemaps should now work when tiles are pre-defined in bank 10/11

Saturday, September 14, 2019

#CSpect V2.11.1

This update is to add in (most) of the new hardware/registers that the latest Spectrum Next firmware has added - along with the new OS which uses some of it. There are some amazing advancements in this firmware. 14Mhz CPU speed all the time, 28Mhz copper and 14Mhz DMA, along with things like smooth scrolling the ULA screen. Brilliant update from the Next team.

#CSpect V2.11.1 changes
  • Fixed extra pixel in wrapping scroll of ULA
  • Fixed label on/off
  • Fixed lowres vertex clip "smear"

#CSpect V2.11.0 changes
  • Fixed -tv command line, and made it actually switch off all shader usage
  • if bit 6 in attrib 3 is not set, #CSpect now properly ignores attribute 4
  • Copper now runs at 28Mhz
  • The CPU no longers slows down over the screen
  • ULA can now scroll in 1/2 pixels
  • ULA Shadow screen now works with Layer 2
  • Reg 0x69 added (Layer 2 enable, ULA shadow mirror, and port bits 0-5 goto port 0xFF)
  • Copper Reg 0x63 added
  • DMA is now always 14Mhz...
  • 4 bit lowres mode added (reg 0x6A)
  • Added $123b Read mode
  • Added $123b 48K mapping mode
  • Reg 7 is now R/W, and bits 4/5 now set correctly
  • Window regsiters are now R/W

Friday, August 23, 2019

#CSpect V2.10.1

#CSpect now uses a shader to render, and I have added a TV shader which you can adjust using page up/down and home/end. This gives a nice retro look to games.

#CSpect V2.10.1 changes
  • Fixed a crash in relative sprites where the X coordinate could go negative
  • Added +3 "all ram" mode. (ZX81 games now work on the Next SD card)
  • ay8912.dll no longer obfuscated. Didn't realise this was still getting obfuscated. This means you can now compile the DLL yourself using the supplied source.
  • Fixed the fake UART and Wifi - was ignoring a basic AT command.
  • Added a couple of new AT commands to the UART. AT+CIPSTA? will now return your IP address.
  • AT+CIPDNS_CUR? will return googles DNS server - coz why not.
  • AT+GMR will return n00160901 - which is an older version of the wifi chip I believe - any other preference?
  • Fixed the default MMC path (when you don't specify one) on the Mac/Linux. Now uses .Net Path.PathSeparator...
  • Any unknown UART command is now sent to the LOG file
  • CTRL+F1 now enables/disables the TV shader (-tv on the command line to disable)
  • PageUp+PageDown now makes the TV lines less/more visible
  • Home+End now adjusts the TV shader "blur"
  • Fixed a shader crash on OSX (Probably Linux as well)

Monday, July 15, 2019

#CSpect V2.9.2

Minor update and fixes. I have also added the parallax scrolling demo I've written as a little demo...

#CSpect V2.9.2 changes
  • Added sprite clipping in over border mode (1 = enabled) (bit 5 of reg $15)
  • Composite/Unified mode visibility flag fixed
  • Parallax scrolling demo (and .BAT file to run it) added to CSpect distro.