The life of a Games Programmer

Thursday, August 10, 2017

Version 0.4 of CSpect

This is a new release of my CSpect emulator with yet more ZX Spectrum Next features.

You can now set sprite and screen order priorities, and the included Layer 2 scrolling demo+source shows how to do this.
Full memory banking has been added, so you can now use the full 1MB of RAM that will come with the next! (well....subject to change, and esxDOS pinching some). Again the Layer 2 demo shows this function in the utils.asm file. The comments also show the general layout of memory, so you should take a look there before using the paging.

Layer 2 double buffering! You can now have 2 Layer 2 buffers and page in either to $0000-$3ffff using the new bit layout in port $123b - see the included read me. for more details. To flip buffers, you just need to swap register 18 and register 19 values around.

There are also several new Z80 opcodes to play with! My included assembler will assemble these correctly for use in the emulator. There are a few more to come.....

   swapnib    ED 23    A bits 7-4 swap with A bits 3-0
   mul        ED 30    multiply HL*DE = HLDE (no flags set)
   add  hl,a  ED 31    Add A to HL (no flags set)
   add  de,a  ED 32    Add A to DE (no flags set)
   add  bc,a  ED 33    Add A to BC (no flags set)
   outinb     ED 90    out (c),(hl), hl++
   ldix       ED A4    As LDI,  but if byte==A does not copy
   ldirx      ED B4    As LDIR, but if byte==A does not copy
   lddx       ED AC    As LDD,  but if byte==A does not copy, and DE is incremented
   lddrx      ED BC    As LDDR,  but if byte==A does not copy

The file system will (should) also now accept '*' or '$' as setting the current drive. (untested)

I have also started to add some basic Audio, so the spectrum beeper now works.

Lastly.... the Sprite shape register has switched from port $55 to port $5B

Download: CSpect Emulator

Saturday, July 15, 2017

New CSpect ZX Spectrum Next emulator

UPDATE: I've now added register setting, and memory window set/moving. You can also set breakpoints from the input line, and push/pop values.

I've been working away on a new debugger for my CSpect emulator, and it's now ready to use, so here it is along with my assembler (SNasm) and a sample showing how to use it all. Please read the readme and be aware it may go pop at any time. Aside from that -have fun!!

Download: CSpect Emulator

Thursday, July 13, 2017

Z8410 DMA chip for the ZX Spectrum Next

So the ZX Spectrum Next team have added a Z80 DMA chip, the Z8410 chip to be more precise. This was popular add on in Europe ( more info here: http://velesoft.speccy.cz/data-gear.htm ), and has some very cool advantages for future games, especially because it's now a standard part. However information on it is a little thin on the ground, so after some major googling, I've found the info on them.

The DMA port can be 11 or 107, and it can transfer around 865k per second
From the web page above: Max. speed of data transfer on ZX128+ is...

17.3 kB(17727 bytes) / frame = 865.6 kB(886350 bytes) / second.
Or.... 1 byte every 4 T-States.

Now here is the really cool bit, because it's tied to the clock speed, when you speed up the NEXT (as it can go 7Mhz, 14Mhz and 28Mhz), the DMA will also speed up! This is amazing, as it means you'll be able to transfer 1.73Mb/s, 3.46Mb/s, and an astounding 6.92Mb/s! At 28Mhz you won't be using Layer 2, and that means you could copy the original Layer 1 screen in just 13.5 scanlines, which is incredible.

The registers are below, and here is the Datasheet

Saturday, July 01, 2017

esxDOS File access

So I was about to start adding some very basic "simulation" support for files into my emulator, and I'd lost track of the API details as they were listed on Facebook. Facebook isn't a great place to have technical discussions as you can't search later to find the stuff you need!

Fortunately I had it saved off, so before I (and everyone else) loses it, I thought I'd put it up here. I'll do a little ASM lib with all this later.
I'll also extend this if/when I discover more commands, as there appears to be very little info about the esxDOS API

;
; NOTE: File paths use the slash character (‘/’) as directory separator (UNIX style)
;

M_GETSETDRV  equ $89
F_OPEN       equ $9a
F_CLOSE      equ $9b
F_READ       equ $9d
F_WRITE      equ $9e
F_SEEK       equ $9f
F_GET_DIR    equ $a8
F_SET_DIR    equ $a9

FA_READ      equ $01
FA_APPEND    equ $06
FA_OVERWRITE equ $0C

; Function: Detect if unit is ready
; Out:      A = default drive (required for all file access)
;           Carry flag will be set if error.
GetSetDrive:
             xor  a               ; A=0, get the default drive
             rst  $08
             db   M_GETSETDRV             
             ld   (DefaultDrive),a
             ret
DefaultDrive db   0



; Function:  Open file
; In:        IX = filename
;            B  = open mode
;            A  = Drive
; Out:       A  = file handle
;            On error: Carry set
;                      A = 5   File not found
;                      A = 7   Name error - not 8.3?
;                      A = 11  Drive not found
;                      
fOpen:
             ld   a, (DefaultDrive)  ; get drive we're on
             ld   b, FA_READ         ; b = open mode
             ld   ix,FileName        ; ix = Pointer to file name (ASCIIZ)
             rst  $08
             db   F_OPEN             ; open read mode
             ret                     ; Returns a file handler in 'A' register.



; Function:  Read bytes from a file
; In:        A  = file handle
;            ix = address to load into
;            bc = number of bytes to read
; Out:       Carry flag is set if read fails.
fRead:
             ld   ix, 16384          ; ix = address where to store what is read
             ld   bc, 6912           ; bc = bytes to read
             ld   a, filehandle      ; a  = the file handler
             rst  $08
             db   F_READ             ; read file
             ret


; Function:  Write bytes to a file
; In:        A  = file handle
;            ix = address to save from 
;            bc = number of bytes to write
; Out:       Carry flag is set if write fails.
fWrite:
             ld   ix, 16384          ; ix = memory address to save from
             ld   bc, 6912           ; bc = bytes to write
             ld   a, handle          ; a  = file handler
             rst  $08
             db   F_WRITE            ; write file
             ret


; Function:  Write bytes to a file
; In:        A  = file handle
; Out:       Carry flag active if error when closing
fClose: 
             ld   a, handle           ; a  = file handler
             rst  $08
             db   F_CLOSE
             ret



; Function:  Seek into file
; In:        A    = file handle
;            L    = mode:  0 - from start of file
;                          1 - forward from current position
;                          2 - back from current position
;            BCDE = bytes to seek
; Out:       BCDE = Current file pointer. (*does not return this yet)
;            
fSeek:   
             ld   a,handle           ; file handle
             or   a                  ; is it zero?
             ret  z                  ; if so return
             ld   l,0
             ld   bc,0
             ld   de,0
             rst  $08           
             db   F_SEEK
             ret



; Function:  SetDirectory
; In:        A    = Drive
;            HL   = pointer to zero terminated path string ("path",0)
; Out:       carry set if error
;            
SetDir:
             ld   a,(DefaultDrive)   ; drive to change directory on
             ld   hl,Path            ; point to "path",0 to set
             rst  $08           
             db   F_SET_DIR  
             ret



; Function:  Get Directory
; In:        A    = Drive
;            HL   = pointer to where to STORE zero terminated path string
; Out:       carry set if error
;            
GetDir:     
             ld   a,(DefaultDrive)   ; drive to get current directory from
             ld   hl,Path            ; location to store path string in
             rst  $08           
             db   F_GET_DIR  
             ret

Sunday, June 04, 2017

ZX Spectrum Next - Bitmap example disassembly

I've been having a poke around in the ZX Spectrum Next Bitmap example. It's mostly clear, but one or two things are....odd.
This is what I have so far.

UPDATE: Banking registers now confirmed by Jim Bagley

; Disassembly of the file "bitmaps\BMPLOAD"
; 
; BMP Port $123b
;     bit 0    -    Write enable. (banks in 16K from $0000-$3fff)
;     bit 1    -    bitmap ON
;     bit 4    -    Layer 2 below spectrum screen
;     bit 6-7  -    Which 16K bank to page into $0000-$3ffff
;
; On entry, HL = arguments
;

;
; Since dot commands run at $2000, and the Layer 2 is paged into $0000-$3fff
; It needs to have some code above the lower 16K to actually copy into layer 2
; So copy up the "copy" routine, and it'll page in/out the layer 2 bank
;
2000 226221    ld      (HL_Save),hl     ; Store HL
2003 216421    ld      hl,UploadCode    ; Src  = 
2006 110060    ld      de,6000h         ; dest = $6000
2009 011f00    ld      bc,001fh         ; size = 31 bytes
200c edb0      ldir    

200e 2a6221    ld      hl,(HL_Save)     ; get HL back
2011 7c        ld      a,h
2012 b5        or      l                ; is hl 0?
2013 2008      jr      nz,201dh         ; if we have an argument, carry on

2015 213121    ld      hl,2131h         ; get message
2018 cd1d21    call    PrintText        ; print the filename
201b 1823      jr      Exit             ; exit

; Load file....?
CopyFilename:
201d 112421    ld      de,FileName      ; "filename.ext" text - space to store filename?
2020 060c      ld      b,0ch            ; b = $0c (max length of allowed filename - no path it seems)

; Copy, and validate characters in filename
2022 7e        ld      a,(hl)           ; first byte of filename
2023 fe3a      cp      3ah              ; is it a ":"?  End of filename
2025 2810      jr      z,EndFilename    ; if so end of filename  
2027 b7        or      a                ; 0?
2028 280d      jr      z,EndFilename    ; if so end of filename
202a fe0d      cp      0dh              ; newline?
202c 2809      jr      z,EndFilename    ; if so end of filename
202e cb7f      bit     7,a              ; over 127?
2030 2005      jr      nz,EndFilename   ; if so end of filename
2032 12        ld      (de),a           ; copy over to filename cache
2033 23        inc     hl               ; next src letter
2034 13        inc     de               ; next dest letter
2035 10eb      djnz    2022h            ; copy all
EndFilename
2037 af        xor     a                ; Mark end of filename
2038 12        ld      (de),a           ; store filename
2039 dd212421  ld      ix,FileName      ; get filename base address
203d cd6220    call    LoadFile:

Exit:
2040 af        xor     a                ; no error
2041 c9        ret                      ; exit


; esxDOS detect if unit is ready
DetectUnit:
2042 af        xor     a                ; Detect if unit is ready
2043 cf        rst     08h              ; call esxDOS
2044 89        db      $89              ; M_GETSETDRV

; Open
; IX= Filename (ASCIIZ)
; B = FA_READ       ($01)
;     FA_APPEND     ($06)
;     FA_OVERWRITE  ($0C)
;
OpenFile:
2045 dde5      push    ix               ; ix = filename
2047 e1        pop     hl               ; If a DOT command (rather than normal memory), uses HL not IX
2048 0601      ld      b,01h            ; b = FA_READ
204a 3e2a      ld      a,2ah            ; a = unknown  (a=0 for open)
204c cf        rst     08h              ; call esxDOS
204d 9a        db      $9a              ; F_Open
204e 325620    ld      (2056h),a        ; Store file handle  (self modify read from file code)
2051 c9        ret                      ;


; esxDOS command - Read from file - command $9d
; IX = address to load into
; BC = number of bytes to load
; A  = file handle
ReadBytes:
2052 dde5      push    ix               ; IX = where to to store data
2054 e1        pop     hl               ; get address to load into
2055 3e00      ld      a,00h            ; $00 is self modified
2057 cf        rst     08h              ; call esxDOS
2058 9d        db      $9d              ; F_Read
2059 c9        ret     

; esxDOS command - Close File - command $9b
; A = file handle
CloseFile:
205a 3a5620    ld      a,(2056h)         ; Get open file handle 
205d b7        or      a                 ; is it 0? (did it open)
205e c8        ret     z                 ; if file handle is 0, return
205f cf        rst     08h               ; Call esxDOS
2060 9b        db      $9b               ; F_Close
2061 c9        ret     

LoadFile:
2062 dde5      push    ix                ; remember filename
2064 cd4220    call    DetectUnit        ; detect unit and open...?!?!?
2067 dde1      pop     ix                ; get filename back
2069 cd4520    call    OpenFile          ; OpenFile - again??
206c dd21e720  ld      ix,BMPHeader      ; Read the BMP file header
2070 013600    ld      bc,0036h          ; read header ($36 bytes)
2073 cd5220    call    ReadBytes
2076 dd218321  ld      ix,BitMap         ; read block into $2183
207a 010004    ld      bc,0400h          ; read palette - 1024 bytes
207d cd5220    call    ReadBytes

;
; Convert the 24bit palette into a simple RRRGGGBB format
;
ConvertBMP:
2080 218321    ld      hl,BitMap         ; Get buffer address ($2183)
2083 11003f    ld      de,3f00h          ; Dest address of converted palette
2086 0600      ld      b,00h

ConvertionLoop:
2088 7e        ld      a,(hl)            ; get BLUE byte
2089 23        inc     hl                ; move on to green
208a c620      add     a,20h             ; brighten up a bit? (blue is always pretty dark??)
208c 3002      jr      nc,SkipSatB       ; overflow? 
208e 3eff      ld      a,0ffh            ; if overflow, then saturate to $FF

SkipSatB:
2090 1f        rra                       ; get top 2 bits only  RRRGGGBB
2091 1f        rra     
2092 1f        rra     
2093 1f        rra     
2094 1f        rra     
2095 1f        rra     
2096 e603      and     03h               ; and store them at the bottom
2098 4f        ld      c,a               ; c holds current byte

2099 7e        ld      a,(hl)            ; get GREEN
209a 23        inc     hl                ; move onto red
209b c610      add     a,10h             ; brighten up a bit as well
209d 3002      jr      nc,SkipSatG       ; if no overflow, skip saturate
209f 3eff      ld      a,0ffh

SkipSatG:
20a1 1f        rra                       ; get 3 bits of green into right place
20a2 1f        rra     
20a3 1f        rra     
20a4 e61c      and     1ch               ; mask off remaining lower bits
20a6 b1        or      c                 ; merge with output byte
20a7 4f        ld      c,a               ; store into output

20a8 7e        ld      a,(hl)            ; get RED
20a9 23        inc     hl                ; move to next byte of colour (assuming alpha)
20aa c610      add     a,10h             ; brighten up a bit
20ac 3002      jr      nc,SkipSatR       ; no overflow?
20ae 3eff      ld      a,0ffh            ; Saturate to $FF

SkipSatR
20b0 e6e0      and     0e0h              ; keep top 3 bits
20b2 b1        or      c                 ; merge with output pixel
20b3 12        ld      (de),a            ; store converted pixel
20b4 13        inc     de                ; move to next ouput pixel
20b5 23        inc     hl                ; move to next BGRA pixel
20b6 10d0      djnz    ConvertionLoop    ; c = pixel....?!?!?!

20b8 06c0      ld      b,0c0h            ; bc=$c000

ConvertUploadLoop:
20ba c5        push    bc
20bb dd21003e  ld      ix,3e00h          ; $3e00 = Destination address
20bf 010001    ld      bc,0100h          ; read 256 bytes of data
20c2 cd5220    call    ReadBytes         ; Read from file

;
; convert 256 value palette index into actual RGB byte pixel using palette lookup
;
20c5 2e00      ld      l,00h             ; l = xx (loop counter)

CopyLoop:
20c7 263e      ld      h,3eh             ; hl = $3exx
20c9 5e        ld      e,(hl)            ; e = Get palette index
20ca 163f      ld      d,3fh             ; d = $3f palette base address - 256 byte aligned
20cc 1a        ld      a,(de)            ; a = palette value (24bit converted downto 8bit)
20cd 265b      ld      h,5bh             ; hl = $5bxx converted 256 byte buffer
20cf 77        ld      (hl),a            ; ($5bxx) = converted colour pixel
20d0 263e      ld      h,3eh             ; hl = $3e00
20d2 2c        inc     l                 ; do 256 bytes of this....
20d3 20f2      jr      nz,CopyLoop

20d5 c1        pop     bc                ; bc = $c000
20d6 c5        push    bc
20d7 cd0060    call    6000h             ; block transfer 256 bytes of the bitmap
20da c1        pop     bc
20db 10dd      djnz    ConvertUploadLoop
20dd 013b12    ld      bc,123bh          ; Bitmap port
20e0 3e02      ld      a,02h             ; 2 = enable and visible
20e2 ed79      out     (c),a             ; switch bitmap layer 2 on
20e4 c35a20    jp      205ah

BMPHeader:     ds   54                   ; $20e7 to $211c

PrintText:
211d 7e        ld      a,(hl)            ; get character
211e 23        inc     hl                ; move to next one
211f b7        or      a                 ; is this 0? 
2120 c8        ret     z                 ; if so... end of string
2121 d7        rst     10h               ; outchr()
2122 18f9      jr      211dh             ; print all characters

;
; Looks like data of some kinds
;
FileName: db  "filename.ext",0
Message:  db  ".picload  to load image to background",$d,$00
HL_Save   dw  0

;
; Copied up to $6000
;
UploadCode:
2164 05        dec     b                ; b = upper byte of memory address
2165 78        ld      a,b              ; get 256 byte page into a
2166 e63f      and     3fh              ; ignore top bits
2168 57        ld      d,a              ; de=$0000-$3ffff 
2169 1e00      ld      e,00h            ; Page bitmap block into lower 16K
216b 78        ld      a,b
216c e6c0      and     0c0h             ; get the 16K bank to page
216e f601      or      01h              ; OR in bank active bit
2170 013b12    ld      bc,123bh         ; bitmap register
2173 ed79      out     (c),a            ; map bank into memory perhaps?
2175 21005b    ld      hl,5b00h         ; $5b00 src image chunk
2178 010001    ld      bc,0100h         ; 256 byte copy (one line)
217b edb0      ldir                     ; copy up  (de=dest)
217d 013b12    ld      bc,123bh         ; bitmap register
2180 ed69      out     (c),l            ; l=0, disable current bank and screen
2182 c9        ret     

BitMap:
2183 00        nop                      ; file is loaded into here....

Friday, January 01, 2016

Why create a GameMaker export to the Raspberry Pi?

Okay, these are very much personal opinions...

So I've been asked a couple of times why bother? It's a tiny machine with (probably) zero chance in making money from it - so why?

First, the Raspberry Pi is an amazing thing, Eben Upton and his various partners have done a phenomenal job getting it out in the first place, never mind the subsequent work of getting the Pi Zero - which is beyond amazing!

What I love about the Pi more than anything else, is that you no longer have to decide to take out a second mortgage on your home to give your kid a computer in their room! What's more, you no longer have to worry about them breaking it. At $25 you can just get another one. With the Pi Zero, this is and even simpler decision. Having a computer in my bedroom when I was growing up was life changing. Without that, I'd never be where I am now.

Of course.... there were other factors that helped. First - content. Without games to play, I'd have grown bored very quickly, and my ZX81 would have ended up in a drawer. What you need to inspire development is inspirational content. You want to play games, use programs and hardware and think - "That is SO cool! I want to do that!" This is what drove me. We used to play the simple games available, type in programs from magazines, and spend hours trying to do stuff ourselves.

Put simply, this is what I want to help achieve. With GameMaker, we have access to users who can provide these inspirational games, games that you want to play over and over, games that make you want to try and create stuff yourself. Sure, there are limits. This isn't a Quad core i7 with 16 Gigs of ram and enough HD space to store most of the internet on. But....games don't have to be full 3D ultra-realistic to be good. Vlambeer have some amazing games, Locomalito has made some astounding games - ALL FREE! All they need (in theory) is a port of the GameMaker runner to help make it happen.

So... that's my reason number one. It's how I started, and if I can help others do the same, I'm all for that.

Second - Education. Not so much for game creation itself, but for Electronics. Currently most folk seem to use Python to do this, and to be frank - python sucks. Even with simple GPIO commands I've been having great fun with it. Remote development is ideal for this kind of thing, because when things go wrong and the Pi crashes (as it inevitably will), you can simply reset while your entire dev environment is safe and sound on a separate machine. At some point I'll get the remote debugger working as well (It almost works now), and that means you'll be able to step through your code and development will be even easier.

On top of this there is an emerging home brew arcade scene. Folk like Locomalito take their games and stick them in PCs inside arcade machines - like a mame cabinet does, and then run their games like in the old days, complete with arcade buttons and joysticks. These days, they appear to be moving more towards using the Raspberry Pi - it's cheap, has lots of easy to use pins for interfacing and if it blows up - meh, they can buy another one. The runner port will allow games to be run on the machine, while the GPIO commands I've added will help them interface with buttons and joysticks easily.

Lastly... a word about the Pi Zero. I can't begin to say how much I'm impressed they made this happen. It's such an incredible break through, They even gave one away on the cover of a magazine! I'll let that sink in...

They gave away a whole computer on the cover of a magazine.

Wow.....It has the same GPIO pins as the PI, and while only a single core, it is a 1Ghz core. This means you could make a case for it with a 3D printer, get some very cheap SD cards, and then sell your game as a plugin console - probably for $20 or so. That astounds me. Even without the case etc. you could sell you game - complete with computer, for less than $10. You can buy 50 micro 4Gb micro SD cards for £1.36 each, add that to the price of the Pi ($5) and you're still going to make a profit at $10. Wow.

Before the Pi came out, we tried to get GameMaker 7 actually running ON the machine. We almost got there, but ran out of time. We were doing it for free because we believed in what they were doing, but commercial pressures reasserted and we had to move on. Long term, I'd love to see a GameMaker IDE of some kind running ON the Raspberry Pi. It would be an amazing thing to behold, and would make learning so much more fun.

..........one day perhaps.....one day.

Friday, October 17, 2014

Why aiming for 60fps or full 1080p is a pointless goal.

So yet again on the web I've seen posts by journalists that 1080p is worthwhile, and that developers should be spending all their time trying to make full 1080p/60fps games, and yet again, it gets my back up...

A few years back I did a post about the mass market, which was an article I did way back in 2004 ( A new direction in gaming), and it seems to me this is the same thing, coming up again. I mean, seriously, who do you want to buy your games, a handful of journalists, or millions of gamers?

Making a full 1080p/60fps game can be hard - depending on the game of course, but ask professional developers just how much time and effort they spend making a game run at 60fps, while keeping the detail level up, and they'll tell you it's incredibly tricky. You end up leaving a lot of performance on the table because you can't let busy sections - explosions, multiplayer, or just a load of baddies appearing at the wrong time, slow down the game even a fraction, because everyone - no matter who they are, will notice a stutter in movement and gameplay. And that's the crux of the problem. Even a simple driving game, where it's all pretty much on rails, so you know poly counts and the like, has to leave a lot on the table, otherwise if there is a pileup, and there are more cars, particles and debris spread around that you thought there would be, the it'll all slow down, and will look utterly horrible.

Take a simple example. You’re in a race with 5 other cars, cars are in single file, spread out over the track - as the developer expected it to be. You’re in the lead with a clear track. So let’s say the developer had accounted for this, and are using 90% of the machines power for most of the track, and on certain areas, they reduce this to account for more cars and effects - like the start line for example. But suddenly, you lose control, and spin, the car crashes and bits are spread everywhere along with lots of particles. Now the other cars catch up, hit you, and it gets messy. Suddenly, that 10% spare just wasn't enough. it needs a good 30-40% to account for particle mayhem, and damage, and the game slows down. As a gamer, it’s not dropped from 60 to 30 - or perhaps even lower depending on how many cars are in the race (like an F1 race for example). Now, 30fps isn't terrible, and even 20fps would be fine - probably, but the thing is.... the player has experienced the 60fps responsiveness and now its suddenly not handling the same way, and they notice.

The problem is people notice change, even if they don't understand the technical effects or reasons behind it. So going from 60 to 30 will be noticed by everyone, even when you compensate for it. It is much harder to notice going from 30 to 20 when there is frame rate correction going on, but many can still "feel it".

So, if I'm saying you shouldn't do 60fps or full 1080p (if the game struggles to handle it), what should you do? Well, what people really notice, isn't smooth graphics, but smooth input. Games have to be responsive, and it's the lag on controls that everyone really notices. If you move left, and it takes a few frames for that to register - everyone cares, not just hard core gamers. But if a game responds smoothly, then if you’re at 60 or 30 - or even 20! then gamers are much more forgiving. This is mainly because once you're in the thick of the action, they are concentrating on the game, not the pretty graphics or effects. I can prove this too, even to hard core developers/gamers. Years ago when projectors first came out they were made with reasonably low res LCD panels, and you got what was called the "Screen Door" effect. Pixels - when projected, didn't sit right next to each other, they were spaces between them. Now when you started a film, you'd see this and you would end up seeing nothing BUT the spaces. However as soon as you got into watching and more importantly, enjoying the film, that went away, you never saw the flaws in the display, because you were concentrating on the content.

The same is true of games, sure you power up a game and ooo...look, it’s super high res, and ever so smooth! But 10 minutes later, your deep in the game and couldn't give a crap about the extra pixels, or slightly smoother game, all you care about is the responsiveness of it all. If it moves when you tell it to, your happy.

So what about the 1080p aspect? Well years ago when 1080p started to become the norm, and shops had both 720p and 1080p large screen TVs in store, I happened to be in one, where they had the same model, but one 1080p, and on 720p hanging right next to each other, playing the same demo. I went right up to them, and could still hardly tell the difference. These were 2 50" displays, yet with my noise almost touching them, it was hard to tell where the extra pixels were. Now, if you have a 1080p projector, and are viewing on a 2 to 3-meter-wide screen, I'm pretty sure you'll notice, but on a TV? when your running/driving at high speed - no chance. Every article about resolution in games shows stills, and this is the worst thing to use, as that's not how you play games. It's also worth remembering, that at that moment in time your noting playing, so you’re not concentrating on the game, you’re doing nothing but search for extra pixels, so it's yet another con really.

So what’s the benefit to running slower, at a slightly reduced resolution? Well, 1080p at 30fps means you can draw at least twice as much. That's a LOT of extra graphics you can suddenly draw, and even if you don't need/want to draw that much more, if makes your coding life MUCH simpler. You no longer have to struggle to maintain the frame rate, or worry about when there is a sudden increase in particles or damage - or just more characters on screen.

What about resolution? Well 720p is still really high-res. There are still folk getting HD TV's and watching standard definition TV thinking it's so much sharper than it used to be! The mass market usually doesn't "get" what higher res is until they see it side by side, and once things are moving and they are concentrating on other things, they will neither know, or care.

At 720p, 30fps you can draw over 4 times as much 1080p, 60fps. That's a LOT of extra time and drawing. Imagine how much more detailed you could make a scene with that amount of extra geometry or pixel throughput. Even at 3 times, you could leave yourself a massive amount of spare bandwidth to handle effects and keep gameplay smooth.

2D games "probably" won't suffer too much from this problem, although on mobile web or consoles like the OUYA they will as the chips/tech themselves just isn't very quick.

So rather than spend all your time with tech and struggle to maintain a frame rate most gamers won't notice, shouldn't you spend all your time making a great game instead? If you're depending on pretty pixels to make your game enjoyable, it probably isn't a great game to start with, and you should really fix that.

Games with pretty pixels sell a few copies, truly great games sell far more, and while it's true that games with both will sell even more, the first rule of game development is that only games you release will sell anything at all, and while playing with tech is great fun, constant playing/tuning doesn't ship the game.

Wednesday, September 17, 2014

When Scotland changes forever.....

In a world of history, a truly historic event is rare, but that's what Scotland gets tomorrow. On the 18th of September 2014, we the people of Scotland, get to vote on Independence. We get to decide whether we break away from the United Kingdom, or whether we stay. But either way, things look like they'll change, for the worse or better is anyone's guess.

The Stay together campaign is promising loads of new powers from Westminster, and that the UK will stay strong, in both military and monetary might. They say everything a Yes vote promises is dangerous, and too big a risk to gamble on. Perhaps their right.... perhaps.

Or perhaps not. The Yes campaign obviously states the exact opposite, that staying will continue the downward spiral of Scotland, and more and more power and money going to Westminster. They will continue to take Scotland's Oil and squander it, while giving little or nothing back. They also say the so-called powers that are on offer, will never appear and not to trust them.

A couple of years ago, I started out firmly in the No camp, but over time came over to be a resounding Yes. So what changed my mind?

Well, at the start I was concerned with things like defence, and thought that being an island nation, we can defend our borders better together, and that leaving would make travel ultra complicated - at least for a few years.

But over time I realised that just like with other nations, partnerships could be done for defence - and probably will be, while travel disruption will actually be fairly minimal as I'm still on a UK passport - that isn't going to change.

The other thing I came to realise was that Scotland is just as capable of running her affairs as Westminster, after all there have been both Scottish Prime Ministers, and Scottish Chancellors. We're a smaller country with amazing natural resources, something Westminster really want to keep a hold of, and I do wonder if it wasn't for Oil and renewable energies (not to mention some place to keep their nuclear arsenal), would they care so much?

I also don't trust the current promise of new powers, because while the "current" leaders may indeed be sincere, there is no way they can promise. Any set of powers MUST go through current parliamentary procedures and be approved by both the House of Parliament, and probably the house of Lords. Back bencher's of all parties - not to mention some grass roots members, have expressed distress - and no small disgust, at additional powers being promised without proper discussion and review. So chances are, whatever comes - IF they come at all, won't be what they are promising, nor what No voters are expecting.

The ironic thing is, that the SNP wanted "Devo-Max" (which is what the offer of new powers is) was on the ballot paper as an offer, but Westminster refused, saying if MUST be a yes/no only. If it was on, I suspect it would have been an easy victory for No/DevoMax folk.

There's also the argument of Money, and what currency to use. I'm indifferent to this at best. We can of course use the English pound, or we could use the Euro or even just make our own. I don't really care either way. I suspect it'll be a few years uncertainty, and then stabilise. Scotland is, or would be, a wealthy nation. Oil rich, great renewable energy, and we export more than we import.

Why does the Better Together campaign think Scotland would struggle? What other Oil rich country struggles? And this isn't even considering the renewables. Scotland is currently on track to have 100% of it's power generated by renewable energy - we're already at 40%.

Not that I actually think we'll win to be honest. While there have been huge - HUGE Yes rally's, I also think those voting No are more the kind to just watch, stay at home... then vote No. I know this because many of my older relatives are no's, and they are exactly like this. On top of this there is also the undecideds, and they may well swing it either way.

But the thing I'm truly excited about, is the level of engagement on this vote. Unlike a general election where the public believes their vote counts for nothing - especially in Scotland, everyone here feels their single vote could swing things, so they HAVE to take part. It's amazing. There has been a 97% registration rate - unheard of in any election, and they are expecting 80-90% of these to actually vote. That's astounding.

So, yes.... I'm looking forward to tomorrow, to take the leap and cast my vote, and while I don't "think" we'll win - I'm certainly not sure, and I'm excited to hear the voice of Scotland - virtually ALL of Scotland.

So I'll vote Yes, and I'll hope everyone else wants to take that leap of faith with me, believe that Scotland is best suited to govern Scotland, because while it might be a tricky road ahead, nothing truly worthwhile is ever easy, and this is something worth taking that leap of faith for, not just for me, but my kids and my kids kids.

Wish us luck, no matter how the vote goes!!

Sunday, August 24, 2014

The beginners guide to Ray Tracing

So, I was asked recently on Twitter how Ray Tracing works, but it's not something that you can easily explain in 140 characters, so I thought I'd do a small post to explain it better.

So... the basic idea in rendering a scene in graphics, is to have a near and far place, and the corners of these two planes defines a "box" in 3D space, and that's what you're going to render - everything in the box". Orthographic projections use a straight "box", where the far plane is the same size as the near plane, whereas perspective uses a far plane that is bigger than the near one - meaning more appears in the distance than closer to you (as you would expect).

Doing Ray Tracing, is no different. First create a box aligned with the Z axis, so the planes run along the "X" axis. something like.... (values not real/worked out!)

Near plane -100,-100,0 (top left) 100,100,0 (bottom right)

Far plane -500,-500,200 (top left) 500,500,200 (bottom right)

This defines the perspective view we're going to render.

Now, how you "step" across the near plane, determines your resolution. So, let's say we're going to render a 320x200 view, then we'd step 320 times across the X axis, while we step 200 times down the Y.
This means we have 320 steps from -100,-100,0 to 100,-100,0 (a step of 0.625 on X). And you then do the same on Y (-100 to 100 giving a DeltaY of 1.0). With me so far?

So, this image shows the basics of what I'm talking about. The small rectangle at the front is the near plane, and the large one at the back; the far plane. Everything inside the white box in 3D is what we consider as part of our view. If we wanted to make a small 10x10 ray traced image, then we'd step along the top 10 times - as shown by the red lines. So we have the 8 steps and the edges, giving us 10 rays to cast.

So lets go back to our 320x200 image, and consider some actual code for this. (this code isn't tested, it's just pseudo code)

float fx,fy,fdx,fdy;
float nz = 0.0f;
float fz = 200.0f;
fy = -500.0f;
fdy = 2.0f;
for(float ny= -100.0; ny<100;nydata-blogger-escaped-a="" data-blogger-escaped-colour="CastRay(nx,ny,nz," data-blogger-escaped-f="" data-blogger-escaped-fdx="" data-blogger-escaped-fdy="" data-blogger-escaped-float="" data-blogger-escaped-for="" data-blogger-escaped-fx="" data-blogger-escaped-fy="" data-blogger-escaped-fz="" data-blogger-escaped-nbsp="" data-blogger-escaped-nx="-100.0;nx<100;nx" data-blogger-escaped-ny="" data-blogger-escaped-pixel.="" data-blogger-escaped-pre="" data-blogger-escaped-raytrace="" data-blogger-escaped-vec4="">

NOTE: Sorry - Blogspot's editor has utterly nuked the code. I'll fix it later....

As you can see, it's a pretty simple loop really. This gets a little more complicated once the view is rotated, as each X,Y,Z has it's own delta, but it's still much the same, with an outer loop of X and Y dictating the final resolution. In fact, at the point the outer X and Y loops are usually just from 0 to 320, and 0 to 200 (or whatever the resolution is).

Assuming we have a background colour, then the CastRay() would currently return only the background colour, as we have nothing in the scene! And when you get right down to it, this is the core of the what ray tracing is. Its not that complicated to define a view, and start to render it. The tricky stuff comes when you start trying to "hit" things with your rays.

Lets assume we have a single sphere in this scene, then the cast ray function will check a collision from the line nx,ny,nz to fx,fy,fz with our sphere (which is a position and radius). This is basic maths stuff now - Ray to Sphere collision. There's a reasonable example of ray to sphere HERE.

So once you have that code in place, and you decide you've hit your sphere, you'll return the colour of your sphere. Pretty easy so far. So what if we add another sphere into the scene? Well, then the ray cast will now simply loop through everything in the scene no matter what we add; spheres, planes, triangles, whatever.

Now we have a scene with lots of things being checked and hit, and we return a single colour for the CLOSEST collision point (giving is the nearest thing we've hit). This is basically our Z-Buffer! I means things can intersect without us having to do anything else. The closest distance down the ray to the intersection point, is the nearest thing to us, so that's the one we see first.

The next simplest thing we might try is reflections. In realtime graphics, this is pretty hard, but in Ray Tracing, it's really not. Lets say we have 2 spheres, and the ray we cast hits the 1st one, well all we now need to do is calculate the "bounce", that is the new vector that we'd get from calculating the incoming ray, and how it hits and bounces off the sphere. Once we have this, then all we do is call cast ray again, and it'll fly off an either hit nothing - or something. If it hits something, we return the colour and this is then merged into the initial collision and returned. Now, how you merge the colour, is basically how shiny the thing is! So if its VERY shiny, then a simple LERP with a 1.0 on the reflected colour would give a perfect reflection, where as if it's 0.5, then you would get a mix of the first and second spheres, and if its 1.0 on the original sphere, then you would only get the original sphere colour (and would probably never do the bounce in the 1st place!)

So the number of "bounces" you allow, is basically the level of recursion you do for reflections. Do you only want to "see" one thing in the reflection, then one bounce it is. DO you want to see a shiny object, in another shiny object? Then up the recursion count! This is one of the places Ray Tracing starts to hurt, as you can increase this number massively and doing so will also increase the number of rays being case, and the amount of CPU time required.

We should now have a shiny scene with a few spheres all reflecting off each other. What about Texture Mapping? Well, this is also pretty simple. When a collision occurs, it'll also give you a "point" of collision, and now all you need to do is project that into the texture space and then return the pixel colour at that point. Hay presto - texturing. This is very simple on say a plane, but gets harder as you start to do triangles, or spheres - or displacement mapping, or bezier patches!

For things like glass and water, you follow the same rule as reflection, except the "bounce" is a refract calculation. So instead of coming off, the ray bends a little as it carries on through the primitive you've hit. Aside from that, it's pretty much the same.

Now... lighting. Again, in realtime graphics, this is tricky, but in Ray Tracing it's again pretty easy. So lets say you've just hit something with your first ray, all you now do is take this point, and create a new ray to your light source, and then cast again. If this new ray hits anything, it means there's something in the way, and your pixel is in shadow. You'll do this for all light sources you want to consider, and merge the shadows together, blending with the surface pixel you hit in the first place, making it darker. And that's all lighting is (at a basic level). Area lights. spot lights and the rest get more interesting, but the basics are the same. If you can't see the light, your in shadow.

So how do you do all this in realtime? Well, now it gets fun! The simple view is to test with everything! And this will work just fine, but it'll be slow. The 1st optimisation is to only test with things in your view frustum. This will exclude a massive amount - and even a simple test to make sure something is even close to it, will help. After this, it gets tricky. A BSP style approach will help, and only test regions that the RAY itself intersects. This again will exclude lots, and help speed things up hugely.

(no idea whose image this is BTW)

After this...well, this is why we have large Graphics conferences like SigGraph, professionals from the likes of Pixar and other studios and universities spend long hours trying to make this stuff better and faster, so reading up on ideas from these people would be a good idea.

Lastly.... There is also some serious benefit to threading here. Parallelisation is a big win in Ray Tracing as each pixel you process is (usually) totally contained, meaning the more CPU cores you have, the faster you'll go! Or, if your really adventurous you can stick it all on the GPU and benefit from some serious parallelisation!!

Wednesday, January 01, 2014

Improving the runner

So, I thought it might be fun to look back at some of the changes we've made to the runner, why, and what impact it's had on performance. Back when we started, the main complaint with Game Maker was that it was too slow, and that for more complex games, it just didn't cut it. This is clearly not the case any more, so what's really changed? Well, this only really effects the c++ runner, but lets take a look see what we did...

WAD

When we started on the PSP port, the 1st thing we did was to remove all the compression from the output. The game was in fact doubly compressed (which is nuts BTW - it's not like it's going to get magically smaller again!), and on the PSP this lead to loading times of around 40 seconds to a minute is even very simple games. So, we changed the whole output, altering it into an IFF based WAD file, and this meant it could be loaded incredibly quickly, and was ready to go as soon as it loaded. This was a huge change code wise, and touched every element of Game Maker. However, this change alone meant games would load either instantly, on in a few seconds.

Virtual Machine

Game Maker was originally split into 2 parts, scripting, and D&D (Drag and Drop), and believe it or not, there were to totally separate code paths for each. On top of this, everything was compiled on load, meaning there was another noticeable pause when games loaded. So, first thing that was done, was we changed the script engine to use a virtual machine, this meant we could optimise small paths of code rather than huge chunks - twice. This gave a modest boost - not as much as we hoped, but this was due to the way GameMaker dealt with it's variables, and how things were cast, and we were stuck with that. The next thing we did was to remove the D&D path, and then compile all D&D actions into script so they could then be compiled in the same manner. Lastly, we pre-compiled everything. This removed the script from the final game - and finally removing the worry of simple decompilation, but it also sped the loading of a game up yet again. It was not at the point, where a windows game would load and run almost instantly.

Rendering

There's quite a number of changes here, so I'll break it down a little further. But basically, the old Delphi runner had some major CPU->GPU stalls, and so we had to look into fixing them.

Hardware T&L

This was introduced on GM8.1, and it's was the first real boost for performance. All later versions of Studio also use this "Hardware Transform and Lighting" mode. By using this, the GPU does all the transformations, and leaves the CPU free for running the game. Up until we switched it, after you submitted a sprite, DirectX then used the CPU to transform all the vertices into screen space before passing them onto the GPU - this places severe limits on how much you can push though the graphics pipeline, so it was changed on the first version of GM8.1 we released.

DirectX 9

The first thing we did was to upgrade to DX9, this meant we were on a version which Microsoft still maintained properly, and gave us access to more interesting tools down the line - like better shaders. While doing this, we also introduced a proper fullscreen mode, and while this may not seem like much, when in fullscreen mode, things do render slightly quicker. Also...DX9 is just a little faster than DX8, and the upgrade is a pretty simple process (takes time, but its simple enough).

Texture pages

One of the biggest problems with Game Maker, was that every sprite, every background was given it's own texture page, and while this might seem like a good idea as it means you can easily load things in, and throw them away, it utterly destroys any real performance. So, we set about putting all images on texture pages (TPages), but this was more complex as Game Maker allows non-power of 2 images.

Standard methods involved sub-dividing images down, splitting them in 2 each time until you allocate the sprite, and fill up the whole sheet. This is very fast, and very efficient. However... we needed something more. So what we did was create a multi-stream, virtual rectangle system which would clip to everything as you added any sprite of any size. I came up with this years ago, and it was incredibly powerful, and very fast - but it was recursive, and complex. Still, having written it before, I did it again and we had a very nice texture packer for any size of image.

Next we noticed that current Game Maker developers were incredibly wasteful of images. They would have huge sprites with nothing but space in them, so they could be aligned easily with screen elements. A cool fix for this, was to crop out anything which had a 0 alpha value. You basically search the sprite for a bounding box that isn't 0 alpha, and then crop it, and remember the all the offsets to enable you to draw it again as normal. This is a cool trick, that helps us pack even more onto a page,which then helps with batching as well.

Batching

Speaking of batching.... Since the old version changed texture all the time, it could never hope to batch more than a couple of triangles at once, and this meant the CPU was forever getting in the way. Modern graphics hardware wants you to send as much as you can in one shot, then to go away and leave it alone!

So, the first thing we did was go through and change all the primitive types. The old way (TriFan) is unbatchable, and so you would always be stuck with 2 triangles per batch, and that's useless. So we changed them to the most batchable prim we could use, TriList. I wrote a new vertex manager that you could allocate vertices from, and it detected if there was any changes, or breaking in batches, so all the engine had to do, was draw, and the underlying API would auto-batch it for us.

Couple this with the new texture pages, and if you were just drawing sprites, you had a good chance of actually drawing multiple sprites in one render call. In fact, many games (like They Need to be Fed) can render virtually the whole game in one render call depending on the size of the texture page.

There are still some things which can break the batching (obviously), but with careful set up, rendering sprites is no longer the limiting factor of your game.

Blend States

Blend states are always tricky, and in the past I've tended to using a layering system - something that might appear in later versions of Studio or more likely GM: Next, as it helps manage state changes much better. However, we got a shock early in the year when we discovered how GM users actually manage state, and so we worked hard to hide the pain of management from our users.

Now a user can simply set state and reset state as they see fit, and aside from the GML call overhead, we handle all the blend state batching behind the scenes. For example, if you set a blend state, draw something, reset it, then set it again, draw something and then reset it, we recognise this, and both items will be drawn together in a single batch, making the rendering code much more efficient without the developer ever having to care.

It'll also handle more extreme cases, where you set blends several times, then set a different one, and then set more blends several times. The engine will also recognise this, and will only submit 3 batches - or however many blend state batches are actually needed. This is incredibly powerful, and a major playing in rendering performance.

Static Models

In the past, whenever you created a D3D model, it didn't actually create anything, it just remembered the prims and commands you sent, then replayed them back dynamically. This meant 3D rendering was terrible, and you would struggle to get above a few thousand polys.

We actually introduce this new concept in GM8.1 although it was more of a pet project for me at the time. Now its a staple of how you should work. Studio will now let you build fully static models, meaning that when your finished creating them, you'll be left with a single vertex buffer (or multiple depending on lines and points), that will simply be submitted to the hardware when you want to draw something.

This is basically as fast as it's possible to draw on modern hardware, you create a buffer full of vertices and submit them but simply pointing to the buffer. This is what all modern games do, its what engines like Unity does. With this addition, you can quite literally draw models with hundreds of thousands of polys at hundreds of frames per second (depending on your graphics card of course!)

Shaders

Although a recent addition, this is a powerful performance tool. Up until this point, special effects would be done on surfaces, then processed by the CPU, most of the time, painfully so! But with all the previous optimisations in place, shaders were finally able to get their place and show you what the GPU was really able to do.

From effects such as shadows, to environment mapping, to more 2D effects such as outlines, colour remapping or even full shader tilemaps all meant that the CPU got more time to work on the game, not spend time trying to make things look pretty, especially when there was a much better way of doing it sitting there doing nothing.

string_execute()

Yeah... this old one. Well, this was a popular command, mainly because it let you be really lazy in the way you coded, and yeah.... it's not a terrible thing, but it was one of the worse performance hogs, and folk using Game Maker had come to rely on it, meaning it was pulling performance down for everyone. So why was it so bad? Well, unlike code when you executed it it had to compile the string every time before running it, and this was incredibly bad. Lets have a quick look at what's involved when compiling.

Lexical Analysis. First it had to scan the string, break it down into tokens allocate memory for it and store a parse tree. Even a very simple string like "room_goto(MenuScreen)" means we had to parse the string, looking at each character, and breaking it into several tokens. room_goto command token, find and verify the syntax of the command, and evaluate the contents of the brackets - in this case; MenuScreen, which is also stored as a token. These are then stored as a parse tree, so that equations and evaluations are calculated correctly. So if you gave a calculation like "A = instance3.x+((instance.B*12)/instance2.C)", it would be evaluated correctly when executed.

Compilation. This then transforms the parse tree into byte code. Previously, it was transfered into a stream of tokens, and the engine just run these, but with the virtual machine, we now compile into byte code. This memory also has to be allocated.

Running. Now the new script can be run. First it sets up a code environment (again, more code allocation), complete with the current instance so it can store things correctly, then the VM takes over and runs the code.

Freeing. This is the real crime. Once all this work is done, and the script has been run.... it's all thrown away. All the different memory blocks are freed, and when you try and run it again, it has to do this all again.

Knowing this... it's no wonder this command was such a hog. And file_execute was even worse, as it had to load the thing first, before doing any of this - and disk access is ALWAYS slow. So this is why this command was removed. You should never need it if you code properly anyway. It can be great for tools, but when everyone just sees it as a command they can use....well, they just blame Game Maker for not being quick enough so it had to go.

Conclusion

Now, there are obviously countless other optimisations in the runner, from ds_grid get/set being made much quicker, to the total rewrite of the ds_map, allowing for huge, lightning fast dictionary lookups, or from the way variables are now handled so that the VM can access them much quicker, or how YYC now translates everything into C++ so it can be natively compiled and thereby lifting the what little lid there was left.

And don't forget, we're a multi-platform engine, and each platform gets the same kind of treatment as the windows one does. Each platform has it's own quirks, it's own set of performance enhancing toolset, like for example the total rewrite of the HTML5 WebGL engine, that does things in a totally different way just so it can push things a little better for that platform, or the windows RT platform, that has a custom DX11 engine design and optimised just for RT devices. Each platform is taken on its own merits, and enhanced to best fit that device.

But no matter how you look at it, GameMaker: Studio has move well beyond the mere learning tool it was once designed to be, it's no longer only use by beginners, or bedroom coders looking for some fun, it's now used by professional developers around the world because it delivers, fast, cross platform development, with a truly fast engine, that's always being improved.