After a long discussion with Russell on Friday about the best way to single step, I decided to try out the new method. It basically involves having the STUB execute a single command inside the STUB itself. Currently I simply place a breakpoint one instruction later, and then run the application; the program runs, hits a BRK then stops again. This new way is friendlier to machines like the ZX Spectrum that has issues (and bugs in the ROM), but its tricky.
Imagine you've stopped the execution of a program, saved all the registers, flags etc. and now want to single step. What you would have to do is copy the instruction you want to trace into a 3 byte slot (thats prefilled with NOP's), restore all registers and flags (also stack) and execute the command. You then resave everything and return back to the debug comms loop. It sounds fairly simple but its not as easy as it appears.
;******************************************************************************
;
; Name:  SingleStep
; Function: Given an address and a byte count, execute a sequence of bytes
;
;******************************************************************************
ExecuteCommand:
             jsr   GetByte             ; Get destination address
             sta   Dest
             jsr   GetByte
             sta   Dest+1
 
             jsr   GetByte             ; get opcode size
             tay
             dey
 
             ; Clear out exec buffer
             lda   #$ea                ; Clear the instruction space 
             sta   ExecBuffer+1        ; in case the instruction isn't 3 bytes long!
             sta   ExecBuffer+2
 
             ; copy 
!lp1         lda   (Dest),y            ; Copy the instruction to execute
             sta   ExecBuffer,y
             dey
             bpl   !lp1
 
             tsx                       ; Save current stack
             stx   StackStore
             ldx   RegSP               ; Restore application stack
             txs
 
             lda   RegF                ; Get the flags ready
             pha
             and   #$4                 ; We must keep interrupts OFF!
             sta   RegF
             pla
             ora   #4
             pha
 
             lda   RegA                ; restore registers
             ldx   RegX
             ldy   RegY
             plp                       ; restore flags
ExecBuffer:
             nop                       ; Command goes here.
             nop
             nop
 
             php                       ; Save flags
             sta   RegA                ; Save registers
             stx   RegX
             sty   RegY
             pla                       ; get the flags and store
             and   #%11111011
             ora   RegF
             sta   RegF
             tsx
             stx   RegSP               ; Store the application stack
 
             ldx   StackStore          ; get the debugger stack back
             txs                       ; and restore it.
             rts
StackStore   db    0
Now you'll also notice you have to do some jiggery-pokery with the flags so you dont reenable interrupts by accident (since we're actually still IN an interrupt), but aside from that theres a lot of stuff involved. The bit I really dont like is that emulators simply wouldn't do this. They would simply set a break point and run. I really don't want to have 2 ways to 
step through code, I'd like it if the CPU module simply didn't care if it was a real machine or an emulator. Now, the spectrum can still do it with breakpoints, but since the spectrum ROM has bugs in it, its limited in exactly what it CAN do. 
We currently think you would need to do breakpoints with CALL's, as the RST instruction needs ROM support and its the part thats bugged. NOW calls take up a few bytes which means you have limits as to where you can place a breakpoint. For example, if you were to branch over an 
XOR A but wanted to put a breakpoint on the XOR, then the breakpoint would overrun onto the next instruction, and it might crash.
Now for single stepping, that shouldn't happen; particually if the coder is aware of the limits. On other Z80 machines, it should be possible to use the RST instruction to do the breakpoints so they would be fine. I've put breakpoint control on the STUB for a few reasons, but because of this it means each STUB can decide how it 
wants to do them.
So... after playing around a lot with it (and actually getting it to work), I've decided not to use it, but to go back to the breakpoint idea. I think it has the widest compatibility, and if your really stuck you could still implement you own single step if you modify a CPU module and package it together with a dedicated COMMS module.
I've also started to think about the TCP/IP module for using with emulators (and I'll do a UDP one for the RR-NET later), and I'm starting to get excited about the possibility of it being adopted by other emulator guys. If all goes well, then no one should ever have to write a built in debugger again!