I've been wondering just how much faster the SuperCPU actually is to a stock C64, and aside from the x20 jump you get from the raw clock speed, the new instructions and 16bit nature give you an even bigger boost - Alomst another x2! Heres a little example....

The scrolling in XeO3 takes a long time, every game cycle I do

*this*:

ldx #39

ScrollLoop

lda BackBuffer,x

sta HWScreen2+$400+(40*00),x

lda BackBuffer,x

sta HWScreen2+$400+(40*01),x

lda BackBuffer,x

sta HWScreen2+$400+(40*02),x

lda BackBuffer,x

sta HWScreen2+$400+(40*03),x

lda BackBuffer,x

sta HWScreen2+$400+(40*04),x

lda BackBuffer,x

sta HWScreen2+$400+(40*05),x

lda BackBuffer,x

sta HWScreen2+$400+(40*06),x

lda BackBuffer,x

sta HWScreen2+$400+(40*07),x

lda BackBuffer,x

sta HWScreen2+$400+(40*08),x

lda BackBuffer,x

sta HWScreen2+$400+(40*09),x

lda BackBuffer,x

sta HWScreen2+$400+(40*10),x

lda BackBuffer,x

sta HWScreen2+$400+(40*11),x

lda BackBuffer,x

sta HWScreen2+$400+(40*12),x

lda BackBuffer,x

sta HWScreen2+$400+(40*13),x

lda BackBuffer,x

sta HWScreen2+$400+(40*14),x

lda BackBuffer,x

sta HWScreen2+$400+(40*15),x

lda BackBuffer,x

sta HWScreen2+$400+(40*16),x

lda BackBuffer,x

sta HWScreen2+$400+(40*17),x

lda BackBuffer,x

sta HWScreen2+$400+(40*18),x

lda BackBuffer,x

sta HWScreen2+$400+(40*19),x

lda BackBuffer,x

sta HWScreen2+$400+(40*20),x

dex

jpl ScrollLoop1

rts

This code is self-modified to address the new location of the back buffer, and I have to use a jpl (macro) since a normal branch is just out of reach, so this takes (40*21*9)+(40*7) = 7840 cycles. (this is approx as there are also page boundary crossings hidden in here.)

Now in 65816, I can do exactly the same but being 16 bit, the loop is half, and although we add a couple more cycles for LDA/STA, its still much quicker. So the loop is now (20*21*11)+(40*7) = 4900 cycles.

And now lastly, the 65816 has a block transfer instruction

**MVN**+

**MVP** which are like Z80's

**LDIR** instruction, which means (BEST case) its now (20*21*7) = 2940 cycles. Now, although the block transfer would be broken up a little mode (to do lines mainly), its still only going to be around 3000. So not only is more than twice the speed as the 6502 version, but we have the new 20Mhz clock as well.

..............Bitmap blitting suddenly becomes

*REALLY* interesting!!