The life of a Games Programmer: C#

Showing posts with label C#. Show all posts

Wednesday, March 18, 2009

C# JIT

I was speaking the other day about the C# JIT and how it sucks sometimes... well, heres some real code to prove the point...

            m_ArrayAccess[0].m_Pos.Y = 2.0f;
00000022  lea         edx,[ecx+8] 
00000025  cmp         byte ptr [edx],al 
00000027  mov         dword ptr [edx+4],40000000h 
            m_ArrayAccess[0].m_Pos.Z = 3.0f;
0000002e  lea         edx,[ecx+8] 
00000031  cmp         ecx,dword ptr [edx] 
00000033  cmp         byte ptr [edx],al 
00000035  mov         dword ptr [edx+8],40400000h 
            m_ArrayAccess[0].m_Pos.W = 4.25f;
0000003c  lea         edx,[ecx+8] 
0000003f  cmp         ecx,dword ptr [edx] 
00000041  cmp         byte ptr [edx],al 
00000043  mov         dword ptr [edx+0Ch],40880000h

As you can see it not only continually reloads the base address of the array+type, but appears to insert pointless CMP instructions all over the place. The only reason I can figure for this is to attempt to prefetch the destination; however since its about to access it on the next instruction, this is pointless - Not only that but since its sequential chances are its in the cache alrady! And even is by some mirricle that it DID matter, why the hell is it doing it TWICE!!!

Man I find that annoying....

Tuesday, March 17, 2009

More C# woes....

The more I look into C# the more I wonder how it manages to run as fast as it does. Even the simplest things it appears to double up on code, put in random meaningless opcodes etc. For example a simpel series of load/store operators like below resolves itself into a reasonable ugly mess of asm...

   Vector.X = InVec.X;
   Vector.Y = InVec.Y;
   Vector.Z = InVec.Z;

The problem is that it simply can't track registers properly and that means it has to continually reload the base address of the object which ends up making the code twice the size.

It also insists on loading then storing floats even though its just transfering bits. You can obviously just use integer registers to transfer data and this could pipline much better than series of FPU load/stores. It's frustrating as if this was C++ I could just drop to ASM and do it myself, but in managed code your at the mercy of the JIT.

Now the JIT gets better each itteration but it doesn't appear to get better where it counts sometimes. For normal code, this simply doesn't matter, you lose around 5-10% speed max. But for code that needs to be highly optimal, this can be a real issue.

Sunday, March 15, 2009

C# and Ants

Since it's been a little while I thought I'd tell you what I've been up to lately at home and work. I've been playing with ANT4.0 which is a profiling tool for C# (and .NET in general). The last one was pretty good but had some issues, while the new one is excellent. Profilers can always get better but the leap from the last version is pretty huge. For a start profiling is almost realtime which is very impressive, normally apps crawl when they're being profiled. You can also play with results while the application is running, which is damn impressive!

I bought ANTS3.0 last year and its pretty expensive but I thought that since I do lots of C# at home I would get the use out of it... trouble is, the stuff I do at home simply doesn't need profiled! Not that V4.0 is here I dont really want to spend huge sums of cash on it again, bnut its such a nice app I'm a little torn... Oh well.

Anyway, I've been profiling some of my work and doing some optimisation at home. It's actually been years since I've done any serious optimising and I'm having a bit of fun with it. C# is pretty cool in that you can do real managed code, or unsafe C++ style code if you really want to. When you're optimising you tend to fall back to unmanaged code as its still quicker.

We've also been getting some little shocks at work as some of the C# collections just aren't as quick or optimal as we were expecting, this has meant we've been starting to write our own set just to make sure its doing what we think it should be doing. Games programmers are funny like that, we hate slow code and will happily sacrifce readability for speed (within reason of course). Anyhooo.. I'm having a blast playing with peep-hole optimising, although its hard to say how this will affect the app overall just now, but I'm pretty happy with the main loop now as its very tight.

It reminds me that optimising is great fun, and thats why I got into games in the first place!!

Wednesday, January 14, 2009

Powering up.....

Because I was ill over Christmas, it's actually been a while since I did any real work at home. So I thought I'd make an effort to get over this and actually DO something!

To this end I've resumed the refactoring of RetroEdit and decided to get editing actually working and usable. So now that I think I've actually finished the refactor I was wanting, I've started to write the editing features. I've currently got hires sprites being editing, so I'm about to try MultiColour Mode ones. I'll then do some basic features like scrolling the whole thing around the window, flip etc. then move onto colour editing.

I need to get the selected machine's palette drawn so I can pick colours and then I need to tackle saving. I'll need a PROJECT save, and a binary save. I'll also need to allow plugins so that folk like Russell can save formats he wants to deal with; although he may well end up doing his own editor, but others might need it so...

With a bit of luck, most of these shouldn't take very long, and I can take the core of the sprite editing and move it to the character editing tab. The core concept of this is a special RetroBitmap control. This allows you to deal with retro graphics directly without having to write huge chunks of code over and over - its also a standard control so others could use it in their own projects if they wanted to.

Saturday, December 20, 2008

RetroEdit cleanup

For the lack of any better ideas, I've been cleaning up retro edit into a more consistant state and I've now got the control editing the data in a native format. This means theres no real conversion required when lifting the data out an saving it off - or indeed loading data into it.

I still need to go through the application itself as I've changed my mind on several things. For example while its perfectly valid to allow you to edit HiRes and MultiColour more sprites together I'm not going to allow it. This means you dont have to run through a hundred sprites and switch them all to Hires or MCM, and in reality this would hardly ever happen anyway.

I can now draw a sprite okay so what I really need to do is add colour palette selection to let me pick which colour I want to paint with. Once I've done that I can add the features that make editing fun.

I'll need to extend the editing control to allow for character and bitmap mode because I want to use the same control to do all editing. That said, I'm still not sure what I'll be spending my time on over the Christmas break, but for now I'll carry on with this...

Thursday, December 18, 2008

RetroEdit

I was looking through the source of retro edit and wondering what I'd do to change it, and came to the conclusion that actually, its not as bad as I thought. The only major change I want to do is how the data is stored internally. You see when I started this I decided to store the data in a basic INT[] array, where each 0 or 1 was actually a BIT, and I would then process the data for editing and drawing. However, this is a bit yucky, and I now think it would be better to just store the data in a more native format. Thats not to say I'll store it in boxes, or colums etc. but as a simple X by Y row of bytes. This makes plug-ins to save the data in a custom format easy.

So I think I might have a little play and see if I can reorginise things so they are more to my liking, and this will (I hope) then allow me to use the custom control to actually do real graphics work.

The way I've gone about this is to have a custom control that deals with retro graphics. So you load it up with the data (in a simple rowxcolum format, and you can then select colours and plot pixels. The idea is that once its released (if ever!) then you could use the control yourself outside of retro edit to do other things. I have no idea if this would ever happen, but it might.

So I've decided to give it a few days and see how I get on. If I feel like Im getting somewhere, I'll carry on - if not, I'll bin it.

Wednesday, September 24, 2008

Debugging.

I've been discussing the debugger and trying to gauge how people develop over on lemon64 (HERE!), so if your interested head on over there.

I still think this project has the chance of being invaluable to the retro comunity, particually if I can get emulator writers to pick it up and implement the STUB (which I think/hope should be minimal work for them). This then gives a stable, consistent debugger across the board and should allow developers a great code base for doing tools and other features, not to mention develop new hardware and games.

I'm actively looking for feedback and suggestions on how you develop and what you'd liketo see in a debugger, so feel free to join in the discussion on lemon64.

Friday, September 19, 2008

Alive and kickin'

Well, I appear to be back in the world of the living again. I finally had to take a couple of days off to try and kick this cold and it appears to have worked, so after a reasonably lazy couple of days I'm finally sitting in front of my machine again looking back at the debugger.

When I started this I thought it would be a reasonably small job, and one I could quickly do and get it out there so I could progress with more interesting things, but alas... this was not to be. It's turning into yet another mamoth task, and one thats eating up yet more of my limited free time. Sure, once its all working things will be great! But that still looks like being a long way off for now. I may have to scale my initial release back a little so I can at least release something and not have this as yet another project that never seems to appear.

What I'll probably do is finish the Plus4 debugging so that its pretty good to use, and then do the emulator (TCP/IP) stub for everyone else, then release that. I know I wanted to get the C64 version done first, but that will take ages, and since it requires a real machine hooked up, I can only do it in my work room at home. Once I get TCP/IP support for emulators, I can carry on doing debugging features using my laptop anywhere I like using Minus4 (or my C64/Spectrum emultor for that matter).

I gave Russell my spectrum emulator and he's be rewriting it in C# with the idea of playing with Silverlight, but this means he's doing a Z80 debugger and will use the plug in interface to support his emuator.

So....thats where we are just now, lets hope I can kick things up a gear and do some actual work! Of course...I go away on holiday to Florida in two weeks, so that's gonna get in the way, although I've bought an extra battery for my laptop, so hope to do some stuff on the plane there.

Thursday, September 04, 2008

Go faster stripes....

I've been struggling with replacing the rich text box for a while and almost everything I've tried was much slower, even though I know I could plot those bloody pixels myself quicker that windows ever could. I've tried several things but I've finally gotten a normal windows form to draw without flickering like a bugger and fast enough to be - well, actually usable.

I'm now rendering the dissassembly window using a large bitmap that I create each Paint (although I'll probably change that to be a little more optimal later). Still, at least I can do pretty much anything I want now without having to worry about it. The update is pretty good to - much quicker than the RichTextBox. So, I'll continue to progress with this and clean up the code as its been hacked around quite a bit in the last week.

After that I really need to address the registers as they have to be created, updated and process by the plugin.

So...progress at LAST! I'm now into the middle third of the project, and this is where things seem to move at a crawl and will never get finished. However, I'm now getting things moving again and I hope it picks up pace and I can actually release it to everyone to play with. I really do hope it'll help change development and make emulators a better platform to write on, not to mention making it easier to test on real hardware.

Saturday, August 30, 2008

Status...

I've not done anything this past week as I've been fixing a friends machine (and been knackered to boot!), so I've made no progress from what I reported last time.

However I was thinking.... I got a question a while ago about my tools being runable on Mac/Linux and I said probably not, but that might not be true. You see the thing about C# and .NET is that you can just use it as a C/C++ compiler as C# will happily do whats called unsafe code; thats basically where you use pointers again. Now what this means is I can port SNASM and possibly Minus4 into a C#/.NET environment and get it compiling, and it would then magically work on other platforms!

You see the thing about .NET is that its a virtual state machine, so you compile your code for a mythical CPU. This is then JIT (Just In Time) compiled when the program runs - much like Java is. This means my plain ol' C++ program would run quite happily under Mono on Mac or Linux. If I get a chance, I'll give this a go and see how I get on. Once its in a C# project I can then slowly port it properly to C# without all the pain.

(And yes...I know theres a managed C++ for .NET, but C/C++ sucks and I'd rather use C# these days...)

Saturday, August 23, 2008

More progress...

Dissassembly view with added watch window.

As you can see from the screen shot I've now gotten the watch window up and running properly. I now need to implement breakpoints properly and come up with a better way to redraw the main dissassembly window, rather than use the rich text box.

Oh, I also need to implement the register window properly as its currently processed by the monitor app rather than the CPU module. Since each processor has different register requirements theres no way to have this done generically, so it must be handled by the CPU module. This is a pain, but necessary. Technically speaking, I could implement a simple generic one IF the CPU module doesn't, and this would mean you could get up and running quicker, but I'm not sure yet.....

Lastly, I suddenly realised that we could use RS232 on a 48K spectrum using an interface one as it has a serial port built in. This would be slow (2.4k a second) but would be quick enough for a remote debugger. I expect most work to be done in an emulator, but this would provide a cheap way to get access on a real machine. This will of course be up to Russell whenever he gets around to doing that.

Friday, August 22, 2008

Watch Window.

I've gotten most of the watch window running now, and although its not finished it IS doing the client memory lookup as part of the expression. This involves giving the expression system access to the ICOMMS interface. This is fine as the monitor window owns both the expression and the comms classes.

The theory is you can have multiple monitor windows each with their own ICPU,IComms and IEvaluation objects. This means watches are attached to the CPU window and so can fetch memory from the client using a different comms interface (or the same). This should provide the most flexibility for the future as you should be able to debug multiple CPU machines like the SNES or megadrive which has a main CPU and a seprate sound CPU.

So, for now the expression evaluation will do proper evaluations, and allow you to access memory on the client via the attached comms module like so: [TEMP+4,w] This looks up the label TEMP, adds 4 then looks up the client memory and downloads a word (the ,w bit). This could then be used as a further value into a more progressive expression like this....
[SpriteData+[CurrentSprite,b]*2,w]. Obviously the more remote memory accesses you have the slower it'll go; BUT it will do it.

So, its looking good. I've decided to drop the memory count at the end for now as this means you can ALTER watch values via this window as well - which I wasn't planning. So all in all, its going well.

I plan to release full sources the the following modules-

The ISymbolFile module to load the SNASM symbol tables

The ICPU 6502 CPU module

The ICOMMS parallel port COMMS module

The 6502 STUB for parallel port version of the comms (C64 and Plus4)

The TCP/IP ICOMMS module

The TCP/IP C++ STUB for emulators

I will not be releasing source to the main Monitor program. I do hope to be doing a UDP version for the RR-NET, but that might not be in the first release. There will probably also be a Z80 ICPU module for using with my spectrum emulator which Russell is currently rewriting in C#. He's also really keen to write a remote stub for a real machine somehow, although we'll need an interface for that first. I do have a download cable for an amstrad, so he should be able to write a stub for that in the future as well.

I'm not currently sure what the first version will look like, or what features will be in but I think it'll be the basic monitor with breakpoints etc. Watch and memory window. If theres anything you think it shouldn't be released without, let me know - but remember most key systems are pluggable, so if theres a CPU thats not supported, you can write it yourself! I might add some tool modules so that the ICOMMS can do things like fetch sprite data etc. This would mean you could add a tool that views maps, sprites and all the rest at a later date.

Future additions will include a 65816 module for the superCPU, and probably a 65c02 module as well as a Hu7 module sometime there after. I know these will take time to appear, but feel free to add it yourself! In the far distant future, 68000 will appear as I want to play with the Amiga+ST again and this would be an ideal way....

Be warned though, part of the license agreement to this will state that any new modules WILL be included in the main application package although I've yet to decide if it will require the source code to be relased as well. So even if you dont send me your modules to get included, if I find them, they WILL be. I suspect this won't be an issue for anyone, but be warned. This is only for new ICPU, ICOMMS (and any stub that goes with it) and ISYMBOLTABLE modules, not for any program that uses them. if you have a problem with this, let me know now why that is.

Wednesday, August 20, 2008

Watch me now....

I've been fighting with .NET list views for the watch window and have finally managed to get a watch class being mapped and displayed via the .NET data mapper. The Data mapper is very cool, and lets you map a class directly into a control. It also lets you add new items and will allocate/edit/delete new entries without me lifting a finger! Very neat.

I'll start on the actual watch processing tomorrow, and with any luck that'll be all that I'll need to do to implement basic watches! After that I'll need to get breakpoints working and I suspect the first version will closely follow afterwards.

Monday, August 18, 2008

Coming together....

I've been making steady progress and have now got the dissassembly window scrolling around under user control (without tracing). So you can now use the cursor keys to scroll the window, and the page-up and page-down keys to move quicker. It's going pretty well but I'm not happy with the ricj text box control. It feels slow, and there are a few keys you can't seem to override. This means although it gets rewritten every tick, you can upset things by pressing enter at the wrong time. I think I'm going to have to draw it myself somehow - perhaps even on a bitmap or something....

However that aside, its making good progress, and once I have user breakpoints in I can start looking at memory and watch windows. But the drawing of the display remains a concern....

Saturday, August 16, 2008

Single stepping.

After a long discussion with Russell on Friday about the best way to single step, I decided to try out the new method. It basically involves having the STUB execute a single command inside the STUB itself. Currently I simply place a breakpoint one instruction later, and then run the application; the program runs, hits a BRK then stops again. This new way is friendlier to machines like the ZX Spectrum that has issues (and bugs in the ROM), but its tricky.

Imagine you've stopped the execution of a program, saved all the registers, flags etc. and now want to single step. What you would have to do is copy the instruction you want to trace into a 3 byte slot (thats prefilled with NOP's), restore all registers and flags (also stack) and execute the command. You then resave everything and return back to the debug comms loop. It sounds fairly simple but its not as easy as it appears.


;******************************************************************************
;
; Name:  SingleStep
; Function: Given an address and a byte count, execute a sequence of bytes
;
;******************************************************************************
ExecuteCommand:
             jsr   GetByte             ; Get destination address
             sta   Dest
             jsr   GetByte
             sta   Dest+1
 
             jsr   GetByte             ; get opcode size
             tay
             dey
 
             ; Clear out exec buffer
             lda   #$ea                ; Clear the instruction space 
             sta   ExecBuffer+1        ; in case the instruction isn't 3 bytes long!
             sta   ExecBuffer+2
 
             ; copy 
!lp1         lda   (Dest),y            ; Copy the instruction to execute
             sta   ExecBuffer,y
             dey
             bpl   !lp1
 
             tsx                       ; Save current stack
             stx   StackStore
             ldx   RegSP               ; Restore application stack
             txs
 
             lda   RegF                ; Get the flags ready
             pha
             and   #$4                 ; We must keep interrupts OFF!
             sta   RegF
             pla
             ora   #4
             pha
 
             lda   RegA                ; restore registers
             ldx   RegX
             ldy   RegY
             plp                       ; restore flags
ExecBuffer:
             nop                       ; Command goes here.
             nop
             nop
 
             php                       ; Save flags
             sta   RegA                ; Save registers
             stx   RegX
             sty   RegY
             pla                       ; get the flags and store
             and   #%11111011
             ora   RegF
             sta   RegF
             tsx
             stx   RegSP               ; Store the application stack
 
             ldx   StackStore          ; get the debugger stack back
             txs                       ; and restore it.
             rts
StackStore   db    0

Now you'll also notice you have to do some jiggery-pokery with the flags so you dont reenable interrupts by accident (since we're actually still IN an interrupt), but aside from that theres a lot of stuff involved. The bit I really dont like is that emulators simply wouldn't do this. They would simply set a break point and run. I really don't want to have 2 ways to step through code, I'd like it if the CPU module simply didn't care if it was a real machine or an emulator. Now, the spectrum can still do it with breakpoints, but since the spectrum ROM has bugs in it, its limited in exactly what it CAN do.

We currently think you would need to do breakpoints with CALL's, as the RST instruction needs ROM support and its the part thats bugged. NOW calls take up a few bytes which means you have limits as to where you can place a breakpoint. For example, if you were to branch over an XOR A but wanted to put a breakpoint on the XOR, then the breakpoint would overrun onto the next instruction, and it might crash.

Now for single stepping, that shouldn't happen; particually if the coder is aware of the limits. On other Z80 machines, it should be possible to use the RST instruction to do the breakpoints so they would be fine. I've put breakpoint control on the STUB for a few reasons, but because of this it means each STUB can decide how it wants to do them.

So... after playing around a lot with it (and actually getting it to work), I've decided not to use it, but to go back to the breakpoint idea. I think it has the widest compatibility, and if your really stuck you could still implement you own single step if you modify a CPU module and package it together with a dedicated COMMS module.

I've also started to think about the TCP/IP module for using with emulators (and I'll do a UDP one for the RR-NET later), and I'm starting to get excited about the possibility of it being adopted by other emulator guys. If all goes well, then no one should ever have to write a built in debugger again!

Thursday, August 14, 2008

Lookin' good!

I've finally done enough to make some visible progress. With the addition of the expression evaluation, symbol table lookup and command line processing means I can finally display symbols in the dissassembly view. It's now starting to look like a proper debugger. I need to get left side labels in next (where the addresses are displayed) so you can see where your branching to, then we'll be mostly done in that view.

I still have to add control of the view but thats next. The biggest unknown is the registers, I've no idea yet how to display a generic number of registers or how to lay them out. I'm tempted to let the CPU module display them as well as handle them, but I don't know yet.

Tuesday, August 12, 2008

Good progress!

I've gotten on pretty well tonight! I've managed to finish BOTH the lexical analysis class and the expression evaluation system. Not bad for a nights work. I can now evaluate things like...

((2<<(4*4)+8*4)&$ffff-MySymbol)*4

This means watches can enter calculations to be evaluated, which you can then apply the [...] modifier to so that its a memory lookup on the target. I still have a way to go as theres currently no way to get the evaluation system to look at the targets memory OR registers. However that shouldn't be hard and then things will start to get very cool. Of course the inital reason for this is so theres an interface for the CPU module to lookup symbols for the dissassembly, but through the same interface you'll be able to execute actual calculations.

I also have the ability to add commands to this, like Lo() and Hi(). This means you could do calculations like...

Lo( 2<<(4*4)+8*4 )<<2

This is pretty neat, but it also opens up for more complex functions like ...err... SQRT() or something, which is pretty cool too. The whole expression and lexical stuff is just 2 small C# files, so its pretty portable if I want to do anything else, so this could be handy!

Monday, August 11, 2008

Expressions and Lexical analysis

So I started to look around after yesterdays revamp of the plugins, and realised that to finish the intialisation system I need to pass in an expression evaluation interface. Oh well.... more code to port..... I started to port over SNasms expression stuff but it soon became apparent that I needed a lexical processor first. Now the one in SNasm it pretty fully featured, but a bit much for a command interface, then I suddenly realised that I dont need very much at all. I need to be able to pick out symbols (like *,."$&[]() etc.) and symbols. This means I only need a very simple one as at text I dont understand will be a label (or I can do a quick dictionary lookup to got basic commands).

So I've almost got this basic one going, and its really very simple but will do the job at hand. So I'll finish this tomorrow, and then port the expression stuff the night after and finally I'll be able to get the intialisation stuff done. It also means I can start to load symbol files and get a proper symbolic dissassembly!

Slow going, but we're headed in the right direction.

Sunday, August 10, 2008

Slow going....

I've not been doing much on the debugger as I've been busy with real life, but I did get some time today to mess around. I've been updating the plugin system to make it more flexible and so that I can do some work on the command line arguments system. Currently I can specify which modules I want to use on the command line (along with symbol table loader and symbol file), and the next step is to spawn a monitor window using the selected modules.

I've rearranged all code so that multiple monitor windows are possible, in theory this will allow debugging of secondard CPU modules as long as the debugging stub or interface supports it - although thats a long way off yet.

Thursday, August 07, 2008

Future plans....

There was a question about using the RR-NET instead of the serial port, so I thought I'd answer in depth here.....

Currently I'm debugging via the user port on a Commodore Plus/4 and the parallel port on the PC, but the plan is to then do two C64 versions using both the userport AND the RR-NET (since the RR-NET is far easier to get a hold of). However the user port version would allow you to develop more hardware as you can't always use RR-NET if your trying to develop for a bit of hardware where theres no passthrough, or it's simply not available (The current cart disables the pass through when in use.)

That said, for normal apps and games, I hope the MMC64/RR-NET (and the MMC Replay) will be invaluable for getting things working on real hardware. Theres nothing quite like debugging on a real machine, particually if you are using the MMC64 file loader stuff I did in the framework, as thats currently impossible to develop with via an emulator. Of course someone could also implement a SilverSurfer stub as well - nothing to stop them!

In between these versions (probably at the same time as the RR-NET version), I'll write my first emulator plugin and get Minus4 working using it. This will then be given away for everyone to use. I then hope that new modules will start popping up as all the hard work should be done, and emulators should just be able to plug it in. Once we have some simple CPU types (Z80, 6502 etc.) then all an emulator should have to do is implement the STUB and it'll all magically work.

The idea is to give away the source to the 6502 module, the COMMS modules, all the STUBS that I end up doing, and the Symbol table module. This should allow you to write whatever system you want, without having to implement all the crap around it that makes up the debugger - in theory. It also means you can easily adapt modules for your own use; so the 6502 one could be changed to be a 65816 one, or a 65c02 etc. allowing for new machines like the SNES or PC Engine to be added easily.

Lastly.... if theres ever any real problems with getting a particular machine to work using the interface provided, then all you need to do is put both modules in the same DLL (comms and cpu) then you can add extra interfaces to expand or alter the behaviour. Lets say you couldn't implement single stepping in the way thats assumed you would, then you can add a new comms command that only YOUR cpu module knows about and call it instead.

Thats the theory......