Proposed Z80 DRC CPU core - CANCELLED
pullmoll
Posts: 817
Without even owning a Propeller yet, I started to design and implement an attempt at a dynamically recompiled code Z80 CPU for the Prop. I thought it may be interesting for others what this means and how I hope to get around problems. Myself I still doubt that it will be possible to get all of Z80 emulated with this approach, so it's more of an exercise to see how to squeeze code in a cog's limited RAM.
The idea of a DRC core is to have pieces of code in the host CPU's native language that emulate each and every opcode of the guest CPU's range. Let's call these code fragments snippets for now.
The kernel code of a DRC core consists of three parts: compiler, patcher, executer.
The compiler translates opcodes of the guest into host language by copying the pieces of code that are listed in a table of 256 longs (yes, there are more tables for prefixed opcodes, but we just ignore that for now).
In my attempt this table consists of 23 bits snippet source address and 9 bits of patch flags. Each snippet of code is preceeded by a long that has the length of the snippet in number of longs. This is done like this:
The first long stored at the compiler's output program counter is the guest's program counter. This long contains 000000 in the instruction field and thus is a NOP for the host. Then the code snippet is copied verbatim to the output program counter locations in ascending order. This is of course done after verifying that this code fragment still fits in the cog RAM.
Next the flag bits of the snippet's long are examined to see if any patches / fixups in the code just compiled are necessary. These patches or fixups are translating immediate bytes and words following the guest CPU's opcode into the corresponding source fields of host CPU opcodes in the compiled snippet.
One flag has the special meaning terminate and it is set on all opcodes that potentially change the order of execution of the guest code, i.e. all jumps, calls and returns. If this flag is found, the compilation must stop and the compiled instructions up to this point can be executed.
Now the last host instruction is stored at the output program counter and this is a jmp #compiler. It should be safe to jump to the compiled piece of code now and let it do it's job, i.e. execute something between very few or several dozen (as cog RAM permits) host CPU's opcodes in line.
The only kind of jumps that do not need to terminate a compiler sweep are the ones that jump within the compiled block of code. For relative jumps this is detected by a) being backwards jumps and b) the resulting PC being found in the compiled code. For absolute jumps (I am not yet at that point) it should work out similar. The compiler has to calculate the jump target address and find it in the host NOPs that are in the emitted code. If it does find the guest PC address, the host (cog RAM) address of this location is patched into the jump instruction's snippet. More details in the example below.
When the host code is executed later on, a jump address that is non-zero means "just jump there" or otherwise "leave the execution, we must compile again".
Also when a memory write is detected that writes to an address between the first compiled opcode (called PC0 in my code) and the end of the compiled code (which is in the Z80 context's PC), the execution has to stop and the compiler has to deal with the - possibly - changed opcodes. This means that self modifying Z80 code will execute pretty ineffective, and there's nothing that can be done about it.
What is left for the kernel is that it now has to deal with those terminating guest opcodes. At the moment it has to deal with them, all registers and flags are in the state that allows to check for conditional jumps, calls and returns. The easy path is the one where the guest's program counter just increments over the instruction, i.e. the not-taken jumps. If a jump is taken, the guest's PC is changed to that address, the compiler output buffer is flushed and compiling starts over.
Let's look at a simple example of Z80 code and how it comes out in cog RAM. Assume the piece of code is some delay subroutine that loops for 30 instructions:
The output of the compilation in cog RAM at 0x100 (assuming this is the ORIGIN of compiled code after the kernel) could look like this:
Now there are two unanimous goals. a) keep the free cog RAM for compiled code as big as possible b) don't waste too much space in the hub RAM. Unfortunately the two goals contradict each other. With every fraction of code that is placed in the kernel, e.g. the handling of flags_szv_dec in a subroutine in the kernel, the amount of precious free cog RAM shrinks. On the other hand this kernelized piece of code would be repeated numerous times in the code snippets that emulate the Z80's DEC instructions. The DEC instruction appears 10 times for 8 bit registers B, C, D, E, H, L, (HL), A, (IX+offs) and (IY+offs) and so would waste hub RAM space 9 times for identical code. And this is only the DEC instruction. The same applies for INC and for the arithmetic operations ADD, ADC, SUB, SBC and CP, and for the logic operations AND, OR, XOR... all of them are repeated 10 times.
The only way out I see is to have a separate number of code snippets of these ALU operations with flag manipulations and somehow encode them in the code snippets that the compiler copies to cog RAM. The compiler would have to insert these ALU snippets in the middle of the opcode code snippet by (quickly) detecting when to do it and from what source to get them.
Perhaps inserting tainted NOPs in the code snippets could be detected fast enough in the compiler copy-code loop to switch over to another source and resume afterwards.
This is, for the moment, what I have thought about and I'm not yet decided how to go on.. and if to go on at all.
Comments and suggestions are of course welcome. That's why I posted my ideas here.
Juergen
Edit:
I have checked in pm80 as a project in my own CVS pserver. To access it you have to do the following (command line CVS):
There's also a CVSweb server running at http://pmbits.ath.cx/cgi-bin/cvsweb/pm80/
Occasionally I'll also update the attachment to this post.
From 0.3.0 on, the compiler actually works correctly
Required to compile the source:
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/18/2010 11:40:50 PM GMT
The idea of a DRC core is to have pieces of code in the host CPU's native language that emulate each and every opcode of the guest CPU's range. Let's call these code fragments snippets for now.
The kernel code of a DRC core consists of three parts: compiler, patcher, executer.
The compiler translates opcodes of the guest into host language by copying the pieces of code that are listed in a table of 256 longs (yes, there are more tables for prefixed opcodes, but we just ignore that for now).
In my attempt this table consists of 23 bits snippet source address and 9 bits of patch flags. Each snippet of code is preceeded by a long that has the length of the snippet in number of longs. This is done like this:
some_opcode long :end - $ - 1 op1 dst, src op2 dst, #1 ... :end
The first long stored at the compiler's output program counter is the guest's program counter. This long contains 000000 in the instruction field and thus is a NOP for the host. Then the code snippet is copied verbatim to the output program counter locations in ascending order. This is of course done after verifying that this code fragment still fits in the cog RAM.
Next the flag bits of the snippet's long are examined to see if any patches / fixups in the code just compiled are necessary. These patches or fixups are translating immediate bytes and words following the guest CPU's opcode into the corresponding source fields of host CPU opcodes in the compiled snippet.
One flag has the special meaning terminate and it is set on all opcodes that potentially change the order of execution of the guest code, i.e. all jumps, calls and returns. If this flag is found, the compilation must stop and the compiled instructions up to this point can be executed.
Now the last host instruction is stored at the output program counter and this is a jmp #compiler. It should be safe to jump to the compiled piece of code now and let it do it's job, i.e. execute something between very few or several dozen (as cog RAM permits) host CPU's opcodes in line.
The only kind of jumps that do not need to terminate a compiler sweep are the ones that jump within the compiled block of code. For relative jumps this is detected by a) being backwards jumps and b) the resulting PC being found in the compiled code. For absolute jumps (I am not yet at that point) it should work out similar. The compiler has to calculate the jump target address and find it in the host NOPs that are in the emitted code. If it does find the guest PC address, the host (cog RAM) address of this location is patched into the jump instruction's snippet. More details in the example below.
When the host code is executed later on, a jump address that is non-zero means "just jump there" or otherwise "leave the execution, we must compile again".
Also when a memory write is detected that writes to an address between the first compiled opcode (called PC0 in my code) and the end of the compiled code (which is in the Z80 context's PC), the execution has to stop and the compiler has to deal with the - possibly - changed opcodes. This means that self modifying Z80 code will execute pretty ineffective, and there's nothing that can be done about it.
What is left for the kernel is that it now has to deal with those terminating guest opcodes. At the moment it has to deal with them, all registers and flags are in the state that allows to check for conditional jumps, calls and returns. The easy path is the one where the guest's program counter just increments over the instruction, i.e. the not-taken jumps. If a jump is taken, the guest's PC is changed to that address, the compiler output buffer is flushed and compiling starts over.
Let's look at a simple example of Z80 code and how it comes out in cog RAM. Assume the piece of code is some delay subroutine that loops for 30 instructions:
3600: 3e 30 ld a, 30h 3602: 3d dec a 3603: 20 fd jr nz, $ - 1 3605: c9 ret
The output of the compilation in cog RAM at 0x100 (assuming this is the ORIGIN of compiled code after the kernel) could look like this:
0100: long $3600 0104: ror AF, #8 0108: movs AF, #$30 ' this source value was patched in, it's the immediate 8bit value following the guest opcode 010c: rol AF, #8 0110: long $3602 0114: mov alu, AF 0118: shr alu, #8 011c: sub alu, #1 0120: call #flags_szv_dec 0124: shl alu, #8 0128: and AF, #$ff 012c: or AF, alu 0130: long $3603 0134: mov ea, #$110 ' this source was patched in, found the calculated guest PC $3602 in the host code here 0138: test AF,#z_flag WZ 013c: if_z jmp #$148 ' this source was patched in, it's the start of the snippet plus its length 0140: test ea, ea WZ 0144: if_z jmp #compile 0148: long $3605 014c: jmp #handle_ret ' this must be handled separately, because it will change the guest's PC
Now there are two unanimous goals. a) keep the free cog RAM for compiled code as big as possible b) don't waste too much space in the hub RAM. Unfortunately the two goals contradict each other. With every fraction of code that is placed in the kernel, e.g. the handling of flags_szv_dec in a subroutine in the kernel, the amount of precious free cog RAM shrinks. On the other hand this kernelized piece of code would be repeated numerous times in the code snippets that emulate the Z80's DEC instructions. The DEC instruction appears 10 times for 8 bit registers B, C, D, E, H, L, (HL), A, (IX+offs) and (IY+offs) and so would waste hub RAM space 9 times for identical code. And this is only the DEC instruction. The same applies for INC and for the arithmetic operations ADD, ADC, SUB, SBC and CP, and for the logic operations AND, OR, XOR... all of them are repeated 10 times.
The only way out I see is to have a separate number of code snippets of these ALU operations with flag manipulations and somehow encode them in the code snippets that the compiler copies to cog RAM. The compiler would have to insert these ALU snippets in the middle of the opcode code snippet by (quickly) detecting when to do it and from what source to get them.
Perhaps inserting tainted NOPs in the code snippets could be detected fast enough in the compiler copy-code loop to switch over to another source and resume afterwards.
This is, for the moment, what I have thought about and I'm not yet decided how to go on.. and if to go on at all.
Comments and suggestions are of course welcome. That's why I posted my ideas here.
Juergen
Edit:
I have checked in pm80 as a project in my own CVS pserver. To access it you have to do the following (command line CVS):
you@yourbox:~$ cvs -d [img]http://forums.parallax.com/images/smilies/tongue.gif[/img]server:anoncvs@pmbits.ath.cx:/anoncvs login Just hit enter when asked for the password If this fails with an error message, do a 'touch ~/.cvspass' before trying to login Next you can check out the pm80 project you@yourbox:~$ cvs -d [img]http://forums.parallax.com/images/smilies/tongue.gif[/img]server:anoncvs@pmbits.ath.cx:/anoncvs co pm80 And later on you can update to the most recent version by running you@yourbox:~/pm80$ cvs up -dAP
There's also a CVSweb server running at http://pmbits.ath.cx/cgi-bin/cvsweb/pm80/
Occasionally I'll also update the attachment to this post.
From 0.3.0 on, the compiler actually works correctly
Required to compile the source:
- BST BradC's spin compiler. I used BSTC.linux, you may have to adapt the path to it in the Makefile
- GCC GNU compiler collection
- GNU make to use the Makefile to build everything
- BISON and FLEX replacements of lex/yacc, compiler-compiler
- GtkTerm on Linux or your preferred terminal program (modify Makefile to fire it up)
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/18/2010 11:40:50 PM GMT
Comments
Development this fast promises good things!
Regards,
T o n y
Somewhere at the beginning of my Z80 emulation adventures I had an Intel 8080 emulated fully enough to be able to run CP/M. everything fit in the COG except the instruction dispatch table and the dreaded DAA instruction.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Yeah, I think if the Z80 won't fit, I can still try the 8080/8085 as many CP/M programs are happy with this. Another candidate would be the 6502/6510, though I'm not aware of a simple hardware and OS that runs on 6502s. I guess aiming at the Atari800 or C64 would be mad
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
By the way did you know I have a half baked 6809 emulation, MoCog, going on as well? That's a real real biggy and will really need a a couple of COGs to get going at a decent rate. I'm not sure I have the stamina to persevere with it.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Yeah, I saw it on the Wiki links page somehwere. I didn't follow those links too far, though.
I read about this one, too. I know a little about the 6809 from MAME. Motorola has never been my kind of CPUs, though, so I'm no expert at this. I know the 6809 has a funny mnemonic: SEX
Back on topic: I just wrote down the code to paste a code snippet inside a code snippet, just needs about 8 longs, so I think this can save a lot of repeated code in the opcode snippets. And at the same time reduces the chance of bugs in the otherwise repeated copies.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/6/2010 7:59:54 AM GMT
Re "Yeah, I think if the Z80 won't fit, I can still try the 8080/8085 as many CP/M programs are happy with this"
Well looking at what I'm using I think it is pretty much all 8080. BDS C, Wordstar, SBasic, Mbasic, Assembly (written in Z80 opcodes but pretty much all 8080 opcodes.). I will make a special mention of DJNZ though.
I'm intrigued what you are doing. As posted on the zicog site, maybe the memory is not such a problem? 2k in cog, 32k in hub but you also have 448k in external ram, and the pasm code to access that is already in the zicog.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Thanks a lot! My board didn't arrive yesterday, so I hope it will be here today.. I want to play around a little [noparse]:)[/noparse]
Yes, it's looking better again now that I have implemented the nested pasting of code snippets. Basically the opcode snippets can contain a long with the address of another snippet to paste. The compiler copy loop detects this and switches over to paste another section of PASM opcodes, then returns to finish the opcode snippet. This way I save a lot of hub RAM and still don't need precious cog RAM for subroutines.
FWIW I have checked in pm80 as a project in my own CVS pserver. To access it you have to do the following (command line CVS):
There's also a CVSweb server running at http://pmbits.ath.cx/cgi-bin/cvsweb/pm80/
Juergen
P.S.: Just when I thought that spring might come now...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/6/2010 7:28:18 AM GMT
Here in Helsinki my balcony has been waist high in snow for three months, only just now is it thawing out. Just after I invested in a shovel.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Actually, I'm doing both right now. Hope my code stay legible!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
I've once stated this quote: "I am undecided whether I am a coder with an alcohol problem, or an alcoholic with a coding problem."
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/6/2010 8:51:22 AM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Visit some of my articles at Propeller Wiki:
MATH on the propeller propeller.wikispaces.com/MATH
pPropQL: propeller.wikispaces.com/pPropQL
pPropQL020: propeller.wikispaces.com/pPropQL020
OMU for the pPropQL/020 propeller.wikispaces.com/OMU
pPropellerSim - A propeller simulator for ASM development sourceforge.net/projects/ppropellersim
Hmm... It seems I can't search for your posts. Just the 5 most recent ones are displayed, or is there a forum function to accomplish that?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/6/2010 1:03:46 PM GMT
There was search.parallax.com but I can't connect to it any more.
If you or someone does not keep bookmarks to interesting threads/posts they are lost in a black hole somewhere.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Oh, that's sad. I really think there should be one or more backup resources on the net that don't rely on parallax.com. No offense, just a suggestion for the unexpected case that never happens.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Well, there is a wiki: http://propeller.wikispaces.com/. Feel free to add to it.
You can use Google to search. Add -site:forums.parallax.com.
I've also had luck searching with AltaVista.
Now I have to find out what tool to use to upload code via my USB to serial adapter on Linux.
Hmm.. Loader.py just hangs there and can't even be killed. I should probably write my own uploader, huh?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/6/2010 2:41:01 PM GMT
What your board arrived ? Great.
For coding and uploading on Linux I use BST the Prop Tool Clone:
url]http://forums.parallax.com/forums/default.aspx?f=25&m=298620&p=1
BST has some nice extras like #define, #ifdef and the @@@ operator. It also produces nice listing files.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Yup. And it blinks the LEDs up and down, making me nervous
Hmm.. It barfs at my #ifdef, though:
pm80 - Error at (84,1) Expected Spin Method, Unique Name, Assembly Conditional, BYTE, WORD, LONG or Assembly Instruction
#ifdef DO_CCOUNT
^
Compiled 6846 Lines of Code in 0.212 Seconds
Until now I used the homespun.exe compiler that runs with Mono on Linux.
It looks like the Python script is dying deep in the kernel somewhere. Probably it's due to my nameless USB to serial converter.
Windows development with that thing is also impossible, because WinXP doesn't know a driver for it and I don't have one either.
So it all boils down to Juergen's going to take the soldering iron in his hands and make a serial cable... *sigh*
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Have you enabled ""Non Parallax compatible extenstions" in the project and or compiler options dialogues ?
Not sure how you do it with the BSTC command line compiler.
But even if you cannot compile with BST you can download the binaries via USB/serial with it. Or the matching BSTL command line loader.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Yes, I've seen it now. I used the command line version, because I usually prefer my own editor (nedit) and work with a Makefile. The option is -Ox to enable the extensions.
Anyway, more serious is that I can't even connect the Prop through a serial port. Or ... wait a minute ... I soldered a null-modem cable, is that necessary? Hmm... I have to do some reading.
AH! It responds when I use a normal 1:1 cable at least Fortunately the rxd/txd buffers survived the wrong connection.
So now it's time to get some objects going. Double-AH! The Debug_1pin_TV_demo.spin is working just fine.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/6/2010 5:29:02 PM GMT
Yup, you are right. Bus 004 Device 003: ID 067b:2303 Prolific Technology, Inc. PL2303 Serial Port.
It doesn't work with either Loader.py nor bst.linux for me. The latter did hang the machine when trying to upload something to the Prop RAM.
My system is Ubuntu 9.10 with all the latest patches and kernel. If I wanted to go ambitious, I could try to nail down the following syslog excerpt:
But I just don't care as my serial port is working now.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Edit I think there are problems with the latest Ubuntu's
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
http://forums.parallax.com/showthread.php?p=882794
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Visit some of my articles at Propeller Wiki:
MATH on the propeller propeller.wikispaces.com/MATH
pPropQL: propeller.wikispaces.com/pPropQL
pPropQL020: propeller.wikispaces.com/pPropQL020
OMU for the pPropQL/020 propeller.wikispaces.com/OMU
pPropellerSim - A propeller simulator for ASM development sourceforge.net/projects/ppropellersim
Thanks for the link! Well, I'm doing almost exactly what you proposed then. There isn't really a problem in distinguishing code from data, because the compilation always follows the Z80 PC. And if some Z80 code intends to jump into data, well, then this data will be compiled as instructions.
The only real showstopper is self modifying code, as was pointed out in this thread. I solved this with two compares before every memory write. If the write is inside the code that is compiled in the cog RAM, then the compiler has to take over again after the byte or word was changed. This may be very ineffective, but only in the cases where code very close to the current PC is modified. Modifying the RAM say + 100 bytes from the current PC and then jumping there or calling it is no problem. I know that the TRS-80 ROM uses this to do the INP() and OUT() commands. The port address is patched into RAM after the IN or OUT opcodes and then these are called.
At the moment I'm struggling with Spin basics to get my code actually going. I don't understand everything yet, i.e. how to pass the parameters of the pm80 start proc to the assembly code... well, I have to do some reading in other peoples code.
Juergen
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Seems I'm a bit to dim to see ways around them. It's a fascinating technique.
Perhaps you could have a look at Zog sometime. The virtual machine in Zog is much much smaller and simpler than a Z80 and it desperately needs all the performance enhancement it can get[noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.