ZiCog a Zilog Z80 emulator in 1 Cog
heater
Posts: 3,370
ZiCog a Z80 emulator for the Propeller.
Run CP/M 2 and 3, Microsoft BASIC etc on a Propeller.
Latest version is zicog_v0.10.zip
The demise of the 4 COG Z80 emulator project was precipitated by the realization that it is possible to create a full up Z80 emulator that uses only one COG. Further that this emulator would be just as fast as the 4 COG effort, would be a lot more elegant and understandable than the old PropAltair 8080 emulator and would be far more friendly to external RAM.
How is this possible?
The classic way of building an emulator is to have a look up table that is used to get from a given instruction opcode to the function that will emulate that instruction. This is fine if you have a lot of space to put those instruction handler functions. In the Prop COGs there is not enough space. We could try to use more COGs. This is slow and wasteful of COGs.
Another emulator approach is to try and decode the various bit fields of the opcode and determine what the instructions data sources and destinations are and what operations should occur. In this way one can fold up the emulator into a twisted mess of code logic that will fit in a COG. This was how the first PropAltair 8080 emulator worked. It is slow.
A new approach: After looking at the emulation process long enough I came to the realization that most of the instructions boil down to three steps:
1) Get an operand from somewhere. The Source.
2) Do some operation with that operand and (mostly) the accumulator (The A register). The Operation.
3) Put the result somewhere. The Destination.
In the 8080 there are about 25 different sources and destinations: Registers A, B, C etc, memory via HL, memory via direct address, immediate data etc etc. There are about 30 different operations, ADD, SUB, ROT, INC, DEC etc etc.
All these sources, destinations and operations can be implemented in a from 2 to 20 lines of PASM each and they will all easily fit in a COG. All we need now is a way to connect different combinations of source, operation, destination together to implement each instruction.
This is where Cluso comes in. Who should, from now on, be referred to as "The Great Cluso".
Clusso suggests that one can put three COG addresses, vectors, in each LONG of a look up table, plus 5 bits of other possibly useful info. For a long time I wondered what such a clever thing could be used for. Until the euro dropped.
The first vector jumps us to a procedure to fetch the source info. The second vector gets us to some code to perform the operation and third vector takes care of posting the result to the right destination. This technique can also be used to take care of conditional/unconditional jumps, calls and returns.
Using this approach it is possible to fit a whole 8080 emulator in one COG with no LMM or other such junk and around 90 longs free!!
Turning to the Z80 we see that it has more than twice as many opcodes as the 8080 BUT looking closely we see there are only a handful more of those Source, Destination and Operation functions. Which can be coded as PASM in the free longs we have. The logic that ties them all together is in the dispatch tables in HUB.
Enough of the long lecture. My first pass at this idea is attached. I have not included the rest of the emulator files as this barely runs as it is but it does show the general outline. There is code in place for 90% of a Z80.
Early tests show that speed is nearly up to that of the 4 COG version, faster than the old PropAltair version and can be pushed a little more with some tweaks here and there.
It is early days yet but this file is intended to show what is coming for those who want to put CP/M on a Prop with external RAM or build some other emulator.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 8/19/2009 4:44:49 AM GMT
Run CP/M 2 and 3, Microsoft BASIC etc on a Propeller.
Latest version is zicog_v0.10.zip
The demise of the 4 COG Z80 emulator project was precipitated by the realization that it is possible to create a full up Z80 emulator that uses only one COG. Further that this emulator would be just as fast as the 4 COG effort, would be a lot more elegant and understandable than the old PropAltair 8080 emulator and would be far more friendly to external RAM.
How is this possible?
The classic way of building an emulator is to have a look up table that is used to get from a given instruction opcode to the function that will emulate that instruction. This is fine if you have a lot of space to put those instruction handler functions. In the Prop COGs there is not enough space. We could try to use more COGs. This is slow and wasteful of COGs.
Another emulator approach is to try and decode the various bit fields of the opcode and determine what the instructions data sources and destinations are and what operations should occur. In this way one can fold up the emulator into a twisted mess of code logic that will fit in a COG. This was how the first PropAltair 8080 emulator worked. It is slow.
A new approach: After looking at the emulation process long enough I came to the realization that most of the instructions boil down to three steps:
1) Get an operand from somewhere. The Source.
2) Do some operation with that operand and (mostly) the accumulator (The A register). The Operation.
3) Put the result somewhere. The Destination.
In the 8080 there are about 25 different sources and destinations: Registers A, B, C etc, memory via HL, memory via direct address, immediate data etc etc. There are about 30 different operations, ADD, SUB, ROT, INC, DEC etc etc.
All these sources, destinations and operations can be implemented in a from 2 to 20 lines of PASM each and they will all easily fit in a COG. All we need now is a way to connect different combinations of source, operation, destination together to implement each instruction.
This is where Cluso comes in. Who should, from now on, be referred to as "The Great Cluso".
Clusso suggests that one can put three COG addresses, vectors, in each LONG of a look up table, plus 5 bits of other possibly useful info. For a long time I wondered what such a clever thing could be used for. Until the euro dropped.
The first vector jumps us to a procedure to fetch the source info. The second vector gets us to some code to perform the operation and third vector takes care of posting the result to the right destination. This technique can also be used to take care of conditional/unconditional jumps, calls and returns.
Using this approach it is possible to fit a whole 8080 emulator in one COG with no LMM or other such junk and around 90 longs free!!
Turning to the Z80 we see that it has more than twice as many opcodes as the 8080 BUT looking closely we see there are only a handful more of those Source, Destination and Operation functions. Which can be coded as PASM in the free longs we have. The logic that ties them all together is in the dispatch tables in HUB.
Enough of the long lecture. My first pass at this idea is attached. I have not included the rest of the emulator files as this barely runs as it is but it does show the general outline. There is code in place for 90% of a Z80.
Early tests show that speed is nearly up to that of the 4 COG version, faster than the old PropAltair version and can be pushed a little more with some tweaks here and there.
It is early days yet but this file is intended to show what is coming for those who want to put CP/M on a Prop with external RAM or build some other emulator.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 8/19/2009 4:44:49 AM GMT
Comments
Can you give an estimate of the speed relative to a 4MHz Z80?
Cheers!
Paul Rowntree
This code has had only minimal testing so far but a loop of INR A, OUT 0, JMP 0000 runs at 384KIPS which happens to be the same as the 4 COG version and faster than the old PropAltair version of 348KIPS. I know I can push this to 400KIPS with a few tweaks. like putting some or all of the registers into COG space. I will probably not do that though, it complicates the code and makes the break, single step and trace debug features messier.
Bear in mind that this may slow down when we start using external RAM so pushing too hard for little optimizations won't make much sense.
That heartbeat is what I use to measure the speed using the FrequencyCounter object.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
It's a shame the 4 cog version didn't provide the gains you expected. Nice to see you are continuing the 1 cog version. Pleased to see the multiple vector approach is working (used this method in my undebugged spin interpreter to great effect).
If you are not already using a 6.0MHz xtal, this·will give you an instant 20%.
Hopefully, my TriBladeProp will be away to manufacture over the weekend - you will be surprised at the·extras I've done (no time to explain till it is away). Just taking a break.
Keep up the excellent work··
Postedit: PropZ80 sounds better - ZiCog is great but no-one will realise what it is.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps (SixBladeProp)
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz
As for the name, I like ZiCog. Somehow it seems to be in harmony with the "Propeller" spirit and way of doing things.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH
Anyway it should be in every "Software Engineering 101" course to identify your primitive operations first and then design the code to use them.
It is fortuitous that the 8080/Z80 fits this scheme so well. We only have three vectors to play with in each table entry. With the 8080 you can get away with just "source", "operation" and "destination". As most operations work on the Accumulator (The A register) e.g. "ADD A, B" it does not need to specified in the table. Operations like "INR D" can work on any register/memory location, but here we only have one source and dest and they are always the same place.
Had the 8080 been like some CPU architectures and have two sources and a destination like "ADD dest, src1, src2" we would be out of luck.
James: "Pnut-Emu" What the..?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH
I was kind of hoping since we hadn't heard from you for a bit that you were deep in the world of coding. The new code is very elegant. Re the Z80 specific instructions, if you can do just LDIR I reckon it could do the N8VEM. How would the 64k memory interface (and could it use the cluso triblade?)
I have been deep in the world of getting myself work with a new start up, so my Propelling may have to slow down somewhat soon.
LDIR, no problem I hope.
For sure half the motivation for this "yet another total rewrite" is to accommodate external RAM interfaces. The old PropAltair version was squeezed to the limit making it hard to find room in the COG for external RAM interface code. The four COG version made the space available but at the cost of speed and COGs and HUB RAM. Not to mention the hassle of ensuring that only one COG access the RAM at a time.
Cluso's TriBlade is the primary target platform just now and probably has the easiest requirements for adding external RAM driving code. Can't wait to get my hands on one.
Mike Green has a serial RAM chip solution that while slower would be amazingly quick and simple to construct hardware wise. That kind of serial access requires more COG code to shift the bits but will now fit I hope.
Leon has/had a board underway using an external RAM and a CPLD.
@Mike and Leon: How are those RAM solutions going ?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps (SixBladeProp)
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz
I'm sure CP/M with banked switched memory can't handle that much RAM.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
E.G. There is "RLCA" which rotates the accumulator setting only the carry flag like an 8080. Then there is RLC A which sets zero, sign, parity and carry!!!
Things are going to get tight around here.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
PUSH PSW
LDIR_LOOP:
MOV A,M
STAX D
INX D
INX H
DCR C
JNZ LDIR_LOOP
DCR B
JNZ LDIR_LOOP
POP PSW
RET
That is fun - LDIR calls a list of 8080 instructions which calls a list of prop asm instructions. This would work though maybe there are some optimisations. LDIR is very useful for block moves eg getting a block of memory out of an eprom and into ram.
Cluso, 512k eh? And all those mass storage options? Might have to look at rewriting some CP/M mass storage code. Keep it simple - if I sent you a number 0 to 1million can you send me a byte at that address? If yes (and I'm sure you can but I'm not an expert in how a prop might get a byte from a microSD), then CP/M can do the track-sector-to-address calculation. Then the microSD can become a disk drive within CP/M.
CP/M only needs 64k. The rest could be useful for the prop but CP/M doesn't need it. Do smaller ram chips fit on the triblade?
Post Edited (Dr_Acula (James Moxham)) : 2/28/2009 11:20:29 AM GMT
Sorry Heater - I didn't mean to hijack your thread.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps (SixBladeProp)
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz
I will try and implement LDIR and friends as high speed PASM.
I want those 2M RAMS. Good for a 1M RAM disk and possibly a smaller one as well. Latched decoding for a RAM disk is just fine.
Cluso: You are by no means hijacking.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
The 4 COG emulator only managed 414KIPS on that! I'm not depressed any more.
70 odd LONGs free. Squeezing the the last of the Z80 stuff in without a little overlay cheating may be getting a bit tight.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
In the implementation of daa I think you could compare data_8 directly against 100 ($A0) without moving it to nibble and shifting it right 4 bits (line 517 of your v 0.0). The mux instructions sure were handy!
I don't thing we'd be completely out of luck if 4 pointers were needed per op-code. In that case using four 8-bit pointers could work IF the missing bit could be implied. (i.e "banking" the code into low and high sections based on op-code sub part, or table pointers only go to even addresses) Otherwise, the 4th pointer could just be restricted to a 5-bit sub-section of a cog's address space. Could be a royal pain to get the compiler to do this automatically. (haven't dived that deep into assembly yet)
Lawson
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Lunch cures all problems! have you had lunch?
Hey, this thing gives me a big enough headache already[noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
When the demo is run it shows the Z80 registers and steps through a short Z80 program loop when any key is hit. You can see the Z80 program in zicog_demo.spin.
The main point here is to show that Z80 ops with prefix bytes (CD, DD, FD, ED) are handled with the appropriate dispatch tables being used.
75% of the 8080 ops have been tested but only a handful of the Z80 ops. Almost all ops have some code in place but there is work to do on the Z80 flag settings.
Despite my optimism it looks as if we are going to be a few LONGs short of getting everything in the COG. There are only 66 LONGs left. So somethings will have to be farmed out to LMM or overlay. However I will keep the Z80 string moves LDIR, LDDR and string seaches CPIR, CPDR as high speed PASM as they are the only Z80 ops known to be used regularly.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
ZiCog V0.1 (attached) has a version of the overlay loader shoehorned in where it is serving up overlays to handle the seldom used but long winded DAA instruction emulation. It is also catering for the 16 Z80 string move, compare, input and output instructions. LDIR and friends.
Whilst I wanted to keep the string ops in resident PASM it turns out they become quite huge. Still this may be a happy compromise as the overhead of loading a short overlay for each string op is (hopefully) not so burdensome when moving/comparing long strings. Certainly much faster than using LMM.
I have made some modifications to the overlay handler:
1. The overlays parameters are stored in a table in HUB rather than in COG.
2. Rather than use two instructions to load and execute each overlay I have arranged to have a single function do this, which is in turn called
via vectors in the dispatch tables with a parameter indicating which overlay to execute from the HUB table.
With these two changes any number of overlays can be used without eating additional COG space.
The attached demo runs a program of misc Z80 ops. It nicely shows LDIR moving a code block from 0000 to 0100 where it is then executed.
As usual any suggestions are welcome.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I asked about RED and it was going to cost an extra $2 per board - ouch! Obviously it's a pain for them to change.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps:· SixBladeProp, TriBladeProp
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
You know, ZiCog is not supposed to be happening. I always insisted that I did not want to make a full up Z80 emulator. It has just insinuated itself into my mind without me noticing. Now I won't rest until the EXZ80 cpu diagnostic runs flawlessly.
Shame about the red.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
This is great. How far from 'working' is this version (as noted in zicog.spin)?
I don't think that you can beat ZiCog as a project name for this [noparse]:)[/noparse]
BTW : The MIT terms of use blurb was not in the two spin files ...
Cheers!
Paul Rowntree
I think IBM did DB2 for CP/M...
(Need to check what's in all those floppies that came with some of my CP/M machines... )
BTW: You are now officially way beyond what should be possible to do with a single COG...
What's next?
DEC PDP-11/750 ?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...
I hope to have all 8080/8085 opcodes working and the T8080 CPU diagnostic passing within a week from now.
Then I'll have to upgrade PropAltair to use this new emulator and get CP/M up and running.
Then it's time to finish up the z80 ops and get the EXZ80 CPU diagnostic passing.
I know I'm going to have a fight with flag settings again......
There is a bunch of Z80 ops to do with interrupts that will probably never be implemented.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
PDP-11 may have been easier[noparse]:)[/noparse] I have no idea, but the Z80 instruction set is total chaos. No wonder most of the instructions are not used by most software.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Off the top of my head I can mention these machines:
Osborne One
KayPro (HDD inside... may contain goodies... )
Epson HX-20
Commodore SX-64.
Unfortunately, my collection is a complete shambles, and a lot of the stuff is packed away in crates so it's not that easy to find out exactly what I have...
(I'm working on fixing a better arrangement in the attic, but this takes time. A lot of time... )
And no, PDP-11 is not easy.
The thing uses virtual memory...
The only assembler I ever tried on it used OCTAL, and Source/destination was switched around, compared to most other CPUs.
(I only did one assignment on it; building a simple 'OS' that could load and run a program from 8" floppies... I think they had us do work on the PDP so that we would appreciate more the more modern tools we used on other machines... )
The reason most Z80 instructions are so little used is that many programmers began on the 8080 or 8085, and never 'upgraded' their skills or tools for the Z80. In one class I attended, we programmed the Tiki 100(CP/M-clone) which had a Z80, but the tool we used was a Borland assembler which only knew 8085, and all Z80-specific commands had to be handled as DATA, and hand-coded.
The Z80 is incredibly efficient compared to the 8085, but only if the programmer KNOW how to use the extended command-set.
I like to disassemble old code(ROMs from assorted computers) and have seen machines that NEVER EVER use the EXX commands, not even for interrupt handling.
I've seen manual block moves...
Manually built loops to do repeated IN commands...
I have yet to see a single computer using the special interrupt mode of the Z80.
(The mode where it waits for the interrupting device to place a Byte on the Data-bus to be combined with the Interrupt-register to form a pointer into a tale to locate the interrupt routine. This could potentially handle 128 different interrupting devices... AAAAARGH! )
The machines where there's ANY use of the #0066H interrupt address it's usually a remnant of the debugger used during the coding stages.
There exists a 'Power saving' version of the Z80, known as NSC800, and while it has the exact same command-set, it handles interrupts slightly differently, the DATA-bus is multiplexed with the address-bus(like on the 8085, killing a major speed-advantage) and most importantly, it has a register placed in the I/O range at address BBH.
I found that one in a Philips Videowriter (word-processor and Printer combined into one ugly box)
As it was localized with Norwegian menus and keyboard, I assume it was once somewhat common.
Inside the code I found JP instructions using the IY-register, so those instructions were used by some people.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...