This topic is about how to use XBYTE to emulate a Z80 CPU.
Introductory spoiler: it can fit comfortably in a single cog without using any hub RAM (apart from the Z80 program of course) and all instructions can have accurate timing.
this is fantastic. Since years I want to run CPM on a propeller, but sadly the CP/M thread for the P1 is huge and lost me at the external memory implementation.
@Cluso99 has it running on some of his P1 modules, but last time I asked him he had none for sale because of redoing his modules.
But I agree with you that with a decent emulation the P2 could rock and roll with CP/M, and with 512 Mb bank switching for MP/M is possible.
We have enough pins to connect a lot of serial terminals (could be P1Briel terminals...) and there is a quite active community around CP/M emulators and one can get tons of software for CP/M Z-80.
I think the first C compiler running on the propeller was running under CP/M and I found even a COBOL-85 compiler for running on a Z-80 based CP/M.
Both heater (ZiCog) and Pullmoll (?) had working Z80 emulations on P1. I helped heater get CPM running and built a board (RamBlade) that had 512KB of SRAM. DrAcula also built boards. heater did not emulate all the Z80 extended opcodes but Pullmoll did. The emulations were running about equivalent of a 4MHz Z80. Wordstar, MBasic, etc were running.
I had 8x 8MB HDD emulated on SD as FAT32 files and wrote extensions to my OS to transfer files between FAT32 and CPM. You could call CPM and return to my OS.
BTW there is a Z80 emulation validation program somewhere on the internet.
Once I complete my P2 spin interpreter my OS should work on P2. If you have any questions just ask.
This topic is about how to use XBYTE to emulate a Z80 CPU.
Introductory spoiler: it can fit comfortably in a single cog without using any hub RAM (apart from the Z80 program of course) and all instructions can have accurate timing.
Sounds fabulous @TonyB_ . Now when you say "all instructions can have accurate timing", do you really mean ALL, or "almost all"? Have you been through all of them to check?
I'm thinking accurate timing would be great for Z80 emulators running games whose timing requirements were locked to Z80 instruction timing, assuming the P2 is also setup to operate at some nice/close multiple of the original Z80 clock and the emulator can add extra cycles to try to accurately match it as best as possible.
It would be great if there were sufficient cycles remaining in this Z80 emulation loop to be able to compensate for any variable hub timing introduced by the emulator during its hub memory accesses. Then I guess it could be potentially truly cycle accurate down to some minimum P2 clock frequency limit assuming this emulator is fast enough.
Yes, I have looked at every instruction. There is a register T that holds the number of T-states per instruction. It is loaded with 4 (or more if M1 wait states are needed e.g. MSX) before each opcode fetch and values are added to it if the instruction is longer than 4T, e.g. +3 for memory cycles and +4 for I/O cycles (again these could have wait states.) Some instructions require special increments, such as +5 for relative jump or IX/IY+d calculations and these are catered for.
At the end of each instruction, T is multiplied by the number of P2 cycles in 1T, which is stored in the T1_cycles register. If the latter is non-zero there is a system count wait but if zero the wait is skipped. The current limitation is that there must be an exact integer number of P2 cycles in 1T. I think it could be a good idea to use an output port to load T1_cycles, e.g. OUT FF, to set the CPU speed or disable exact timing.
I think the smallest variable delay starts at 2.. ?
It may be useful to have a conditional build, that strips out entirely the delays-to-match-timing code, to allows 'fastest Z80' alternative ?
I think both cycle-accurate and maximum-speed versions would be useful. On the cycle-accurate front I wonder if it would be possible to replicate a ZX80 video system. That would be hardcore proof.
At the end of each instruction, T is multiplied by the number of P2 cycles in 1T, which is stored in the T1_cycles register. If the latter is non-zero there is a system count wait but if zero the wait is skipped. The current limitation is that there must be an exact integer number of P2 cycles in 1T. I think it could be a good idea to use an output port to load T1_cycles, e.g. OUT FF, to set the CPU speed or disable exact timing.
There should be plenty of time to cope with hub timing and XBYTE FIFO reloading when emulating old Z80 machines. I have just finished the skip patterns and therefore the public release is imminent.
Thanks @TonyB_ , this is sounding better and better than I imagined. Can't wait to see your public release soon.
I've worked on Z80 systems in the past including my old Microbee computer I recently fitted out with my custom P1 board that could emulate the video graphics system. It now seems like soon a P2 could probably pretty well emulate the entire thing, assuming I sort out some PIO emulation for its bit bang sound / serial / cassette and joystick ports etc. I've already replaced the original floppy stuff in the BIOS with some custom SD sector read/writes and CP/M works well (now with very fast seek times).
By the way TonyB_, thinking further out it would be great if there was some simple/standardized way to hook into Z80 IO port accesses so additional COGs could access data reads/writes to them making IO device emulation in additional COGs a bit easier. On the P1 I sort of did this by mapping a portion of the Z80 IO space into P1 hub RAM that different COGs could share but it was somewhat limited and needed a polling mechanism in Z80 software to work using ready/busy flags. It worked ok for a my own disk controller but may not work out for specific devices that need to act as soon as the read or write happens. Maybe the P2 has better support for something like this by syncing/waking other COGs. You may already be thinking of this too...
Looking through it now. Very nice, you've done a good job! Wow, that is quite a lot of work to put together and it looks like it would have taken a while to sort out all these instruction details and suitable skip flags etc.
I wonder which is the slowest Z80 instruction to execute on the P2 in its T state budget? Possibly DAA? I guess that would then determine the upper bound of the emulated Z80 clock frequency if you need cycle accurate execution. Can (say) a 4MHz Z80 run on (say) a 160MHz P2 for example for all possible Z80 instructions? That should allow a fast 4 T state Z80 opcode almost up to 80(!) P2 instructions, minus 3 for the XBYTE overhead I'd guess. Your DAA implementation seems to be about 23 P2 instructions excluding other opcode overhead so that seems to fit my example timing budget easily. Maybe a 8~10MHz Z80 or so is possible on a 160MHz P2?
UPDATE: by the way TonyB_ I did see some SKIPF flags in your XBYTE LUT table that exceed their normal 22 bit range within D[31:10] if I've counted correctly (e.g. look at CP_). Those seems to be a bug.
I can see there are a few places where you can save some longs if it gets to that, although I only had a quick look.
FWIW I will post my current P2 Spin Interpreter on that thread. It's not fully debugged although that basics are working. I am using the ROM to output serial debug info to PST.
I can see you have come up with the same problem as me regarding how to specify the skip instruction documentation
I can see there are a few places where you can save some longs if it gets to that, although I only had a quick look.
FWIW I will post my current P2 Spin Interpreter on that thread. It's not fully debugged although that basics are working. I am using the ROM to output serial debug info to PST.
I can see you have come up with the same problem as me regarding how to specify the skip instruction documentation
Cluso,
Please tell me wherever I can save code, sometime. I read the P1 Z80 thread from 2009 (long before I joined this forum) with great interest. My starting point was qz80.spin but the end result is pretty much a total rewrite.
IIRC qz80 was PullMoll's version and it was quite complete.
Yes, I found out that trying to shoehorn the old interpreter didn't really fit with the skipf concept even tho' I had decoding vectors already in place. The speed improvement should be dramatic
I saw where you can common up more code and have the common ending apply to more of your code. Cannot recall where I saw it a few times.
Here is my mathops. Note I had to break it into 3 sections because of the 22 skip length limitation.
'$F0=AND bool %00_0000_0000_0000_1000_0000
'$F2=OR bool %00_0000_0000_0000_0100_0000
math_F02 rdlong y,--ptrb 'popy bin %--_----_----_----_----_---0
rdlong x,--ptrb 'popx un+bin %--_----_----_----_----_--0-
CMP x,#0 wz ' %--_----_----_----_----_-0--
MUXNZ x,masklong ' %--_----_----_----_----_0---
CMP y,#0 wz ' %--_----_----_----_---0_----
MUXNZ y,masklong ' %--_----_----_----_--0-_----
AND x,y ' %--_----_----_----_-0--_----
OR x,y ' %--_----_----_----_0---_----
jmp #pushr 'push result
'------------------------------------------------------------------------------
math_E0 rdlong y,--ptrb 'popy bin %--_----_----_----_----_---0
rdlong x,--ptrb 'popx un+bin %--_----_----_----_----_--0-
ROR x,y '$E0=ROR %00_1111_1111_1111_1111_1000
ROL x,y '$E1=ROL %00_1111_1111_1111_1111_0100
SHR x,y '$E2=SHR %00_1111_1111_1111_1110_1100
SHL x,y '$E3=SHL %00_1111_1111_1111_1101_1100
FGES x,y '$E4=FGES %00_1111_1111_1111_1011_1100
FLES x,y '$E5=FLES %00_1111_1111_1111_0111_1100
NEG x '$E6=NEG un %00_1111_1111_1110_1111_1101
NOT x '$E7=NOT un %00_1111_1111_1101_1111_1101
AND x,y '$E8=AND %00_1111_1111_1011_1111_1100
ABS x '$E9=ABS un %00_1111_1111_0111_1111_1101
OR x,y '$EA=OR %00_1111_1110_1111_1111_1100
XOR x,y '$EB=XOR %00_1111_1101_1111_1111_1100
ADD x,y '$EC=ADD %00_1111_1011_1111_1111_1100
SUB x,y '$ED=SUB %00_1111_0111_1111_1111_1100
SAR x,y '$EE=SAR %00_1110_1111_1111_1111_1100
REV x '$EF=REV (un) %00_1101_1111_1111_1111_1100
ENCOD x '$F1=ENCOD un %00_1011_1111_1111_1111_1101
DECOD x '$F3=DECOD un %00_0111_1111_1111_1111_1101
jmp #pushr 'push result
'------------------------------------------------------------------------------
math_F4 rdlong y,--ptrb 'popy bin %--_----_----_----_----_---0
rdlong x,--ptrb 'popx un+bin %--_----_----_----_----_--0-
QMUL x,y '$F4=MPY %00_0111_1111_1111_1111_0000
GETQX x ' **
QMUL x,y '$F5=MPY_MSW %00_0111_1111_1111_1100_1100
GETQY x ' **
QDIV x,y '$F6=DIV %00_0111_1111_1111_0011_1100
GETQX x ' **
QDIV x,y '$F7=MOD %00_0111_1111_1100_1111_1100
GETQY x ' **
QSQRT x,#0 '$F8=SQRT un %00_0111_1111_0011_1111_1101
GETQX x ' **
'$F9..$FE=test signed LT/GT/NE/EQ/LE/GE '$F9..$FE %00_0010_0000_1111_1111_1100 'LT/GT/NE/EQ/LE/GE
CMPS x,y wcz ' * 'tests
if_z MOV x,#%100 ' * 'equal?
if_nz MOV x,#%010 ' * 'above?
if_c MOV x,#%001 ' * 'below?
ANDN x,a wz ' * 'set t/f
CMP y,#0 wz '$FF=NOT bool un %00_0001_1111_1111_1111_1100 '!t/f
{pushtf} MUXZ x,masklong ' * 'Z=true(-1)/false(0)
jmp #pushr 'push result
Comments
I've got an arcade cab just waiting...
I want to work on 6502. An example would be fantastic!
(Hey all. I have been using tools, writing some code, but not too active on the forums. That will change soon.)
Another example is the Zog ZPU emulator. The zog_p2.spin file uses XBYTE to run ZPU instructions on the P2.
Having a Z80 emulator would be very nice though, it's a much more widely used chip than the ZPU .
this is fantastic. Since years I want to run CPM on a propeller, but sadly the CP/M thread for the P1 is huge and lost me at the external memory implementation.
@Cluso99 has it running on some of his P1 modules, but last time I asked him he had none for sale because of redoing his modules.
But I agree with you that with a decent emulation the P2 could rock and roll with CP/M, and with 512 Mb bank switching for MP/M is possible.
We have enough pins to connect a lot of serial terminals (could be P1Briel terminals...) and there is a quite active community around CP/M emulators and one can get tons of software for CP/M Z-80.
I think the first C compiler running on the propeller was running under CP/M and I found even a COBOL-85 compiler for running on a Z-80 based CP/M.
So please feel encouraged to do this.
Mike
I had 8x 8MB HDD emulated on SD as FAT32 files and wrote extensions to my OS to transfer files between FAT32 and CPM. You could call CPM and return to my OS.
BTW there is a Z80 emulation validation program somewhere on the internet.
Once I complete my P2 spin interpreter my OS should work on P2. If you have any questions just ask.
Sounds fabulous @TonyB_ . Now when you say "all instructions can have accurate timing", do you really mean ALL, or "almost all"? Have you been through all of them to check?
I'm thinking accurate timing would be great for Z80 emulators running games whose timing requirements were locked to Z80 instruction timing, assuming the P2 is also setup to operate at some nice/close multiple of the original Z80 clock and the emulator can add extra cycles to try to accurately match it as best as possible.
It may be useful to have a conditional build, that strips out entirely the delays-to-match-timing code, to allows 'fastest Z80' alternative ?
Thanks @TonyB_ , this is sounding better and better than I imagined. Can't wait to see your public release soon.
I've worked on Z80 systems in the past including my old Microbee computer I recently fitted out with my custom P1 board that could emulate the video graphics system. It now seems like soon a P2 could probably pretty well emulate the entire thing, assuming I sort out some PIO emulation for its bit bang sound / serial / cassette and joystick ports etc. I've already replaced the original floppy stuff in the BIOS with some custom SD sector read/writes and CP/M works well (now with very fast seek times).
I wonder which is the slowest Z80 instruction to execute on the P2 in its T state budget? Possibly DAA? I guess that would then determine the upper bound of the emulated Z80 clock frequency if you need cycle accurate execution. Can (say) a 4MHz Z80 run on (say) a 160MHz P2 for example for all possible Z80 instructions? That should allow a fast 4 T state Z80 opcode almost up to 80(!) P2 instructions, minus 3 for the XBYTE overhead I'd guess. Your DAA implementation seems to be about 23 P2 instructions excluding other opcode overhead so that seems to fit my example timing budget easily. Maybe a 8~10MHz Z80 or so is possible on a 160MHz P2?
UPDATE: by the way TonyB_ I did see some SKIPF flags in your XBYTE LUT table that exceed their normal 22 bit range within D[31:10] if I've counted correctly (e.g. look at CP_). Those seems to be a bug.
I can see there are a few places where you can save some longs if it gets to that, although I only had a quick look.
FWIW I will post my current P2 Spin Interpreter on that thread. It's not fully debugged although that basics are working. I am using the ROM to output serial debug info to PST.
I can see you have come up with the same problem as me regarding how to specify the skip instruction documentation
And what about the ones that need to do read-modify-write at some bit/byte contents, residing at the 64kB main memory map?
Yes, I found out that trying to shoehorn the old interpreter didn't really fit with the skipf concept even tho' I had decoding vectors already in place. The speed improvement should be dramatic
I saw where you can common up more code and have the common ending apply to more of your code. Cannot recall where I saw it a few times.
Here is my mathops. Note I had to break it into 3 sections because of the 22 skip length limitation.