Z80 CPU XBYTE interpreter

This topic is about how to use XBYTE to emulate a Z80 CPU.

Introductory spoiler: it can fit comfortably in a single cog without using any hub RAM (apart from the Z80 program of course) and all instructions can have accurate timing.
Formerly known as TonyB
«1

Comments

  • Isn't that what we need for Mame?

    I've got an arcade cab just waiting...
    Prop Info and Apps: http://www.rayslogic.com/
  • TonyB_ wrote: »
    This topic is about how to use XBYTE to emulate a Z80 CPU.

    Introductory spoiler: it can fit comfortably in a single cog without using any hub RAM (apart from the Z80 program of course) and all instructions can have accurate timing.
    Are you implying that you've already done it?
  • Let's hope so!

    I want to work on 6502. An example would be fantastic!

    (Hey all. I have been using tools, writing some code, but not too active on the forums. That will change soon.)

    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • TonyB_TonyB_ Posts: 1,208
    edited 2019-03-30 - 02:49:44
    EDIT:
    Interpreter now written. As I cannot edit the first post, the latest version will be posted here instead.

    P2 Z80 CPU interpreter/emulator using XBYTE
    Version 1b attached

    Formerly known as TonyB
  • TonyB_TonyB_ Posts: 1,208
    edited 2019-03-26 - 18:08:33
    I don't like ' as the comment character as I'm used to ;
    Could the tools support semi-colon too?
    Formerly known as TonyB
  • We have Z80 & CPM2.2 running on a P1 so its definately doable on P2 :smiley:
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Here are a few instruction speeds, running flat out with no timing checks:
    Z80 effective clock
    is P2 sysclk divided by
    
    NOP			/3 
    INC/DEC rr		/4
    EX AF,AF'		/5
    LD r,r'			/6
    EX DE,HL		/6
    EXX			/6
    DAA			/15
    ADD/ADC/SUB/SBC A,r	/20
    INC/DEC r		/20
    
    Formerly known as TonyB
  • potatohead wrote: »
    I want to work on 6502. An example would be fantastic!

    Another example is the Zog ZPU emulator. The zog_p2.spin file uses XBYTE to run ZPU instructions on the P2.

    Having a Z80 emulator would be very nice though, it's a much more widely used chip than the ZPU :).
  • @TonyB_,

    this is fantastic. Since years I want to run CPM on a propeller, but sadly the CP/M thread for the P1 is huge and lost me at the external memory implementation.

    @Cluso99 has it running on some of his P1 modules, but last time I asked him he had none for sale because of redoing his modules.

    But I agree with you that with a decent emulation the P2 could rock and roll with CP/M, and with 512 Mb bank switching for MP/M is possible.

    We have enough pins to connect a lot of serial terminals (could be P1Briel terminals...) and there is a quite active community around CP/M emulators and one can get tons of software for CP/M Z-80.

    I think the first C compiler running on the propeller was running under CP/M and I found even a COBOL-85 compiler for running on a Z-80 based CP/M.

    So please feel encouraged to do this.

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • TonyB_TonyB_ Posts: 1,208
    edited 2019-03-27 - 19:23:45
    Thank you for the support. Not long to wait, I hope.
    Formerly known as TonyB
  • Cluso99Cluso99 Posts: 15,084
    edited 2019-03-26 - 22:28:39
    Both heater (ZiCog) and Pullmoll (?) had working Z80 emulations on P1. I helped heater get CPM running and built a board (RamBlade) that had 512KB of SRAM. DrAcula also built boards. heater did not emulate all the Z80 extended opcodes but Pullmoll did. The emulations were running about equivalent of a 4MHz Z80. Wordstar, MBasic, etc were running.
    I had 8x 8MB HDD emulated on SD as FAT32 files and wrote extensions to my OS to transfer files between FAT32 and CPM. You could call CPM and return to my OS.

    BTW there is a Z80 emulation validation program somewhere on the internet.

    Once I complete my P2 spin interpreter my OS should work on P2. If you have any questions just ask.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • TonyB_ wrote: »
    This topic is about how to use XBYTE to emulate a Z80 CPU.

    Introductory spoiler: it can fit comfortably in a single cog without using any hub RAM (apart from the Z80 program of course) and all instructions can have accurate timing.

    Sounds fabulous @TonyB_ . Now when you say "all instructions can have accurate timing", do you really mean ALL, or "almost all"? Have you been through all of them to check?

    I'm thinking accurate timing would be great for Z80 emulators running games whose timing requirements were locked to Z80 instruction timing, assuming the P2 is also setup to operate at some nice/close multiple of the original Z80 clock and the emulator can add extra cycles to try to accurately match it as best as possible.
  • It would be great if there were sufficient cycles remaining in this Z80 emulation loop to be able to compensate for any variable hub timing introduced by the emulator during its hub memory accesses. Then I guess it could be potentially truly cycle accurate down to some minimum P2 clock frequency limit assuming this emulator is fast enough.
  • TonyB_TonyB_ Posts: 1,208
    edited 2019-03-28 - 01:59:34
    rogloh wrote: »
    TonyB_ wrote: »
    This topic is about how to use XBYTE to emulate a Z80 CPU.

    Introductory spoiler: it can fit comfortably in a single cog without using any hub RAM (apart from the Z80 program of course) and all instructions can have accurate timing.

    Sounds fabulous @TonyB_ . Now when you say "all instructions can have accurate timing", do you really mean ALL, or "almost all"? Have you been through all of them to check?

    I'm thinking accurate timing would be great for Z80 emulators running games whose timing requirements were locked to Z80 instruction timing, assuming the P2 is also setup to operate at some nice/close multiple of the original Z80 clock and the emulator can add extra cycles to try to accurately match it as best as possible.

    Yes, I have looked at every instruction. There is a register T that holds the number of T-states per instruction. It is loaded with 4 (or more if M1 wait states are needed e.g. MSX) before each opcode fetch and values are added to it if the instruction is longer than 4T, e.g. +3 for memory cycles and +4 for I/O cycles (again these could have wait states.) Some instructions require special increments, such as +5 for relative jump or IX/IY+d calculations and these are catered for.

    At the end of each instruction, T is multiplied by the number of P2 cycles in 1T, which is stored in the T1_cycles register. If the latter is non-zero there is a system count wait but if zero the wait is skipped. The current limitation is that there must be an exact integer number of P2 cycles in 1T. I think it could be a good idea to use an output port to load T1_cycles, e.g. OUT FF, to set the CPU speed or disable exact timing.
    rogloh wrote: »
    It would be great if there were sufficient cycles remaining in this Z80 emulation loop to be able to compensate for any variable hub timing introduced by the emulator during its hub memory accesses. Then I guess it could be potentially truly cycle accurate down to some minimum P2 clock frequency limit assuming this emulator is fast enough.

    There should be plenty of time to cope with hub timing and XBYTE FIFO reloading when emulating old Z80 machines. I have just finished the skip patterns and therefore the public release is imminent.
    Formerly known as TonyB
  • jmgjmg Posts: 13,614
    TonyB_ wrote: »
    Yes, I have looked at every instruction. There is a register T that holds the number of T-states per instruction. It is loaded with 4 (or more if M1 wait states are needed e.g. MSX) before each opcode fetch and values are added to it if the instruction is longer than 4T, e.g. +3 for memory cycles and +4 for I/O cycles (again these could have wait states.) Some instructions require special increments, such as +5 for relative jump or IX/IY+d calculations and these are catered for.

    At the end of each instruction, T is multiplied by the number of P2 cycles in 1T, which is stored in the T1_cycles register. If the latter is non-zero there is a system count wait but if zero the wait is skipped. The current limitation is that there must be an exact integer number of P2 cycles in 1T. I think it could be a good idea to use an output port to load T1_cycles, e.g. OUT FF, to set the CPU speed or disable exact timing.
    I think the smallest variable delay starts at 2.. ?
    It may be useful to have a conditional build, that strips out entirely the delays-to-match-timing code, to allows 'fastest Z80' alternative ?

  • I think both cycle-accurate and maximum-speed versions would be useful. On the cycle-accurate front I wonder if it would be possible to replicate a ZX80 video system. That would be hardcore proof.
  • Yeah, it would. ZX 80 / 81 are lean and mean.
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • TonyB_ wrote: »
    Yes, I have looked at every instruction.

    ...

    At the end of each instruction, T is multiplied by the number of P2 cycles in 1T, which is stored in the T1_cycles register. If the latter is non-zero there is a system count wait but if zero the wait is skipped. The current limitation is that there must be an exact integer number of P2 cycles in 1T. I think it could be a good idea to use an output port to load T1_cycles, e.g. OUT FF, to set the CPU speed or disable exact timing.

    There should be plenty of time to cope with hub timing and XBYTE FIFO reloading when emulating old Z80 machines. I have just finished the skip patterns and therefore the public release is imminent.

    Thanks @TonyB_ , this is sounding better and better than I imagined. Can't wait to see your public release soon.

    I've worked on Z80 systems in the past including my old Microbee computer I recently fitted out with my custom P1 board that could emulate the video graphics system. It now seems like soon a P2 could probably pretty well emulate the entire thing, assuming I sort out some PIO emulation for its bit bang sound / serial / cassette and joystick ports etc. I've already replaced the original floppy stuff in the BIOS with some custom SD sector read/writes and CP/M works well (now with very fast seek times).
  • roglohrogloh Posts: 1,106
    edited 2019-03-28 - 04:51:38
    By the way TonyB_, thinking further out it would be great if there was some simple/standardized way to hook into Z80 IO port accesses so additional COGs could access data reads/writes to them making IO device emulation in additional COGs a bit easier. On the P1 I sort of did this by mapping a portion of the Z80 IO space into P1 hub RAM that different COGs could share but it was somewhat limited and needed a polling mechanism in Z80 software to work using ready/busy flags. It worked ok for a my own disk controller but may not work out for specific devices that need to act as soon as the read or write happens. Maybe the P2 has better support for something like this by syncing/waking other COGs. You may already be thinking of this too...
  • TonyB_TonyB_ Posts: 1,208
    edited 2019-03-30 - 02:34:12
    Thanks for the comments, to which I'll try to respond a little later.

    I've finished and attached version 1 of the P2 Z80 CPU interpreter/emulator using XBYTE. It is completely untested, apart from inside my head. There are about 130 longs spare. The address of the LUT code in hub RAM needs to be specified and there are probably syntax errors, so please report any mistakes. I hope the algorithms work once others have the code running.

    EDIT:
    Latest version at
    http://forums.parallax.com/discussion/comment/1467680/#Comment_1467680
    Formerly known as TonyB
  • roglohrogloh Posts: 1,106
    edited 2019-03-29 - 21:56:52
    Looking through it now. Very nice, you've done a good job! Wow, that is quite a lot of work to put together and it looks like it would have taken a while to sort out all these instruction details and suitable skip flags etc.

    I wonder which is the slowest Z80 instruction to execute on the P2 in its T state budget? Possibly DAA? I guess that would then determine the upper bound of the emulated Z80 clock frequency if you need cycle accurate execution. Can (say) a 4MHz Z80 run on (say) a 160MHz P2 for example for all possible Z80 instructions? That should allow a fast 4 T state Z80 opcode almost up to 80(!) P2 instructions, minus 3 for the XBYTE overhead I'd guess. Your DAA implementation seems to be about 23 P2 instructions excluding other opcode overhead so that seems to fit my example timing budget easily. Maybe a 8~10MHz Z80 or so is possible on a 160MHz P2?

    UPDATE: by the way TonyB_ I did see some SKIPF flags in your XBYTE LUT table that exceed their normal 22 bit range within D[31:10] if I've counted correctly (e.g. look at CP_). Those seems to be a bug.
  • Nice work Tony :smiley:

    I can see there are a few places where you can save some longs if it gets to that, although I only had a quick look.

    FWIW I will post my current P2 Spin Interpreter on that thread. It's not fully debugged although that basics are working. I am using the ROM to output serial debug info to PST.

    I can see you have come up with the same problem as me regarding how to specify the skip instruction documentation ;)
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Looks good, TonyB. You have a P2 Eval board, right?
  • rogloh wrote: »
    UPDATE: by the way TonyB_ I did see some SKIPF flags in your XBYTE LUT table that exceed their normal 22 bit range within D[31:10] if I've counted correctly (e.g. look at CP_). Those seems to be a bug.

    Yes, the main _ED_execf table has some errors, thanks for spotting that. There are 23 instructions that are skipped either from first to last but one or second to last. I'll correct it right away.

    P.S. I think slowest instruction is 8-bit arithmetic e.g. ADD A,r or INC/DEC r, as mentioned above:
    http://forums.parallax.com/discussion/comment/1467699/#Comment_1467699
    Formerly known as TonyB
  • TonyB_TonyB_ Posts: 1,208
    edited 2019-03-30 - 01:52:57
    deleted
    Formerly known as TonyB
  • Hi TonyB

    And what about the ones that need to do read-modify-write at some bit/byte contents, residing at the 64kB main memory map?
  • Yanomani wrote: »
    Hi TonyB

    And what about the ones that need to do read-modify-write at some bit/byte contents, residing at the 64kB main memory map?

    Instructions with source and destination in memory should work, e.g. INC (HL) or SET 5,(IX-7). These have an extra 1T because the modify cannot happen during the execute-fetch overlap with the next instruction.
    Formerly known as TonyB
  • TonyB_TonyB_ Posts: 1,208
    edited 2019-03-30 - 02:37:06
    Bug fixed in BIT instructions. BIT_exec needs skipf #0 to clear skip bits. It does not return to skip sequence as BIT does not write to a destination.

    New version at
    http://forums.parallax.com/discussion/comment/1467680/#Comment_1467680
    Formerly known as TonyB
  • Cluso99 wrote: »
    Nice work Tony :smiley:

    I can see there are a few places where you can save some longs if it gets to that, although I only had a quick look.

    FWIW I will post my current P2 Spin Interpreter on that thread. It's not fully debugged although that basics are working. I am using the ROM to output serial debug info to PST.

    I can see you have come up with the same problem as me regarding how to specify the skip instruction documentation ;)

    Cluso,

    Please tell me wherever I can save code, sometime. I read the P1 Z80 thread from 2009 (long before I joined this forum) with great interest. My starting point was qz80.spin but the end result is pretty much a total rewrite.
    Formerly known as TonyB
  • TonyB_ wrote: »
    Cluso99 wrote: »
    Nice work Tony :smiley:

    I can see there are a few places where you can save some longs if it gets to that, although I only had a quick look.

    FWIW I will post my current P2 Spin Interpreter on that thread. It's not fully debugged although that basics are working. I am using the ROM to output serial debug info to PST.

    I can see you have come up with the same problem as me regarding how to specify the skip instruction documentation ;)

    Cluso,

    Please tell me wherever I can save code, sometime. I read the P1 Z80 thread from 2009 (long before I joined this forum) with great interest. My starting point was qz80.spin but the end result is pretty much a total rewrite.
    IIRC qz80 was PullMoll's version and it was quite complete.

    Yes, I found out that trying to shoehorn the old interpreter didn't really fit with the skipf concept even tho' I had decoding vectors already in place. The speed improvement should be dramatic :smiley:

    I saw where you can common up more code and have the common ending apply to more of your code. Cannot recall where I saw it a few times.
    Here is my mathops. Note I had to break it into 3 sections because of the 22 skip length limitation.
                                                            '$F0=AND bool    %00_0000_0000_0000_1000_0000
                                                            '$F2=OR  bool    %00_0000_0000_0000_0100_0000
    math_F02                rdlong  y,--ptrb                'popy       bin  %--_----_----_----_----_---0
                            rdlong  x,--ptrb                'popx    un+bin  %--_----_----_----_----_--0-
                            CMP     x,#0       wz           '                %--_----_----_----_----_-0--
                            MUXNZ   x,masklong              '                %--_----_----_----_----_0---
                            CMP     y,#0       wz           '                %--_----_----_----_---0_----
                            MUXNZ   y,masklong              '                %--_----_----_----_--0-_----
                            AND     x,y                     '                %--_----_----_----_-0--_----
                            OR      x,y                     '                %--_----_----_----_0---_----
                            jmp     #pushr                  'push result
    '------------------------------------------------------------------------------
    math_E0                 rdlong  y,--ptrb                'popy       bin  %--_----_----_----_----_---0
                            rdlong  x,--ptrb                'popx    un+bin  %--_----_----_----_----_--0-
                            ROR     x,y                     '$E0=ROR         %00_1111_1111_1111_1111_1000
                            ROL     x,y                     '$E1=ROL         %00_1111_1111_1111_1111_0100
                            SHR     x,y                     '$E2=SHR         %00_1111_1111_1111_1110_1100
                            SHL     x,y                     '$E3=SHL         %00_1111_1111_1111_1101_1100
                            FGES    x,y                     '$E4=FGES        %00_1111_1111_1111_1011_1100
                            FLES    x,y                     '$E5=FLES        %00_1111_1111_1111_0111_1100
                            NEG     x                       '$E6=NEG     un  %00_1111_1111_1110_1111_1101
                            NOT     x                       '$E7=NOT     un  %00_1111_1111_1101_1111_1101
                            AND     x,y                     '$E8=AND         %00_1111_1111_1011_1111_1100
                            ABS     x                       '$E9=ABS     un  %00_1111_1111_0111_1111_1101
                            OR      x,y                     '$EA=OR          %00_1111_1110_1111_1111_1100
                            XOR     x,y                     '$EB=XOR         %00_1111_1101_1111_1111_1100
                            ADD     x,y                     '$EC=ADD         %00_1111_1011_1111_1111_1100
                            SUB     x,y                     '$ED=SUB         %00_1111_0111_1111_1111_1100
                            SAR     x,y                     '$EE=SAR         %00_1110_1111_1111_1111_1100
                            REV     x                       '$EF=REV    (un) %00_1101_1111_1111_1111_1100
                            ENCOD   x                       '$F1=ENCOD   un  %00_1011_1111_1111_1111_1101
                            DECOD   x                       '$F3=DECOD   un  %00_0111_1111_1111_1111_1101
                            jmp     #pushr                  'push result
    '------------------------------------------------------------------------------
    math_F4                 rdlong  y,--ptrb                'popy       bin  %--_----_----_----_----_---0
                            rdlong  x,--ptrb                'popx    un+bin  %--_----_----_----_----_--0-
                            QMUL    x,y                     '$F4=MPY         %00_0111_1111_1111_1111_0000
                            GETQX   x                       '                                         **  
                            QMUL    x,y                     '$F5=MPY_MSW     %00_0111_1111_1111_1100_1100
                            GETQY   x                       '                                      **     
                            QDIV    x,y                     '$F6=DIV         %00_0111_1111_1111_0011_1100
                            GETQX   x                       '                                    **       
                            QDIV    x,y                     '$F7=MOD         %00_0111_1111_1100_1111_1100
                            GETQY   x                       '                                 **          
                            QSQRT   x,#0                    '$F8=SQRT    un  %00_0111_1111_0011_1111_1101     
                            GETQX   x                       '                               **            
    '$F9..$FE=test signed LT/GT/NE/EQ/LE/GE                 '$F9..$FE        %00_0010_0000_1111_1111_1100   'LT/GT/NE/EQ/LE/GE
                            CMPS    x,y     wcz             '                             *                 'tests               
            if_z            MOV     x,#%100                 '                            *                  'equal?              
            if_nz           MOV     x,#%010                 '                           *                   'above?              
            if_c            MOV     x,#%001                 '                          *                    'below?              
                            ANDN    x,a     wz              '                        *                      'set t/f             
                            CMP     y,#0    wz              '$FF=NOT bool un %00_0001_1111_1111_1111_1100   '!t/f      
    {pushtf}                MUXZ    x,masklong              '                      *                        'Z=true(-1)/false(0)
                            jmp     #pushr                  'push result
    
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
Sign In or Register to comment.