Z80 CPU XBYTE interpreter

2»

Comments

  • And here is another piece where single instructions are executed plus other common bits
                                                            '                  write     p000000-           %00_0000_0111_1111_1111_1110
                                                            '  +1:2 adr s=pops repeat    p0000s1-           (see below)
                                                            '      rnd fwd     ?var      p00010--           %00_0000_0101_1111_1111_1001
                                                            '      rnd rev     var?      p00011--           %00_0000_0101_1111_1111_0001
                                                            '      signx byte  ~var      p00100--           %00_0000_0101_1111_1110_1101
                                                            '      signx word  ~~var     p00101--           %00_0000_0101_1111_1101_1101
                                                            '      post-clear  var~      p00110--           %00_0000_0111_1111_1011_1101
                                                            '      post-set    var~~     p00111--           %00_0000_0111_1111_0111_1101
                                                            '      00=bit/adr  ++var     p0100ss-           %00_0000_010*_**10_1111_1101
                                                            '      01=byte     var++     p0101ss-           %00_0000_010*_**10_1111_1101
                                                            '      10=word     --var     p0110ss-           %00_0000_010*_**01_1111_1101
                                                            '      11=long     var--     p0111ss-           %00_0000_010*_**01_1111_1101
    assignx                                                                                                                                    
    .writep                 rdlong  x,--ptrb                'popx: write w/push                             %--_----_----_----_----_---0
    rdmem                   rdlong  x,adr                   'MEM read var            byte/word/long???      %--_----_----_----_----_--0-  ?????
    .rnd                    getrnd  x                       'get random value                               %--_----_----_----_----_-0--
                            rev     x                       'rev                                            %--_----_----_----_----_0---
    .sxcs                   signx   x,#7                    '~var  sign xtnd byte                           %--_----_----_----_---0_----
                            signx   x,#15                   '~~var sign xtnd word                           %--_----_----_----_--0-_----
                            mov     x,#0                    'var~                                           %--_----_----_----_-0--_---- 
                            neg     x,#1                    'var~~                                          %--_----_----_----_0---_---- 
    .incdec                 add     x,#1                    '++var/var++                                    %--_----_----_---0_----_----
                            sub     x,#1                    '--var/var--                                    %--_----_----_--0-_----_----
                            zerox   x,adr                   'mask result by size: adr   -1 ?????            %--_----_----_-0--_----_----  ?????
                            and     x,#$FF                  '                     byte                      %--_----_----_0---_----_----
                            and     x,maskword              '                     word                      %--_----_---0_----_----_----
    .stack                  wrlong  x,ptrb                  'update var on stack                            %--_----_--0-_----_----_----
                            add     ptrb,#4                 'keep var on stack                              %--_----_-0--_----_----_----  <<<<<
    .keep                   test    op2,#%10000000  wc      '                                               %--_----_0---_----_----_----
            if_c            add     ptrb,#4                 'keep var on stack                              %--_---0_----_----_----_----
    .restore                nop                             'restore pushret to #loop                       %--_--0-_----_----_----_----
    wrmem                   wrlong  x,adr                   'MEM read var            byte/word/long???      %--_-0--_----_----_----_----  ?????
                            jmp     #loop                   '                                               $0
    
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • I've found this method to be better for construction the skip bitmap
    '  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    '  % 9  $24   SPR[nibble]                       x                        push %                 j9_0 r  %--_----_----_1111_0100_0111
    '  %    $25   SPR[nibble]                       x                        pop  %                 j9_1 w  %--_----_----_1111_0100_0111
    '  %    $26   SPR[nibble]                       x                        asgn %       ?????     j9_2 a  %--_----_----_1111_0100_0111
    '  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%                                             
    '  % F  $3D   register[bit] op                  x                             %                 jF_1 b  %--_----_----_0000_1001_1001
    '  %    $3E   register[bit..bit] op             y,x                           %                 jF_2 r  %--_----_----_0000_1001_1100
    '  %    $3F   register op                       ---         push/pop/asgn/adr %       ?????     jF_3 o  %--_----_----_1000_1100_0111
    '  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    j9_012                                                  'x=reg $1Fx                       j9...  jF...
    jF_123                                                  'yx=[b..b] pcurr=op+reg           0 1 2  1 2 3
    
    {jF_2}                  rdlong  y,--ptrb                'popy:  bit                       - - -  - r -  %--_----_----_----_----_---0
    {jF_2 jF_1}             rdlong  x,--ptrb                'popx:  bit                       - - -  b r -  %--_----_----_----_----_--0-
         {jF_1}             mov     y,x                     'if single bit range              - - -  b - -  %--_----_----_----_----_-0--
    {j9_012 jF_3}           mov     lsb,#0                  '\ bits[31:0]                     r w a  - - o  %--_----_----_----_----_0---
                            neg     a,#1                    '/ ..mask=1's                     r w a  - - o  %--_----_----_----_---0_----
                            mov     skip_rev,#3             'preset no reverse                r w a  b r o  %--_----_----_----_--0-_----
    {jF_12}                 call    #.bit_range             'set a=bit-mask, lsb=lowest-bit   - - -  b r -  %--_----_----_----_-0--_----   
    
    {j9_012}                rdlong  r,--ptrb                'popx:  reg                       r w a  - - -  %--_----_----_----_0---_----
    {jF_123}                rdbyte  r,ptra++                'pcurr: reg+op                    - - -  b r o  %--_----_----_---0_----_----
                            mov     op,r                    '\ justify op                     - - -  b r o  %--_----_----_--0-_----_----
                            shr     op,#5                   '/   (sets type to register)      - - -  b r o  %--_----_----_-0--_----_----
    {jF_12}                 or      op,#%1100               'set bit mode                     - - -  b r -  %--_----_----_0---_----_----   ?????
    
              
                    call    #_debugreg1
                            
    {j9_012 jF_123}         call    #.reg_xlate             'translate to P2 regs             r w a  b r o  $0
              
                    call    #_debugreg2
                            
    

    I now use this for each possible variation
    x x x  x x x  %--_----_----_1111_0100_0111
    
    where the "-" is for unused (convert to 0 when inserted into the table) and of course 1 to skip, 0 to execute
    and for each instruction in the group, this
    r w a  b r o  %--_----_----_----_--0-_----
    
    I try to make the single alpha chars make some form of sense.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • roglohrogloh Posts: 1,293
    edited 2019-03-30 - 03:50:30
    TonyB_ wrote: »
    P.S. I think slowest instruction is 8-bit arithmetic e.g. ADD A,r or INC/DEC r, as mentioned above:
    http://forums.parallax.com/discussion/comment/1467699/#Comment_1467699

    Ok, thanks TonyB_. I see now that those Z80 flag bitwise computations required for 8 bit register ALU operations is sort of the limiting case for all those fast 4 T state Z80 register arithmetic instructions as this burns quite a few P2 instructions. So any optimizations found that work there are going to be helpful to speed things up. Most of the rest of the Z80 instruction set seems to give you plenty more T-states to play with.

    Note: I'm not saying it necessarily needs to be sped up by the way. It's going to be plenty fast for emulating a typical 4MHz Z80 IMHO. :smile:
  • TonyB_TonyB_ Posts: 1,294
    edited 2019-03-30 - 04:28:53
    Cluso, I'll study your horizontal scheme for skip patterns. I don't like scrolling sideways and my display resolution is small by modern standards. I borrowed Chip's vertical method, with a letter for each option generally starting at "a" but sometimes more appropriate, although "d" for DI and "e" for EI just happened that way.

    Re saving longs, some code that looks as though it could shrink by using skipping cannot do so because it is in routines that are called from skip sequences, as I mention briefly in the source. I could save one long by combining CB_ with the other prefixes, which would disappear in a version with no timing. DAA could jump elsewhere to write some flags, saving three longs perhaps, but this is scraping the bottom of the barrel. I'd be happy to be proved wrong, though.
    Formerly known as TonyB
  • rogloh wrote: »
    TonyB_ wrote: »
    P.S. I think slowest instruction is 8-bit arithmetic e.g. ADD A,r or INC/DEC r, as mentioned above:
    http://forums.parallax.com/discussion/comment/1467699/#Comment_1467699

    Ok, thanks TonyB_. I see now that those Z80 flag bitwise computations required for 8 bit register ALU operations is sort of the limiting case for all those fast 4 T state Z80 register arithmetic instructions as this burns quite a few P2 instructions. So any optimizations found that work there are going to be helpful to speed things up. Most of the rest of the Z80 instruction set seems to give you plenty more T-states to play with.

    Note: I'm not saying it necessarily needs to be sped up by the way. It's going to be plenty fast for emulating a typical 4MHz Z80 IMHO. :smile:

    Going to need a lot of waits or slow the clock freq down to match a 4MHz Z80.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • TonyB_TonyB_ Posts: 1,294
    edited 2019-03-30 - 04:31:30
    Cluso99 wrote: »
    rogloh wrote: »
    TonyB_ wrote: »
    P.S. I think slowest instruction is 8-bit arithmetic e.g. ADD A,r or INC/DEC r, as mentioned above:
    http://forums.parallax.com/discussion/comment/1467699/#Comment_1467699

    Ok, thanks TonyB_. I see now that those Z80 flag bitwise computations required for 8 bit register ALU operations is sort of the limiting case for all those fast 4 T state Z80 register arithmetic instructions as this burns quite a few P2 instructions. So any optimizations found that work there are going to be helpful to speed things up. Most of the rest of the Z80 instruction set seems to give you plenty more T-states to play with.

    Note: I'm not saying it necessarily needs to be sped up by the way. It's going to be plenty fast for emulating a typical 4MHz Z80 IMHO. :smile:

    Going to need a lot of waits or slow the clock freq down to match a 4MHz Z80.

    Yes, 4 MHz should be easy to do. As things stand now, I'd say the minimum effective Z80 clock is 8 MHz @ 160 MHz sysclk and most instructions would run quicker. The half-carry and overflow flags take quite a long time to do for arithmetic instructions and how often would anyone test for the latter after INC or DEC?
    Formerly known as TonyB
  • roglohrogloh Posts: 1,293
    edited 2019-03-30 - 04:53:50
    Thinking more about your emulator @TonyB_ ...

    I see you have mostly left the IO instructions alone for now with some placeholders in the code. In the future it could be nice to be able to access some Z80 IO ports, both emulated internally and potentially using actual real external HW devices on a bus. If there is enough space left in COG RAM it could be good to have some mechanism to map IO ports to either another COG (or COGs) or external bus, perhaps using some HUB memory to define port ranges and which COG to trigger etc. I'm sort of thinking that the external COG that emulates some given port range could be notified to trigger its execution from the Z80 IO reads/write operations and then return a data result for reads, along with some number of I/O and/or wait cycles taken by the operation so the emulator could wait for the result and accurately model its execution time and IO delay accordingly. Then Z80 peripherals such as the PIO might be able to be emulated internally within the P2, or even real HW accessed over some P2 IO pins that emulate the Z80 IO bus access. This could be rather good. Also if the internal emulation of some peripheral ends up taking longer than real HW typically would, we could just pretend this slow down was a result of using wait states to keep the emulator timing happy and Z80 still functioning.

    Some internal INT signal might even be possible too...?
  • TonyB_TonyB_ Posts: 1,294
    edited 2019-03-31 - 00:51:43
    rogloh,

    I haven't thought too much about I/O ports yet as I wanted to get a 'basic' emulator done (although 'basic' doesn't mean simple or trivial). Another cog could assert /RESET, /NMI or /INT by writing to 'internal' pins, probably.

    It is possible to reduce the time taken for 4T 8-bit arithmetic instructions by one-third and they could be done in 27 P2 instructions maximum, including the XBYTE overhead. The extra for (HL) is 13 instructions worst-case (or 12 if the Z80 memory starts at hub address 0), which is less than 2T whereas the real Z80 requires 3T. The flat-out speed table now looks this:
    Z80 effective clock
    is P2 sysclk divided by
    
    NOP			/3 
    INC/DEC rr		/4
    EX AF,AF'		/5
    LD r,r'			/6
    EX DE,HL		/6
    EXX			/6
    DAA			/13.5
    INC/DEC r		/13.5
    ADD/ADC/SUB/SBC A,r	/13.5
    AND/XOR/OR/CP r		/13.5
    

    Removing the timing code and speeding up the arithmetic increases the code by about 25 longs, but there are still more than 100 longs spare. The other way to go faster is to use a higher sysclock. 252 MHz (12 MHz x 21) is very close to the spec for 640x480 HDMI video and can be divided exactly by 3.5 and 4 (x 72 and x 63 respectively).
    Formerly known as TonyB
  • roglohrogloh Posts: 1,293
    edited 2019-03-31 - 22:22:13
    I really like what you've done so far Tony.

    While some people would be interested in optimising this code to be as fast as possible even at the expense of true cycle accuracy, that may burn COG RAM and I do think it is still useful to keep sufficient spare memory around for further features you or others might like to add down the line such as adding IO ports/interrupt support/external memory buses/ DMA etc. Maybe two variants of the code will eventuate, one for highest performance, the other for highest compatibility with the Z80?

    Cheers,
    Roger.

    Update: also don't forget about the 24 longs used for registers and masks etc you have defined before the start of COG executable code. I would think they need to be factored into the COG RAM usage as well if you hadn't already already accounted for those Tony in your number above. Right now they look like they are part of hub RAM but not COG RAM.
  • rogloh, yes, the registers and masks should be in cog RAM and they have been included in the long count. I'm working on a separate new fast version, to go along with the existing timed version. The latter has mistakes in the block instructions (too many skip bits for EXECF therefore rewrite using SKIPF) and I missed out a +1T for CALL and CALL cc (simply change one skip bit from 1 to 0).
    Formerly known as TonyB
  • TonyB_TonyB_ Posts: 1,294
    edited 2019-04-10 - 12:23:12
    Below are some of the instruction timings that are possible for a fast Z80 interpreter.
    Instruction  T-states	 P2 cycles	P2 cycles/
    			min ave	max	T-states
    -------------------------------------------------
    LD r,r'		 4	14	18	 3.5
    
    LD r,n		 7	18	20	 2.6
    
    LD r,(HL)	 7	33  37	40	 5.2
    
    LD (HL),r	 7	27  31	34	 4.4
    
    LD (HL),n	10	35  39	42	 3.9
    
    LD A,(BC)	 7	25  29  32	 4.1
    
    LD A,(DE)	 7	25  29  32	 4.1
    
    LD (BC),A	 7	19  23	26	 3.4
    
    LD (DE),A	 7	19  23	26	 3.4
    ------------------------------------------------
    LD rr,nn	10	14	16	 1.4
    
    LD HL,(nn)	16	27  31	35	 1.9
    
    LD rr,(nn)	20	43  47  51	 2.4	 rr = BC/DE/SP
    
    LD (nn),HL	16	21  25	29	 1.6
    
    LD (nn),rr	20	37  41  45	 2.1	 rr = BC/DE/SP
    
    LD SP,HL	 6	16	16	 2.7
    
    PUSH rr		11	21  25	31	 2.3
    
    POP rr		10	27  31  37	 3.1
    ------------------------------------------------
    EX AF,AF'	 4	12	12	 3.0
    
    EX DE,HL	 4	16	16	 4.0
    
    EXX		 4	16	16	 4.0
    
    LDI		16	82  89	96	 5.6
    
    LDIR		16	82  89	96	 5.6	BC = 0 after dec
    		21	14  21  28	 1.0	BC > 0 after dec
    ------------------------------------------------
    ADD A,r		 4	46	48	11.5
    
    ADC A,r		 4	48	50	12.0
    
    SUB A,r		 4	46	48	11.5
    
    SBC A,r		 4	48	48	11.5
    
    AND r		 4	28	30	 7.0
    
    XOR r		 4	26	28	 6.5
    
    OR r		 4	26	28	 6.5
    
    CP r		 4	44	46	11.0
    
    ADD A,n		 7	48	48	 6.9
    
    ADC A,n 	 7	50	50	 7.1
    
    SUB A,n		 7	48	48	 6.9
    
    SBC A,n 	 7	50	50	 7.1
    
    AND n		 7	30	30	 4.3
    
    XOR n		 7	28	28	 4.0
    
    OR n		 7	28	28	 4.0
    
    CP n		 7	46	46	 6.6
    
    INC r		 4	48	52	12.0
    
    DEC r		 4	48	52	12.0
    ------------------------------------------------
    DAA		 4	52	52	13.0
    
    CPL		 4	18	18	 4.5
    
    CCF		 4	18	18	 4.5
    
    SCF		 4	14	14	 3.5
    
    NOP		 4	10	10	 2.5
    
    DI		 4	16	16	 4.0
    
    EI		 4	16	16	 4.0
    ------------------------------------------------
    ADD HL,rr	11	38	40	 3.5	
    
    ADC HL,rr	15	62	62	 4.1
    
    SBC HL,rr	15	62	62	 4.1
    
    INC rr		 6	12	14	 2.0
    
    DEC rr		 6	12	14	 2.0
    ------------------------------------------------
    RLCA		 4	22	22	 5.5		
    
    RRCA		 4	22	22	 5.5
    
    RLA		 4	24	24	 6.0			
    
    RRA		 4	24	24	 6.0
    
    RLC r		 8	66	66	 8.3
    
    RRC r		 8	66	66	 8.3
    
    RL r		 8	68	68	 8.5
    
    RR r		 8	68	68	 8.5
    
    SLA r		 8	68	68	 8.5
    
    SRA r		 8	68	68	 8.5
    
    SLL r		 8	68	68	 8.5
    
    SRL r		 8	64	64	 8.0
    ------------------------------------------------
    BIT b,r		 8	72	72	 9.0
    
    RES b,r		 8	58	58	 7.3	+2 for RES b,B
    
    SET b,r		 8	58	58	 7.3	+2 for SET b,B
    ------------------------------------------------
    JP nn		10	28  32  35	 3.2
    
    JP cc,nn	10	32  37	41	 3.7
    
    JR n		12	34  38	41	 3.1
    
    JR cc,n		 7	20  21  22	 3.0	not cc
    		12	40  45	49	 3.8	cc
    
    JP (HL)		 4	30  34	37	 8.4
    
    DJNZ e		 8	22	22	 2.8	B = 0 after dec
    		13	42  46	49	 3.5	B > 0 after dec
    ------------------------------------------------
    CALL nn		17	41  49  56	 2.9
    
    CALL cc,nn	10	20  21  22	 2.1	not cc
    		17	43  52	60	 3.0	cc
    
    RET		10	41  49  56	 4.9
    
    RET cc		 5	16  17  18	 3.4	not cc
    		11	45  54	62	 4.9	cc
    ------------------------------------------------
    CB prefix	 4	18	18	 4.5	CB + 00-3F
    		 4	20	20	 5.0	CB + 40-FF
    
    DD prefix	 4	20	20	 5.0
    
    ED prefix	 4	14	14	 3.5	ED + 00-3F
    		 4	18	18	 4.5	ED + 40-7F
    		 4	24	24	 6.0	LDI/LDD/LDIR/LDDR
    		 4	34	34	 8.5	CPxx/INxx/OTxx
    		 4	20	20	 5.0	ED + 80-FF not block instruction
    
    FD prefix	 4	20	20	 5.0
    

    If register src/dest = H/L/HL, add two cycles (included in max values). If Z80 address zero = hub RAM zero, deduct two cycles for each memory access, or eight for LDI/LDD/LDIR/LDDR. Each byte transfer in LDIR/LDDR (apart from the last one) can take on average 21 P2 cycles, the same number as Z80 T-states. If P2 cycles/T-states* = 10, then the P2 @ 250 MHz can emulate the Z80 @ 25 MHz and as can be seen overall performance should be better.

    * Based on average values when memory is accessed, otherwise minimum values
    Formerly known as TonyB
  • Great work TonyB_, looks like you've pushed it even faster now. So DAA looks to be about the slowest emulated instruction in this fast Z80 emulator and divide by 13 is still plenty fast, although personally I'm still very interested in that original cycle accurate variant. Especially if other COGs can somehow synchronously emulate some IO Z80 peripherals such as the PIO in the future.
  • rogloh wrote: »
    Great work TonyB_, looks like you've pushed it even faster now. So DAA looks to be about the slowest emulated instruction in this fast Z80 emulator and divide by 13 is still plenty fast, although personally I'm still very interested in that original cycle accurate variant. Especially if other COGs can somehow synchronously emulate some IO Z80 peripherals such as the PIO in the future.

    Thanks for the support, rogloh. I think an overall divide by 8 might be realistic for the fast interpreter, intended for CP/M. The timed and fast versions have diverged somewhat, but their functionality is identical. I need to apply some new ideas from the latter to the former.

    Timed version has ~130 longs free and fast ~80, which might increase to ~150 and almost 100, respectively, if latter uses fixed Z80 64K memory starting at hub address zero.
    Formerly known as TonyB
  • roglohrogloh Posts: 1,293
    edited 2019-04-10 - 00:25:54
    Yes I can see that keeping the Z80 memory space at hub address zero would be the fastest. I'd also expect it will still be handy to have a variant to enable you to address more than 64kB from other hub addresses as some later Z80 systems supported banked memory, once memory chips became cheap enough, though their bank selection approaches would have differed. I think for example in the case of a Z80 Microbee system it allowed the lower 32kB Z80 address space to be mapped from a pool of 128-256kB of DRAM by just writing a control byte to an IO mapped latch. Other Z80 memory bank expansion approaches might have worked slightly differently or over different address regions but achieved similar results. I guess some customization of the memory address reads/writes in your emulator will be required by people for supporting those systems, but it should still be doable at the expense of some small amount of speed. There's still plenty of performance headroom left there anyway for typical Z80 speeds of that era.

    Actually yesterday I was pondering a full Microbee Z80 implementation on the P2 eventually using a customized variant of this emulator that supports IO. I've already done a VGA emulated version of the CRTC using P1 but I think if the P2 running overclocked at 270MHz can generate 576p over HDMI that would work well with line-doubling and support all the 256(16 rows x16), 272(17 rows x 16), 264(24 rows*11), 275(25 rows*11) scan line video modes most people used with the 6545 CRTC on that system. The 270MHz is also handy multiple of the original 13.5MHz crystal it used for its pixel clock and 3.375MHz CPU clock too. Add a USB or PS/2 keyboard interface and my SD BIOS mods and its mostly done. A PIO, either real or ideally emulated, and interrupts would complete it fully and would allow this system to talk to the outside world with its original serial/parallel/tape/squawker ports etc, and even directly through P2 pins for simplicity. Very little external HW would be needed.
  • If I remember correctly MP/M was using bank switching for multiple user CP/M. I think they switched the upper 32K, but it might be the lower 32K as @rogloh writes.

    I am following this thread with great interest, having CP/M and or MP/M on a P2 would be absolutely cool.

    Enjoy!

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • TorTor Posts: 1,981
    msrobots wrote: »
    If I remember correctly MP/M was using bank switching for multiple user CP/M. I think they switched the upper 32K, but it might be the lower 32K as @rogloh writes.
    Without checking, I think it must have been the lower 32K that was switched. I think MP/M works as CP/M-3 (CP/M Plus) in this respect, and that one needs some common OS code to be available at all times and the CP/M (and MP/M) BDOS is in upper RAM.

  • msrobotsmsrobots Posts: 2,913
    edited 2019-08-05 - 02:56:02
    @TonyB_,

    I tried to compile P2_Z80_CPU_v1b.spin2 but FastSpin complains about your usage of
    @ in Label names like in
    T_INC_DEC_@HL	=	1
    
    @ is a operator, how do you compile this?
    
    

    Pnut does not compile this either

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • msrobotsmsrobots Posts: 2,913
    edited 2019-08-05 - 04:10:21
    There are a lot of those @ s I handle them later
    
    
    xbyte_setq	=	%100000000	'LUT index = bytecode[7:0], LUT base = $100 (no/ED prefix)
    xbyte_setq2	=	%011101010	'LUT index = bytecode[7:3], LUT base = $0E0 (CB prefix)
    
    it is % not # for binary numbers
    
    Next one is your Block with Registers needs to go behind jmp	#LUT_load
    
    DAT
    		orgh 0
    		org 0
    Entry
    		jmp	#LUT_load			'*** follow this jump if a newcomer ***
    
    '-----------------------------------------------
    
    'Registers
    …
    '-----------------------------------------------
    
    M1T		long	T_M1				'T-states in M1 cycle no it is T_M1 not #T_M1 no # here in front of T_M1
    
    NOP  is a reserved word and can't be a label - changed to XNOP
    CALL is a reserved word and can't be a label - CHANGED to XCALL
    
    

    now changing the @ 's that will take some time

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • msrobotsmsrobots Posts: 2,913
    edited 2019-08-05 - 06:57:24
    Ok I am down to one line not compiling
    SBC_ADC_HL						'called, no return
    		add	T,#T_SBC_ADC_HL
    		testbn	opcode,#3		wz	'z = 0 for ADC, z = 1 for SBC
    		shr	opcode,#4			'01rrn010 -> 000001rr = rr + 100
    
    ->		alts	opcode,#BC-%100
    
    changing to
    
    		sub	opcode,#%100
    		alts	opcode,#BC
    
    does compile, but is that meant like that?
    
    
    		getword	data,0,#BC_word			'data = BC/DE/HL/SP
    		getword	acc,HL,#HL_word			'acc = HL
    		getbyte	temp,HL,#H_byte			'temp = old H
    		testb	AF,#C_bit		wc	'c = CF = carry in
    		sumz	acc,data			'acc + data if NF = 0, acc - data if NF = 1
    	if_c	sumz	acc,#1				'acc + CF   if NF = 0, acc - CF   if NF = 1
    		setword	acc,HL,#HL_word			'write new HL
    		ror	acc,#8				'acc = {new L, 15'b0, CF, H}
    		ror	data,#8				'data = {C/E/L/SPl, 16'b0, B/D/H/SPh}
    		call	#\add_sub_flags			'do flags, shared code with 8-bit ADC/ADD/SUB
    		jmp	#ED_done_pop
    
    C:/P2/test/P2_Z80_CPU_v1b.spin2(389) error: Source operand too big for alts
    Everything else compiles now with fastspin.

    attached changed file P2_Z80_CPU_v1b.spin2 with error

    I do not really understand what this is supposed to do.

    Enjoy!

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • TonyB_TonyB_ Posts: 1,294
    edited 2019-08-05 - 16:09:40
    msrobots wrote: »
    Ok I am down to one line not compiling
    SBC_ADC_HL						'called, no return
    		add	T,#T_SBC_ADC_HL
    		testbn	opcode,#3		wz	'z = 0 for ADC, z = 1 for SBC
    		shr	opcode,#4			'01rrn010 -> 000001rr = rr + 100
    ->		alts	opcode,#BC-%100
    

    Mike, thanks for looking at and compiling the v1b source code. I have not done any P2 stuff at all since April - until today. The reason I asked whether other people could test my code is because I knew I would be totally preoccupied with demolishing and replacing parts of my house and I have felt too tired in the evenings to do any programming.

    I've attached a modified source, v1b2. It has your corrections, except that the NOP label has been renamed NOP_ and some occurrences of '@' have been replaced by 'at_'. I think it's a pity that '@' cannot be used in labels when it is not the first character. Regarding the final error, BC-%100 = BC - 4 and therefore BC must be address 4 or higher to avoid an incorrect negative value. I have moved some registers so that it is.

    I have done a fast Z80 emulator without any T-state counting that is far better than v1b when precise timing is not required, e.g. for CP/M. A lot of the code has been rewritten quite drastically and some of the concepts used could be applied to an improved timed version. Unfortunately, my main PC does not boot any more and I cannot access any files on it at the moment.

    For the time being, I'd be grateful if you and perhaps others too could try to get v1b2 running. I suggest CP/M as a possible first use, which would need a BIOS written but that could be very skeletal to begin with. When my Z80 emulator is running CP/M, I hereby declare that it shall be known as CP2M.
    Formerly known as TonyB
  • msrobotsmsrobots Posts: 2,913
    edited 2019-08-05 - 21:30:34
    It looks like you do not compile at all

    But are you aware that stuff like this
    CB_76543_execf
    		long	RLC		|	%0_1111110			<< 10	'00-07 RLC
    		long	RRC		|	%0_111110_			<< 10	'08-0F RRC
    		long	RL		|	%0_11110__			<< 10	'10-17 RL
    		long	RR		|	%0_1110___			<< 10	'18-1F RR
    		long	SLA		|	%0_110____			<< 10	'20-27 SLA
    		long	SRA		|	%0_10_____			<< 10	'28-2F SRA
    		long	SLL		|	%0_110____			<< 10	'30-37 SLL or SL1
    		long	SRL		|	%0_0______			<< 10	'38-3F SRL
    
    is in fact THIS?
    CB_76543_execf
    		long	RLC		|	%0_1111110			<< 10	'00-07 RLC
    		long	RRC		|	%00_111110			<< 10	'08-0F RRC
    		long	RL		|	%000_11110		<< 10	'10-17 RL
    		long	RR		|	%0000_1110			<< 10	'18-1F RR
    		long	SLA		|	%00000_110			<< 10	'20-27 SLA
    		long	SRA		|	%000000_10			<< 10	'28-2F SRA
    		long	SLL		|	%00000_110			<< 10	'30-37 SLL or SL1
    		long	SRL		|	%0000000_0			<< 10	'38-3F SRL
    
    underscores are no numbers, just optical placeholders, so %111000 is the same as %__11_1_000____
    So what you want is either
    CB_76543_execf
    		long	RLC		|	%0_1111110			<< 10	'00-07 RLC
    		long	RRC		|	%0_1111100			<< 10	'08-0F RRC
    		long	RL		|	%0_1111000			<< 10	'10-17 RL
    		long	RR		|	%0_1110000			<< 10	'18-1F RR
    		long	SLA		|	%0_1100000			<< 10	'20-27 SLA
    		long	SRA		|	%0_1000000			<< 10	'28-2F SRA
    		long	SLL		|	%0_1100000			<< 10	'30-37 SLL or SL1
    		long	SRL		|	%0_0000000			<< 10	'38-3F SRL
    
    or
    CB_76543_execf
    		long	RLC		|	%0_1111110			<< 10	'00-07 RLC
    		long	RRC		|	%0_1111101			<< 10	'08-0F RRC
    		long	RL		|	%0_1111011			<< 10	'10-17 RL
    		long	RR		|	%0_1110111			<< 10	'18-1F RR
    		long	SLA		|	%0_1101111			<< 10	'20-27 SLA
    		long	SRA		|	%0_1011111			<< 10	'28-2F SRA
    		long	SLL		|	%0_1101111			<< 10	'30-37 SLL or SL1
    		long	SRL		|	%0_0111111			<< 10	'38-3F SRL
    

    depending if you want to execute them.

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • Ahh, as usual I was to stupid.

    What you wrote is OK I just got confused by the format.

    At least I can run 2 instructions without problems , 0 and 8

    Now I need to test some Z80 hex in there, just something simple some nop's and loop, so I can see PC stays where it should

    What would be

    loop
    nop
    nop
    nop
    jr loop

    in hex?

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • TonyB_TonyB_ Posts: 1,294
    edited 2019-08-06 - 09:49:00
    msrobots wrote: »
    Ahh, as usual I was to stupid.

    What you wrote is OK I just got confused by the format.

    At least I can run 2 instructions without problems , 0 and 8

    Now I need to test some Z80 hex in there, just something simple some nop's and loop, so I can see PC stays where it should

    What would be

    loop
    nop
    nop
    nop
    jr loop

    in hex?

    Mike

    Underscores are very helpful in skip patterns. The right alignment they offer means you can see easily which instructions are being skipped in a group of bytecodes that share common code. I copied this idea from Chip.

    Simple Z80 code:
    ;
    ;Mnemonic		 Hex
    ;
    LOOP:	NOP		;00
    	NOP		;00
    	NOP		;00
    	JR LOOP		;18 FB
    ;
    LOOP2:	JR LOOP2	;18 FE
    
    Formerly known as TonyB
  • msrobotsmsrobots Posts: 2,913
    edited 2019-08-07 - 10:07:21
    Thank you, @TonyB_

    Sadly I broke it yesterday while removing debug stuff, and sure I do not make backups at 3 in the night while working fireously. In just 5 minutes I broke it and don't now where and why.

    Will take some more searching

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • msrobots,
    Do you have a recent backup?
    Notepad++ has a decent addin that compates two files side by side.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • msrobots wrote: »
    Thank you, @TonyB_

    Sadly I broke it yesterday while removing debug stuff, and sure I do not make backups at 3 in the night while working fireously. In just 5 minutes I broke it and don't now where and why.

    Will take some more searching

    Mike

    Thanks for the testing, Mike.

    Are you doing stealth debugging or have you added your own debug stuff to my code? If the latter, there is an alternative: I could change the HALT instruction to copy the Z80 registers to hub RAM, so that another cog could see the entire Z80 state, registers and memory. A variable or constant in the source file could specify the hub register dump address, whichever is easiest.

    Assume cog A is running the Z80 interpreter and cog B is debugging. When a HALT instruction is executed, cog A dumps the Z80 registers and sets a HALT flag in hub RAM. Cog B waits for this flag to be set, then clears it after displaying or outputting the registers. Cog A waits for the HALT flag to be cleared, then resumes program execution. Therefore, to examine the registers just insert a HALT instruction (hex 76). Note that the dumped PC would be the address of the instruction after the HALT but it could be decremented before writing to hub RAM to point to the HALT itself.
    Formerly known as TonyB
  • real programmer don't do backups, heck it was just some last minute cleanup, sorry that those two planes went down, but it was late and I had to go to my second job...

    No actually I just have what I posted and that was way earlier. I need to nail it down, and I might can do so now since you posted that Debug Package on the other thread. That might help to nail it down.

    Somehow I messed up the placement of the LUT tables, I get something like $4366 as cmd (in PA) when it reads a 08 as bytecode.

    I moved memory around and that broke something.

    MIke
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • Cluso99 wrote: »
    @pullmoll,
    You will appreciate the xbyte and skip/skip instructions for emulators, although from what I can see, it’s of limited use for z80 emulation.

    I disagree. A Z80 interpreter is the perfect application for XBYTE. In fact, the latter was designed carefully to make the former as quick and easy as possible. As soon as I read about XBYTE I knew it would be ideal for emulating 8-bit CPUs in general and the Z80 in particular, therefore I complained about any changes that would make that more difficult and the final version is wonderful (thanks to Chip).

    How the Z80 was implemented on the P1 was very clever, but that is not how to do it on the P2. XBYTE is the way to go and no other method could be as fast or compact. I have proven on paper that all the interpreter code can fit in a single cog's register and LUT RAM, but due to lack of time I need somebody else to test and confirm it.

    A P2 CP/M BIOS would be useful, of course, no matter how the emulation is done.
    pullmoll wrote: »
    I already saw the xbyte Z80 emulator code and was impressed. I also read about skip and especially skipf in the docs, which can be utilized to dramatically reduce the COG space required to emulate orthogonal instruction sets like the 8080/8085 and Z80 are.

    Hello pullmoll,

    I based my first version of Z80 XBYTE on your qz80, thank you, however the latest version with timing (v1b2) is almost a complete rewrite. Perhaps you could try this with your new P2 emulator? It is amazing how much code can be saved by using SKIPF/EXECF. I have done a fast version with no T-state counting, not yet published here, that is quite different from the timed version.

    I think I have found the optimum way of implementing the registers, i.e. BC & BC' in one long, DE & DE' in another, etc., as explained in the source file. There is a critical tradeoff between the size of EXECF tables vs. the amount of register and LUT RAM remaining to implement the actual Z80 instructions and different options are possible.
    Formerly known as TonyB
  • TonyB_ wrote: »
    How the Z80 was implemented on the P1 was very clever, but that is not how to do it on the P2. XBYTE is the way to go and no other method could be as fast or compact.
    XBYTE is a good balance of speed and simplicity for interpreters, but it's not always the fastest way to interpret another processor. In my experience JIT compilers outperform XBYTE by a significant amount, but at the cost of some latency and a more complicated interpreter.
  • TonyB_ wrote: »
    Cluso99 wrote: »
    @pullmoll,
    You will appreciate the xbyte and skip/skip instructions for emulators, although from what I can see, it’s of limited use for z80 emulation.

    I disagree. A Z80 interpreter is the perfect application for XBYTE. In fact, the latter was designed carefully to make the former as quick and easy as possible. As soon as I read about XBYTE I knew it would be ideal for emulating 8-bit CPUs in general and the Z80 in particular, therefore I complained about any changes that would make that more difficult and the final version is wonderful (thanks to Chip).

    How the Z80 was implemented on the P1 was very clever, but that is not how to do it on the P2. XBYTE is the way to go and no other method could be as fast or compact. I have proven on paper that all the interpreter code can fit in a single cog's register and LUT RAM, but due to lack of time I need somebody else to test and confirm it.

    A P2 CP/M BIOS would be useful, of course, no matter how the emulation is done.
    pullmoll wrote: »
    I already saw the xbyte Z80 emulator code and was impressed. I also read about skip and especially skipf in the docs, which can be utilized to dramatically reduce the COG space required to emulate orthogonal instruction sets like the 8080/8085 and Z80 are.

    Hello pullmoll,

    I based my first version of Z80 XBYTE on your qz80, thank you, however the latest version with timing (v1b2) is almost a complete rewrite. Perhaps you could try this with your new P2 emulator? It is amazing how much code can be saved by using SKIPF/EXECF. I have done a fast version with no T-state counting, not yet published here, that is quite different from the timed version.

    I think I have found the optimum way of implementing the registers, i.e. BC & BC' in one long, DE & DE' in another, etc., as explained in the source file. There is a critical tradeoff between the size of EXECF tables vs. the amount of register and LUT RAM remaining to implement the actual Z80 instructions and different options are possible.
    @tonyb,
    I’m not so sure that everything for the z80 is best with xbyte. Only time will tell tho.
    Some of the extended instructions would benefit from what ive seen so far.

    We already have a CPM Bios. That will have to wait another 2+ weeks until i am back in Oz (I’m in your home grounds atm - in Tavistock (Devon) for the next 5 days. The bios will be in the ZiCog thread somewhere but thats a very long thread.

    The cpm bios handles 8 x 8MB HDD as FAT32 files in the root directory. Note they are actually 32MB files with only the first 8MB accessible as heater worked the HDD table and I wasnt sure how to expand them to 32MB if that was possible. Never found the time to investigate this.

    BTW I’m all for combining the effort so anything i can help you with just ask. I’m doing the Z80 just because it interests me. Maybe in the end there may be parts you do and parts i do that makes sense. Let’s see :)
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
Sign In or Register to comment.