Edited.

2»

Comments

  • And here is another piece where single instructions are executed plus other common bits
                                                            '                  write     p000000-           %00_0000_0111_1111_1111_1110
                                                            '  +1:2 adr s=pops repeat    p0000s1-           (see below)
                                                            '      rnd fwd     ?var      p00010--           %00_0000_0101_1111_1111_1001
                                                            '      rnd rev     var?      p00011--           %00_0000_0101_1111_1111_0001
                                                            '      signx byte  ~var      p00100--           %00_0000_0101_1111_1110_1101
                                                            '      signx word  ~~var     p00101--           %00_0000_0101_1111_1101_1101
                                                            '      post-clear  var~      p00110--           %00_0000_0111_1111_1011_1101
                                                            '      post-set    var~~     p00111--           %00_0000_0111_1111_0111_1101
                                                            '      00=bit/adr  ++var     p0100ss-           %00_0000_010*_**10_1111_1101
                                                            '      01=byte     var++     p0101ss-           %00_0000_010*_**10_1111_1101
                                                            '      10=word     --var     p0110ss-           %00_0000_010*_**01_1111_1101
                                                            '      11=long     var--     p0111ss-           %00_0000_010*_**01_1111_1101
    assignx                                                                                                                                    
    .writep                 rdlong  x,--ptrb                'popx: write w/push                             %--_----_----_----_----_---0
    rdmem                   rdlong  x,adr                   'MEM read var            byte/word/long???      %--_----_----_----_----_--0-  ?????
    .rnd                    getrnd  x                       'get random value                               %--_----_----_----_----_-0--
                            rev     x                       'rev                                            %--_----_----_----_----_0---
    .sxcs                   signx   x,#7                    '~var  sign xtnd byte                           %--_----_----_----_---0_----
                            signx   x,#15                   '~~var sign xtnd word                           %--_----_----_----_--0-_----
                            mov     x,#0                    'var~                                           %--_----_----_----_-0--_---- 
                            neg     x,#1                    'var~~                                          %--_----_----_----_0---_---- 
    .incdec                 add     x,#1                    '++var/var++                                    %--_----_----_---0_----_----
                            sub     x,#1                    '--var/var--                                    %--_----_----_--0-_----_----
                            zerox   x,adr                   'mask result by size: adr   -1 ?????            %--_----_----_-0--_----_----  ?????
                            and     x,#$FF                  '                     byte                      %--_----_----_0---_----_----
                            and     x,maskword              '                     word                      %--_----_---0_----_----_----
    .stack                  wrlong  x,ptrb                  'update var on stack                            %--_----_--0-_----_----_----
                            add     ptrb,#4                 'keep var on stack                              %--_----_-0--_----_----_----  <<<<<
    .keep                   test    op2,#%10000000  wc      '                                               %--_----_0---_----_----_----
            if_c            add     ptrb,#4                 'keep var on stack                              %--_---0_----_----_----_----
    .restore                nop                             'restore pushret to #loop                       %--_--0-_----_----_----_----
    wrmem                   wrlong  x,adr                   'MEM read var            byte/word/long???      %--_-0--_----_----_----_----  ?????
                            jmp     #loop                   '                                               $0
    
  • I've found this method to be better for construction the skip bitmap
    '  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    '  % 9  $24   SPR[nibble]                       x                        push %                 j9_0 r  %--_----_----_1111_0100_0111
    '  %    $25   SPR[nibble]                       x                        pop  %                 j9_1 w  %--_----_----_1111_0100_0111
    '  %    $26   SPR[nibble]                       x                        asgn %       ?????     j9_2 a  %--_----_----_1111_0100_0111
    '  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%                                             
    '  % F  $3D   register[bit] op                  x                             %                 jF_1 b  %--_----_----_0000_1001_1001
    '  %    $3E   register[bit..bit] op             y,x                           %                 jF_2 r  %--_----_----_0000_1001_1100
    '  %    $3F   register op                       ---         push/pop/asgn/adr %       ?????     jF_3 o  %--_----_----_1000_1100_0111
    '  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    j9_012                                                  'x=reg $1Fx                       j9...  jF...
    jF_123                                                  'yx=[b..b] pcurr=op+reg           0 1 2  1 2 3
    
    {jF_2}                  rdlong  y,--ptrb                'popy:  bit                       - - -  - r -  %--_----_----_----_----_---0
    {jF_2 jF_1}             rdlong  x,--ptrb                'popx:  bit                       - - -  b r -  %--_----_----_----_----_--0-
         {jF_1}             mov     y,x                     'if single bit range              - - -  b - -  %--_----_----_----_----_-0--
    {j9_012 jF_3}           mov     lsb,#0                  '\ bits[31:0]                     r w a  - - o  %--_----_----_----_----_0---
                            neg     a,#1                    '/ ..mask=1's                     r w a  - - o  %--_----_----_----_---0_----
                            mov     skip_rev,#3             'preset no reverse                r w a  b r o  %--_----_----_----_--0-_----
    {jF_12}                 call    #.bit_range             'set a=bit-mask, lsb=lowest-bit   - - -  b r -  %--_----_----_----_-0--_----   
    
    {j9_012}                rdlong  r,--ptrb                'popx:  reg                       r w a  - - -  %--_----_----_----_0---_----
    {jF_123}                rdbyte  r,ptra++                'pcurr: reg+op                    - - -  b r o  %--_----_----_---0_----_----
                            mov     op,r                    '\ justify op                     - - -  b r o  %--_----_----_--0-_----_----
                            shr     op,#5                   '/   (sets type to register)      - - -  b r o  %--_----_----_-0--_----_----
    {jF_12}                 or      op,#%1100               'set bit mode                     - - -  b r -  %--_----_----_0---_----_----   ?????
    
              
                    call    #_debugreg1
                            
    {j9_012 jF_123}         call    #.reg_xlate             'translate to P2 regs             r w a  b r o  $0
              
                    call    #_debugreg2
                            
    

    I now use this for each possible variation
    x x x  x x x  %--_----_----_1111_0100_0111
    
    where the "-" is for unused (convert to 0 when inserted into the table) and of course 1 to skip, 0 to execute
    and for each instruction in the group, this
    r w a  b r o  %--_----_----_----_--0-_----
    
    I try to make the single alpha chars make some form of sense.
  • roglohrogloh Posts: 2,240
    edited 2019-03-30 - 03:50:30
    TonyB_ wrote: »
    P.S. I think slowest instruction is 8-bit arithmetic e.g. ADD A,r or INC/DEC r, as mentioned above:
    http://forums.parallax.com/discussion/comment/1467699/#Comment_1467699

    Ok, thanks TonyB_. I see now that those Z80 flag bitwise computations required for 8 bit register ALU operations is sort of the limiting case for all those fast 4 T state Z80 register arithmetic instructions as this burns quite a few P2 instructions. So any optimizations found that work there are going to be helpful to speed things up. Most of the rest of the Z80 instruction set seems to give you plenty more T-states to play with.

    Note: I'm not saying it necessarily needs to be sped up by the way. It's going to be plenty fast for emulating a typical 4MHz Z80 IMHO. :smile:
  • TonyB_TonyB_ Posts: 1,483
    edited 2019-12-25 - 23:13:44
    Edited.
  • rogloh wrote: »
    TonyB_ wrote: »
    P.S. I think slowest instruction is 8-bit arithmetic e.g. ADD A,r or INC/DEC r, as mentioned above:
    http://forums.parallax.com/discussion/comment/1467699/#Comment_1467699

    Ok, thanks TonyB_. I see now that those Z80 flag bitwise computations required for 8 bit register ALU operations is sort of the limiting case for all those fast 4 T state Z80 register arithmetic instructions as this burns quite a few P2 instructions. So any optimizations found that work there are going to be helpful to speed things up. Most of the rest of the Z80 instruction set seems to give you plenty more T-states to play with.

    Note: I'm not saying it necessarily needs to be sped up by the way. It's going to be plenty fast for emulating a typical 4MHz Z80 IMHO. :smile:

    Going to need a lot of waits or slow the clock freq down to match a 4MHz Z80.
  • TonyB_TonyB_ Posts: 1,483
    edited 2019-12-25 - 23:13:54
    Edited.
  • roglohrogloh Posts: 2,240
    edited 2019-03-30 - 04:53:50
    Thinking more about your emulator @TonyB_ ...

    I see you have mostly left the IO instructions alone for now with some placeholders in the code. In the future it could be nice to be able to access some Z80 IO ports, both emulated internally and potentially using actual real external HW devices on a bus. If there is enough space left in COG RAM it could be good to have some mechanism to map IO ports to either another COG (or COGs) or external bus, perhaps using some HUB memory to define port ranges and which COG to trigger etc. I'm sort of thinking that the external COG that emulates some given port range could be notified to trigger its execution from the Z80 IO reads/write operations and then return a data result for reads, along with some number of I/O and/or wait cycles taken by the operation so the emulator could wait for the result and accurately model its execution time and IO delay accordingly. Then Z80 peripherals such as the PIO might be able to be emulated internally within the P2, or even real HW accessed over some P2 IO pins that emulate the Z80 IO bus access. This could be rather good. Also if the internal emulation of some peripheral ends up taking longer than real HW typically would, we could just pretend this slow down was a result of using wait states to keep the emulator timing happy and Z80 still functioning.

    Some internal INT signal might even be possible too...?
  • TonyB_TonyB_ Posts: 1,483
    edited 2019-12-25 - 23:14:05
    Edited.
  • roglohrogloh Posts: 2,240
    edited 2019-03-31 - 22:22:13
    I really like what you've done so far Tony.

    While some people would be interested in optimising this code to be as fast as possible even at the expense of true cycle accuracy, that may burn COG RAM and I do think it is still useful to keep sufficient spare memory around for further features you or others might like to add down the line such as adding IO ports/interrupt support/external memory buses/ DMA etc. Maybe two variants of the code will eventuate, one for highest performance, the other for highest compatibility with the Z80?

    Cheers,
    Roger.

    Update: also don't forget about the 24 longs used for registers and masks etc you have defined before the start of COG executable code. I would think they need to be factored into the COG RAM usage as well if you hadn't already already accounted for those Tony in your number above. Right now they look like they are part of hub RAM but not COG RAM.
  • TonyB_TonyB_ Posts: 1,483
    edited 2019-12-25 - 23:14:17
    Edited.
  • TonyB_TonyB_ Posts: 1,483
    edited 2019-12-25 - 23:14:30
    Edited.
  • Great work TonyB_, looks like you've pushed it even faster now. So DAA looks to be about the slowest emulated instruction in this fast Z80 emulator and divide by 13 is still plenty fast, although personally I'm still very interested in that original cycle accurate variant. Especially if other COGs can somehow synchronously emulate some IO Z80 peripherals such as the PIO in the future.
  • TonyB_TonyB_ Posts: 1,483
    edited 2019-12-25 - 23:14:39
    Edited.
  • roglohrogloh Posts: 2,240
    edited 2019-04-10 - 00:25:54
    Yes I can see that keeping the Z80 memory space at hub address zero would be the fastest. I'd also expect it will still be handy to have a variant to enable you to address more than 64kB from other hub addresses as some later Z80 systems supported banked memory, once memory chips became cheap enough, though their bank selection approaches would have differed. I think for example in the case of a Z80 Microbee system it allowed the lower 32kB Z80 address space to be mapped from a pool of 128-256kB of DRAM by just writing a control byte to an IO mapped latch. Other Z80 memory bank expansion approaches might have worked slightly differently or over different address regions but achieved similar results. I guess some customization of the memory address reads/writes in your emulator will be required by people for supporting those systems, but it should still be doable at the expense of some small amount of speed. There's still plenty of performance headroom left there anyway for typical Z80 speeds of that era.

    Actually yesterday I was pondering a full Microbee Z80 implementation on the P2 eventually using a customized variant of this emulator that supports IO. I've already done a VGA emulated version of the CRTC using P1 but I think if the P2 running overclocked at 270MHz can generate 576p over HDMI that would work well with line-doubling and support all the 256(16 rows x16), 272(17 rows x 16), 264(24 rows*11), 275(25 rows*11) scan line video modes most people used with the 6545 CRTC on that system. The 270MHz is also handy multiple of the original 13.5MHz crystal it used for its pixel clock and 3.375MHz CPU clock too. Add a USB or PS/2 keyboard interface and my SD BIOS mods and its mostly done. A PIO, either real or ideally emulated, and interrupts would complete it fully and would allow this system to talk to the outside world with its original serial/parallel/tape/squawker ports etc, and even directly through P2 pins for simplicity. Very little external HW would be needed.
  • If I remember correctly MP/M was using bank switching for multiple user CP/M. I think they switched the upper 32K, but it might be the lower 32K as @rogloh writes.

    I am following this thread with great interest, having CP/M and or MP/M on a P2 would be absolutely cool.

    Enjoy!

    Mike
  • TorTor Posts: 1,995
    msrobots wrote: »
    If I remember correctly MP/M was using bank switching for multiple user CP/M. I think they switched the upper 32K, but it might be the lower 32K as @rogloh writes.
    Without checking, I think it must have been the lower 32K that was switched. I think MP/M works as CP/M-3 (CP/M Plus) in this respect, and that one needs some common OS code to be available at all times and the CP/M (and MP/M) BDOS is in upper RAM.

  • msrobotsmsrobots Posts: 3,127
    edited 2019-08-05 - 02:56:02
    @TonyB_,

    I tried to compile P2_Z80_CPU_v1b.spin2 but FastSpin complains about your usage of
    @ in Label names like in
    T_INC_DEC_@HL	=	1
    
    @ is a operator, how do you compile this?
    
    

    Pnut does not compile this either

    Mike
  • msrobotsmsrobots Posts: 3,127
    edited 2019-08-05 - 04:10:21
    There are a lot of those @ s I handle them later
    
    
    xbyte_setq	=	%100000000	'LUT index = bytecode[7:0], LUT base = $100 (no/ED prefix)
    xbyte_setq2	=	%011101010	'LUT index = bytecode[7:3], LUT base = $0E0 (CB prefix)
    
    it is % not # for binary numbers
    
    Next one is your Block with Registers needs to go behind jmp	#LUT_load
    
    DAT
    		orgh 0
    		org 0
    Entry
    		jmp	#LUT_load			'*** follow this jump if a newcomer ***
    
    '-----------------------------------------------
    
    'Registers
    …
    '-----------------------------------------------
    
    M1T		long	T_M1				'T-states in M1 cycle no it is T_M1 not #T_M1 no # here in front of T_M1
    
    NOP  is a reserved word and can't be a label - changed to XNOP
    CALL is a reserved word and can't be a label - CHANGED to XCALL
    
    

    now changing the @ 's that will take some time

    Mike
  • msrobotsmsrobots Posts: 3,127
    edited 2019-08-05 - 06:57:24
    Ok I am down to one line not compiling
    SBC_ADC_HL						'called, no return
    		add	T,#T_SBC_ADC_HL
    		testbn	opcode,#3		wz	'z = 0 for ADC, z = 1 for SBC
    		shr	opcode,#4			'01rrn010 -> 000001rr = rr + 100
    
    ->		alts	opcode,#BC-%100
    
    changing to
    
    		sub	opcode,#%100
    		alts	opcode,#BC
    
    does compile, but is that meant like that?
    
    
    		getword	data,0,#BC_word			'data = BC/DE/HL/SP
    		getword	acc,HL,#HL_word			'acc = HL
    		getbyte	temp,HL,#H_byte			'temp = old H
    		testb	AF,#C_bit		wc	'c = CF = carry in
    		sumz	acc,data			'acc + data if NF = 0, acc - data if NF = 1
    	if_c	sumz	acc,#1				'acc + CF   if NF = 0, acc - CF   if NF = 1
    		setword	acc,HL,#HL_word			'write new HL
    		ror	acc,#8				'acc = {new L, 15'b0, CF, H}
    		ror	data,#8				'data = {C/E/L/SPl, 16'b0, B/D/H/SPh}
    		call	#\add_sub_flags			'do flags, shared code with 8-bit ADC/ADD/SUB
    		jmp	#ED_done_pop
    
    C:/P2/test/P2_Z80_CPU_v1b.spin2(389) error: Source operand too big for alts
    Everything else compiles now with fastspin.

    attached changed file P2_Z80_CPU_v1b.spin2 with error

    I do not really understand what this is supposed to do.

    Enjoy!

    Mike
  • TonyB_TonyB_ Posts: 1,483
    edited 2019-12-25 - 23:14:56
    Edited.
  • msrobotsmsrobots Posts: 3,127
    edited 2019-08-05 - 21:30:34
    It looks like you do not compile at all

    But are you aware that stuff like this
    CB_76543_execf
    		long	RLC		|	%0_1111110			<< 10	'00-07 RLC
    		long	RRC		|	%0_111110_			<< 10	'08-0F RRC
    		long	RL		|	%0_11110__			<< 10	'10-17 RL
    		long	RR		|	%0_1110___			<< 10	'18-1F RR
    		long	SLA		|	%0_110____			<< 10	'20-27 SLA
    		long	SRA		|	%0_10_____			<< 10	'28-2F SRA
    		long	SLL		|	%0_110____			<< 10	'30-37 SLL or SL1
    		long	SRL		|	%0_0______			<< 10	'38-3F SRL
    
    is in fact THIS?
    CB_76543_execf
    		long	RLC		|	%0_1111110			<< 10	'00-07 RLC
    		long	RRC		|	%00_111110			<< 10	'08-0F RRC
    		long	RL		|	%000_11110		<< 10	'10-17 RL
    		long	RR		|	%0000_1110			<< 10	'18-1F RR
    		long	SLA		|	%00000_110			<< 10	'20-27 SLA
    		long	SRA		|	%000000_10			<< 10	'28-2F SRA
    		long	SLL		|	%00000_110			<< 10	'30-37 SLL or SL1
    		long	SRL		|	%0000000_0			<< 10	'38-3F SRL
    
    underscores are no numbers, just optical placeholders, so %111000 is the same as %__11_1_000____
    So what you want is either
    CB_76543_execf
    		long	RLC		|	%0_1111110			<< 10	'00-07 RLC
    		long	RRC		|	%0_1111100			<< 10	'08-0F RRC
    		long	RL		|	%0_1111000			<< 10	'10-17 RL
    		long	RR		|	%0_1110000			<< 10	'18-1F RR
    		long	SLA		|	%0_1100000			<< 10	'20-27 SLA
    		long	SRA		|	%0_1000000			<< 10	'28-2F SRA
    		long	SLL		|	%0_1100000			<< 10	'30-37 SLL or SL1
    		long	SRL		|	%0_0000000			<< 10	'38-3F SRL
    
    or
    CB_76543_execf
    		long	RLC		|	%0_1111110			<< 10	'00-07 RLC
    		long	RRC		|	%0_1111101			<< 10	'08-0F RRC
    		long	RL		|	%0_1111011			<< 10	'10-17 RL
    		long	RR		|	%0_1110111			<< 10	'18-1F RR
    		long	SLA		|	%0_1101111			<< 10	'20-27 SLA
    		long	SRA		|	%0_1011111			<< 10	'28-2F SRA
    		long	SLL		|	%0_1101111			<< 10	'30-37 SLL or SL1
    		long	SRL		|	%0_0111111			<< 10	'38-3F SRL
    

    depending if you want to execute them.

    Mike
  • Ahh, as usual I was to stupid.

    What you wrote is OK I just got confused by the format.

    At least I can run 2 instructions without problems , 0 and 8

    Now I need to test some Z80 hex in there, just something simple some nop's and loop, so I can see PC stays where it should

    What would be

    loop
    nop
    nop
    nop
    jr loop

    in hex?

    Mike
  • TonyB_TonyB_ Posts: 1,483
    edited 2019-12-25 - 23:15:07
    Edited.
  • msrobotsmsrobots Posts: 3,127
    edited 2019-08-07 - 10:07:21
    Thank you, @TonyB_

    Sadly I broke it yesterday while removing debug stuff, and sure I do not make backups at 3 in the night while working fireously. In just 5 minutes I broke it and don't now where and why.

    Will take some more searching

    Mike
  • msrobots,
    Do you have a recent backup?
    Notepad++ has a decent addin that compates two files side by side.
  • TonyB_TonyB_ Posts: 1,483
    edited 2019-12-25 - 23:15:22
    Edited.
  • real programmer don't do backups, heck it was just some last minute cleanup, sorry that those two planes went down, but it was late and I had to go to my second job...

    No actually I just have what I posted and that was way earlier. I need to nail it down, and I might can do so now since you posted that Debug Package on the other thread. That might help to nail it down.

    Somehow I messed up the placement of the LUT tables, I get something like $4366 as cmd (in PA) when it reads a 08 as bytecode.

    I moved memory around and that broke something.

    MIke
  • TonyB_TonyB_ Posts: 1,483
    edited 2019-12-25 - 23:15:36
    Edited.
  • TonyB_ wrote: »
    How the Z80 was implemented on the P1 was very clever, but that is not how to do it on the P2. XBYTE is the way to go and no other method could be as fast or compact.
    XBYTE is a good balance of speed and simplicity for interpreters, but it's not always the fastest way to interpret another processor. In my experience JIT compilers outperform XBYTE by a significant amount, but at the cost of some latency and a more complicated interpreter.
  • TonyB_ wrote: »
    Cluso99 wrote: »
    @pullmoll,
    You will appreciate the xbyte and skip/skip instructions for emulators, although from what I can see, it’s of limited use for z80 emulation.

    I disagree. A Z80 interpreter is the perfect application for XBYTE. In fact, the latter was designed carefully to make the former as quick and easy as possible. As soon as I read about XBYTE I knew it would be ideal for emulating 8-bit CPUs in general and the Z80 in particular, therefore I complained about any changes that would make that more difficult and the final version is wonderful (thanks to Chip).

    How the Z80 was implemented on the P1 was very clever, but that is not how to do it on the P2. XBYTE is the way to go and no other method could be as fast or compact. I have proven on paper that all the interpreter code can fit in a single cog's register and LUT RAM, but due to lack of time I need somebody else to test and confirm it.

    A P2 CP/M BIOS would be useful, of course, no matter how the emulation is done.
    pullmoll wrote: »
    I already saw the xbyte Z80 emulator code and was impressed. I also read about skip and especially skipf in the docs, which can be utilized to dramatically reduce the COG space required to emulate orthogonal instruction sets like the 8080/8085 and Z80 are.

    Hello pullmoll,

    I based my first version of Z80 XBYTE on your qz80, thank you, however the latest version with timing (v1b2) is almost a complete rewrite. Perhaps you could try this with your new P2 emulator? It is amazing how much code can be saved by using SKIPF/EXECF. I have done a fast version with no T-state counting, not yet published here, that is quite different from the timed version.

    I think I have found the optimum way of implementing the registers, i.e. BC & BC' in one long, DE & DE' in another, etc., as explained in the source file. There is a critical tradeoff between the size of EXECF tables vs. the amount of register and LUT RAM remaining to implement the actual Z80 instructions and different options are possible.
    @tonyb,
    I’m not so sure that everything for the z80 is best with xbyte. Only time will tell tho.
    Some of the extended instructions would benefit from what ive seen so far.

    We already have a CPM Bios. That will have to wait another 2+ weeks until i am back in Oz (I’m in your home grounds atm - in Tavistock (Devon) for the next 5 days. The bios will be in the ZiCog thread somewhere but thats a very long thread.

    The cpm bios handles 8 x 8MB HDD as FAT32 files in the root directory. Note they are actually 32MB files with only the first 8MB accessible as heater worked the HDD table and I wasnt sure how to expand them to 32MB if that was possible. Never found the time to investigate this.

    BTW I’m all for combining the effort so anything i can help you with just ask. I’m doing the Z80 just because it interests me. Maybe in the end there may be parts you do and parts i do that makes sense. Let’s see :)
Sign In or Register to comment.