Console Emulation

TonyB_ · 2022-01-14 10:01

@Wuerfel_21 said:

This pic doesn't look quite right, at top of right palm tree trunk.

Will the 68K and Z80 emulators both contain PSRAM code? If so, a single PSRAM cog would appear to be better so that spare code in Z80 cog could use the streamer for video.

Wuerfel_21 · 2022-01-14 13:05

@TonyB_ said:

@Wuerfel_21 said:

This pic doesn't look quite right, at top of right palm tree trunk.

Not sure what you're talking about. The graphics in this scene are a bit janky as a still frame due to the heat effect.

Will the 68K and Z80 emulators both contain PSRAM code? If so, a single PSRAM cog would appear to be better so that spare code in Z80 cog could use the streamer for video.

Ye, the PSRAM code will be in both. It needs to be for lowest latency.

TonyB_ · 2022-01-14 17:04

@Wuerfel_21 said:

@TonyB_ said:

@Wuerfel_21 said:

This pic doesn't look quite right, at top of right palm tree trunk.

Not sure what you're talking about. The graphics in this scene are a bit janky as a still frame due to the heat effect.

Looks corrupted to right of "SONIC GOT" and "THROUGH".

Wuerfel_21 · 2022-01-14 17:33

That's just the "ACT 1" icon.

Wuerfel_21 · 2022-01-15 15:14

So, uhh, making some inroads on that Z80 kore.

Why are the block operations so insane?
Why did I decide to write it like that?
AAAAAAAAA

.blockop
              '' The undocumented flags on these are cursed.
              test zk_opcode,#%0100_1000 wz
        if_nz ret ' NOP
              rczr zk_opcode wcz ' get operation type into cz
        if_00 skipf ##%00000000_11101_01_1111110_0110_011_111111
        if_01 skipf ##%00000011_00000_10_0011110_0101_011_1111
        if_10 skipf ##%00001111_11101_01_1100011_0011_100_11
        if_11 skipf ##%00111111_11101_01_1111101_0011_011

              getbyte zk_ea,zk_debc,#0 ' IN
              call #\zk_portin ' IN
              call #\zk_read8hl ' everything else

              modc _clr wc  ' LD
              modc _set wc  ' CP
              testb zk_tmp8,#7 wc ' IN/OUT
              bitc zk_flags,#ZK_NMODE_BIT

              mov zk_optmp0,zk_accu ' LD CP
              getbyte zk_optmp0,zk_hl,#0 ' OUT
              getbyte zk_optmp0,zk_debc,#0 ' IN
              testb zk_opcode,#1 wc ' IN
              sumc zk_optmp0,#1 ' IN
              mov zk_optmp1,zk_tmp8 ' CP
              xor zk_optmp1,zk_accu ' CP

              sub zk_optmp0,zk_tmp8 ' CP only
              add zk_optmp0,zk_tmp8 ' LD,IN,OUT

              xor zk_optmp1,zk_optmp0 ' CP only
              and zk_optmp0,#255 wz
              bitz zk_flags,#ZK_ZERO_BIT ' CP
              testb zk_optmp0,#7 wc ' CP
              bitc  zk_optmp0,#ZK_SIGN_BIT ' CP

              skipf ##%000_00_000_0_1110100_1100_1111_0000_11111_111111
              skipf ##%000_00_111_0_1110100_1111_1111_0000_01010_1111
              skipf ##%000_00_111_0_0001001_1111_0000_1111_10101_11
              skipf ##%000_00_111_0_0001001_0011_0000_1111_10101

              testb zk_optmp1,#4 wc   ' CP
              cmpr zk_optmp0,#255 wc  ' IN/OUT
              bitc zk_flags,#ZK_HALF_BIT ' CP
              muxc zk_flags,#(1<<ZK_HALF_BIT)|(1<<ZK_CARRY_BIT) ' IN/OUT
        if_c  sub zk_optmp0,#1  ' CP

              testb zk_optmp0,#1 wc ' LD,CP
              bitc zk_flags,#3 'LD,CP (sets YF)
              testb zk_optmp0,#3 wc ' LD,CP
              bitc zk_flags,#5 'LD,CP (sets XF)

              getbyte zk_optmp1,zk_debc,#1 ' IN,OUT (get B)
              and zk_optmp0,#7 ' IN,OUT
              xor zk_optmp0,zk_optmp1 wc ' IN,OUT
              bitnc zk_flags,#ZK_OVER_BIT ' IN,OUT

              getword zk_ea,zk_debc,#1 ' LD
              call #\zk_write8 ' LD
              getbyte zk_ea,zk_debc,#0 ' OUT
              call #\zk_portout ' OUT

              getword zk_optmp1,zk_debc,#0 ' LD,CP
              decmod zk_optmp1,zk_ffffh wz ' all
              setbyte zk_debc,zk_ea,#1 ' IN/OUT
              bitnz zk_flags,#ZK_OVER_BIT ' LD,CP
              bitz zk_flags,#ZK_ZERO_BIT ' IN,OUT
              setq #ZK_SIGNXYMASK ' IN,OUT
              muxq zk_flags,zk_optmp1 ' IN,OUT

              testb zk_opcode,#1 wc ' Get direction (C = dec)

        if_nc incmod zk_ea,zk_ffffh    ' LD
        if_c  decmod zk_ea,zk_ffffh    ' LD
              setword zk_debc,zk_ea,#1 ' LD

        if_nc incmod zk_hl,zk_ffffh
        if_c  decmod zk_hl,zk_ffffh

              testb zk_opcode,#2 wc ' Is repeat code?
 if_c_and_nz  sub zk_pc,#2
              ret

The best part is that there's surely a bug in there. IDK, still too lazy to set up an assembler to run some code yet.

rogloh · 2022-01-15 23:45

LOL. Nested skipfs are so much fun to maintain. Just wait when you need to fit another line in.

rogloh · 2022-01-16 06:45

After looking at these CPU emulators you've been developing @Wuerfel_21 I now wonder how hard an AVR would be to emulate on a P2? It might be good for larger programs running from PSRAM. Especially given full GCC and C++ toolchain support for this microarchitecture. It wouldn't be as fast (maybe just a few MIPs instead of 16MIPs) but it could be useful for external memory program expansion and it might be a reasonable match with the performance of a PSRAM random read anyway.

Here's the ISA:

https://en.wikipedia.org/wiki/Atmel_AVR_instruction_set

After a quick look it seems like it should fit in a COG okay. I've started looking at the decoding and the flag stuff it might need. Idea is a LUT table holding EXECF vectors taking up 256 entries and indexed with RDLUT by upper byte of opcode. Regs stored in COGRAM and they could be indirectly indexed by setbyte/getbyte perhaps to maintain their 8 bit nature. ALU flags update will slow things a bit but many of the instructions would be simple. Nibble aligned register addresses helps a lot.

Wuerfel_21 · 2022-01-16 21:59

AVR is probably possible, but I don't see the point. It's not going to be fast enough to emulate existing code and why would you limit yourself to an 8 bit ISA otherwise...

I am however thinking that it is probably possible to squeeze an entire SNES audio board into a single cog. That's an SPC700 CPU (=6502 clone) at 1MHz, 64k of RAM and a moderately complex custom DSP that mixes 8 BRR-compressed samples with envelopes at 32kHz. The whole thing is effectively standalone, so you only need a 64k memory dump to play music (as opposed to a large register log).

In actual news, Z80 core is good enough to run some tests now:

rogloh · 2022-01-16 23:04

It would be slow yes, but it may possibly allow some simple Arduino or other AVR code to run...lack of I/O peripherals would be an issue of course.

I've hacked up the bulk of it and it will certainly fit in a COG. The various flags are a problem that can slow the ALU ops down. Other ops are not so bad. If we could get a couple of MIPs from it then it might still be useful for something.

Ideally we could run 32 bit P2 operations not 8 bit AVR operations from external memory but we don't have the tools needed yet.

Wuerfel_21 · 2022-01-16 23:15

Wonder how the speed compares when running GCC-compiled code via 68000 emulation vs AVR emulation.... 68000 ops are slow to decode, but can handle 32 bits in one go...

Megayume's 68000 kore seems to be quite a bit faster than an 8MHz 68000 and it could be a bit faster if you removed things like interrupt/trace support and simplified the memory decoding.

(Also, I just remembered I could/should move some more addressing modes into cog memory with EXECF, which would see a neat bit of speedup)

rogloh · 2022-01-16 23:31

Yeah it would be interesting and probably somewhat application dependent. The AVR may get a relative speedup with memory transfers and branches that normally take extra cycles but ALU would be slower with the flags. I've got the basic MOV instructions down to 29 clocks (cheating a bit with HUB RAM + fifo use for now). So a 300 MHz P2 could do about 10 AVR MIPs in MOV instructions, not too bad as a LOT of code is MOV (as well as load and store). It would be neat to tweak the AVR instructions to use 32 bits but no toolchain would like it.

loop
         rfword instr  '   2 ' need to wrap?
         getbyte instr, instr, #1 ' 2
         rdlut sequence, instr '3
         push #loop ' 2
         execf sequence ' 4
         getnib s, instr, #0 ' 2
         getnib d, instr, #1 ' 2
         altgb s, #regbase ' 2
         getbyte s, 0-0, #0 ' 2
         mov d1, s  ' 2 (done like this to share code)
         altsb d, #regbase ' 2
 _ret_   setbyte 0-0, d1, #0 ' 4

Wuerfel_21 · 2022-01-16 23:43

Hmm, not counting opcode fetch overhead, a MOVE.L Dn,Dn takes... 130 cycles? Probably some miscounting in there. So on the incredibly silly front of register bandwidth, your AVR thing wins ;3

mk_nextop

mk_ihook1     nop
mk_ihook2     nop

              call mk_getopf ' count as 8 cy for comparsions sake

              push #mk_nextop
              getnib mk_memtmp0,mk_opword,#3
              altd mk_memtmp0,#mk_nibble_impl_tbl
              jmp 0-0
mk_nibble_2 ' MOVE (long)
              callpb mk_opword,#mk_setup_operand32
mk_setup_operand32
              mov mk_shiftit,#0
              test pb,#%110_000 wz
        if_z  jmp #mk_setup_reg32
mk_setup_reg32
              and pb,#%1_111
              sets mk_rd32_reg,pb
              setd mk_wr32_reg,pb
              mov  mk_readf,#mk_rd32_reg
              mov  mk_writef,#mk_wr32_reg
              ret wcz

              call mk_readf
mk_rd32_reg
        _ret_ mov mk_memvalue,0-0

              callpa #31,#mk_move_shoot_the_Smile

mk_move_shoot_the_Smile ' handle all the common MOVE stuff
               ' Doing MOVEA?
              test mk_opword,#%110_000_000 wz
              testb mk_opword,#6 andz
        if_z  jmp #.movea
              signx mk_memvalue,pa wcz
              bitc mk_sr,#MK_NEG_BIT
              bitz mk_sr,#MK_ZERO_BIT
              andn mk_sr,#MK_OVER_MASK|MK_CARRY_MASK
.mangle_ea2   ' format for second operand is stupid
              setnib mk_opword,#0,#3
              mov pa,mk_opword
              and pa,#%111_000_000
              shl pa,#6
              or mk_opword,pa
        _ret_ shr mk_opword,#9


              callpb mk_opword,#mk_setup_operand32
mk_setup_operand32
              mov mk_shiftit,#0
              test pb,#%110_000 wz
        if_z  jmp #mk_setup_reg32
mk_setup_reg32
              and pb,#%1_111
              sets mk_rd32_reg,pb
              setd mk_wr32_reg,pb
              mov  mk_readf,#mk_rd32_reg
              mov  mk_writef,#mk_wr32_reg
              ret wcz

              call mk_readf
mk_rd32_reg
        _ret_ mov mk_memvalue,0-0

              jmp mk_writef

mk_wr32_reg
        _ret_ mov 0-0,mk_memvalue

I guess MOVE and a bunch of other ops don't really care about mk_shiftit that much, so that's 4 cycles that could be eliminated by jumping over that.

TonyB_ · 2022-01-17 11:06

@rogloh said:
Yeah it would be interesting and probably somewhat application dependent. The AVR may get a relative speedup with memory transfers and branches that normally take extra cycles but ALU would be slower with the flags. I've got the basic MOV instructions down to 29 clocks (cheating a bit with HUB RAM + fifo use for now). So a 300 MHz P2 could do about 10 AVR MIPs in MOV instructions, not too bad as a LOT of code is MOV (as well as load and store). It would be neat to tweak the AVR instructions to use 32 bits but no toolchain would like it.
loop
         rfword instr  '   2 ' need to wrap?
         getbyte instr, instr, #1 ' 2
         rdlut sequence, instr '3
         push #loop ' 2
         execf sequence ' 4
         getnib s, instr, #0 ' 2
         getnib d, instr, #1 ' 2
         altgb s, #regbase ' 2
         getbyte s, 0-0, #0 ' 2
         mov d1, s  ' 2 (done like this to share code)
         altsb d, #regbase ' 2
 _ret_   setbyte 0-0, d1, #0 ' 4

Will this work? Low byte of instr gets overwritten in line 2.

How about using XBYTE? The end of each instruction could read the low byte of the next opcode:

        _ret_   rfbyte  oplow

where TOS = $1FF and _ ret _ starts new XBYTE using high byte of opcode, copying it to pa. Most AVR instructions can be decoded using the high seven bits, halving LUT EXECF table to 128 longs. This means extra decoding for some instructions but adds only two cycles to MOVW:

                tjz     pa,#NOP

TonyB_ · 2022-01-17 11:13

@Wuerfel_21 said:
In actual news, Z80 core is good enough to run some tests now:

What test program are you using? Some mnemonics are not Z80.

Wuerfel_21 · 2022-01-17 11:25

@TonyB_ said:

@Wuerfel_21 said:
In actual news, Z80 core is good enough to run some tests now:

What test program are you using? Some mnemonics are not Z80.

You mean "ALO"? That's just a catch-all for all the opcodes $80..$BF

rogloh · 2022-01-17 13:12

@TonyB_ said:
Will this work? Low byte of instr gets overwritten in line 2.

Yeah I found that today and fixed it.

How about using XBYTE? The end of each instruction could read the low byte of the next opcode:
  _ret_   rfbyte  oplow

Haven't looked at that yet. Might try it once I've put all the other things together and got something going there. It will fit fine in the COG with a 256 entry table and this allows high/low register selection in the execf sequence (see regbase vs regbase+4 alternatives below).

The AVR ISA is reasonably simple to emulate but as expected the flags are a PITA and chew more P2 cycles. Here's a sample of what I've got (untested and I'm not even sure I have its logic coded right). AVR uses a half carry flag, and an overflow flag too in ALU operations which are annoying to compute, as well as the more typical C, Z, S, N flags which are okay. I think the ADD here alone takes about 50 P2 clocks plus the outer loop overhead time of about 14 P2 clocks (total 64 clocks, so a bit under 5MIPs @ 300MHz...but could still be usable). Thankfully most other instructions are much simpler/faster than these.

op_cpc
op_cp
op_sbc
op_sub
op_add
op_adc
op_cpi
op_sbci
op_subi
                getnib  d, instr, #1        ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                altgb   d, #regbase         '  l     l      l      l      l      l      l      l      l
                altgb   d, #regbase+4       '  h     h      h      h      h      l      l      l      l
                getbyte d1, 0-0, #0         ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI

                getnib  s, instr, #0        ' CPC    CP    SBC    SUB    ADD    ADC     |      |      |
                altgb   s, #regbase         '  l     l      l      l      l      l      |      |      | 
                altgb   s, #regbase+4       '  h     h      h      h      h      h      |      |      |
                getbyte s1, 0-0, #0         ' CPC    CP    SBC    SUB    ADD    ADC     |      |      |
                getnib  s1, instr, #2       '  |     |      |      |      |      |     CPI    SBCI   SUBI
                rolnib  s1, instr, #0       '  |     |      |      |      |      |     CPI    SBCI   SUBI

                getnib  d2, d1, #0          ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI ' save for flags
                testb   flags, #CFLAG wc    ' CPC    |     SBC     |      |     ADC     |     SBCI    |
                subx    d1, s1 wz           '  |     |     SBC     |      |      |      |     SBCI    |
                add     d1, s1              '  |     |      |      |     ADD     |      |      |      |
                sub     d1, s1 wz           '  |     |      |     SUB     |      |      |      |     SUBI
                addx    d1, s1              '  |     |      |      |      |     ADC     |      |      |
                cmp     d1, s1 wz           '  |     CP     |      |      |      |     CPI     |      |
                cmpx    d1, s1 wz           ' CPC    |      |      |      |      |     CPI     |      |

                altsb   d, #regbase         '  |     |      l      l      l      l      |      l      l
                altsb   d, #regbase+4       '  |     |      h      h      h      l      |      l      l
                setbyte 0-0, d1, #0         '  |     |     SBC    SUB    ADD    ADC     |     SBCI   SUBI

                mov     d, d1               ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                fles    d, #127 wc          ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
    if_nc       fges    d, minus128 wc      ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                bitc    flags, #VFLAG       ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                test    d1, #255 wz         '  |                         ADD    ADC                             
                testb   d1, #8 wc           '  |                         ADD    ADC
                bitc    flags, #CFLAG       '  |                         ADD    ADC
                testb   flags, #ZFLAG andz  ' CPC          SBC                                SBCI
                bitz    flags, #ZFLAG       ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                testb   d1, #7 wc           ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                bitc    flags, #NFLAG       ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                testb   flags, #VFLAG xorc  ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                bitc    flags, #SFLAG       ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                and     s1, #$f             ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                cmp     d2, s1 wc           '        CP           SUB     |      |     CPI           SUBI
                cmpx    d2, s1 wc           ' CPC          SBC            |      |            SBCI
                add     d2, s1 wc           '                            ADD     | 
                addx    d2, s1 wc           '                             |     ADC
    _ret_       bitc    flags, #HFLAG       ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI

TonyB_ · 2022-01-17 14:26

This is how I do H and V flags:

                mov     temp,dest
                add     dest,src            'arithmetic operation
                xor     temp,src
                xor     temp,dest           'temp =  dest ^ src ^ result

'temp[bit] = carry into result[bit]
'overflow if carry into result[msb] != carry out of result[msb]

                testb   dest,#8     wc
                bitc    flags,#CF
                testb   temp,#7     xorc
                bitc    flags,#VF
                testb   temp,#4
                bitc    flags,#HF

I notice that rather stupidly neither of the AVR sign and half-carry bits in SREG match the corresponding bits in the result (7 & 4), unlike flag reg in 8080/Z80.

Wuerfel_21 · 2022-01-17 15:33

Yeah, flags are a PITA. I use the same idea as Tony to compute C/H/V flags.

What about SuperH?
That ISA is 32 bit and only has one status bit (called T) that isn't even written by the basic ADD/SUB and all the opcode fields are pleasantly nibble aligned, unlike 68000. Though obnoxiously, it's the top and bottom nibbles of the 16 bit instruction that determine the operation and the middle two nibbles select the registers.

Or maybe try getting Eric's RISC-V recompiler to deal with external memory...

Wuerfel_21 · 2022-01-17 20:06

It is on... Me vs. the last damn instruction.

And of course it's one that's implemented by that mess I've shown earlier (which I've already fixed plenty of bugs in)

Circuitsoft · 2022-01-17 21:13

@rogloh said:
After looking at these CPU emulators you've been developing @Wuerfel_21 I now wonder how hard an AVR would be to emulate on a P2? It might be good for larger programs running from PSRAM. Especially given full GCC and C++ toolchain support for this microarchitecture. It wouldn't be as fast (maybe just a few MIPs instead of 16MIPs) but it could be useful for external memory program expansion and it might be a reasonable match with the performance of a PSRAM random read anyway.

Here's the ISA:

https://en.wikipedia.org/wiki/Atmel_AVR_instruction_set

After a quick look it seems like it should fit in a COG okay. I've started looking at the decoding and the flag stuff it might need. Idea is a LUT table holding EXECF vectors taking up 256 entries and indexed with RDLUT by upper byte of opcode. Regs stored in COGRAM and they could be indirectly indexed by setbyte/getbyte perhaps to maintain their 8 bit nature. ALU flags update will slow things a bit but many of the instructions would be simple. Nibble aligned register addresses helps a lot.

A better target may be the Zylin ZPU. It has a GCC target, and there's already a P1 implementation of it. It's made to be very small to implement in an FPGA, and not particularly fast.

Wuerfel_21 · 2022-01-18 00:06

Okay, Z80 seems to work "good enough" now. All the documented features are there and most of the undocumented ones, too. No interrupts yet. Computed cycle counts are off in some cases, but in the right ballpark, I think.

I've attached it and the compiled test program (source here (note that the newest version of sjasm is busted, use the previous one)), perhaps someone wants to compare it to @Cluso99 's Z80 core.

Now I'll need to fit it and the OPN into the emulator proper. That shouldn't be too much work, though the OPN needs a PASM I/O interface and timers implemented.

rogloh · 2022-01-18 04:22

@Wuerfel_21 said:
Now I'll need to fit it and the OPN into the emulator proper. That shouldn't be too much work, though the OPN needs a PASM I/O interface and timers implemented.

That's the fun part!

rogloh · 2022-01-18 04:26

@TonyB_ said:
This is how I do H and V flags:

Cool, will check this out, could save me some cycles, as well as the XBYTE thing. Maybe this AVR could run at a reasonable speed in the end (at least from HUB).

I've got this weird idea to try to somehow use FBLOCK and PSRAM reads into a cycling HUB region for the FIFO to use, tracking the emulated PC as we go and using the FIFO wrap interrupt too. Needs more thought but something like it may work ok for emulating from external RAM....

TonyB_ · 2022-01-18 12:32

@rogloh said:

@TonyB_ said:
This is how I do H and V flags:

Cool, will check this out, could save me some cycles, as well as the XBYTE thing. Maybe this AVR could run at a reasonable speed in the end (at least from HUB).

The double XOR trick should save a few cycles. Main advantage is no need to do extra 4-bit arithmetic operation to get half-carry. The best I can do Z80 ADD A/ADC A/SUB/SBC A flags is 22 cycles, which includes YF & XF for free. CP s takes 26 cycles as YF & XF come from s, not the result.

One thing is very clear. If a future P2+/P3 had a new FLAGS D instruction (perhaps involving SETQ) to handle legacy CPU flags (Z80 and 8086 at least) then the time savings would be huge. A new MOVBITS D,S that moves bits within low byte of D as given by S could convert supported flags to other CPUs and generally would be very handy.

MOVBYTS D,{#}S could change to MOVBYTS D,#S and MOVBITS D,S.

I've got this weird idea to try to somehow use FBLOCK and PSRAM reads into a cycling HUB region for the FIFO to use, tracking the emulated PC as we go and using the FIFO wrap interrupt too. Needs more thought but something like it may work ok for emulating from external RAM....

It would be great if you could get PSRAM and XBYTE working together. Automatic FIFO wrapping using RDFAST seems to be impossible for CPU emulators because the wrap address needs to be a fixed number of blocks from a base address (CPU address zero), not the start address (CPU PC). Perhaps FBLOCK could solve this?

rogloh · 2022-01-18 12:47

@TonyB_ 22-26 cycles for flags seems about right. I've got my ALU stuff for AVR down to this below (still untested, could have bugs)...

By the way if @Wuerfel_21 doesn't want these additional CPU emulator discussions cluttering up her console emulation thread, happy to start a new thread, although there are two CPU emulators, 68k and Z80 currently being worked on here, so dunno where it fits.

It would be great if you could PSRAM and XBYTE working together. Automatic FIFO wrapping using RDFAST seems to be impossible for CPU emulators because the wrap address needs to be a fixed number of blocks from a base address (CPU address zero), not the start address (CPU PC). Perhaps FBLOCK could solve this?

Hoping it might somehow, and if I can dynamically read into smaller 64 byte blocks and effectively create some type of I-cache... I wonder if that FIFO interrupt can be triggered at the end of reading the last byte/word/long from a block before we are about to wrap back to the beginning or it if happens AFTER you just wrapped, and have already read the first byte from the beginning of the block. If it's the former, we may have a way to read in another block on the fly when we need to before using an out of date instruction (TBD). Perhaps the block address could be re-allocated to one using another area of HUB RAM already read (forming some type of rudimentary I-cache). Those are my nebulous thoughts so far anyway.

op_cpc
op_cp
op_sbc
op_sub
op_add
op_adc
op_cpi
op_sbci
op_subi
                getnib  d, instr, #1        ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                altgb   d, #regbase         '  l     l      l      l      l      l      l      l      l
                altgb   d, #regbase+4       '  h     h      h      h      h      l      l      l      l
                getbyte d1, 0-0, #0         ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                getnib  s, instr, #0        ' CPC    CP    SBC    SUB    ADD    ADC     |      |      |
                altgb   s, #regbase         '  l     l      l      l      l      l      |      |      | 
                altgb   s, #regbase+4       '  h     h      h      h      h      h      |      |      |
                getbyte s1, 0-0, #0         ' CPC    CP    SBC    SUB    ADD    ADC     |      |      |
                getnib  s1, instr, #2       '  |     |      |      |      |      |     CPI    SBCI   SUBI
                rolnib  s1, instr, #0       '  |     |      |      |      |      |     CPI    SBCI   SUBI
                getbyte d2, d1, #0          ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                testb   flags, #CFLAG wc    ' CPC    |     SBC     |      |     ADC     |     SBCI    |
                subx    d1, s1 wz           ' CPC    |     SBC     |      |      |      |     SBCI    |
                add     d1, s1              '  |     |      |      |     ADD     |      |      |      |
                sub     d1, s1 wz           '  |     CP     |     SUB     |      |     CPI     |     SUBI
                addx    d1, s1              '  |     |      |      |      |     ADC     |      |      |
                altsb   d, #regbase         '  |     |      l      l      l      l      |      l      l
                altsb   d, #regbase+4       '  |     |      h      h      h      l      |      l      l
                setbyte 0-0, d1, #0         '  |     |     SBC    SUB    ADD    ADC     |     SBCI   SUBI
                test    d1, #$ff wz         '  |     |      |      |     ADD    ADC     |      |      |
                testb   flags, #ZFLAG andz  ' CPC    |     SBC     |      |      |      |     SBCI    |
                bitz    flags, #ZFLAG       ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                xor     d2, s1              ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI     
                xor     d2, d1              ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                testb   d1, #7 wc           ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                bitc    flags, #NFLAG       ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                testb   d1, #8 wc           ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                bitc    flags, #CFLAG       ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                testb   d2, #7 xorc         ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                bitc    flags, #VFLAG       ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                testb   flags, #NFLAG xorc  ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                bitc    flags, #SFLAG       ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
                testb   d2, #4 wc           ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI
    _ret_       bitc    flags, #HFLAG       ' CPC    CP    SBC    SUB    ADD    ADC    CPI    SBCI   SUBI

evanh · 2022-01-18 13:06

@rogloh said:
Hoping it might somehow, and if I can dynamically read into smaller 64 byte blocks and effectively create some type of I-cache... I wonder if that FIFO interrupt can be triggered at the end of reading the last byte/word/long from a block before we are about to wrap back to the beginning or it if happens AFTER you just wrapped, and have already read the first byte from the beginning of the block. If it's the former, we may have a way to read in another block on the fly when we need to before using an out of date instruction (TBD). Perhaps the block address could be re-allocated to one using another area of HUB RAM already read (forming some type of rudimentary I-cache). Those are my nebulous thoughts so far anyway.

Reading the docs ... The FBW event occurs after the contiguous blocks are completed and it has started the next lot. FBLOCK commands are buffered just like the streamer commands.

Therefore to make effective use of FBLOCK for a random block list, you have to load up two block commands before the first FBW event gets triggered. So start with the RDFAST for the first block (with block count of one). Then immediately issue FBLOCK for the second block (also with block count of one). Then enable interrupts, or in the main loop, upon FBW event, issue another FBLOCK (each one having a block count of one).

At time of FBW event, for counts of one, the issued FBLOCK is for a whole block in the future of where the emulator is decoding.

PS: Also, when using non-zero block counts, all start addresses have to be longword aligned.

TonyB_ · 2022-01-18 13:29

Although the wrap address could (and should) be on a 64-byte boundary, the RDFAST start address could be anywhere within a 64-byte block. This might be a problem as the block count starts from the RDFAST address.

If required, could the PSRAM code start the streamer before the PSRAM is ready and stream in rubbish data before the desired start address data? This would have to be done only for the RDFAST block.

evanh · 2022-01-18 13:47

@TonyB_ said:
Although the wrap address could (and should) be on a 64-byte boundary, ...

Only has to be longword aligned. But each block is 64 bytes in size. So, if not all 64 bytes is wanted you still have to read the remainder out of the FIFO before the new block appears.

The alternative is to start again with a fresh RDFAST ... Which maybe a suitable solution because by using a $8000_0000 for the block count it'll fetch in the background and if you've got other things to attend while the FIFO fills then bobs-your-uncle.

And what's more, $8000_0000 is treated as a zero block count. Which means byte alignment is okay to use.

TonyB_ · 2022-01-18 13:54

@evanh said:
PS: Also, when using non-zero block counts, all start addresses have to be longword aligned.

Is this another problem, that rules out automatic wrapping with RDFAST & FBLOCK? On branching, first byte in FIFO must be opcode at exact jump or call address, not up to 3 bytes lower.

evanh · 2022-01-18 13:58

@TonyB_ said:

@evanh said:
PS: Also, when using non-zero block counts, all start addresses have to be longword aligned.

Is this another problem, that rules out automatic wrapping with RDFAST & FBLOCK? On branching, first byte in FIFO must be opcode at exact jump or call address, not up to 3 bytes lower.

I figured, with fixed 64-byte blocks, wasted fetches all round was going to be order of things.

PS: Note, as above, zero count can be byte aligned.

Console Emulation

Comments