Console Emulation

pik33 · 2022-06-09 20:34

This is more weird than I thought.

Let recall these experiments.

(1) I replaced reading from PSRAM with setting "gradient type" fields The result was: a lot of blinikng, every third (!! why??) sprite was gradient as expected, the rest was bad at all
(2) I removed all PSRAM related code. What I got was all sprites "gradient" with a little blinking
(3) I added several nops before setting the sprite, after a third nop added now every 2 from 3 gradient sprites disappeared
(4) I restricted the lspc to make sprite list for first 4 lines only, expecting that they will repeat on the rest of the screen. They repeated, but with an accompaniament of random blinks, they disappeared gradually on the bottom of the screen

The conclusion: something overwrites the sprite buffer But there was nothing to do this except the memory cog

(5) I restricted sprite number to 6. Now I had only 6 of them with all special effects unchanged
(6) What the rdlong ptrb[2] does? It checks for sentinels, which I don't need, I have only 6 sprites. It reads the address which I dont use as I fill the sprite data manually. Let's comment it out.

The result: no blinking at all, 6 good gradient sprites. So..

(7) let's make the sprite list at the full screen, not 4 lines
(8) let's restore the PSRAM reading. Blinks. Comment the rdlong ptrb[2]. No blink. So, this causes the blink effect somewhat. Declare a helper variable and use it instead - bingo! A good part (still 6 sprites) of screen!
(9) is it ptrb which have too little time to settle? (now I know this in a bad conclusion, but.. ) Let's remove the variable and add the nop. Works. The problem worked around.

Now I can explain several effects.

(1) I have one column of 2 line good pixels, 2 line nothing. That was the first sprite. No djnz yet.
(2) after I changed the buffer address compute method I got a full column of good pixels. What changed? No conditional and add ptrb,reg instead of imm.

I cannot explain the effect of every third sprite good and 2 others bad when fed with constant data. I cannot explain why this effect appears when adding nops to the loop.

But then this effect, and the blinking, was caused by rdlong ptrb[2] with unused result (!!!)

This means rdlong ptrb[2] without a nop actually caused write to the hub (?!!??) Also, no amount of nops after adding #16 to ptrb and doing djnz helps, so it is djnz followed by rdlong, that caused the problem. A pipeline bug?

This is potentially much more dangerous effect than one nop delay.

I have to repeat several test using NeoYume. Then check this on a simple code if it can be done using a simple code at all. NeoYume is a complex code with all 8 cogs running with complex interactions between them.

jmg · 2022-06-09 20:59

@pik33 said:

Also, no amount of nops after adding #16 to ptrb and doing djnz helps, so it is djnz followed by rdlong, that caused the problem. A pipeline bug?

DJNZ is a tighter opcode, because it has to change a value and then make a decision, but somehow fit that into the same time as a JMP.
If you change DJNZ to a two line form (decrement then conditional jump) does that resolve the situation ?

This means rdlong ptrb[2] without a nop actually caused write to the hub (?!!??)

That seems extreme, what if the effect is a bit more subtle ?
Suppose the delays inside DJNZ make the PC not stable enough ?

pik33 · 2022-06-09 21:04

DJNZ is a tighter opcode, because it has to change a value and then make a decision,

These are 2 of them in row, rdlong ptrb[2] has to add #8 to ptrb, then do reading, so another check: add 24 to ptrb, read without index after djnz.

Wuerfel_21 · 2022-06-09 21:20

@pik33 said:
This is more weird than I thought.

This is potentially much more dangerous effect than one nop delay.

Truly a rabbit hole leading straight to hell.

I have to repeat several test using NeoYume. Then check this on a simple code if it can be done using a simple code at all. NeoYume is a complex code with all 8 cogs running with complex interactions between them.

Yea, simple test case go. Though if you want to, you can just turn off the Z80/YM2610 cogs.

pik33 · 2022-06-10 09:10

If you change DJNZ to a two line form (decrement then conditional jump) does that resolve the situation ?

No.

But djnz, and then

add ptrb,#8
rdlong ma_mtmp1,ptrb wc
sub ptrb,8

works

evanh · 2022-06-10 11:32

Typo? Meant to be sub ptrb,#8 I presume.

pik33 · 2022-06-10 13:14

Yes, of course, without # it will not work at all.

rogloh · 2022-06-10 13:21

Interesting you mentioned that your board gets very hot after running NeoYume pik33. I find after maybe 15mins or so mine is somewhat warm but it's certainly not hot. I have it in the JonnyMac board so maybe that transfers heat slightly differently to your setup, or perhaps the clock rate on your P2 is stressing it out.

Wuerfel_21 · 2022-06-10 13:34

Mine runs fairly hot, too (and as you could see, it crashes quite fast if heated beyond the thermal equilibrium it reaches during normal operation)

rogloh · 2022-06-11 01:45

It would be interesting to know at what frequency pik33's board comes good with code that normally fails. To be able to run it slower without dropping below the monitor's vertical refresh limit you might have to draw less total scan lines on the screen and feed via analog VGA to a multi-sync monitor. Hopefully that would be a small change that could keep the failure mode occurring while the frequency drops. Eg. does it come good at 250MHz or 200MHz etc, or does it remain bad regardless down to 160MHz which would indicate a bad or damaged P2.

evanh · 2022-06-11 01:53

HDMI will happily go lower scan rates but obviously minimum sysclock is 250 MHz then.

rogloh · 2022-06-11 02:05

I'd just use VGA ideally with an analog monitor if you have one available. Drop to 350 active scanlines from 480 and it can look a bit like an EGA signal. Monitors should still accept that. Hopefully even LCDs will still recognize that old format.

evanh · 2022-06-11 02:52

I mean the same monitor/TV will go lower refresh when connected via HDMI. Unless you've got a CRT, LCD's are more fussy about scan rates over VGA.

PS: My limited experience of 640x350 is it is bound tightly to 70 Hz refresh on VGA inputs. 640x400 I think has more leeway. PPS: But the 30 kHz minimum hsync will likely get in the way.

pik33 · 2022-06-11 10:23

Adding a nop after djnz partially fixes the problem. Partially, because I got only half of lines. The second half is probably not displayed because of this if_c before rdlong. This effect can be fixed by using alternate algorithm without ifs.

So it seems the pipeline clearing after jump or conditional is what is not 100% working in this cog.

rogloh · 2022-06-11 11:21

Yeah but the interesting thing would be if this bad COG issue persists even down to 160MHz or so.

Wuerfel_21 · 2022-06-11 12:19

Meanwhile, actual developments. Even though Metal Slug X really needs more than 32MB, it's still interesting to mess around with because it uses a unique copy protection scheme involving a dongle chip. Current MAME actually emulates the chip, but old MAME just does a find-replace for the code pattern that triggers the protection. There's a pre-patched ROM floating around (apparently these chips are failure-prone, so people need to install the patched ROM into genuine cartridges to fix them), but I think it'd be nicer to apply the patch to the original ROM on load.

So far, no dice.

Interstingly, this is the same message other games use when they detect a counterfeit MVS machine. Too lazy to write a new one?

Yanomani · 2022-06-11 12:25

@pik33 said:
Adding a nop after djnz partially fixes the problem. Partially, because I got only half of lines. The second half is probably not displayed because of this if_c before rdlong. This effect can be fixed by using alternate algorithm without ifs.

So it seems the pipeline clearing after jump or conditional is what is not 100% working in this cog.

I believe the following code excerpt (as posted by Ada, in another thread) is the one you've refered to, so I did a small mod on it, just to test an alternative way that uses slight less resources from the Cog that runs it.

Could you please test it (and sure, fell free to insert/remove or exchange anything you find appropriate, including trash it all ), just to verify if it leads to any new useful clues?

@Wuerfel_21 said:
Pik's funny issue isn't really depending on hub alignment though. The issue seems to be caused by a certain instruction pattern, regardless of hub alignment, clock speed and surrounding code changes (and of course localized entirely to one particular cog on one particular chip)

Here's the entire graphics fetching loop from NeoYume:

              '' adding NOP here does not bypass bug
.slotlp
              '' NOP here does bypass bug
              rdlong ma_mtmp1,ptrb[2] wc
        if_c  jmp #ma_lineloop ' got sentinel

              shl     ma_mtmp1,#1 ' sprite lines are 2 longs
              add     ma_mtmp1,ma_char_base
              setbyte ma_mtmp1,#$EB,#3
              splitb  ma_mtmp1
              rev     ma_mtmp1
              movbyts ma_mtmp1, #%%0123
              mergeb  ma_mtmp1
              ''
              ''
              ''rep @.irqshield,#1
              ''
              STALLI
              ''
              drvh  #PSRAM_SELECT
              drvl  ma_psram_pinfield
              xinit ma_psram_addr_cmd,ma_mtmp1
              wypin #(8+PSRAM_WAIT+4)*2,#PSRAM_CLK
              setq ma_nco_fast
              xcont #PSRAM_WAIT*2+PSRAM_DELAY,#0
              wrfast ma_bit31,ptrb
              waitxmt
              fltl ma_psram_pinfield
              setq ma_nco_slow
              xcont ma_psram_readspr_cmd,#0
              waitxfi
              drvl #PSRAM_SELECT
.irqshield
              ''
              ALLOWI
              ''
              add ptrb,#4*4
              '' adding NOP here does not bypass bug
              djnz ma_slotleft,#.slotlp

As you may see, the only hub-aligning instruction is the one RDLONG and since adding a NOP at the bottom of the loop doesn't bypass the bug, it is most certainly not hub alignment.

Wuerfel_21 · 2022-06-11 12:50

STALLI/ALLOWI instead of REP? could be worth a try I guess.

Unrelatedly, figured out howmst into patch. Game runs now, but of course with 2/3 of the graphics missing. Most convoluted load script so far.

'' NOTE: TOO BIG
game_mslugx byte "mslugx",0,"Metal Slug X",0,"250-",0,GAME_ISSUE
byte SET_CHAR
byte LOAD_CROM,"C1",0,"C2",0,word $100_0
'byte LOAD_CROM,"C3",0,"C4",0,word $100_0
'byte LOAD_CROM,"C5",0,"C6",0,word $100_0
byte SET_PROG
byte LOAD_BSWAP,"P1",0,word $10_0
'' Copy protection patches
'{
byte PATCH_PATTERN,long -$10_0000,long $8_0000-3
    byte 2,6, $02,$43,$00,$01,$66,$00 ' Find pattern (ANDI.W #1,D3; BNE.W xxxx)
    byte 4,4, $4E,$71,$4E,$71         ' Replace pattern (replace BNE.W with NOPs)
'}
'{
byte PATCH_PATTERN,long -$10_0000,long $8_0000-3
    byte 2,6, $43,$F9,$00,$2F,$FF,$E0 ' Find pattern (LEA $2FFFE0,A1)
    byte 2,4, $00,$E0,$00,$00         ' Replace pattern (use $E00000)
'}
'{
byte PATCH_SINGLE,long $3BDC - $10_0000,6, $4E,$71,$4E,$71,$4E,$71
byte PATCH_SINGLE,long $3C0C - $10_0000,6, $4E,$71,$4E,$71,$4E,$71
byte PATCH_SINGLE,long $3C36 - $10_0000,4, $4E,$71,$4E,$71
'}
byte SET_PORT,SET_BANKSWITCH_4M
byte LOAD_BSWAP,"P2",0,word $40_0
byte SET_ADPCM_A,SET_ADPCM_B
byte LOAD_RAW,"V1",0,word $40_0
byte LOAD_RAW,"V2",0,word $40_0
byte LOAD_RAW,"V3",0,word $20_0
byte SET_FIX
byte LOAD_SROM,"S1",0,word $02_0
byte SET_Z80
byte LOAD_RAW,"M1",0,word $02_0
byte SET_BIOS,LOAD_BIOS
byte 0

evanh · 2022-06-11 12:58

Here's the corrected auto-delays in psram.spin2 (Used in MegaYume) for use on Edge EC32MB card. First line I've retained as commented out original table. Second line is the updated table.

'delayTable  long    7,92_000000,150_000000,206_000000,258_000000,310_000000,333_000000,0  'Eval + Add-on
delayTable  long    7,130_000000,190_000000,240_000000,290_000000,340_000000  'Edge EC32MB

Wuerfel_21 · 2022-06-11 13:01

FYI, NeoYume also uses psram.spin2 for the loader stage

EDIT: wait, no, none of them use psram.spin2 right now

Yanomani · 2022-06-11 13:24

@Wuerfel_21 said:
STALLI/ALLOWI instead of REP? could be worth a try I guess.

Just to clarify things a bit: it doesn't need to be solelly the STALLI/ALLOWI-pair; in fact, it can be any of the available long-range interrupt-blocking mechanisms, able to embrace the whole code block that needs shielding. This allows leveraging from any interval between the interrupt-blocking-action "starter instruction", and the one that "completes" it, thus releasing the interrupt-blocking, e.g. (as extracted from the docs)(GETCT+WC doesn't generally applies, due to being "too-short-sighted" ):

● ALTxx / CRCNIB / SCA / SCAS / GETCT+WC / GETXACC / SETQ / SETQ2 / XORO32 / XBYTE must not be executing
● AUGS must not be executing or waiting for a S/# instruction
● AUGD must not be executing or waiting for a D/# instruction
● REP must not be executing or active
● STALLI must not be executing or active
● The cog must not be stalled in any WAITx instruction

evanh · 2022-06-11 16:26

Ah, I made Roger happy anyway ... Went and got me the newer MegaYume with SD reading - https://github.com/IRQsome ... and NeoYume too ... now Flexspin is complaining even more:

/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fopen.c:92: warning: Redefining function or subroutine fopen with an incompatible type
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fopen.c:96: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fseek.c:15: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fseek.c:28: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fwrite.c:7: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fwrite.c:24: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fwrite.c:31: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fputs.c:14: warning: Redefining function or subroutine fputs with an incompatible type
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fputs.c:31: warning: Redefining function or subroutine puts with an incompatible type

and NeoYume doesn't see any ROM files on the SD card ... It just gets Directory error: Unknown error... Do they have to be valid ROM images? MegaYume is working even though I get the same compile warnings.

EDIT: hang on, I didn't make the NEOYUME directory ... right, got it saying NO GAMES! now.

Wuerfel_21 · 2022-06-11 16:37

Yeah, it kinda splorps when using libc, but that's just warnings, it does actually work.

Cluso99 · 2022-06-12 00:46

Time to add a Raspberry Pi heatsink to P2. You can buy a set of 4 for RPi on eBay/ali cheap - the larger one fits nicely on P2.
You can also get a fan - IIRC 20x20x7 5VDC fits nicely on the heatsink but I’ve not tried how to fasten it.

Wuerfel_21 · 2022-06-13 21:26

Slide preview, part 2.

This one is as deranged as it needs to be.
Blabbing will commence in 2 days, mark your calendars if you want to hear unfunny funnis vaguely tangential to P2 programming.

Also, Reimu.

evanh · 2022-06-14 00:34

The Reimu prop.

pik33 · 2022-06-14 06:30

The nop can be even more magic. Add it after (under) djnz so it is never executed in the loop but it is still doing (partially: the error after if_c before the loop remains) its job

rogloh · 2022-06-14 08:21

What's the lowest sysclk you can currently reproduce your problem @pik33?

macca · 2022-06-14 08:26

@pik33 said:
The nop can be even more magic. Add it after (under) djnz so it is never executed in the loop but it is still doing (partially: the error after if_c before the loop remains) its job

This ptr bug is interesting (and also a bit scary...), has @cgracey been informed about it ?
And, maybe more important, can it be reproduced with a (lot) less code, to see if other chips/cogs may be affected ?

evanh · 2022-06-14 08:51

Isn't it exclusive to Pik's revB EC32MB? No one else has reproduced it, right?

EDIT: Actually, there's only three people with the requisite ROM files I think. I'm clueless on where to find them.

Console Emulation

Comments

Slide preview, part 2.