Shop OBEX P1 Docs P2 Docs Learn Events
Console Emulation - Page 29 — Parallax Forums

Console Emulation

1262729313268

Comments

  • pik33pik33 Posts: 2,366
    edited 2022-06-09 20:56

    This is more weird than I thought.

    Let recall these experiments.

    (1) I replaced reading from PSRAM with setting "gradient type" fields The result was: a lot of blinikng, every third (!! why??) sprite was gradient as expected, the rest was bad at all
    (2) I removed all PSRAM related code. What I got was all sprites "gradient" with a little blinking
    (3) I added several nops before setting the sprite, after a third nop added now every 2 from 3 gradient sprites disappeared
    (4) I restricted the lspc to make sprite list for first 4 lines only, expecting that they will repeat on the rest of the screen. They repeated, but with an accompaniament of random blinks, they disappeared gradually on the bottom of the screen

    The conclusion: something overwrites the sprite buffer But there was nothing to do this except the memory cog

    (5) I restricted sprite number to 6. Now I had only 6 of them with all special effects unchanged
    (6) What the rdlong ptrb[2] does? It checks for sentinels, which I don't need, I have only 6 sprites. It reads the address which I dont use as I fill the sprite data manually. Let's comment it out.

    The result: no blinking at all, 6 good gradient sprites. So..

    (7) let's make the sprite list at the full screen, not 4 lines
    (8) let's restore the PSRAM reading. Blinks. Comment the rdlong ptrb[2]. No blink. So, this causes the blink effect somewhat. Declare a helper variable and use it instead - bingo! A good part (still 6 sprites) of screen!
    (9) is it ptrb which have too little time to settle? (now I know this in a bad conclusion, but.. ) Let's remove the variable and add the nop. Works. The problem worked around.


    Now I can explain several effects.

    (1) I have one column of 2 line good pixels, 2 line nothing. That was the first sprite. No djnz yet.
    (2) after I changed the buffer address compute method I got a full column of good pixels. What changed? No conditional and add ptrb,reg instead of imm.

    I cannot explain the effect of every third sprite good and 2 others bad when fed with constant data. I cannot explain why this effect appears when adding nops to the loop.

    But then this effect, and the blinking, was caused by rdlong ptrb[2] with unused result (!!!)

    This means rdlong ptrb[2] without a nop actually caused write to the hub (?!!??) Also, no amount of nops after adding #16 to ptrb and doing djnz helps, so it is djnz followed by rdlong, that caused the problem. A pipeline bug?

    This is potentially much more dangerous effect than one nop delay.

    I have to repeat several test using NeoYume. Then check this on a simple code if it can be done using a simple code at all. NeoYume is a complex code with all 8 cogs running with complex interactions between them.

  • jmgjmg Posts: 15,173

    @pik33 said:

    Also, no amount of nops after adding #16 to ptrb and doing djnz helps, so it is djnz followed by rdlong, that caused the problem. A pipeline bug?

    DJNZ is a tighter opcode, because it has to change a value and then make a decision, but somehow fit that into the same time as a JMP.
    If you change DJNZ to a two line form (decrement then conditional jump) does that resolve the situation ?

    This means rdlong ptrb[2] without a nop actually caused write to the hub (?!!??)

    That seems extreme, what if the effect is a bit more subtle ?
    Suppose the delays inside DJNZ make the PC not stable enough ?

  • pik33pik33 Posts: 2,366

    DJNZ is a tighter opcode, because it has to change a value and then make a decision,

    These are 2 of them in row, rdlong ptrb[2] has to add #8 to ptrb, then do reading, so another check: add 24 to ptrb, read without index after djnz.

  • @pik33 said:
    This is more weird than I thought.

    This is potentially much more dangerous effect than one nop delay.

    Truly a rabbit hole leading straight to hell.

    I have to repeat several test using NeoYume. Then check this on a simple code if it can be done using a simple code at all. NeoYume is a complex code with all 8 cogs running with complex interactions between them.

    Yea, simple test case go. Though if you want to, you can just turn off the Z80/YM2610 cogs.

  • pik33pik33 Posts: 2,366

    If you change DJNZ to a two line form (decrement then conditional jump) does that resolve the situation ?

    No.

    But djnz, and then

    add ptrb,#8
    rdlong ma_mtmp1,ptrb wc
    sub ptrb,8
    

    works

  • evanhevanh Posts: 15,915

    Typo? Meant to be sub ptrb,#8 I presume.

  • pik33pik33 Posts: 2,366

    Yes, of course, without # it will not work at all.

  • Interesting you mentioned that your board gets very hot after running NeoYume pik33. I find after maybe 15mins or so mine is somewhat warm but it's certainly not hot. I have it in the JonnyMac board so maybe that transfers heat slightly differently to your setup, or perhaps the clock rate on your P2 is stressing it out.

  • Mine runs fairly hot, too (and as you could see, it crashes quite fast if heated beyond the thermal equilibrium it reaches during normal operation)

  • It would be interesting to know at what frequency pik33's board comes good with code that normally fails. To be able to run it slower without dropping below the monitor's vertical refresh limit you might have to draw less total scan lines on the screen and feed via analog VGA to a multi-sync monitor. Hopefully that would be a small change that could keep the failure mode occurring while the frequency drops. Eg. does it come good at 250MHz or 200MHz etc, or does it remain bad regardless down to 160MHz which would indicate a bad or damaged P2.

  • evanhevanh Posts: 15,915

    HDMI will happily go lower scan rates but obviously minimum sysclock is 250 MHz then.

  • roglohrogloh Posts: 5,786
    edited 2022-06-11 02:06

    I'd just use VGA ideally with an analog monitor if you have one available. Drop to 350 active scanlines from 480 and it can look a bit like an EGA signal. Monitors should still accept that. Hopefully even LCDs will still recognize that old format.

  • evanhevanh Posts: 15,915
    edited 2022-06-11 03:15

    I mean the same monitor/TV will go lower refresh when connected via HDMI. Unless you've got a CRT, LCD's are more fussy about scan rates over VGA.

    PS: My limited experience of 640x350 is it is bound tightly to 70 Hz refresh on VGA inputs. 640x400 I think has more leeway. PPS: But the 30 kHz minimum hsync will likely get in the way.

  • pik33pik33 Posts: 2,366
    edited 2022-06-11 10:24

    Adding a nop after djnz partially fixes the problem. Partially, because I got only half of lines. The second half is probably not displayed because of this if_c before rdlong. This effect can be fixed by using alternate algorithm without ifs.

    So it seems the pipeline clearing after jump or conditional is what is not 100% working in this cog.

  • Yeah but the interesting thing would be if this bad COG issue persists even down to 160MHz or so.

  • Meanwhile, actual developments. Even though Metal Slug X really needs more than 32MB, it's still interesting to mess around with because it uses a unique copy protection scheme involving a dongle chip. Current MAME actually emulates the chip, but old MAME just does a find-replace for the code pattern that triggers the protection. There's a pre-patched ROM floating around (apparently these chips are failure-prone, so people need to install the patched ROM into genuine cartridges to fix them), but I think it'd be nicer to apply the patch to the original ROM on load.

    So far, no dice.

    Interstingly, this is the same message other games use when they detect a counterfeit MVS machine. Too lazy to write a new one?

  • @pik33 said:
    Adding a nop after djnz partially fixes the problem. Partially, because I got only half of lines. The second half is probably not displayed because of this if_c before rdlong. This effect can be fixed by using alternate algorithm without ifs.

    So it seems the pipeline clearing after jump or conditional is what is not 100% working in this cog.

    I believe the following code excerpt (as posted by Ada, in another thread) is the one you've refered to, so I did a small mod on it, just to test an alternative way that uses slight less resources from the Cog that runs it.

    Could you please test it (and sure, fell free to insert/remove or exchange anything you find appropriate, including trash it all :smile: ), just to verify if it leads to any new useful clues?

    @Wuerfel_21 said:
    Pik's funny issue isn't really depending on hub alignment though. The issue seems to be caused by a certain instruction pattern, regardless of hub alignment, clock speed and surrounding code changes (and of course localized entirely to one particular cog on one particular chip)

    Here's the entire graphics fetching loop from NeoYume:

                  '' adding NOP here does not bypass bug
    .slotlp
                  '' NOP here does bypass bug
                  rdlong ma_mtmp1,ptrb[2] wc
            if_c  jmp #ma_lineloop ' got sentinel
    
                  shl     ma_mtmp1,#1 ' sprite lines are 2 longs
                  add     ma_mtmp1,ma_char_base
                  setbyte ma_mtmp1,#$EB,#3
                  splitb  ma_mtmp1
                  rev     ma_mtmp1
                  movbyts ma_mtmp1, #%%0123
                  mergeb  ma_mtmp1
                  ''
                  ''
                  ''rep @.irqshield,#1
                  ''
                  STALLI
                  ''
                  drvh  #PSRAM_SELECT
                  drvl  ma_psram_pinfield
                  xinit ma_psram_addr_cmd,ma_mtmp1
                  wypin #(8+PSRAM_WAIT+4)*2,#PSRAM_CLK
                  setq ma_nco_fast
                  xcont #PSRAM_WAIT*2+PSRAM_DELAY,#0
                  wrfast ma_bit31,ptrb
                  waitxmt
                  fltl ma_psram_pinfield
                  setq ma_nco_slow
                  xcont ma_psram_readspr_cmd,#0
                  waitxfi
                  drvl #PSRAM_SELECT
    .irqshield
                  ''
                  ALLOWI
                  ''
                  add ptrb,#4*4
                  '' adding NOP here does not bypass bug
                  djnz ma_slotleft,#.slotlp
    

    As you may see, the only hub-aligning instruction is the one RDLONG and since adding a NOP at the bottom of the loop doesn't bypass the bug, it is most certainly not hub alignment.

  • STALLI/ALLOWI instead of REP? could be worth a try I guess.

    Unrelatedly, figured out howmst into patch. Game runs now, but of course with 2/3 of the graphics missing. Most convoluted load script so far.

    '' NOTE: TOO BIG
    game_mslugx byte "mslugx",0,"Metal Slug X",0,"250-",0,GAME_ISSUE
    byte SET_CHAR
    byte LOAD_CROM,"C1",0,"C2",0,word $100_0
    'byte LOAD_CROM,"C3",0,"C4",0,word $100_0
    'byte LOAD_CROM,"C5",0,"C6",0,word $100_0
    byte SET_PROG
    byte LOAD_BSWAP,"P1",0,word $10_0
    '' Copy protection patches
    '{
    byte PATCH_PATTERN,long -$10_0000,long $8_0000-3
        byte 2,6, $02,$43,$00,$01,$66,$00 ' Find pattern (ANDI.W #1,D3; BNE.W xxxx)
        byte 4,4, $4E,$71,$4E,$71         ' Replace pattern (replace BNE.W with NOPs)
    '}
    '{
    byte PATCH_PATTERN,long -$10_0000,long $8_0000-3
        byte 2,6, $43,$F9,$00,$2F,$FF,$E0 ' Find pattern (LEA $2FFFE0,A1)
        byte 2,4, $00,$E0,$00,$00         ' Replace pattern (use $E00000)
    '}
    '{
    byte PATCH_SINGLE,long $3BDC - $10_0000,6, $4E,$71,$4E,$71,$4E,$71
    byte PATCH_SINGLE,long $3C0C - $10_0000,6, $4E,$71,$4E,$71,$4E,$71
    byte PATCH_SINGLE,long $3C36 - $10_0000,4, $4E,$71,$4E,$71
    '}
    byte SET_PORT,SET_BANKSWITCH_4M
    byte LOAD_BSWAP,"P2",0,word $40_0
    byte SET_ADPCM_A,SET_ADPCM_B
    byte LOAD_RAW,"V1",0,word $40_0
    byte LOAD_RAW,"V2",0,word $40_0
    byte LOAD_RAW,"V3",0,word $20_0
    byte SET_FIX
    byte LOAD_SROM,"S1",0,word $02_0
    byte SET_Z80
    byte LOAD_RAW,"M1",0,word $02_0
    byte SET_BIOS,LOAD_BIOS
    byte 0
    
  • evanhevanh Posts: 15,915

    Here's the corrected auto-delays in psram.spin2 (Used in MegaYume) for use on Edge EC32MB card. First line I've retained as commented out original table. Second line is the updated table.

    'delayTable  long    7,92_000000,150_000000,206_000000,258_000000,310_000000,333_000000,0  'Eval + Add-on
    delayTable  long    7,130_000000,190_000000,240_000000,290_000000,340_000000  'Edge EC32MB
    
  • Wuerfel_21Wuerfel_21 Posts: 5,052
    edited 2022-06-11 13:02

    FYI, NeoYume also uses psram.spin2 for the loader stage

    EDIT: wait, no, none of them use psram.spin2 right now

  • @Wuerfel_21 said:
    STALLI/ALLOWI instead of REP? could be worth a try I guess.

    Just to clarify things a bit: it doesn't need to be solelly the STALLI/ALLOWI-pair; in fact, it can be any of the available long-range interrupt-blocking mechanisms, able to embrace the whole code block that needs shielding. This allows leveraging from any interval between the interrupt-blocking-action "starter instruction", and the one that "completes" it, thus releasing the interrupt-blocking, e.g. (as extracted from the docs)(GETCT+WC doesn't generally applies, due to being "too-short-sighted" :smile: ):

    ● ALTxx / CRCNIB / SCA / SCAS / GETCT+WC / GETXACC / SETQ / SETQ2 / XORO32 / XBYTE must not be executing
    ● AUGS must not be executing or waiting for a S/# instruction
    ● AUGD must not be executing or waiting for a D/# instruction
    ● REP must not be executing or active
    ● STALLI must not be executing or active
    ● The cog must not be stalled in any WAITx instruction

  • evanhevanh Posts: 15,915
    edited 2022-06-11 16:37

    Ah, I made Roger happy anyway ... Went and got me the newer MegaYume with SD reading - https://github.com/IRQsome ... and NeoYume too ... now Flexspin is complaining even more:

    /home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fopen.c:92: warning: Redefining function or subroutine fopen with an incompatible type
    /home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fopen.c:96: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
    /home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fseek.c:15: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
    /home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fseek.c:28: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
    /home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fwrite.c:7: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
    /home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fwrite.c:24: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
    /home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fwrite.c:31: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
    /home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fputs.c:14: warning: Redefining function or subroutine fputs with an incompatible type
    /home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fputs.c:31: warning: Redefining function or subroutine puts with an incompatible type
    

    and NeoYume doesn't see any ROM files on the SD card ... It just gets Directory error: Unknown error... Do they have to be valid ROM images? MegaYume is working even though I get the same compile warnings.

    EDIT: hang on, I didn't make the NEOYUME directory ... right, got it saying NO GAMES! now. :)

  • Yeah, it kinda splorps when using libc, but that's just warnings, it does actually work.

  • Cluso99Cluso99 Posts: 18,069

    Time to add a Raspberry Pi heatsink to P2. You can buy a set of 4 for RPi on eBay/ali cheap - the larger one fits nicely on P2.
    You can also get a fan - IIRC 20x20x7 5VDC fits nicely on the heatsink but I’ve not tried how to fasten it.

  • Slide preview, part 2.

    This one is as deranged as it needs to be.
    Blabbing will commence in 2 days, mark your calendars if you want to hear unfunny funnis vaguely tangential to P2 programming.

    Also, Reimu.

  • evanhevanh Posts: 15,915

    The Reimu prop.

  • pik33pik33 Posts: 2,366

    The nop can be even more magic. Add it after (under) djnz so it is never executed in the loop but it is still doing (partially: the error after if_c before the loop remains) its job

  • What's the lowest sysclk you can currently reproduce your problem @pik33?

  • @pik33 said:
    The nop can be even more magic. Add it after (under) djnz so it is never executed in the loop but it is still doing (partially: the error after if_c before the loop remains) its job

    This ptr bug is interesting (and also a bit scary...), has @cgracey been informed about it ?
    And, maybe more important, can it be reproduced with a (lot) less code, to see if other chips/cogs may be affected ?

  • evanhevanh Posts: 15,915
    edited 2022-06-14 08:55

    Isn't it exclusive to Pik's revB EC32MB? No one else has reproduced it, right?

    EDIT: Actually, there's only three people with the requisite ROM files I think. I'm clueless on where to find them.

Sign In or Register to comment.