(1) I replaced reading from PSRAM with setting "gradient type" fields The result was: a lot of blinikng, every third (!! why??) sprite was gradient as expected, the rest was bad at all
(2) I removed all PSRAM related code. What I got was all sprites "gradient" with a little blinking
(3) I added several nops before setting the sprite, after a third nop added now every 2 from 3 gradient sprites disappeared
(4) I restricted the lspc to make sprite list for first 4 lines only, expecting that they will repeat on the rest of the screen. They repeated, but with an accompaniament of random blinks, they disappeared gradually on the bottom of the screen
The conclusion: something overwrites the sprite buffer But there was nothing to do this except the memory cog
(5) I restricted sprite number to 6. Now I had only 6 of them with all special effects unchanged
(6) What the rdlong ptrb[2] does? It checks for sentinels, which I don't need, I have only 6 sprites. It reads the address which I dont use as I fill the sprite data manually. Let's comment it out.
The result: no blinking at all, 6 good gradient sprites. So..
(7) let's make the sprite list at the full screen, not 4 lines
(8) let's restore the PSRAM reading. Blinks. Comment the rdlong ptrb[2]. No blink. So, this causes the blink effect somewhat. Declare a helper variable and use it instead - bingo! A good part (still 6 sprites) of screen!
(9) is it ptrb which have too little time to settle? (now I know this in a bad conclusion, but.. ) Let's remove the variable and add the nop. Works. The problem worked around.
Now I can explain several effects.
(1) I have one column of 2 line good pixels, 2 line nothing. That was the first sprite. No djnz yet.
(2) after I changed the buffer address compute method I got a full column of good pixels. What changed? No conditional and add ptrb,reg instead of imm.
I cannot explain the effect of every third sprite good and 2 others bad when fed with constant data. I cannot explain why this effect appears when adding nops to the loop.
But then this effect, and the blinking, was caused by rdlong ptrb[2] with unused result (!!!)
This means rdlong ptrb[2] without a nop actually caused write to the hub (?!!??) Also, no amount of nops after adding #16 to ptrb and doing djnz helps, so it is djnz followed by rdlong, that caused the problem. A pipeline bug?
This is potentially much more dangerous effect than one nop delay.
I have to repeat several test using NeoYume. Then check this on a simple code if it can be done using a simple code at all. NeoYume is a complex code with all 8 cogs running with complex interactions between them.
Also, no amount of nops after adding #16 to ptrb and doing djnz helps, so it is djnz followed by rdlong, that caused the problem. A pipeline bug?
DJNZ is a tighter opcode, because it has to change a value and then make a decision, but somehow fit that into the same time as a JMP.
If you change DJNZ to a two line form (decrement then conditional jump) does that resolve the situation ?
This means rdlong ptrb[2] without a nop actually caused write to the hub (?!!??)
That seems extreme, what if the effect is a bit more subtle ?
Suppose the delays inside DJNZ make the PC not stable enough ?
This is potentially much more dangerous effect than one nop delay.
Truly a rabbit hole leading straight to hell.
I have to repeat several test using NeoYume. Then check this on a simple code if it can be done using a simple code at all. NeoYume is a complex code with all 8 cogs running with complex interactions between them.
Yea, simple test case go. Though if you want to, you can just turn off the Z80/YM2610 cogs.
Interesting you mentioned that your board gets very hot after running NeoYume pik33. I find after maybe 15mins or so mine is somewhat warm but it's certainly not hot. I have it in the JonnyMac board so maybe that transfers heat slightly differently to your setup, or perhaps the clock rate on your P2 is stressing it out.
It would be interesting to know at what frequency pik33's board comes good with code that normally fails. To be able to run it slower without dropping below the monitor's vertical refresh limit you might have to draw less total scan lines on the screen and feed via analog VGA to a multi-sync monitor. Hopefully that would be a small change that could keep the failure mode occurring while the frequency drops. Eg. does it come good at 250MHz or 200MHz etc, or does it remain bad regardless down to 160MHz which would indicate a bad or damaged P2.
I'd just use VGA ideally with an analog monitor if you have one available. Drop to 350 active scanlines from 480 and it can look a bit like an EGA signal. Monitors should still accept that. Hopefully even LCDs will still recognize that old format.
I mean the same monitor/TV will go lower refresh when connected via HDMI. Unless you've got a CRT, LCD's are more fussy about scan rates over VGA.
PS: My limited experience of 640x350 is it is bound tightly to 70 Hz refresh on VGA inputs. 640x400 I think has more leeway. PPS: But the 30 kHz minimum hsync will likely get in the way.
Adding a nop after djnz partially fixes the problem. Partially, because I got only half of lines. The second half is probably not displayed because of this if_c before rdlong. This effect can be fixed by using alternate algorithm without ifs.
So it seems the pipeline clearing after jump or conditional is what is not 100% working in this cog.
Meanwhile, actual developments. Even though Metal Slug X really needs more than 32MB, it's still interesting to mess around with because it uses a unique copy protection scheme involving a dongle chip. Current MAME actually emulates the chip, but old MAME just does a find-replace for the code pattern that triggers the protection. There's a pre-patched ROM floating around (apparently these chips are failure-prone, so people need to install the patched ROM into genuine cartridges to fix them), but I think it'd be nicer to apply the patch to the original ROM on load.
So far, no dice.
Interstingly, this is the same message other games use when they detect a counterfeit MVS machine. Too lazy to write a new one?
@pik33 said:
Adding a nop after djnz partially fixes the problem. Partially, because I got only half of lines. The second half is probably not displayed because of this if_c before rdlong. This effect can be fixed by using alternate algorithm without ifs.
So it seems the pipeline clearing after jump or conditional is what is not 100% working in this cog.
I believe the following code excerpt (as posted by Ada, in another thread) is the one you've refered to, so I did a small mod on it, just to test an alternative way that uses slight less resources from the Cog that runs it.
Could you please test it (and sure, fell free to insert/remove or exchange anything you find appropriate, including trash it all ), just to verify if it leads to any new useful clues?
@Wuerfel_21 said:
Pik's funny issue isn't really depending on hub alignment though. The issue seems to be caused by a certain instruction pattern, regardless of hub alignment, clock speed and surrounding code changes (and of course localized entirely to one particular cog on one particular chip)
Here's the entire graphics fetching loop from NeoYume:
As you may see, the only hub-aligning instruction is the one RDLONG and since adding a NOP at the bottom of the loop doesn't bypass the bug, it is most certainly not hub alignment.
Here's the corrected auto-delays in psram.spin2 (Used in MegaYume) for use on Edge EC32MB card. First line I've retained as commented out original table. Second line is the updated table.
'delayTable long 7,92_000000,150_000000,206_000000,258_000000,310_000000,333_000000,0 'Eval + Add-on
delayTable long 7,130_000000,190_000000,240_000000,290_000000,340_000000 'Edge EC32MB
@Wuerfel_21 said:
STALLI/ALLOWI instead of REP? could be worth a try I guess.
Just to clarify things a bit: it doesn't need to be solelly the STALLI/ALLOWI-pair; in fact, it can be any of the available long-range interrupt-blocking mechanisms, able to embrace the whole code block that needs shielding. This allows leveraging from any interval between the interrupt-blocking-action "starter instruction", and the one that "completes" it, thus releasing the interrupt-blocking, e.g. (as extracted from the docs)(GETCT+WC doesn't generally applies, due to being "too-short-sighted" ):
● ALTxx / CRCNIB / SCA / SCAS / GETCT+WC / GETXACC / SETQ / SETQ2 / XORO32 / XBYTE must not be executing
● AUGS must not be executing or waiting for a S/# instruction
● AUGD must not be executing or waiting for a D/# instruction
● REP must not be executing or active
● STALLI must not be executing or active
● The cog must not be stalled in any WAITx instruction
Ah, I made Roger happy anyway ... Went and got me the newer MegaYume with SD reading - https://github.com/IRQsome ... and NeoYume too ... now Flexspin is complaining even more:
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fopen.c:92: warning: Redefining function or subroutine fopen with an incompatible type
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fopen.c:96: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fseek.c:15: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fseek.c:28: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fwrite.c:7: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fwrite.c:24: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fwrite.c:31: warning: incompatible pointer types in parameter passing: expected pointer to _struct__vfs_file_t but got pointer to _struct__s_vfs_file_t
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fputs.c:14: warning: Redefining function or subroutine fputs with an incompatible type
/home/evanh/hoard/coding/prop2/testing/displays/NeoYume-master/include/libc/stdio/fputs.c:31: warning: Redefining function or subroutine puts with an incompatible type
and NeoYume doesn't see any ROM files on the SD card ... It just gets Directory error: Unknown error... Do they have to be valid ROM images? MegaYume is working even though I get the same compile warnings.
EDIT: hang on, I didn't make the NEOYUME directory ... right, got it saying NO GAMES! now.
Time to add a Raspberry Pi heatsink to P2. You can buy a set of 4 for RPi on eBay/ali cheap - the larger one fits nicely on P2.
You can also get a fan - IIRC 20x20x7 5VDC fits nicely on the heatsink but I’ve not tried how to fasten it.
This one is as deranged as it needs to be.
Blabbing will commence in 2 days, mark your calendars if you want to hear unfunny funnis vaguely tangential to P2 programming.
The nop can be even more magic. Add it after (under) djnz so it is never executed in the loop but it is still doing (partially: the error after if_c before the loop remains) its job
@pik33 said:
The nop can be even more magic. Add it after (under) djnz so it is never executed in the loop but it is still doing (partially: the error after if_c before the loop remains) its job
This ptr bug is interesting (and also a bit scary...), has @cgracey been informed about it ?
And, maybe more important, can it be reproduced with a (lot) less code, to see if other chips/cogs may be affected ?
Comments
This is more weird than I thought.
Let recall these experiments.
(1) I replaced reading from PSRAM with setting "gradient type" fields The result was: a lot of blinikng, every third (!! why??) sprite was gradient as expected, the rest was bad at all
(2) I removed all PSRAM related code. What I got was all sprites "gradient" with a little blinking
(3) I added several nops before setting the sprite, after a third nop added now every 2 from 3 gradient sprites disappeared
(4) I restricted the lspc to make sprite list for first 4 lines only, expecting that they will repeat on the rest of the screen. They repeated, but with an accompaniament of random blinks, they disappeared gradually on the bottom of the screen
The conclusion: something overwrites the sprite buffer But there was nothing to do this except the memory cog
(5) I restricted sprite number to 6. Now I had only 6 of them with all special effects unchanged
(6) What the rdlong ptrb[2] does? It checks for sentinels, which I don't need, I have only 6 sprites. It reads the address which I dont use as I fill the sprite data manually. Let's comment it out.
The result: no blinking at all, 6 good gradient sprites. So..
(7) let's make the sprite list at the full screen, not 4 lines
(8) let's restore the PSRAM reading. Blinks. Comment the rdlong ptrb[2]. No blink. So, this causes the blink effect somewhat. Declare a helper variable and use it instead - bingo! A good part (still 6 sprites) of screen!
(9) is it ptrb which have too little time to settle? (now I know this in a bad conclusion, but.. ) Let's remove the variable and add the nop. Works. The problem worked around.
Now I can explain several effects.
(1) I have one column of 2 line good pixels, 2 line nothing. That was the first sprite. No djnz yet.
(2) after I changed the buffer address compute method I got a full column of good pixels. What changed? No conditional and add ptrb,reg instead of imm.
I cannot explain the effect of every third sprite good and 2 others bad when fed with constant data. I cannot explain why this effect appears when adding nops to the loop.
But then this effect, and the blinking, was caused by rdlong ptrb[2] with unused result (!!!)
This means rdlong ptrb[2] without a nop actually caused write to the hub (?!!??) Also, no amount of nops after adding #16 to ptrb and doing djnz helps, so it is djnz followed by rdlong, that caused the problem. A pipeline bug?
This is potentially much more dangerous effect than one nop delay.
I have to repeat several test using NeoYume. Then check this on a simple code if it can be done using a simple code at all. NeoYume is a complex code with all 8 cogs running with complex interactions between them.
DJNZ is a tighter opcode, because it has to change a value and then make a decision, but somehow fit that into the same time as a JMP.
If you change DJNZ to a two line form (decrement then conditional jump) does that resolve the situation ?
That seems extreme, what if the effect is a bit more subtle ?
Suppose the delays inside DJNZ make the PC not stable enough ?
These are 2 of them in row, rdlong ptrb[2] has to add #8 to ptrb, then do reading, so another check: add 24 to ptrb, read without index after djnz.
Truly a rabbit hole leading straight to hell.
Yea, simple test case go. Though if you want to, you can just turn off the Z80/YM2610 cogs.
No.
But djnz, and then
works
Typo? Meant to be
sub ptrb,#8
I presume.Yes, of course, without # it will not work at all.
Interesting you mentioned that your board gets very hot after running NeoYume pik33. I find after maybe 15mins or so mine is somewhat warm but it's certainly not hot. I have it in the JonnyMac board so maybe that transfers heat slightly differently to your setup, or perhaps the clock rate on your P2 is stressing it out.
Mine runs fairly hot, too (and as you could see, it crashes quite fast if heated beyond the thermal equilibrium it reaches during normal operation)
It would be interesting to know at what frequency pik33's board comes good with code that normally fails. To be able to run it slower without dropping below the monitor's vertical refresh limit you might have to draw less total scan lines on the screen and feed via analog VGA to a multi-sync monitor. Hopefully that would be a small change that could keep the failure mode occurring while the frequency drops. Eg. does it come good at 250MHz or 200MHz etc, or does it remain bad regardless down to 160MHz which would indicate a bad or damaged P2.
HDMI will happily go lower scan rates but obviously minimum sysclock is 250 MHz then.
I'd just use VGA ideally with an analog monitor if you have one available. Drop to 350 active scanlines from 480 and it can look a bit like an EGA signal. Monitors should still accept that. Hopefully even LCDs will still recognize that old format.
I mean the same monitor/TV will go lower refresh when connected via HDMI. Unless you've got a CRT, LCD's are more fussy about scan rates over VGA.
PS: My limited experience of 640x350 is it is bound tightly to 70 Hz refresh on VGA inputs. 640x400 I think has more leeway. PPS: But the 30 kHz minimum hsync will likely get in the way.
Adding a nop after djnz partially fixes the problem. Partially, because I got only half of lines. The second half is probably not displayed because of this if_c before rdlong. This effect can be fixed by using alternate algorithm without ifs.
So it seems the pipeline clearing after jump or conditional is what is not 100% working in this cog.
Yeah but the interesting thing would be if this bad COG issue persists even down to 160MHz or so.
Meanwhile, actual developments. Even though Metal Slug X really needs more than 32MB, it's still interesting to mess around with because it uses a unique copy protection scheme involving a dongle chip. Current MAME actually emulates the chip, but old MAME just does a find-replace for the code pattern that triggers the protection. There's a pre-patched ROM floating around (apparently these chips are failure-prone, so people need to install the patched ROM into genuine cartridges to fix them), but I think it'd be nicer to apply the patch to the original ROM on load.
So far, no dice.
Interstingly, this is the same message other games use when they detect a counterfeit MVS machine. Too lazy to write a new one?
I believe the following code excerpt (as posted by Ada, in another thread) is the one you've refered to, so I did a small mod on it, just to test an alternative way that uses slight less resources from the Cog that runs it.
Could you please test it (and sure, fell free to insert/remove or exchange anything you find appropriate, including trash it all ), just to verify if it leads to any new useful clues?
STALLI/ALLOWI instead of REP? could be worth a try I guess.
Unrelatedly, figured out howmst into patch. Game runs now, but of course with 2/3 of the graphics missing. Most convoluted load script so far.
Here's the corrected auto-delays in
psram.spin2
(Used in MegaYume) for use on Edge EC32MB card. First line I've retained as commented out original table. Second line is the updated table.FYI, NeoYume also uses
psram.spin2
for the loader stageEDIT: wait, no, none of them use
psram.spin2
right nowJust to clarify things a bit: it doesn't need to be solelly the STALLI/ALLOWI-pair; in fact, it can be any of the available long-range interrupt-blocking mechanisms, able to embrace the whole code block that needs shielding. This allows leveraging from any interval between the interrupt-blocking-action "starter instruction", and the one that "completes" it, thus releasing the interrupt-blocking, e.g. (as extracted from the docs)(GETCT+WC doesn't generally applies, due to being "too-short-sighted" ):
● ALTxx / CRCNIB / SCA / SCAS / GETCT+WC / GETXACC / SETQ / SETQ2 / XORO32 / XBYTE must not be executing
● AUGS must not be executing or waiting for a S/# instruction
● AUGD must not be executing or waiting for a D/# instruction
● REP must not be executing or active
● STALLI must not be executing or active
● The cog must not be stalled in any WAITx instruction
Ah, I made Roger happy anyway ... Went and got me the newer MegaYume with SD reading - https://github.com/IRQsome ... and NeoYume too ... now Flexspin is complaining even more:
and NeoYume doesn't see any ROM files on the SD card ... It just gets
Directory error: Unknown error
... Do they have to be valid ROM images? MegaYume is working even though I get the same compile warnings.EDIT: hang on, I didn't make the NEOYUME directory ... right, got it saying
NO GAMES!
now.Yeah, it kinda splorps when using libc, but that's just warnings, it does actually work.
Time to add a Raspberry Pi heatsink to P2. You can buy a set of 4 for RPi on eBay/ali cheap - the larger one fits nicely on P2.
You can also get a fan - IIRC 20x20x7 5VDC fits nicely on the heatsink but I’ve not tried how to fasten it.
Slide preview, part 2.
This one is as deranged as it needs to be.
Blabbing will commence in 2 days, mark your calendars if you want to hear unfunny funnis vaguely tangential to P2 programming.
Also, Reimu.
The Reimu prop.
The nop can be even more magic. Add it after (under) djnz so it is never executed in the loop but it is still doing (partially: the error after if_c before the loop remains) its job
What's the lowest sysclk you can currently reproduce your problem @pik33?
This ptr bug is interesting (and also a bit scary...), has @cgracey been informed about it ?
And, maybe more important, can it be reproduced with a (lot) less code, to see if other chips/cogs may be affected ?
Isn't it exclusive to Pik's revB EC32MB? No one else has reproduced it, right?
EDIT: Actually, there's only three people with the requisite ROM files I think. I'm clueless on where to find them.