I have a lot of better things to do, so I'm looking into fixing MegaYume's sprite cache emulation to be properly accurate (should fix sprite glitches and broken effects in some games and just be the right thing to do). Minor problem: I wrote the sprite logic a long time ago and it's totally cryptic gobbledygook with multiple layers of indirection (somewhat necessary to get the correct masking behaviours).
So currently sprite cache is kept internally in renderer's cogRAM and recomputed on every blank line. This caches Y position, size and link field (cursed entity that is...) of the sprites. But to match real hardware, this should instead be recomputed when the sprite table is written to (so that job'll have to be moved to the CPU cog, just like CRAM format conversion is). But currently sprite cache is not formatted in a way that would be fast to update one sprite at a time. There's also dependency on current video mode (progressive vs. interlaced). Owie owie.
Well, it was a success, but the internet died and now I can't upload it (without mayor hassle, anyways). Take this bad picture, wherein the rather clever snow effect in Stone Protectors now correctly displays, as proof.
Well, everything's okay again, so I pushed new MegaYume V1.3 beta code to github.
I fixed the aforementioned sprite cache behaviour, which should fix glitches or missing sprites in a few games, including:
Stone Protectors (missing snow effect)
Gauntlet IV (missing sprites when there's a lot of enemies)
King of the Monsters (everything was broken)
Samurai Shodown (occasional glitches)
I also fixed an edge case (or rather, an edge case of an edge case) relating to sprite masking and the sprite tile limit, so now the test screen ROM shows "6. MASK S1 ON DOT OVERFLOW" as "PASS".
I also optimized the whole rendering code some more. In particular, I found a faster way to handle 4bpp -> 8bpp conversion. The comments make me remember @rogloh helped find out about the current method, so I guess I'll tag him (oops I already did).
Old code:
rdlong vdpr_tiledata,vdpr_tmp4
' rogloh's optimized variant...
' Prepare attributes
mov vdpr_tmp1, vdpr_tile
shr vdpr_tmp1, #13 ' Just pal+priority
testb vdpr_tile, #11 wc ' mirror bit
if_nc splitb vdpr_tiledata ' reverse nibbles in tile
if_nc rev vdpr_tiledata
if_nc movbyts vdpr_tiledata, #%%0123
if_nc mergeb vdpr_tiledata
' bit magic (space out nybbles)
getword vdpr_tilebuffer1, vdpr_tiledata, #1 ' first group
movbyts vdpr_tilebuffer1, #%%3120
mergew vdpr_tilebuffer1
movbyts vdpr_tilebuffer1, #%%3120
splitw vdpr_tilebuffer1
getword vdpr_tilebuffer2, vdpr_tiledata, #0 ' second group
movbyts vdpr_tilebuffer2, #%%3120
mergew vdpr_tilebuffer2
movbyts vdpr_tilebuffer2, #%%3120
splitw vdpr_tilebuffer2
' add attributes to non-transparent pixels
test vdpr_tilebuffer1,vdpr_pixnibtest+0 wz
if_nz setnib vdpr_tilebuffer1,vdpr_tmp1,#1
test vdpr_tilebuffer1,vdpr_pixnibtest+1 wz
if_nz setnib vdpr_tilebuffer1,vdpr_tmp1,#3
test vdpr_tilebuffer1,vdpr_pixnibtest+2 wz
if_nz setnib vdpr_tilebuffer1,vdpr_tmp1,#5
test vdpr_tilebuffer1,vdpr_pixnibtest+3 wz
if_nz setnib vdpr_tilebuffer1,vdpr_tmp1,#7
test vdpr_tilebuffer2,vdpr_pixnibtest+0 wz
if_nz setnib vdpr_tilebuffer2,vdpr_tmp1,#1
test vdpr_tilebuffer2,vdpr_pixnibtest+1 wz
if_nz setnib vdpr_tilebuffer2,vdpr_tmp1,#3
test vdpr_tilebuffer2,vdpr_pixnibtest+2 wz
if_nz setnib vdpr_tilebuffer2,vdpr_tmp1,#5
test vdpr_tilebuffer2,vdpr_pixnibtest+3 wz
if_nz setnib vdpr_tilebuffer2,vdpr_tmp1,#7
[...]
vdpr_pixnibtest long 15<<0,15<<8,15<<16,15<<24
new code:
rdlong vdpr_tilebuffer1,vdpr_tmp4
' rogloh's optimized variant...
' Prepare attributes
getword vdpr_tmp1, vdpr_tile,#0 '' Note: this changed to getword because vdpr_tile now has garbage in top half
shr vdpr_tmp1, #13 ' Just pal+priority
testb vdpr_tile, #11 wc ' mirror bit
splitb vdpr_tilebuffer1 ' reverse nibbles in tile
if_nc rev vdpr_tilebuffer1
if_nc movbyts vdpr_tilebuffer1, #%%0123
not vdpr_tmp2,vdpr_tilebuffer1
mergeb vdpr_tilebuffer1
' generate transparency mask
getword vdpr_tmp3,vdpr_tmp2,#1
and vdpr_tmp2,vdpr_tmp3
getbyte vdpr_tmp3,vdpr_tmp2,#1
and vdpr_tmp2,vdpr_tmp3
' bit magic (space out nybbles)
movbyts vdpr_tilebuffer1, #%%3120
mergew vdpr_tilebuffer1
movbyts vdpr_tilebuffer1, #%%3120
splitw vdpr_tilebuffer1
mov vdpr_tilebuffer2,vdpr_tilebuffer1
shr vdpr_tilebuffer1,#4
and vdpr_tilebuffer1,vdpr_lowernibbles
and vdpr_tilebuffer2,vdpr_lowernibbles
' add attributes to non-transparent pixels
skip vdpr_tmp2
setnib vdpr_tilebuffer2,vdpr_tmp1,#1
setnib vdpr_tilebuffer2,vdpr_tmp1,#3
setnib vdpr_tilebuffer2,vdpr_tmp1,#5
setnib vdpr_tilebuffer2,vdpr_tmp1,#7
setnib vdpr_tilebuffer1,vdpr_tmp1,#1
setnib vdpr_tilebuffer1,vdpr_tmp1,#3
setnib vdpr_tilebuffer1,vdpr_tmp1,#5
setnib vdpr_tilebuffer1,vdpr_tmp1,#7
[...]
vdpr_lowernibbles long $0F0F0F0F
It's 34 vs 30 instructions, so not quite so much, but the methodology is quite different. (The more efficient spacing out also happens in a slightly different version with tile planes). I feel like there might be an even more efficient way.
Yeah SKIP comes in very handy for a sequence of binary choices, especially when executing single instructions, although you can still use it for groups of 2,4,8,16 instructions as well if you do some initial bit replicating when creating the skip mask.
Well, I though it was working already. But then a friend came over and we actually played a few levels... Owie. I still am not 100% sure it works now. Weird if it wouldn't though. Notice the text in the status area has different colors for each player? They're actually the same color being changed by a raster interrupt. The bug I encountered is that at some threshold of active sprites, some sections of the screen will have no sprites drawn in them. This matches up with the player sections. I think the same raster IRQ is used to rewrite the sprite table mid-frame in these cases (but for some reason, this is only done when needed instead of always). With the old code, the Y positions would be stuck at the previous values.
@Rayman said:
Need to try gauntlet again…
Guessing multiplayer can work now?
The multiplayer has worked ever since I switched it to usbnew. Just have to hook up multiple game pads to a hub. I also added 4-player adapter support on the emulation side shortly after (which can be sortof wonky and doesn't quite work with every game that it ought to work with, but it works with Gauntlet).
Ada,
In MegaYume, after loading a ROM from the SD, does an unchanging solid magenta display have a meaning?
I'm struggling to work out what I've done wrong. I've downloaded, configured and compiled the latest emulator from https://github.com/IRQsome/MegaYume
The load menu comes up fine on the HDMI TV, the USB keyboard control allows me to select the ROM to load, it shows a brief loading bar zip across the display then it goes magenta and that's it. I can CTRL+ESC and try again with the same result. ROM name is RASTAN2.MD which is renamed from Rastan Saga II (USA).md. I've played it briefly in the past on an older Megayume without issue.
PS: Using an Eval Board, I've tried both a 4-bit PSRAM add-on (from Rayman) and Parallax's HyperRAM add-on. Both have worked fine for me in the past.
PPS: I suspect my prior success with Megayume was using Edge32MB only.
@Wuerfel_21 said:
... can happen on SD cards that are being subjected to experiments(TM).
I've reinit'd the FAT32 filesystem and it boots a _BOOT_P2.BIX bootfile no problem. Okay, I've done the RAM expansion thing to death, maybe I need to try a fresh ROM then. I only have the one ...
Aargh! No, it was the RAM. Don't know what I was doing wrong with Rayman's addon but it seems there a bug in the handling of the HyperRAM addon. I doesn't work when using a base pin of 32.
@evanh said:
Aargh! No, it was the RAM. Don't know what I was doing wrong with Rayman's addon but it seems there a bug in the handling of the HyperRAM addon. I doesn't work when using a base pin of 32.
Really? Have to dig one out to figure out the issue...
Okay, worked out the problem with Rayman's 4-bit PSRAM add-on with some trial and error testing: It needed PSRAM_WAIT = 6 instead of 5.
' Enable one of these to select the exmem type to use
'#define USE_PSRAM16
'#define USE_PSRAM8
#define USE_PSRAM4
'#define USE_HYPER
' Rayman's 24MB 4-bit PSRAM add-on
PSRAM_BASE = 40
PSRAM_CLK = 4+PSRAM_BASE
PSRAM_SELECT = 5+PSRAM_BASE
PSRAM_BANKS = 3 ' Only used to stop further banks from interfering
PSRAM_WAIT = 6
PSRAM_DELAY = 13
PSRAM_SYNC_CLOCK = true
PSRAM_SYNC_DATA = true
' Uncomment for slower memory clock
'#define USE_PSRAM_SLOW
No, you want to increase DELAY. WAIT is for the actual chip latency that needs to be clocked through. If you increase WAIT, there'll be too many clock pulses. Check the RAMCONFIG.MD, it has correct values for all sorts of setup
Yep, I know. I've not gone looking at the datasheets yet but my guess is ended up with a different brand of PSRAM than what you tested with. On that note ...
If you do WAIT=6 and DELAY=13, that results in the same read offset, but there'll be an extraneous clock pulse at the end of each transfer. That doesn't really matter, but it's just more correct that way.
Well isn't that nice. There's a bit of a catch to NeoYume reliability: the ROMs are neatly segmented into graphics, sound and code. Corruption in the former two is entirely transient and can never crash the system. The CPUs are completely isolated from the contents of Cx/Vx/S1 ROMs. They can only see M1 (Z80), Px and the BIOS (68000). Due to the (somewhat arbitrarily defined in neoyume_gamedb) load orders of games, the code ROMs end up in different banks for different games. So if you want to be sure, you actually need to test a few different-sized games.
It's the only option I have on hand. I can now finally move on to the purpose of this effort - Load-testing of something substantial other than my over-the-top synthetic high-power tests.
EDIT: I guess I could dig up a smaller game that fits in 16MB. But I'd be waiting for your hyperRAM fix first. I don't want to use my good Eval board for this and my damaged Eval board will only work with the RAM board at base pin P32.
Comments
I have a lot of better things to do, so I'm looking into fixing MegaYume's sprite cache emulation to be properly accurate (should fix sprite glitches and broken effects in some games and just be the right thing to do). Minor problem: I wrote the sprite logic a long time ago and it's totally cryptic gobbledygook with multiple layers of indirection (somewhat necessary to get the correct masking behaviours).
So currently sprite cache is kept internally in renderer's cogRAM and recomputed on every blank line. This caches Y position, size and link field (cursed entity that is...) of the sprites. But to match real hardware, this should instead be recomputed when the sprite table is written to (so that job'll have to be moved to the CPU cog, just like CRAM format conversion is). But currently sprite cache is not formatted in a way that would be fast to update one sprite at a time. There's also dependency on current video mode (progressive vs. interlaced). Owie owie.
Well, it was a success, but the internet died and now I can't upload it (without mayor hassle, anyways). Take this bad picture, wherein the rather clever snow effect in Stone Protectors now correctly displays, as proof.
Well, everything's okay again, so I pushed new MegaYume V1.3 beta code to github.
I fixed the aforementioned sprite cache behaviour, which should fix glitches or missing sprites in a few games, including:
I also fixed an edge case (or rather, an edge case of an edge case) relating to sprite masking and the sprite tile limit, so now the test screen ROM shows "6. MASK S1 ON DOT OVERFLOW" as "PASS".
I also optimized the whole rendering code some more. In particular, I found a faster way to handle 4bpp -> 8bpp conversion. The comments make me remember @rogloh helped find out about the current method, so I guess I'll tag him (oops I already did).
Old code:
new code:
It's 34 vs 30 instructions, so not quite so much, but the methodology is quite different. (The more efficient spacing out also happens in a slightly different version with tile planes). I feel like there might be an even more efficient way.
Yeah SKIP comes in very handy for a sequence of binary choices, especially when executing single instructions, although you can still use it for groups of 2,4,8,16 instructions as well if you do some initial bit replicating when creating the skip mask.
Nice to get gauntlet working!
Well, I though it was working already. But then a friend came over and we actually played a few levels... Owie. I still am not 100% sure it works now. Weird if it wouldn't though. Notice the text in the status area has different colors for each player? They're actually the same color being changed by a raster interrupt. The bug I encountered is that at some threshold of active sprites, some sections of the screen will have no sprites drawn in them. This matches up with the player sections. I think the same raster IRQ is used to rewrite the sprite table mid-frame in these cases (but for some reason, this is only done when needed instead of always). With the old code, the Y positions would be stuck at the previous values.
Need to try gauntlet again…
Guessing multiplayer can work now?
The multiplayer has worked ever since I switched it to usbnew. Just have to hook up multiple game pads to a hub. I also added 4-player adapter support on the emulation side shortly after (which can be sortof wonky and doesn't quite work with every game that it ought to work with, but it works with Gauntlet).
Ever tried the HDMI video modes in recent times and wondered why they don't work? That's because I broke them when adding LCD support. Fixed now.
Ada,
In MegaYume, after loading a ROM from the SD, does an unchanging solid magenta display have a meaning?
I'm struggling to work out what I've done wrong. I've downloaded, configured and compiled the latest emulator from https://github.com/IRQsome/MegaYume
The load menu comes up fine on the HDMI TV, the USB keyboard control allows me to select the ROM to load, it shows a brief loading bar zip across the display then it goes magenta and that's it. I can CTRL+ESC and try again with the same result. ROM name is RASTAN2.MD which is renamed from Rastan Saga II (USA).md. I've played it briefly in the past on an older Megayume without issue.
PS: Using an Eval Board, I've tried both a 4-bit PSRAM add-on (from Rayman) and Parallax's HyperRAM add-on. Both have worked fine for me in the past.
PPS: I suspect my prior success with Megayume was using Edge32MB only.
Solid magenta in megayume usually means that it's not reading the external RAM properly.
By the same token, could also be a corrupted ROM file, can happen on SD cards that are being subjected to experiments(TM).
I've reinit'd the FAT32 filesystem and it boots a _BOOT_P2.BIX bootfile no problem. Okay, I've done the RAM expansion thing to death, maybe I need to try a fresh ROM then. I only have the one ...
Aargh! No, it was the RAM. Don't know what I was doing wrong with Rayman's addon but it seems there a bug in the handling of the HyperRAM addon. I doesn't work when using a base pin of 32.
Really? Have to dig one out to figure out the issue...
Okay, worked out the problem with Rayman's 4-bit PSRAM add-on with some trial and error testing: It needed PSRAM_WAIT = 6 instead of 5.
No, you want to increase DELAY. WAIT is for the actual chip latency that needs to be clocked through. If you increase WAIT, there'll be too many clock pulses. Check the RAMCONFIG.MD, it has correct values for all sorts of setup
Yep, I know. I've not gone looking at the datasheets yet but my guess is ended up with a different brand of PSRAM than what you tested with. On that note ...
No, you're just being thick. The indicated timing for the Rayslogic 24MB board is:
If you do WAIT=6 and DELAY=13, that results in the same read offset, but there'll be an extraneous clock pulse at the end of each transfer. That doesn't really matter, but it's just more correct that way.
I tried 5 and 15, it was a magenta display skewed display at the load menu. I didn't bother loading the game.
Datasheet for the APS6404L PSRAM says 6 for both Quad and QPI
????
Okay now I'm confused. How did WAIT=5 ever work then?
Skewed? You have a visual of this? Sounds weird.
Argh! I can't reproduce it. That 5 + 15 now works! And so does 6 + 13.
So, does that mean the correct value for PSRAM_DELAY is one less than what the datasheet says?
I want to say yes?
Neoyume working on Eval board with Rayman's 96MB add-on using the following config:
The default registered pins started okay but didn't take long before it crashed.
Well isn't that nice. There's a bit of a catch to NeoYume reliability: the ROMs are neatly segmented into graphics, sound and code. Corruption in the former two is entirely transient and can never crash the system. The CPUs are completely isolated from the contents of Cx/Vx/S1 ROMs. They can only see M1 (Z80), Px and the BIOS (68000). Due to the (somewhat arbitrarily defined in neoyume_gamedb) load orders of games, the code ROMs end up in different banks for different games. So if you want to be sure, you actually need to test a few different-sized games.
I think there's also the slow mode right? That sometimes fixes things...
The 96MB board needs slow mode to begin with. Too chunky for 170Mhz overclock.
It's the only option I have on hand. I can now finally move on to the purpose of this effort - Load-testing of something substantial other than my over-the-top synthetic high-power tests.
EDIT: I guess I could dig up a smaller game that fits in 16MB. But I'd be waiting for your hyperRAM fix first. I don't want to use my good Eval board for this and my damaged Eval board will only work with the RAM board at base pin P32.
The 96MB board should be fine. Mine is perfectly reliable as long as you don't take it outside on a hot day.
Will look into fixing the HyperRAM mode. Might be tomorrow if it isn't a super quick fix.