Here's a headscratcher: The address error (due to trying to write $401FFB (odd address)) is generated on the highlighted MOVEM. How can that ever happen when the previous MOVEM executed ok? Are you telling me there's some bug in this essential instruction that I've somehow completely missed?
Well, it turns out it's very beneficial to put orgh before assembling hub code....
New funny: Spin2 upper code is corrupting memory (before 68k even boots). If I comment out the right lines it even crashes. Cool. Yeah. Amazing.
EDIT: Hmmm, if i disable flexspins -Olocal-reuse, it stops corrupting the zeropage location I'm interested in, but starts visibly corrupting the screen palette instead. Smells like funny codegen to me.
EDIT2: Owie ouch, was trying to close an uninitialized FILE* whenever loading data that isn't split between two files (as sprite tiles are). Yeah that's no good.
Indeed, if you squint your eyes very hard, you can see the NeoGeo BIOS screen and Samurai Shodown's SNK logo screen. The graphics being missing is expected, but IDK why everything is red.
EDIT: The red is apparently what it fills the first palette bank with, while it then goes on to only use the second bank. I forgot to fully implement palette bankswitching (i.e. the CPU access side, but not the GPU side)
Amazing progress. I go back to sleep for a few hours after it seemed you were battling your way through some initial bugs and now you've already got awesome looking gameplay demos functioning!
Yeah, IDK, it seems everything just kinda fell into place. You can see there's a bug related to sprite wraparound... Eludes me yet, but tomorrow I'll investigate further. Oh, and hook up the controls. It currently is an attract-screen-only emulator.
Aside from the missing sound, there's also some other missing features. Most notably, the auto-animation speed is currently just hardcoded, lol. Also, the timer IRQ is completly missing.
Well, I've hooked up the controls and fixed one of the sprite bugs. It turns out the "glitchy NeoGeo logo" is a separate issue from the "Samurai Shodown stage wraparound".
So far I've tried:
- Metal Slug: Works great, no noticable slowdown
- Twinkle Star Sprites: Works great, slowdown only when bombs are active (may be intentional)
- Samurai Shodown: Works great apart from aforementioned graphical issue
- Ironclad: Works, quite a bit of slowdown though
- Sprite experimenter ROM: Busted
How much extra PSRAM load do you expect to need going forward with ADPCM? Could that slow things down a lot more or do you imagine that the current code you have now is reasonably representative of how much external memory load there will be?
The ADPCM bandwidth kinda tops out at less than 3k per frame and it can be fetched in large blocks (probably going to do 16 byte blocks), so I don't think it'll be an issue. Should end up needing to fetch one block per scanline...(though it will of course fetch more given there's time left on a scanline). I also haven't really optimized the memory code yet, so there's still sleeping potential.
Polished it up, fixed the animation speed thing, so here it is: A workable alpha release.
Building and config is similar to MegaYume (and I've made it work with all the same SDTV,VGA and HDMI modes). Currently the game to load is hardcoded (by default, it tries to load Metal Slug). You need the game ROM files unzipped into a NEOYUME directory, like this: neoyume/mslug/201-c1.bin. You also need the BIOS file at neoyume/neogeo/neo-epo.bin.
Here's to another one:
- fixed(?) SDTV modes
- added support for NeoGeo mini USB controller (note: has a Type C plug, adapter needed)
Slight problem reared its head: My intent is to re-cycle the space used by the Spin2 upper code after the game launches to hold the Z80 ROM. This obviously doesn't work if the USB driver is living there. Guess I'll have to separate it out. Don't think moving it all into the lower ASM is a terrific idea, maybe I can compile it into its own upcode blob.
Also, speaking of that NeoGeo controller, it behaves really oddly with @macca 's USB test program. Like if it was being polled at a really low rate. Surprisingly it works fine with my hardcoded polling code, so uhhhhh idk.
Managed to somehow shunfle around the memory to make the input code into a separate module that can stay resident after I kick out the loader code. Also managed to fit in the Z80 and the YM2612 (which should be similar enough to the YM2610 to get something), but as of now, it doesn't really produce anything of value and after a while it seems the Z80 just dies (stops ACKing NMIs).
And remember, children, always double check your SKIPF patterns or you'll be eaten by the Krampus.
Unknown, 1642
Wiser words have never been spoken.
Anyways, there's some approximation of sound (as expected due to putting a YM2612 where a YM2610 is needed) and for some reason Twinkle Star Sprites now semi-randomly hangs on dialog screens? Ok cool. Here's a video to exactly that extent:
Will have to figure out if its related to the memory shunfling or the Z80/YM addition, but as you can see, it doesn't happen every time, so, uh, owie.
EDIT: Confirmed that all games crash semi-randomly (though that TSS always does it on the dialog screen is odd) Don't think it's the Z80/YM actively going off the rails, because music continues to play. I'll try changing the load order, because now the PSRAM arbiter runs in a different cog and it may have bungled the timing or something.
EDIT 2: Wait no, the PSRAM is always on cog 0 (the Spin2 restarts itself into it), so that's not it. phew, not that particular fuzzy issue. Anyways, 2:30 AM. Slenp.
Good progress again. Seems you're almost short a COG now and have to work around it by dynamically starting/stopping COGs between games and shuffling code about in memory. In your custom menuing system do you need the Z80 to run? Is there any sound control needed at that point? Maybe you can start/stop that COG and/or the YM audio COG if needed if it helps you out with memory, although that might cause clicks/pops each time as the audio COG is initialized.
If SPIN2 could call some PASM and return back to SPIN2 that could be good. So maybe your PSRAM arbiter could be run as some SPIN2 inline PASM routine - it looked like it was quite small in size and might fit, worth a look perhaps. Although you'll still need something that writes into PSRAM and loads the game from SD, I guess that is where my driver comes in.
There is no returning to Spin2, the Spin blob is overwritten with Z80 code (and the loader tileset is overwritten with the game tileset). I guess I could store it away into PSRAM and load it back after the game is done, but I'm not sure if flexspin's code is re-entrant like that.
Still not sure what causes crashes. I've mucked about a bit (had to use a different USB to barrel cable to test HDMI, since the screen didn't like one of them. So I swapped that one onto the P2 board. Perhaps that cable is just bad. But I also mucked with the cog order) and then was able to almost-clear (skill issue) TSS without a crash and it has been running attract ever since, but IDK if it's actually fixed.
There's also a bug where the Z80 takes a long time to respond to the first couple NMIs (and thus the NeoGeo jingle plays incorrectly), but it only happens on some games (The logo screen is generated by the BIOS, but playing the jingle is delegated to the game's sound code, which is why it sounds a bit different between games and presumably why this bug only manifests in some)
But the next big bit is making a working YM2610 core. I guess I could take a detour to make a standalone YMF288 core (YM2608 without ADPCM) and then build the ADPCM bits inside NeoYume (since they have to be polled by the arbiter to do useful work)
EDIT: Figured out the NMI delay issue - forgot to init the Z80 timing stuff, so sometimes it'd wait a long time before executing anything. Presumably dependent on CT, thus it only happening on some games (some take longer to load).
Unrelatedly, look at what happens when running Ironclad with debugging enabled. Why is it constantly using TRAP #8? But it works so let's not think about it too hard.
But more interestingly, I've devised a low-effort way of triggering these crashes. Just load Metal Slug, set to easy and 5 lives, start the game and do absolutely nothing. It seems that it will crash evenutually.
EDIT 2: aaand another one. Didn't even run the timer out once... Bad branch address?
@Wuerfel_21 said:
Still not sure what causes crashes. I've mucked about a bit (had to use a different USB to barrel cable to test HDMI, since the screen didn't like one of them.
That's probably worth checking carefully with an oscilloscope if you have one. Monitor the 5 Volt supply for dips during operation. It is possible*1 for very short brown-outs to be ignored by the Edge Board's Power-Good reset circuits - which have a nominal 32 us response.
*1 Given your code is likely pulsing to quite high loading: You're using all eight cogs, hammering a 16-bit external data bus, doing a lot of block transfers to and from hubRAM and operating up around 350 MHz sysclock, right?
The instruction before $028DD4 (-> $028DD2 = $4ED3) I'm pretty sure is a MOVE USP of some kind, which is most definitely not a branch (well, it can do a privilege trap, but we'd see that). So I guess that means the instructions are getting corrupted?
Interestingly, this one actually ran out the timer and returned to attract mode before crashing. So I guess leaving it on overnight will work as a stability test? I did it with an older version a few days ago and that was still going in the morning, so I guess that's proof that those versions did not crash.
I just realized that in the older versions, the arbiter was not loaded in Cog0. On P1 there is an issue to do with distance between different cogs and the pins, mayhaps this is the same thing? Well, will try shuffling the load order again. After I get another crash, but this time I'm logging the current opcode on suspicious branch and address error.
EDIT: Well, consistency is for loosers, got this odd one. $1x_xxxx is in RAM, so likely another bad branch, presumably through an A register. Interestingly, the illegal instruction handler just reboots the machine (thus the $03 (-> play NeoGeo jingle) Z80 command).
EDIT 2: Wait, I'm just an idiot, $4ED3 is JMP (A3) and I'm just too stupid to read the decode table.
EDIT 3: So, got off my bum and disassembled that location and it's some sort of tabular jump. Will try printing A2 and D2 on address error and see what it brings.
EDIT 4: Of course now that I've got it by the metaphorical tail, it doesn't wanna do it. Classic. Cool. Good. [seethe noises]
@Wuerfel_21 said:
EDIT 4: Of course now that I've got it by the metaphorical tail, it doesn't wanna do it. Classic. Cool. Good. [seethe noises]
If it's a random error each time or inconsistent it might indicate PSRAM timing issues creating bad instructions, or even thermal jitter. If you have a spare fan you might want to set it up to blow air on the hot P2 to see if it helps or changes things with these crashes.
I think in your earlier PSRAM read code you were not using the "half" delay steps like I do in my PSRAM driver when it's characterized by that delaytest utility I wrote, so you could already have coarser steps and potentially more marginal input delays to begin with, unless the P2 data sampling "eye" happens to be well centered between transitions at your operating frequency and thus more immune to jitter.
It seems to be bad instructions, but at oddly specific locations(?) Or maybe its certain bit patterns that tickle it.
See, this corresponds to the code from the last post. Note how A2 is not the constant that should have just been loaded (and consequently A3 is garbage). Presumably the MOVEA got turned into something else, since the A2 value is a plausible RAM address that could have been in the register upon entering whatever this function is.
If it is PSRAM bus corruption, I know it must be the memory arbiter's issue and not your driver, since otherwise it wouldn't crash at random moments...
Another theory was that there's some freak case where the FIFO takes too long to dump the data, so the 68k reads stale data from the previous request, but that doesn't line up with the evidence that instructions in the middle of the execution stream get bungled.
Well, it seems enabling registered I/O for both clock and data and bumping the DELAY to 4 cyc boots, so I guess I'll let that one run over night if it doesn't crash before I get to bed.
Anyways, appreciate that I have been staring at this for hours now. "Marco Rossi wins by doing nothing"?
Also be sure that you meet all PSRAM timing requirements between accesses in your arbiter. You probably will due to the overhead instructions but it might be worth double checking anyway.
UPDATE: And check that any of the page boundary crossings are ok too.
Comments
Here's a headscratcher: The address error (due to trying to write $401FFB (odd address)) is generated on the highlighted MOVEM. How can that ever happen when the previous MOVEM executed ok? Are you telling me there's some bug in this essential instruction that I've somehow completely missed?
Well, it turns out it's very beneficial to put
orgh
before assembling hub code....New funny: Spin2 upper code is corrupting memory (before 68k even boots). If I comment out the right lines it even crashes. Cool. Yeah. Amazing.
EDIT: Hmmm, if i disable flexspins
-Olocal-reuse
, it stops corrupting the zeropage location I'm interested in, but starts visibly corrupting the screen palette instead. Smells like funny codegen to me.EDIT2: Owie ouch, was trying to close an uninitialized FILE* whenever loading data that isn't split between two files (as sprite tiles are). Yeah that's no good.
Anyways, with that fixed, it clears the screen (as much as it can without an extant VRAM data port, anyways) and gets to an infinite loop like this:
Presumably waiting for an IRQ handler to set a flag.
Seems like progress again at least.
IT LIVES!!!!! [Lightning strike foley]
Indeed, if you squint your eyes very hard, you can see the NeoGeo BIOS screen and Samurai Shodown's SNK logo screen. The graphics being missing is expected, but IDK why everything is red.
EDIT: The red is apparently what it fills the first palette bank with, while it then goes on to only use the second bank. I forgot to fully implement palette bankswitching (i.e. the CPU access side, but not the GPU side)
With fixed palette banking, fix tile loading and the red video driver border disabled:
Well, that was easy....
Wow, Wuerfel!
Yeah, sure is looking good. Decent sized ROM files loading at the start. Quite the showcase for the EC32MB.
Amazing progress. I go back to sleep for a few hours after it seemed you were battling your way through some initial bugs and now you've already got awesome looking gameplay demos functioning!
Yeah, IDK, it seems everything just kinda fell into place. You can see there's a bug related to sprite wraparound... Eludes me yet, but tomorrow I'll investigate further. Oh, and hook up the controls. It currently is an attract-screen-only emulator.
Aside from the missing sound, there's also some other missing features. Most notably, the auto-animation speed is currently just hardcoded, lol. Also, the timer IRQ is completly missing.
Well, I've hooked up the controls and fixed one of the sprite bugs. It turns out the "glitchy NeoGeo logo" is a separate issue from the "Samurai Shodown stage wraparound".
So far I've tried:
- Metal Slug: Works great, no noticable slowdown
- Twinkle Star Sprites: Works great, slowdown only when bombs are active (may be intentional)
- Samurai Shodown: Works great apart from aforementioned graphical issue
- Ironclad: Works, quite a bit of slowdown though
- Sprite experimenter ROM: Busted
How much extra PSRAM load do you expect to need going forward with ADPCM? Could that slow things down a lot more or do you imagine that the current code you have now is reasonably representative of how much external memory load there will be?
The ADPCM bandwidth kinda tops out at less than 3k per frame and it can be fetched in large blocks (probably going to do 16 byte blocks), so I don't think it'll be an issue. Should end up needing to fetch one block per scanline...(though it will of course fetch more given there's time left on a scanline). I also haven't really optimized the memory code yet, so there's still sleeping potential.
Good to hear. Was hoping it was not already fully maxxed out.
Polished it up, fixed the animation speed thing, so here it is: A workable alpha release.
Building and config is similar to MegaYume (and I've made it work with all the same SDTV,VGA and HDMI modes). Currently the game to load is hardcoded (by default, it tries to load Metal Slug). You need the game ROM files unzipped into a NEOYUME directory, like this:
neoyume/mslug/201-c1.bin
. You also need the BIOS file atneoyume/neogeo/neo-epo.bin
.Keyboard controls (assuming QWERTY) are:
Here's to another one:
- fixed(?) SDTV modes
- added support for NeoGeo mini USB controller (note: has a Type C plug, adapter needed)
Slight problem reared its head: My intent is to re-cycle the space used by the Spin2 upper code after the game launches to hold the Z80 ROM. This obviously doesn't work if the USB driver is living there. Guess I'll have to separate it out. Don't think moving it all into the lower ASM is a terrific idea, maybe I can compile it into its own upcode blob.
Also, speaking of that NeoGeo controller, it behaves really oddly with @macca 's USB test program. Like if it was being polled at a really low rate. Surprisingly it works fine with my hardcoded polling code, so uhhhhh idk.
Managed to somehow shunfle around the memory to make the input code into a separate module that can stay resident after I kick out the loader code. Also managed to fit in the Z80 and the YM2612 (which should be similar enough to the YM2610 to get something), but as of now, it doesn't really produce anything of value and after a while it seems the Z80 just dies (stops ACKing NMIs).
Wiser words have never been spoken.
Anyways, there's some approximation of sound (as expected due to putting a YM2612 where a YM2610 is needed) and for some reason Twinkle Star Sprites now semi-randomly hangs on dialog screens? Ok cool. Here's a video to exactly that extent:
Will have to figure out if its related to the memory shunfling or the Z80/YM addition, but as you can see, it doesn't happen every time, so, uh, owie.
EDIT: Confirmed that all games crash semi-randomly (though that TSS always does it on the dialog screen is odd) Don't think it's the Z80/YM actively going off the rails, because music continues to play. I'll try changing the load order, because now the PSRAM arbiter runs in a different cog and it may have bungled the timing or something.
EDIT 2: Wait no, the PSRAM is always on cog 0 (the Spin2 restarts itself into it), so that's not it. phew, not that particular fuzzy issue. Anyways, 2:30 AM. Slenp.
Good progress again. Seems you're almost short a COG now and have to work around it by dynamically starting/stopping COGs between games and shuffling code about in memory. In your custom menuing system do you need the Z80 to run? Is there any sound control needed at that point? Maybe you can start/stop that COG and/or the YM audio COG if needed if it helps you out with memory, although that might cause clicks/pops each time as the audio COG is initialized.
If SPIN2 could call some PASM and return back to SPIN2 that could be good. So maybe your PSRAM arbiter could be run as some SPIN2 inline PASM routine - it looked like it was quite small in size and might fit, worth a look perhaps. Although you'll still need something that writes into PSRAM and loads the game from SD, I guess that is where my driver comes in.
There is no returning to Spin2, the Spin blob is overwritten with Z80 code (and the loader tileset is overwritten with the game tileset). I guess I could store it away into PSRAM and load it back after the game is done, but I'm not sure if flexspin's code is re-entrant like that.
Still not sure what causes crashes. I've mucked about a bit (had to use a different USB to barrel cable to test HDMI, since the screen didn't like one of them. So I swapped that one onto the P2 board. Perhaps that cable is just bad. But I also mucked with the cog order) and then was able to almost-clear (skill issue) TSS without a crash and it has been running attract ever since, but IDK if it's actually fixed.
There's also a bug where the Z80 takes a long time to respond to the first couple NMIs (and thus the NeoGeo jingle plays incorrectly), but it only happens on some games (The logo screen is generated by the BIOS, but playing the jingle is delegated to the game's sound code, which is why it sounds a bit different between games and presumably why this bug only manifests in some)
But the next big bit is making a working YM2610 core. I guess I could take a detour to make a standalone YMF288 core (YM2608 without ADPCM) and then build the ADPCM bits inside NeoYume (since they have to be polled by the arbiter to do useful work)
EDIT: Figured out the NMI delay issue - forgot to init the Z80 timing stuff, so sometimes it'd wait a long time before executing anything. Presumably dependent on CT, thus it only happening on some games (some take longer to load).
Unrelatedly, look at what happens when running Ironclad with debugging enabled. Why is it constantly using
TRAP #8
? But it works so let's not think about it too hard.Also: Nope, still crashing occasionally.
EDIT: Managed to catch one:
Not terribly enlightening.
But more interestingly, I've devised a low-effort way of triggering these crashes. Just load Metal Slug, set to easy and 5 lives, start the game and do absolutely nothing. It seems that it will crash evenutually.
EDIT 2: aaand another one. Didn't even run the timer out once... Bad branch address?
That's probably worth checking carefully with an oscilloscope if you have one. Monitor the 5 Volt supply for dips during operation. It is possible*1 for very short brown-outs to be ignored by the Edge Board's Power-Good reset circuits - which have a nominal 32 us response.
*1 Given your code is likely pulsing to quite high loading: You're using all eight cogs, hammering a 16-bit external data bus, doing a lot of block transfers to and from hubRAM and operating up around 350 MHz sysclock, right?
Well, partial enlightenment:
The instruction before $028DD4 (-> $028DD2 = $4ED3) I'm pretty sure is a MOVE USP of some kind, which is most definitely not a branch (well, it can do a privilege trap, but we'd see that). So I guess that means the instructions are getting corrupted?
Interestingly, this one actually ran out the timer and returned to attract mode before crashing. So I guess leaving it on overnight will work as a stability test? I did it with an older version a few days ago and that was still going in the morning, so I guess that's proof that those versions did not crash.
I just realized that in the older versions, the arbiter was not loaded in Cog0. On P1 there is an issue to do with distance between different cogs and the pins, mayhaps this is the same thing? Well, will try shuffling the load order again. After I get another crash, but this time I'm logging the current opcode on suspicious branch and address error.
EDIT: Well, consistency is for loosers, got this odd one. $1x_xxxx is in RAM, so likely another bad branch, presumably through an A register. Interestingly, the illegal instruction handler just reboots the machine (thus the $03 (-> play NeoGeo jingle) Z80 command).
EDIT 2: Wait, I'm just an idiot, $4ED3 is
JMP (A3)
and I'm just too stupid to read the decode table.EDIT 3: So, got off my bum and disassembled that location and it's some sort of tabular jump. Will try printing A2 and D2 on address error and see what it brings.
EDIT 4: Of course now that I've got it by the metaphorical tail, it doesn't wanna do it. Classic. Cool. Good. [seethe noises]
If it's a random error each time or inconsistent it might indicate PSRAM timing issues creating bad instructions, or even thermal jitter. If you have a spare fan you might want to set it up to blow air on the hot P2 to see if it helps or changes things with these crashes.
I think in your earlier PSRAM read code you were not using the "half" delay steps like I do in my PSRAM driver when it's characterized by that delaytest utility I wrote, so you could already have coarser steps and potentially more marginal input delays to begin with, unless the P2 data sampling "eye" happens to be well centered between transitions at your operating frequency and thus more immune to jitter.
It seems to be bad instructions, but at oddly specific locations(?) Or maybe its certain bit patterns that tickle it.
See, this corresponds to the code from the last post. Note how A2 is not the constant that should have just been loaded (and consequently A3 is garbage). Presumably the MOVEA got turned into something else, since the A2 value is a plausible RAM address that could have been in the register upon entering whatever this function is.
If it is PSRAM bus corruption, I know it must be the memory arbiter's issue and not your driver, since otherwise it wouldn't crash at random moments...
Another theory was that there's some freak case where the FIFO takes too long to dump the data, so the 68k reads stale data from the previous request, but that doesn't line up with the evidence that instructions in the middle of the execution stream get bungled.
Well, it seems enabling registered I/O for both clock and data and bumping the DELAY to 4 cyc boots, so I guess I'll let that one run over night if it doesn't crash before I get to bed.
Anyways, appreciate that I have been staring at this for hours now. "Marco Rossi wins by doing nothing"?
Hope it helps with your bug.
Nope, crashed again. Will try setting only the bus, not the clock, to registered. That also boots, so uh well, gotta wait some minutes on a verdict.
Also be sure that you meet all PSRAM timing requirements between accesses in your arbiter. You probably will due to the overhead instructions but it might be worth double checking anyway.
UPDATE: And check that any of the page boundary crossings are ok too.