I wonder...
If there was a tool for NeoYume that could create a raw binary image after download/munging data from SD (e.g, write back raw file to SD) then people with the HyperFlash breakout board on P2-EVAL might be able to flash this board directly and run a game sized up to 32MB, or even possibly 48MB if HyperFlash and HyperRAM could be run at the same time. It could enable very fast boot up to the last game as well...
I'm just thinking if this system can run off HyperRAM, it might also run (directly) off HyperFlash. HyperFlash just has different input latency per burst (it is a little slower, but it's quite good at sysclk/2 I think). Page wrapping is hopefully not an issue with your aligned accesses.
Tubular and I were actually chatting about this today... I think you can get 32-64MB of HyperFlash for around $11-$16 or so which could make a good "cartridge" emulation, though it's obviously not dirt cheap like SD + PSRAM. If the game could run directly out of HyperFlash it could be handy and save load time. Good if you are playing the same game a lot before switching - which would need a reflash unless you swap cartridges.
There's a huge difference in initial random read address latency between NOR Flash and NAND Flash. As in NAND is 1000 fold slower. For XIP without large caches, that's a death sentence.
@evanh said:
There's a huge difference in initial random read address latency between NOR Flash and NAND Flash. As in NAND is 1000 fold slower. For XIP without large caches, that's a death sentence.
Maybe but I'm talking about HyperFlash. It's very much faster.
@rogloh said:
I wonder...
If there was a tool for NeoYume that could create a raw binary image after download/munging data from SD (e.g, write back raw file to SD) then people with the HyperFlash breakout board on P2-EVAL might be able to flash this board directly and run a game sized up to 32MB, or even possibly 48MB if HyperFlash and HyperRAM could be run at the same time. It could enable very fast boot up to the last game as well...
You could totally hack the code to dump PSRAM contents to disk after loading and program some flash with that. The run-from-flash version would need to be hardcoded to that game and have its load mechanism dummied out (so it still ends up with the right pointer/bankswitch setup). That'd be enough for a proof-of-concept, anyways. If you wanted to do it properly, you'd write a processing tool that works without having to actually have a large PSRAM board (either native or on PC). That would also include a parameter block to contain the pointer stuff, so the actual executable doesn't need to have a gamedb.
But since when is the Hyper breakout 48MB? I thought it was 8MB RAM + 16MB Flash?
@rogloh said:
I think it is 32MB flash + 16MB RAM. Maybe if the games are well partitioned between code and data that could help, not sure.
I mean they kinda are by necessity? C/V/S files contain data of different types. M and P are code/data visible to the CPUs. Didn't I explain that already?
@rogloh said:
I think it is 32MB flash + 16MB RAM. Maybe if the games are well partitioned between code and data that could help, not sure.
I mean they kinda are by necessity? C/V/S files contain data of different types. M and P are code/data visible to the CPUs. Didn't I explain that already?
You did, so I'm thinking some code could fit in HyperRAM and some graphics (probably larger) could fit in HyperFlash (for example).
This is somewhat pertinent right now as I'm rewriting some of my memory driver and HyperFlash stuff to try to decouple the outer memory object from HyperFlash specfic APIs.
That sounds promising ... Looking around I'm seeing mention of MirrorBit for extended capacity without compromising performance, and presumably it applies to NOR Flash. Seems the 2002 tech has just come out of patent.
Another day, another TEST/TESTB bug: LD A,I doesn't set the correct parity flag due to this (IFF1 instead of IFF2). Doesn't seem to actually fix anything, but neat to know.
Anyways, this is what my life looks like right now. This spaghett here is some 5 or so jumps removed from the actual register writes and is supposed to get a pointer to a sample info struct (containing start/end V ROM addresses and some other(?) stuff) into IX. I only know that C is the channel number from fortunate context of the call site, lol. Also ghidra really loves Z80 register pairs and totally doesn't break down in their presence at all (weird given that x86 also has them, though less prominent)
Interesting observations:
- MAME vs NeoYume get different calls to $03f5 (what?)
- The correct playback contains wrong sound triggers (this is audible if you solo channel 2 (or 3 if you index by 1 like in_vgm) on a vgm rip), but these are cancelled fast enough to not cause trouble (?)
Maybe this does have something to do with that accursed EOS flag again?
I've seen that the code clears the flag, so it's probably getting some use out of it.
Okay, I thought I already quintuple checked the actual ADPCM channel logic, but an off-by-two somehow slipped by. It turns out that a channel triggered with equal start/end pointer should play exactly two samples (or in general, the byte at the end address is part of the sample) and then end (setting EOS).
That magically fixes it. Okay. Now have to check if this truly is TheFix for all issues.
EDIT: It is not TheFix. Nice try though ig.
EDIT 2: Still a reliably reproducible wrongsound in ss5 though. Just have to let the demo run through twice to have it happen very obviously.
EDIT 3: And as I say that it stops happening - sigh
Hmmm, now that I think about it... The issue is always only semi-reproducible. It happens the same way twice or trice but when you try to chase it, it goes somewhere else.
Maybe something about uninitialized PSRAM contents? Shouldn't be (usually BIOS is the final file, so some sort of overread shouldn't affect sound), but who can say. Going to just fill all memory with something now.
Except that doesn't work at all? @rogloh I think there's a bug with filling large amounts of bytes. that makes it wrap within a bank or smth.
Got one on trace!
Should have logged the key-ons, but look, it seems to forget to set the sample start LSB register (zk_memtmp1 = 0)?
So we end up with end address $9742 ($12E8401 in samples) and start $9785 ($12F0A00) - the $85 LSB remaining from the previous sound on ch4. Start > End, thus garbage sounds.
Interestingly I had to turn off 68000-side command logging to get it to happen, so maybe it's the "NMIs too close together" issue part 2, electric boogaloo. The 68000 side code has no appreciable delay loop, so maybe?
May also be part 2 of "wrong register write due to interrupt", but when I looked at it the register write code was solid. Perhaps I should log calls to the register set function, too.
EDIT: Obvious innovation discovered. If I kill everything at the first whiff of a position > end condition, I don't have to pay constant attention to the log/audio. Why didn't I think of this earlier?
Another variant: In this case, it seems a dump command didn't make it through? (i.e. end address changed while sample was still playing). But there's another access ($0CDD - volume) between the dump ($0090) and the NMI, so uhhh.
Had the key-on/dump logging disabled, what a shame.
@Wuerfel_21 said:
Except that doesn't work at all? @rogloh I think there's a bug with filling large amounts of bytes. that makes it wrap within a bank or smth.
Oh, yeah, the driver code doesn't currently fill outside of the bank if you define each bank as being 16MB and had multiple of them joined together. So you'd need to define a larger memory array size by putting the overall aggregate size into each mapped bank's own size parameter at startup time.
The size needs to be a power of 2 though so for a 96MB board like Rayman's you'd have to round this size parameter up to use S_128MB, and then it I think your should be able to fill within this total range (assuming contiguous banks are defined) before wrapping.
That works for a single bus, but for a multiple bus and multiple bank scenario I plan and need to have the outer fill or copy API break it up into smaller fill or copy operations that don't exceed each driver's device sizes (I still need to do that). Until then there are probably still some ways to break things depending on how you configure it and you are first to really starting pushing on some of the limits. Also the HyperRAM devices wrap internally on 8MB die boundaries as well...another complication, as well as any unmapped address portions.
Though I'm convinced now the issue is timing related. It seems that something is occasionally preventing register writes from coming through. May not even be related to IRQ/NMI.
Okay, something really bad is afoot. Check this out - I'm logging ADPCM address writes, ADPCM key-on writes (high register 00), NMIs and also the value of DEBC when PC is a particular value (end of high register write function, where D = reg id and C = value)
As the annotations hopefully explain, the Z80 believes it wrote to the key-on register to dump channel 5, but it actually didn't, which causes a wrongsound as soon as it writes the new end pointer, because now the YM-internal play position is behind the end pointer (actually, this case of the key-off going missing doesn't because it would immediately trigger the sound after writing the new positions, getting the channel back into a valid state)
As you can see, there's no NMI happening and to even get to the code path where DEBC is logged requires that IRQs are disabled to begin with.
My best guess is that something is corrupting the register address very occasionally?
And immediately as I say that, it hits me that I only allocated a single byte to the opn_register, but it's actually a word, so _lspc_animctr overlaps the top half. _lspc_animctr is written once per frameonly when the auto-anim timer rolls over. I am very smart.
EDIT To perhaps further explain: The top half of opn_register is either $00 or $01 to denote whether the low address port or the high address port was written. Writing address to low port and then data to high port (or vice versa) is a no-op and this is implemented as such. By corrupting this state, register writes are randomly dropped.
_lspc_animctr only updating when it actually changes explains why some games (and some stages within a game) suffer much harder from this issue.
Freebie: the background animations in Blazing Star are now also fixed. I guess it is actually using the animation count register to do them. Still no sound in that game, but as previously exposited, the Z80 is running and acking IRQs, which is enough to disturb the animation counter.
Well, released beta 05. I've marked all the games that had mysterious(tm) sound issues of the sort as OK now, without really testing them much. I'm just tired of critically listening to game sound effects (that are often a tad sloppy to begin with).
Uhhh too many? The commit that introduces the issue is pretty much exactly a month old. I'm bad at remembering what I did when, but I'd think it's about 4 days worth of active mucking about (but that'd include all the other audio fixes, which I was really motivated to do in order to figure out this one). The bigger issue is that I can't sleep at night when I know my software has stupid heisenbugs like that in it.
Also, do not worry, there's plenty more bugs. There's an LSPC related one that's been there forever: The broken graphics in Neo Drift Out. I realized that it also happens in other games, so it's a general LSPC issue I think. HEIGHT 33 I CURSE YOU.
Anyways, might be interesting to add HyperRAM support to the neoyume memory arbiter. Most likely by just completely branching off a separate copy to avoid convoluting the code further.
Just re-did the YPbPr coefficients... colors should be a bit better now. There was some degree of red/blue confusion.
Also, interesting tidbit: hotplugging headphones seems to crash NeoYume now. I think I did this before and it didn't. Hmmm. Maybe it happens because it has to simultaneously sink the 120V AC floating ground from the TV into my PC (only device with 3 pin power...), lol.
Comments
I wonder...
If there was a tool for NeoYume that could create a raw binary image after download/munging data from SD (e.g, write back raw file to SD) then people with the HyperFlash breakout board on P2-EVAL might be able to flash this board directly and run a game sized up to 32MB, or even possibly 48MB if HyperFlash and HyperRAM could be run at the same time. It could enable very fast boot up to the last game as well...
I'm just thinking if this system can run off HyperRAM, it might also run (directly) off HyperFlash. HyperFlash just has different input latency per burst (it is a little slower, but it's quite good at sysclk/2 I think). Page wrapping is hopefully not an issue with your aligned accesses.
Tubular and I were actually chatting about this today... I think you can get 32-64MB of HyperFlash for around $11-$16 or so which could make a good "cartridge" emulation, though it's obviously not dirt cheap like SD + PSRAM. If the game could run directly out of HyperFlash it could be handy and save load time. Good if you are playing the same game a lot before switching - which would need a reflash unless you swap cartridges.
There's a huge difference in initial random read address latency between NOR Flash and NAND Flash. As in NAND is 1000 fold slower. For XIP without large caches, that's a death sentence.
Maybe but I'm talking about HyperFlash. It's very much faster.
You could totally hack the code to dump PSRAM contents to disk after loading and program some flash with that. The run-from-flash version would need to be hardcoded to that game and have its load mechanism dummied out (so it still ends up with the right pointer/bankswitch setup). That'd be enough for a proof-of-concept, anyways. If you wanted to do it properly, you'd write a processing tool that works without having to actually have a large PSRAM board (either native or on PC). That would also include a parameter block to contain the pointer stuff, so the actual executable doesn't need to have a gamedb.
But since when is the Hyper breakout 48MB? I thought it was 8MB RAM + 16MB Flash?
I think it is 32MB flash + 16MB RAM. Maybe if the games are well partitioned between code and data that could help, not sure.
I mean they kinda are by necessity? C/V/S files contain data of different types. M and P are code/data visible to the CPUs. Didn't I explain that already?
If they are NOR Flash then it's good.
You did, so I'm thinking some code could fit in HyperRAM and some graphics (probably larger) could fit in HyperFlash (for example).
This is somewhat pertinent right now as I'm rewriting some of my memory driver and HyperFlash stuff to try to decouple the outer memory object from HyperFlash specfic APIs.
Extract from Cypress data sheet mentions 96ns access time.
That sounds promising ... Looking around I'm seeing mention of MirrorBit for extended capacity without compromising performance, and presumably it applies to NOR Flash. Seems the 2002 tech has just come out of patent.
Another day, another TEST/TESTB bug:
LD A,I
doesn't set the correct parity flag due to this (IFF1 instead of IFF2). Doesn't seem to actually fix anything, but neat to know.Yeah I've made that same mistake quite a few times. With test instead of testb..
Yeah, the testb mnemonic is a bit poorly chosen.
Anyways, this is what my life looks like right now. This spaghett here is some 5 or so jumps removed from the actual register writes and is supposed to get a pointer to a sample info struct (containing start/end V ROM addresses and some other(?) stuff) into IX. I only know that C is the channel number from fortunate context of the call site, lol. Also ghidra really loves Z80 register pairs and totally doesn't break down in their presence at all (weird given that x86 also has them, though less prominent)
Interesting observations:
- MAME vs NeoYume get different calls to $03f5 (what?)
- The correct playback contains wrong sound triggers (this is audible if you solo channel 2 (or 3 if you index by 1 like in_vgm) on a vgm rip), but these are cancelled fast enough to not cause trouble (?)
Maybe this does have something to do with that accursed EOS flag again?
I've seen that the code clears the flag, so it's probably getting some use out of it.
Okay, I thought I already quintuple checked the actual ADPCM channel logic, but an off-by-two somehow slipped by. It turns out that a channel triggered with equal start/end pointer should play exactly two samples (or in general, the byte at the end address is part of the sample) and then end (setting EOS).
That magically fixes it. Okay. Now have to check if this truly is TheFix for all issues.
EDIT: It is not TheFix. Nice try though ig.
EDIT 2: Still a reliably reproducible wrongsound in ss5 though. Just have to let the demo run through twice to have it happen very obviously.
EDIT 3: And as I say that it stops happening - sigh
Hmmm, now that I think about it... The issue is always only semi-reproducible. It happens the same way twice or trice but when you try to chase it, it goes somewhere else.
Maybe something about uninitialized PSRAM contents? Shouldn't be (usually BIOS is the final file, so some sort of overread shouldn't affect sound), but who can say. Going to just fill all memory with something now.
Except that doesn't work at all? @rogloh I think there's a bug with filling large amounts of bytes. that makes it wrap within a bank or smth.
Got one on trace!
Should have logged the key-ons, but look, it seems to forget to set the sample start LSB register (zk_memtmp1 = 0)?
So we end up with end address $9742 ($12E8401 in samples) and start $9785 ($12F0A00) - the $85 LSB remaining from the previous sound on ch4. Start > End, thus garbage sounds.
Interestingly I had to turn off 68000-side command logging to get it to happen, so maybe it's the "NMIs too close together" issue part 2, electric boogaloo. The 68000 side code has no appreciable delay loop, so maybe?
May also be part 2 of "wrong register write due to interrupt", but when I looked at it the register write code was solid. Perhaps I should log calls to the register set function, too.
EDIT: Obvious innovation discovered. If I kill everything at the first whiff of a position > end condition, I don't have to pay constant attention to the log/audio. Why didn't I think of this earlier?
Another variant: In this case, it seems a dump command didn't make it through? (i.e. end address changed while sample was still playing). But there's another access ($0CDD - volume) between the dump ($0090) and the NMI, so uhhh.
Had the key-on/dump logging disabled, what a shame.
Oh, yeah, the driver code doesn't currently fill outside of the bank if you define each bank as being 16MB and had multiple of them joined together. So you'd need to define a larger memory array size by putting the overall aggregate size into each mapped bank's own size parameter at startup time.
The size needs to be a power of 2 though so for a 96MB board like Rayman's you'd have to round this size parameter up to use S_128MB, and then it I think your should be able to fill within this total range (assuming contiguous banks are defined) before wrapping.
That works for a single bus, but for a multiple bus and multiple bank scenario I plan and need to have the outer fill or copy API break it up into smaller fill or copy operations that don't exceed each driver's device sizes (I still need to do that). Until then there are probably still some ways to break things depending on how you configure it and you are first to really starting pushing on some of the limits. Also the HyperRAM devices wrap internally on 8MB die boundaries as well...another complication, as well as any unmapped address portions.
Yea, that fixes it, thanks.
Though I'm convinced now the issue is timing related. It seems that something is occasionally preventing register writes from coming through. May not even be related to IRQ/NMI.
Okay, something really bad is afoot. Check this out - I'm logging ADPCM address writes, ADPCM key-on writes (high register 00), NMIs and also the value of DEBC when PC is a particular value (end of high register write function, where D = reg id and C = value)
As the annotations hopefully explain, the Z80 believes it wrote to the key-on register to dump channel 5, but it actually didn't, which causes a wrongsound as soon as it writes the new end pointer, because now the YM-internal play position is behind the end pointer (actually, this case of the key-off going missing doesn't because it would immediately trigger the sound after writing the new positions, getting the channel back into a valid state)
As you can see, there's no NMI happening and to even get to the code path where DEBC is logged requires that IRQs are disabled to begin with.
My best guess is that something is corrupting the register address very occasionally?
And immediately as I say that, it hits me that I only allocated a single byte to the
opn_register
, but it's actually a word, so_lspc_animctr
overlaps the top half._lspc_animctr
is written once per frame only when the auto-anim timer rolls over. I am very smart.EDIT To perhaps further explain: The top half of
opn_register
is either $00 or $01 to denote whether the low address port or the high address port was written. Writing address to low port and then data to high port (or vice versa) is a no-op and this is implemented as such. By corrupting this state, register writes are randomly dropped._lspc_animctr
only updating when it actually changes explains why some games (and some stages within a game) suffer much harder from this issue.Stomped
Stomped indeed.
Freebie: the background animations in Blazing Star are now also fixed. I guess it is actually using the animation count register to do them. Still no sound in that game, but as previously exposited, the Z80 is running and acking IRQs, which is enough to disturb the animation counter.
Well, released beta 05. I've marked all the games that had mysterious(tm) sound issues of the sort as OK now, without really testing them much. I'm just tired of critically listening to game sound effects (that are often a tad sloppy to begin with).
Well done on tracking that one down, nasty. How many total hours (or days?) do you reckon you've spent on that one @Wuerfel_21 ?
Uhhh too many? The commit that introduces the issue is pretty much exactly a month old. I'm bad at remembering what I did when, but I'd think it's about 4 days worth of active mucking about (but that'd include all the other audio fixes, which I was really motivated to do in order to figure out this one). The bigger issue is that I can't sleep at night when I know my software has stupid heisenbugs like that in it.
Also, do not worry, there's plenty more bugs. There's an LSPC related one that's been there forever: The broken graphics in Neo Drift Out. I realized that it also happens in other games, so it's a general LSPC issue I think. HEIGHT 33 I CURSE YOU.
I know the feeling. Once you know something is definitely broken in your own code, it's very hard to let it remain.
So hmm, the HyperRAM board is actually 16MB?
Where did I get the idea that it's 8MB from then?
Anyways, might be interesting to add HyperRAM support to the neoyume memory arbiter. Most likely by just completely branching off a separate copy to avoid convoluting the code further.
Just re-did the YPbPr coefficients... colors should be a bit better now. There was some degree of red/blue confusion.
Also, interesting tidbit: hotplugging headphones seems to crash NeoYume now. I think I did this before and it didn't. Hmmm. Maybe it happens because it has to simultaneously sink the 120V AC floating ground from the TV into my PC (only device with 3 pin power...), lol.