That means there's 3 "broken sound" games left now (unless I go looking for more lmao). MS3 isn't fixable without major refactor, Blazing Star I have a feeling will just work now and I think I now remember that the samsho5 issue had something to do with ROM encryption.
Decent work I guess, now have to hook up the P2EVAL/96MB again to check if the EI issue was responsible for the wrong sound issue (only games that reproduce it reliably enough to not go insane are samsho3/samsho4) and perhaps finally implement sysclk/3 into the memory arbiter (@rogloh ,do you have drivers with some sort of slow mode support ready? MSX dongle patch requires the loader to read back from RAM).
@Wuerfel_21 said:
rogloh ,do you have drivers with some sort of slow mode support ready? MSX dongle patch requires the loader to read back from RAM.
Yes I do. The 8 bit code has already been modified with that same change, although I should test it out specifically on Rayman's board to make sure it is okay as well (I had actually only tested the concept on the HW with the 4 bit mode).
To enable sysclk/3 you only need to set a flag bit in the startup structure passed into the init code, but that bit is subject to change in a future official release if I decide to share its use across drivers or make it driver specific, as I haven't quite decided which approach is best yet.
@Wuerfel_21 I just tested my slower sysclk/3 enabled 8 bit driver on Rayman's 96MB board and it certainly improved things like the 4 bit version did previously so it may help improve your NeoYume performance.
Here is the driver file you need, inside the attached zip. When you launch the driver, in the startupData parameters flags value you will need to enable the SLOWCLK_BIT which selects slower operation instead of regular speed (sysclk/2), and the CLKSEL_BIT (a 1 selects sysclk/3, and a 0 selects sysclk/4)
So for enabling sysclk/3 you would do this:
' optional FLAGS for driver
FLAGS = 1<<driver.SLOWCLK_BIT | 1<<driver.CLKSEL_BIT
I've also included a custom binary to test the memory's delays at sysclk/3 as these values might also need to be modified in your setup. I found I needed to increase the delay to 14, for good operation at 337MHz. Run the binary with the console port at 115200 bps and you can vary the pins for your board's location plus setup the additional CE pins to drive high as you test each CE bank.
Enter the base pin number for your PSRAM (0,8,16...48) [32]: 32
Enter the chip enable pin number for your PSRAM [42]: 43
Enter the clock pin number for your PSRAM [104]: 104
Enter an additional CE/CLK P2 pin to drive high (0-55), or a higher value to exit [56]: 42
Enter an additional CE/CLK P2 pin to drive high (0-55), or a higher value to exit [56]: 44
Enter an additional CE/CLK P2 pin to drive high (0-55), or a higher value to exit [56]: 45
Enter an additional CE/CLK P2 pin to drive high (0-55), or a higher value to exit [56]: 46
Enter an additional CE/CLK P2 pin to drive high (0-55), or a higher value to exit [56]: 47
Enter an additional CE/CLK P2 pin to drive high (0-55), or a higher value to exit [56]: 56
Enter a starting frequency to test in MHz (100-350) : [320] 320
Enter the ending frequency to test in MHz (320-350) : [340] 340
Enter 1 to use the automatic delay value only, or 0 to test over the delay range : [0] 0
Enter 1 to display the first error encountered, or 0 to not display error details : [0] 0
Testing P2 from 320000000 - 340000000 Hz
Successful data reads from 100 block transfers of 8192 random bytes
Frequency Delay 3 4 5 6 7 8 9 10 11 12 13 14
320000000 (11) 0% 0% 0% 0% 0% 0% 0% 3% 43% 96% 100% 100%
321000000 (11) 0% 0% 0% 0% 0% 0% 0% 2% 45% 96% 100% 100%
322000000 (11) 0% 0% 0% 0% 0% 0% 0% 2% 46% 95% 100% 100%
323000000 (11) 0% 0% 0% 0% 0% 0% 0% 1% 35% 98% 100% 100%
324000000 (11) 0% 0% 0% 0% 0% 0% 0% 2% 31% 99% 100% 100%
325000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 29% 96% 100% 100%
326000000 (11) 0% 0% 0% 0% 0% 0% 0% 1% 28% 96% 100% 100%
327000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 36% 91% 100% 100%
328000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 20% 86% 100% 100%
329000000 (11) 0% 0% 0% 0% 0% 0% 0% 1% 16% 81% 100% 100%
330000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 14% 83% 100% 100%
331000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 7% 86% 100% 100%
332000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 17% 83% 100% 100%
333000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 13% 78% 99% 100%
334000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 7% 75% 100% 100%
335000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 9% 66% 98% 100%
336000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 11% 66% 100% 100%
337000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 7% 72% 97% 100%
338000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 5% 63% 99% 100%
339000000 (11) 0% 0% 0% 0% 0% 0% 0% 0% 7% 60% 95% 100%
340000000 (12) 0% 0% 0% 0% 0% 0% 0% 0% 1% 42% 92% 100%
@Wuerfel_21 said:
Thanks. (Ideally I'd like a slow-capable version of the 16bit and 4bit-dualCE drivers, too, for symmetry. But it's anything but urgent)
I already have the 16 bit one, and a 4 bit one (whether the latter includes the dual-CE feature, I'll need to doublecheck as it might not have that included by default).
@Wuerfel_21 said:
Thanks. (Ideally I'd like a slow-capable version of the 16bit and 4bit-dualCE drivers, too, for symmetry. But it's anything but urgent)
I already have the 16 bit one, and a 4 bit one (whether the latter includes the dual-CE feature, I'll need to doublecheck as it might not have that included by default).
Here's all the bus size variants with sysclk/3 and sysclk/4 support, including the dual CE feature for 4 bit buses. It's still a special custom build for you and not part of an official release that is more extensively tested but you should be able to use it anyway. Let me know if you have problems...
If you want to reduce the total driver overhead, you'll likely be able replace all this and just use evanh's SPIN2 based APIs to do the game init now. You'll just need to manage the 1/2/4 kB page crossings and handle any read-modify-write issues yourself in that case (there won't be any if writes are always aligned to longs). I imagine that's not too difficult during your block copies off the SD card.
Good news: got the P2EVAL 96MB setup back onto the bench. This simple task somehow took me days.
Bad news: I don't think the wrong sound issue got fixed (at least not for samsho3 - though IDK if it ever manifested on the edge setup... something something timing? Then again, it does happen very rarely in most games)
Also, audio still busted in Blazing Star, so that's another tree I can bark up first. (oink up?)
@rogloh I think there's smth wrong with the 8 bit driver.
The mslugx pattern patching routine doesn't work on it. It does find some patterns, but it just boots to black screen afterwards (expected behaviour for if the patch isn't applied at all is to show the anti-piracy message). Perhaps an issue with unaligned writes? (though the patched instructions are of course always word-aligned). 4 bit driver on same config works, as do all other games on 8 bit. Also note I'm trying all of this at sysclk/4 still.
I think I've got the sysclk/3 mode working. It will replace the sysclk/4 mode because that sucked. Just need to do some testing.
Also, as said the main hangup with the merge back into the master branch was the sound issue. I am somewhat convinced that the issue was always there, it's just that some games are more or less (or not at all?) affected by it and I never noticed it before trying samsho3/4 which are heavily susceptible to it. I haven't been able to reproduce a different "wrong sound" type issue (sporadically unintended sounds and rarely, dropped notes on lastblad intro), so I think that one was caused by the EI thing?. Add to that all the stuff that is now fixed (continuous sound, broken sound in most games etc), I really think that it should happen now. Or maybe I'm just exhausted of ability to critically listen to game sound effects for hours.
Relatedly, I looked into what's going on with Blazing Star. No IRQ oddities, just doesn't do anything after the NeoGeo jingle (except for acking the interrupt and setting the SSG volumes), Z80 logic issue?
Welp, sound issues part idek anymore: Was the sound of a soldier getting knifed in mslug4 weird and distorted before? (i.e. caused by sys/3 mode or just something I missed earlier). If the former, well, one slightly busted sfx doesn't outweigh everything not running like Smile anymore I don't think.
@Wuerfel_21 said:
I think I've got the sysclk/3 mode working. It will replace the sysclk/4 mode because that sucked. Just need to do some testing.
Ok so no write issues then from what you mentioned before...? The 8 bit version was tested with my delay test so writes should still work ok unless it is something related to the clock edge location with respect to the data on some HW or the duty cycle thing being violated.
So how much better is sysclk/3 v sysclk/4 with your testing, and what's your aim with sysclk/3? Do you leave it to the user to setup themselves as a build option and keep using sysclk/2 otherwise? Making it game dependent probably makes no sense, it's more a HW setup config that needs it when you parallel lots of PSRAMs.
@Wuerfel_21 said:
I think I've got the sysclk/3 mode working. It will replace the sysclk/4 mode because that sucked. Just need to do some testing.
Ok so no write issues then from what you mentioned before...? The 8 bit version was tested with my delay test so writes should still work ok unless it is something related to the clock edge location with respect to the data on some HW or the duty cycle thing being violated.
No, does have the issue, it just doesn't mess up the normal block loading.
So how much better is sysclk/3 v sysclk/4 with your testing, and what's your aim with sysclk/3? Do you leave it to the user to setup themselves as a build option and keep using sysclk/2 otherwise? Making it game dependent probably makes no sense, it's more a HW setup config that needs it when you parallel lots of PSRAMs.
@Wuerfel_21 said:
No, does have the issue, it just doesn't mess up the normal block loading.
Ok, maybe your patching requires the PSRAM read operation with slightly different delay timing, now we have sysclk/3 being used. I had found delay=14 worked, but maybe 13 or 15 can be tried as well if your boards are slightly different. The delay test program I included with the first syslk/3 zip in post #1056 can be used to find optimal delay values for the banks at your operating frequency. With any luck there is a common delay value that works for all banks, if not we will have to patch each bank with a different delay. I'm not sure there is an API for that yet in the simple wrapper driver but changes could be made there if needed.
I've tried that, that's not it. Same delay works for 4bit on both halves (I also removed data so code ends up in different banks etc bla bla) I'm very sure the issue is with the 8bit driver specifically.
When you patch are you using read bursts to read back or reading just a single element (byte/word/long), updating and writing back? Which particular API or mailbox request type are you using for your reads and writes?
Uhhh, just look at the code if you need more detail, but it reads back blocks at a time, searches for matches and then writes the replacement data for each.
Ok, will try to take a look when I get a chance. There were some changes in the driver code that may not be getting fully exercised by the delay test so there could be a bug somewhere.
I wonder if the 16 bit slow version is broken as well?
@rogloh said:
I wonder if the 16 bit slow version is broken as well?
Would have to test that on the P2EDGE. I think it isn't slow-specific though, I never really tried this on 8bit fast mode, so it probably didn't work to begin with.
Unrelatedly, just noticed there wasn't a credit screen yet. Not as comprehensive as the MegaYume one because we don't have the ShiftJIS renderer and I wanted to cram in the current memory config. Also only displays when you hit D in the menu since it'd be really annoying otherwise, given lack of quit-to-menu.
true means async, false means sync. The read is sync, the write is async (and probably takes less time than the sprintf call lmao). Unless I'm having a big dum.
Anyways, NeoYume Beta 04
is finally upon us on the master branch. No time for cool imagery right now. I encourage everyone to try it, as it probably won't work for you lol.
Current default RAM config should work for P2EDGE, but I'll have to put together a guide or smth on how to configure this nonsense.
@Wuerfel_21 said:
true means async, false means sync.
Oh yeah, you're right. Read it backwards. The ifnot condition must have confused me, or just some very poor sleep last night. Was maybe 10C warmer than normal overnight here in our winter! Time for coffee...
Ok, so I patched my video demo with the slow 8 bit PSRAM driver code and found it does have a write problem vs the original. The data right at the end of a write burst gets corrupted when the last address is not long aligned, showing up as corruption on the right side of my filled rectangles on screen, however all the single pixel writes seem okay.
Until I figure it out fully and fix this later today you might be able to work around it by writing individual (hopefully aligned) single longs instead of a burst when you patch your code. Use a variant of your exmem_fill function instead of exmem_write to fill with length of 1 long (use R_WRITELONG). The random length write burst or multiple fills seems to be the problem because that needs to figure out the ending condition masks. I know I had to change the driver code in that area to save space in the 16 bit driver which made its way into 8 bit as well, so there's certainly some sort of bug in there somewhere.
The fact is that the delay test I ran worked wasn't enough of a test here as it didn't exercise all the different starting/ending write address conditions that a random fill would.
Found the bug - I still had some code using the old LUT table which I'd removed to save space. Here's the patch fix for both 8 bit and 16 bit slow drivers.
Replace these lines:
'we have 1-3 more aligned residual bytes left to send as the trailer
or d, #$1f0 'setup mux mask address
rdlut pa, d 'read mux mask for this length at offset 0
With this:
'we have 1-3 more aligned residual bytes left to send as the trailer
mul d, #8 'scale into bytes
sub d, #1 'bmask adds 1 to its argument so compensate here
bmask pa, d 'create mux mask
Just implemented NEO-CMC50 Z80 ROM unscrambling - since it operates on 64K blocks, I can just do it on load. And since the M1 is small, anyways, it doesn't even take that obnoxiously long. So now you don't need to hunt down a pre-decrypted ROM (no, you can't even dump these from MAME easily) for stuff that needs it.
Comments
That means there's 3 "broken sound" games left now (unless I go looking for more lmao). MS3 isn't fixable without major refactor, Blazing Star I have a feeling will just work now and I think I now remember that the samsho5 issue had something to do with ROM encryption.
Decent work I guess, now have to hook up the P2EVAL/96MB again to check if the EI issue was responsible for the wrong sound issue (only games that reproduce it reliably enough to not go insane are samsho3/samsho4) and perhaps finally implement sysclk/3 into the memory arbiter (@rogloh ,do you have drivers with some sort of slow mode support ready? MSX dongle patch requires the loader to read back from RAM).
Unrelatedly, ported the LCD support to MegaYume, too. Like HDMI, less than ideal for H32 mode (256x224).
That's hardly a fix. The fact that it's outside the regular IRQ mechanisms makes it a flaw. NMIs are awful things, right at the architectural level.
PS: RESET is the closest thing to an NMI that should exist.
Yes I do. The 8 bit code has already been modified with that same change, although I should test it out specifically on Rayman's board to make sure it is okay as well (I had actually only tested the concept on the HW with the 4 bit mode).
To enable sysclk/3 you only need to set a flag bit in the startup structure passed into the init code, but that bit is subject to change in a future official release if I decide to share its use across drivers or make it driver specific, as I haven't quite decided which approach is best yet.
@Wuerfel_21 I just tested my slower sysclk/3 enabled 8 bit driver on Rayman's 96MB board and it certainly improved things like the 4 bit version did previously so it may help improve your NeoYume performance.
Here is the driver file you need, inside the attached zip. When you launch the driver, in the startupData parameters flags value you will need to enable the SLOWCLK_BIT which selects slower operation instead of regular speed (sysclk/2), and the CLKSEL_BIT (a 1 selects sysclk/3, and a 0 selects sysclk/4)
So for enabling sysclk/3 you would do this:
I've also included a custom binary to test the memory's delays at sysclk/3 as these values might also need to be modified in your setup. I found I needed to increase the delay to 14, for good operation at 337MHz. Run the binary with the console port at 115200 bps and you can vary the pins for your board's location plus setup the additional CE pins to drive high as you test each CE bank.
Thanks. (Ideally I'd like a slow-capable version of the 16bit and 4bit-dualCE drivers, too, for symmetry. But it's anything but urgent)
Ada,
My code has all that included. But it's only the bare tx/rx routines, not a manager.
I already have the 16 bit one, and a 4 bit one (whether the latter includes the dual-CE feature, I'll need to doublecheck as it might not have that included by default).
Here's all the bus size variants with sysclk/3 and sysclk/4 support, including the dual CE feature for 4 bit buses. It's still a special custom build for you and not part of an official release that is more extensively tested but you should be able to use it anyway. Let me know if you have problems...
If you want to reduce the total driver overhead, you'll likely be able replace all this and just use evanh's SPIN2 based APIs to do the game init now. You'll just need to manage the 1/2/4 kB page crossings and handle any read-modify-write issues yourself in that case (there won't be any if writes are always aligned to longs). I imagine that's not too difficult during your block copies off the SD card.
Good news: got the P2EVAL 96MB setup back onto the bench. This simple task somehow took me days.
Bad news: I don't think the wrong sound issue got fixed (at least not for samsho3 - though IDK if it ever manifested on the edge setup... something something timing? Then again, it does happen very rarely in most games)
Also, audio still busted in Blazing Star, so that's another tree I can bark up first. (oink up?)
@rogloh I think there's smth wrong with the 8 bit driver.
The mslugx pattern patching routine doesn't work on it. It does find some patterns, but it just boots to black screen afterwards (expected behaviour for if the patch isn't applied at all is to show the anti-piracy message). Perhaps an issue with unaligned writes? (though the patched instructions are of course always word-aligned). 4 bit driver on same config works, as do all other games on 8 bit. Also note I'm trying all of this at sysclk/4 still.
I think I've got the sysclk/3 mode working. It will replace the sysclk/4 mode because that sucked. Just need to do some testing.
Also, as said the main hangup with the merge back into the master branch was the sound issue. I am somewhat convinced that the issue was always there, it's just that some games are more or less (or not at all?) affected by it and I never noticed it before trying samsho3/4 which are heavily susceptible to it. I haven't been able to reproduce a different "wrong sound" type issue (sporadically unintended sounds and rarely, dropped notes on lastblad intro), so I think that one was caused by the EI thing?. Add to that all the stuff that is now fixed (continuous sound, broken sound in most games etc), I really think that it should happen now. Or maybe I'm just exhausted of ability to critically listen to game sound effects for hours.
Relatedly, I looked into what's going on with Blazing Star. No IRQ oddities, just doesn't do anything after the NeoGeo jingle (except for acking the interrupt and setting the SSG volumes), Z80 logic issue?
Welp, sound issues part idek anymore: Was the sound of a soldier getting knifed in mslug4 weird and distorted before? (i.e. caused by sys/3 mode or just something I missed earlier). If the former, well, one slightly busted sfx doesn't outweigh everything not running like Smile anymore I don't think.
Ok so no write issues then from what you mentioned before...? The 8 bit version was tested with my delay test so writes should still work ok unless it is something related to the clock edge location with respect to the data on some HW or the duty cycle thing being violated.
So how much better is sysclk/3 v sysclk/4 with your testing, and what's your aim with sysclk/3? Do you leave it to the user to setup themselves as a build option and keep using sysclk/2 otherwise? Making it game dependent probably makes no sense, it's more a HW setup config that needs it when you parallel lots of PSRAMs.
No, does have the issue, it just doesn't mess up the normal block loading.
Yeah, it's a config option.
Ok, maybe your patching requires the PSRAM read operation with slightly different delay timing, now we have sysclk/3 being used. I had found delay=14 worked, but maybe 13 or 15 can be tried as well if your boards are slightly different. The delay test program I included with the first syslk/3 zip in post #1056 can be used to find optimal delay values for the banks at your operating frequency. With any luck there is a common delay value that works for all banks, if not we will have to patch each bank with a different delay. I'm not sure there is an API for that yet in the simple wrapper driver but changes could be made there if needed.
I've tried that, that's not it. Same delay works for 4bit on both halves (I also removed data so code ends up in different banks etc bla bla) I'm very sure the issue is with the 8bit driver specifically.
When you patch are you using read bursts to read back or reading just a single element (byte/word/long), updating and writing back? Which particular API or mailbox request type are you using for your reads and writes?
Uhhh, just look at the code if you need more detail, but it reads back blocks at a time, searches for matches and then writes the replacement data for each.
Ok, will try to take a look when I get a chance. There were some changes in the driver code that may not be getting fully exercised by the delay test so there could be a bug somewhere.
I wonder if the 16 bit slow version is broken as well?
Would have to test that on the P2EDGE. I think it isn't slow-specific though, I never really tried this on 8bit fast mode, so it probably didn't work to begin with.
Unrelatedly, just noticed there wasn't a credit screen yet. Not as comprehensive as the MegaYume one because we don't have the ShiftJIS renderer and I wanted to cram in the current memory config. Also only displays when you hit D in the menu since it'd be really annoying otherwise, given lack of quit-to-menu.
I noticed a use of async mode for your exmem reads. You may want to try all sync only so you know the data is ready.
true means async, false means sync. The read is sync, the write is async (and probably takes less time than the sprintf call lmao). Unless I'm having a big dum.
Anyways,
NeoYume Beta 04
is finally upon us on the master branch. No time for cool imagery right now. I encourage everyone to try it, as it probably won't work for you lol.
Current default RAM config should work for P2EDGE, but I'll have to put together a guide or smth on how to configure this nonsense.
Oh yeah, you're right. Read it backwards. The ifnot condition must have confused me, or just some very poor sleep last night. Was maybe 10C warmer than normal overnight here in our winter! Time for coffee...
Ok, so I patched my video demo with the slow 8 bit PSRAM driver code and found it does have a write problem vs the original. The data right at the end of a write burst gets corrupted when the last address is not long aligned, showing up as corruption on the right side of my filled rectangles on screen, however all the single pixel writes seem okay.
Until I figure it out fully and fix this later today you might be able to work around it by writing individual (hopefully aligned) single longs instead of a burst when you patch your code. Use a variant of your exmem_fill function instead of exmem_write to fill with length of 1 long (use R_WRITELONG). The random length write burst or multiple fills seems to be the problem because that needs to figure out the ending condition masks. I know I had to change the driver code in that area to save space in the 16 bit driver which made its way into 8 bit as well, so there's certainly some sort of bug in there somewhere.
The fact is that the delay test I ran worked wasn't enough of a test here as it didn't exercise all the different starting/ending write address conditions that a random fill would.
Found the bug - I still had some code using the old LUT table which I'd removed to save space. Here's the patch fix for both 8 bit and 16 bit slow drivers.
Replace these lines:
With this:
Indeed, that resolves the issue.
Good.
Just implemented NEO-CMC50 Z80 ROM unscrambling - since it operates on 64K blocks, I can just do it on load. And since the M1 is small, anyways, it doesn't even take that obnoxiously long. So now you don't need to hunt down a pre-decrypted ROM (no, you can't even dump these from MAME easily) for stuff that needs it.
Slightly terrible code though.
Got sound working in samsho5 and oh, what is that, a "wrong sound" issue that reproduces immediately and reliably on boot?
(sorry for low recording level)
That was surely worth it.
Also, I can't help but notice that applying the blood color censorship (white->red) to those stains on the title screen is a bit unfortunate, lol.