@rogloh said:
Also be sure that you meet all PSRAM timing requirements between accesses in your arbiter. You probably will due to the overhead instructions but it might be worth double checking anyway.
UPDATE: And check that any of the page boundary crossings are ok too.
What is the timing requirement between accesses? I remember reading about it in the datasheet but its like 1:30 AM I can't be trusted to read that now.
The boundary check/split code is 1:1 copied from megayume. Interestingly, I at some point realized that the page size was set too large, but fixing it changed literally nothing, so uhhhh. Code looks like this (is set as ATN interrupt. Yes, the graphics reads are appropriately shielded from being interrupted, I hope)
@Wuerfel_21 said:
What is the timing requirement between accesses? I remember reading about it in the datasheet but its like 1:30 AM I can't be trusted to read that now.
I just read it from an APmemory data sheet and it lists 18ns min CS HIGH time between accesses. But not sure if this is the right datasheet for P2 Edge.
"Sync bus + Async clock" setup has not crashed so far (2 continues worth of letting Metal Slug time out), so maybe that really is the ticket.
"Sync bus + Async clock" setup has not crashed so far (2 continues worth of letting Metal Slug time out), so maybe that really is the ticket.
Fingers crossed.
Final continue, if survives a full games worth of "Marco Rossi wins by doing nothing", it'll go on to the overnight attract mode test. It survived it before I started with the audio, so that's the benchmark.
@Wuerfel_21 said:
What is the timing requirement between accesses? I remember reading about it in the datasheet but its like 1:30 AM I can't be trusted to read that now.
I just read it from an APmemory data sheet and it lists 18ns min CS HIGH time between accesses. But not sure if this is the right datasheet for P2 Edge.
2 AM mind says that that's ~12 cycles. I think the case of an interrupt hitting right before the non-interrupt code is about to enter the protected section might only do 10 cycles? Eh we're overclocking, anyways.
Well at 330MHz or so each P2 clock is ~3ns, so 18ns is 6 clock cycles, or 3 P2 instructions. You should be good as long as the next transfer delays by this number of instructions between CS going high and CS going low.
Update: Is it still running okay with the updated timing?
@Wuerfel_21 said:
What is the timing requirement between accesses? I remember reading about it in the datasheet but its like 1:30 AM I can't be trusted to read that now.
I just read it from an APmemory data sheet and it lists 18ns min CS HIGH time between accesses. But not sure if this is the right datasheet for P2 Edge.
"Sync bus + Async clock" setup has not crashed so far (2 continues worth of letting Metal Slug time out), so maybe that really is the ticket.
Fingers crossed.
If it's a P2-EC32MB Rev.A, its BOM states APS6404L-3SQR-ZR (ZR-code stands for USON-8L pkg).
APMEMORY website lists "APS6404L-3SQR QSPI PSRAM" - "APM SPI 3V PSRAM Datasheet.pdf - Rev. 2.4 Oct 08, 2021" as the current datasheet for that part:
Figure 19 shows tCPH as only defined for "input", so that limit seems to be valid just for Writes.
Item 8.6 "Command Termination" (page 10) gives more detail on how reads and writes are expected to come to an end, while showing tCHD as the limiting factor, and also carry the following warning: "Not doing so will block internal refresh operations and cause
memory failure."
Perhaps I'm wrong, but, as suggested by those timing diagrams, it worths noting those PSRAMS seems to differ quite a bit from Hypers, in the sense CLK appears to be held "LOW" while ~CE is "HIGH"; IIRC, Hypers explicitly allows for continuous clocking while they're unselected.
Hope it helps...
Henrique
Addit: Convoluted as it appear to be, Table 14.5 (Page 22, DC Characteristics) shows the way Standby Current is meassured (3. Standby current is measured when CLK is in DC low state. )
Again, I can be wrong, but, patents being what they really are (... subdued narratives..., never completelly descriptives...), Standby mode needs to be forcefully waranted between valid operations, or sh...t will happen...
@Yanomani said:
Item 8.6 "Command Termination" (page 10) gives more detail on how reads and writes are expected to come to an end, while showing tCHD as the limiting factor, and also carry the following warning: "Not doing so will block internal refresh operations and cause memory failure."
Perhaps I'm wrong, but, as suggested by those timing diagrams, it worths noting those PSRAMS seems to differ quite a bit from Hypers, in the sense CLK appears to be held "LOW" while ~CE is "HIGH"; IIRC, Hypers explicitly allows for continuous clocking while they're unselected.
Don't think there's any problems comparing/reusing there. All Hyper transactions are two edges, 16-bit, per cycle. The diagrams show these as a low-to-high-to-low sequence. And I've not seen any continuous clocking for any of these memories. The only code I've seen continuous clocking used is for the Ethernet chip that ManAtWork did an impressive driver for, and that relies on preambled data start sequence.
ewww, Roger, I'm just having a nosey at your driver doc, MemoryDriverDocumentation_v09b.pdf, and trying to work out how to use it and how Ada has used it ... not matching up very well so far. The first function Ada's code calls is "getDriverAddr()" but that isn't even a documented function!
Just found an interesting pickle: Apparently the YM2149 SSG (and thus presumably the SSG subcomponent of Yamaha FM chips?) has a 32-step envelope generator while the original AY-3-8910 only has 16 (matching the fixed volume levels). Did @Ahle2 ever realize this? Well, I think the tweak isn't going to be too difficult, but still, how odd.
Slight problem encountered: There may not actually be enough cog space for YM2608 Rhythm channels (let alone YM2610 ADPCM). FM section and its envelopes are too dang complex.
I guess some space could be saved by un-unrolling (rerolling?) the SSG code. That would violate all my principles that, as aforementioned, I discovered by transcending the boundary between time and space, but if it has to be done...
It has to be done tomorrow because holy shat where did the hours go?
Well, it didn't fit after all and I had to move the rhythm generator to hubexec... (The YM2610 ADPCM needs to be hubexec, anyways, it's a decent bit more complex and then there's the ADPCM-B channel to worry about (which YM2608 technically also has, but at least on PC-98 soundcards, the bus for it wasn't hooked up to anything (and in later YM2608 compatible chips, they just removed it), so I'd have to dig to even find a soundtrack that uses it)
So anyways, here's the mega-unpolished YM2608 (so unpolished I didn't even bother to change the file header) for personal enjoyment. Excuse the poor tune selection, too lazy to hunt for anything that doesn't immediately come to mind. Not that my example tune collections are ever of particularly notable quality.
Thoughts regarding ADPCM: There's a bit of a pickle when it comes to handling key-on: If the channel is already active, it will have already requested the next data block, so a key-on command can really only be handled when that request is acknowledged (which may take a sample or two (for ADPCM-A), since ADPCM uses the bandwidth left over from the sprite reads at the end of each scanline). That's gonna be a somewhat complicated state machine I reckon. (Also there may be weird phasing issues if there are samples that are supposed to be triggered simultaneously but end up out of alignment due to high bus usage. Not sure if that ever happens (two samples being triggered at once even. That wouldn't happen with a usual track-per-channel type music driver (I guess the ability to trigger more than one sample at a time is simply a holdover from the YM2608 rhythm generator))).
I have similar situation in the player: the main portion of the bandwidth is used by the video driver, so I have to wait for a sample up to 20-25 us, way too long for the driver to work. That's why I implemented a simple cache system in the audio driver. If hit, it loads the sample from hub, if miss it preloads 256 next bytes in one PSRAM transfer. As it is way too simple, the driver still can have to wait these 30 us, but it does this only when cache miss, which then can be absorbed by a 512-sample buffer.
Another esoteric nop trap. I also had this kind of stuff: any nop anywhere before the particular point made the code crash and I still don't understand why and how.
So, uh, anyways, ADPCM. Kinda works after I hit my head against a stupid typo for a while. I think the levels (FM vs SSG vs ADPCM) are all wrong and the HMG sounds a bit odd and it's only the ADPCM-A channels (ADPCM-B uses a different codec and has a frequency register). IDK will fix tomorrow am tired.
Here's some 2AM smoothbrained Metal Slug for demonstration purposes:
Welp, got the ADPCM-B to kinda work (still freaks out occassionally) and also made the sample end register work (I think). Still so many audio issues. Anyways, I put together another ZIP because why not. Maybe @pik33 wants to have a go at it with his new edge or smth. (Perhaps interesting to know if the timing tweak is consistent across edge units)
Eitherhow, take a look at the horridness that is the ADPCM-B implementation (I'm pretty sure there's still a bug in this (or the register handler)):
opn_adpcm_b_run
'ret
loc ptra,#adpb_regbase
setq #1
rdlong opn_arg3,ptra[0] ' get repeat flag, pan flags, start/end address
rdlong opn_tempValue3,#adpcm_pollbox+6*4 wz
'modcz _clr,_nz wcz ' If outstanding memory rq, ignore commands
{if_x0} rdbyte opn_tempValue,ptra[8] wcz ' NZ-> got command. NC-> key-on, C-> dump
'if_x0 drvh #38
if_x0 wrbyte #0,ptra[8]
if_x0 bitl adpcm_active,#7
if_10 neg adpb_nextblock,#1
if_00 getword adpb_nextblock,opn_arg3,#1
if_00 shl adpb_nextblock,#4
tjf adpb_nextblock,#.norq
tjnz opn_tempValue3,#.nosample ' Stall if previous request isn't done
add adpb_nextblock,adpcm_b_base
debug(ubin_byte(adpcm_active),uhex_long(adpb_nextblock,adpcm_b_base,adpb_ptr,adpb_phase))
wrlong adpb_nextblock,#adpcm_pollbox+6*4
testbn adpcm_active,#7 wc ' C if key-on
if_nc neg adpb_nextblock,#1
if_c bith adpcm_active,#7
if_c jmp #.key_on
.norq
rdword opn_tempValue,ptra[##9]
'debug(uhex_long(adpb_phase,opn_tempValue))
add adpb_phase,opn_tempValue
cmpsub adpb_phase,##$1_0000 wc
if_nc jmp #.nosample
mov adpb_oldsample,adpb_newsample
'' Get sample and decode
mov opn_tempValue,adpb_ptr
and opn_tempValue,#63
shr opn_tempValue,#1 wc ' odd/even nibble
add opn_tempValue,##adpcm_buffers+6*32
rdbyte opn_tempValue,opn_tempValue
if_nc shr opn_tempValue,#4
bitl opn_tempValue,#3 addbits 7 wcz
altgb opn_tempValue,#adpb_adaptionTable
getbyte opn_tempValue3
shl opn_tempValue,#1
add opn_tempValue,#1
mul opn_tempValue,adpb_step
shr opn_tempValue,#3
sumc adpb_newsample,opn_tempValue
fges adpb_newsample,##-32768
fles adpb_newsample,##+32767
mul adpb_step,opn_tempValue3
shr adpb_step,#6
fge adpb_step,#127
fle adpb_step,##24576
'' Increment ptr and check end address
add adpb_ptr,#1
getword opn_tempValue,opn_arg4,#0
shl opn_tempValue,#9
sub opn_tempValue,adpb_ptr wz
testbn opn_arg3,#4 wc ' Check repeat flag
if_11 bitl adpcm_active,adpa_which
if_11 wrbyte #0,ptra[6] ' set end flag
if_not_01 jmp #.norep_retrigger
.key_on ' If we enter with NC, we repeat. If we enter with C, we key-on...
wrbyte #1,ptra[6] ' clear end flag
mov adpb_oldsample,#0
mov adpb_newsample,#0
mov adpb_phase,#0 ' Doing this for repeat feel wrong..
getword adpb_ptr,opn_arg3,#1
shl adpb_ptr,#9
mov adpb_step,#127
mov opn_tempValue,#0 ' for safety
.norep_retrigger
testb adpcm_active,#7 wc
' For repeat, we actually need to check one block early for the load
cmp opn_tempValue,#32 wz
if_11 getword adpb_nextblock,opn_arg3,#1
if_11 shl adpb_nextblock,#4
if_11 skip #%1111
test adpb_ptr,#31 wz
if_11 mov adpb_nextblock,adpb_ptr
if_11 shr adpb_nextblock,#5
if_11 add adpb_nextblock,#1
.nosample
' Linear interpolation (slightly low quality for speends)
getword opn_tempValue,adpb_phase,#0
shr opn_tempValue,#2
mov opn_tempValue2,adpb_oldsample
scas adpb_newsample,opn_tempValue
adds opn_tempValue2,0-0
scas adpb_oldsample,opn_tempValue
subs opn_tempValue2,0-0
rdbyte opn_tempValue,ptra[$B]
muls opn_tempValue2,opn_tempValue
sar opn_tempValue2,#8+1
muls opn_tempValue2,##OPN_VOLUME_MUL
sar opn_tempValue2,#14
testb adpcm_active,#7 wz
testb opn_arg3,#7+8 wc
if_11 add opn_outl,opn_tempValue2
testb opn_arg3,#6+8 wc
if_11 add opn_outr,opn_tempValue2
ret
I compiled this and I have a picture (Neo Yume alpha 03... .... .... file1 fail: 4!) on the monitor: I have to stuff the SD card with this MSLUG thing now. (and find it somewhere in the darknet first)
You also need a NeoGeo AES BIOS file at /sd/neoyume/neogeo/neo-epo.bin. You'd usually find all the BIOS variants together in a neogeo.zip.
Also, side note: For reasons entirely unrelated to the continued nonexistence of a preprocessor in propeller tool, some configuration is duplicated between config and neoyume_lower. (In particular, audio pins)
@JonnyMac said:
I'm not a gamer -- retro or otherwise -- but if someone built a P2 version of the Hydra I would buy one just to play with the code you PASM wizards are writing. Neat stuff.
Like a P2 based DemoBoard? I would love something like that too, especially if it came from Parallax themselves so it becomes a standard of sort.
Comments
What is the timing requirement between accesses? I remember reading about it in the datasheet but its like 1:30 AM I can't be trusted to read that now.
The boundary check/split code is 1:1 copied from megayume. Interestingly, I at some point realized that the page size was set too large, but fixing it changed literally nothing, so uhhhh. Code looks like this (is set as ATN interrupt. Yes, the graphics reads are appropriately shielded from being interrupted, I hope)
"Sync bus + Async clock" setup has not crashed so far (2 continues worth of letting Metal Slug time out), so maybe that really is the ticket.
I just read it from an APmemory data sheet and it lists 18ns min CS HIGH time between accesses. But not sure if this is the right datasheet for P2 Edge.
Fingers crossed.
Final continue, if survives a full games worth of "Marco Rossi wins by doing nothing", it'll go on to the overnight attract mode test. It survived it before I started with the audio, so that's the benchmark.
2 AM mind says that that's ~12 cycles. I think the case of an interrupt hitting right before the non-interrupt code is about to enter the protected section might only do 10 cycles? Eh we're overclocking, anyways.
Well at 330MHz or so each P2 clock is ~3ns, so 18ns is 6 clock cycles, or 3 P2 instructions. You should be good as long as the next transfer delays by this number of instructions between CS going high and CS going low.
Update: Is it still running okay with the updated timing?
Oh owie, no idea how that doubled in my mind. Yeah there's certainly 6 cycles in any case.
And yep, still going. I think it works. Oh man, P_SYNC_IO, gotta be my new favorite semi-undocumented magic flag.
Just let it run while you hopefully get some sleep.
If it's a P2-EC32MB Rev.A, its BOM states APS6404L-3SQR-ZR (ZR-code stands for USON-8L pkg).
APMEMORY website lists "APS6404L-3SQR QSPI PSRAM" - "APM SPI 3V PSRAM Datasheet.pdf - Rev. 2.4 Oct 08, 2021" as the current datasheet for that part:
https://apmemory.com/wp-content/uploads/APM_PSRAM_E3_QSPI-APS6404L-3SQR-v2.4-KGD_PKG.pdf
Figure 19 shows tCPH as only defined for "input", so that limit seems to be valid just for Writes.
Item 8.6 "Command Termination" (page 10) gives more detail on how reads and writes are expected to come to an end, while showing tCHD as the limiting factor, and also carry the following warning: "Not doing so will block internal refresh operations and cause
memory failure."
Perhaps I'm wrong, but, as suggested by those timing diagrams, it worths noting those PSRAMS seems to differ quite a bit from Hypers, in the sense CLK appears to be held "LOW" while ~CE is "HIGH"; IIRC, Hypers explicitly allows for continuous clocking while they're unselected.
Hope it helps...
Henrique
Addit: Convoluted as it appear to be, Table 14.5 (Page 22, DC Characteristics) shows the way Standby Current is meassured (3. Standby current is measured when CLK is in DC low state. )
Again, I can be wrong, but, patents being what they really are (... subdued narratives..., never completelly descriptives...), Standby mode needs to be forcefully waranted between valid operations, or sh...t will happen...
Yeah that's the same one then. 18ns CE# high time is required between transfers.
Don't think there's any problems comparing/reusing there. All Hyper transactions are two edges, 16-bit, per cycle. The diagrams show these as a low-to-high-to-low sequence. And I've not seen any continuous clocking for any of these memories. The only code I've seen continuous clocking used is for the Ethernet chip that ManAtWork did an impressive driver for, and that relies on preambled data start sequence.
Well, survived the night. That solves that then.
Awesome. Glad we worked it out.
ewww, Roger, I'm just having a nosey at your driver doc, MemoryDriverDocumentation_v09b.pdf, and trying to work out how to use it and how Ada has used it ... not matching up very well so far. The first function Ada's code calls is "getDriverAddr()" but that isn't even a documented function!
Just found an interesting pickle: Apparently the YM2149 SSG (and thus presumably the SSG subcomponent of Yamaha FM chips?) has a 32-step envelope generator while the original AY-3-8910 only has 16 (matching the fixed volume levels). Did @Ahle2 ever realize this? Well, I think the tweak isn't going to be too difficult, but still, how odd.
Well, got SSG going (and removed the YM2612 bitcrush/distortion). Haven't messed with the envelope steps yet. Anyways, it can do this now.
Slight problem encountered: There may not actually be enough cog space for YM2608 Rhythm channels (let alone YM2610 ADPCM). FM section and its envelopes are too dang complex.
I guess some space could be saved by un-unrolling (rerolling?) the SSG code. That would violate all my principles that, as aforementioned, I discovered by transcending the boundary between time and space, but if it has to be done...
It has to be done tomorrow because holy shat where did the hours go?
Where's the fun if it is easy to get it to fit first time around.
Well, it didn't fit after all and I had to move the rhythm generator to hubexec... (The YM2610 ADPCM needs to be hubexec, anyways, it's a decent bit more complex and then there's the ADPCM-B channel to worry about (which YM2608 technically also has, but at least on PC-98 soundcards, the bus for it wasn't hooked up to anything (and in later YM2608 compatible chips, they just removed it), so I'd have to dig to even find a soundtrack that uses it)
So anyways, here's the mega-unpolished YM2608 (so unpolished I didn't even bother to change the file header) for personal enjoyment. Excuse the poor tune selection, too lazy to hunt for anything that doesn't immediately come to mind. Not that my example tune collections are ever of particularly notable quality.
Thoughts regarding ADPCM: There's a bit of a pickle when it comes to handling key-on: If the channel is already active, it will have already requested the next data block, so a key-on command can really only be handled when that request is acknowledged (which may take a sample or two (for ADPCM-A), since ADPCM uses the bandwidth left over from the sprite reads at the end of each scanline). That's gonna be a somewhat complicated state machine I reckon. (Also there may be weird phasing issues if there are samples that are supposed to be triggered simultaneously but end up out of alignment due to high bus usage. Not sure if that ever happens (two samples being triggered at once even. That wouldn't happen with a usual track-per-channel type music driver (I guess the ability to trigger more than one sample at a time is simply a holdover from the YM2608 rhythm generator))).
I have similar situation in the player: the main portion of the bandwidth is used by the video driver, so I have to wait for a sample up to 20-25 us, way too long for the driver to work. That's why I implemented a simple cache system in the audio driver. If hit, it loads the sample from hub, if miss it preloads 256 next bytes in one PSRAM transfer. As it is way too simple, the driver still can have to wait these 30 us, but it does this only when cache miss, which then can be absorbed by a 512-sample buffer.
And once again we're having fun:
Putting a NOP anywhere above this ALTR sequence seems to break the Z80 entirely. Below doesn't matter. I'm having a headache already....
Ah yes, the wonders of having no proper namespacing.
Another esoteric nop trap. I also had this kind of stuff: any nop anywhere before the particular point made the code crash and I still don't understand why and how.
So, uh, anyways, ADPCM. Kinda works after I hit my head against a stupid typo for a while. I think the levels (FM vs SSG vs ADPCM) are all wrong and the HMG sounds a bit odd and it's only the ADPCM-A channels (ADPCM-B uses a different codec and has a frequency register). IDK will fix tomorrow am tired.
Here's some 2AM smoothbrained Metal Slug for demonstration purposes:
Welp, got the ADPCM-B to kinda work (still freaks out occassionally) and also made the sample end register work (I think). Still so many audio issues. Anyways, I put together another ZIP because why not. Maybe @pik33 wants to have a go at it with his new edge or smth. (Perhaps interesting to know if the timing tweak is consistent across edge units)
Eitherhow, take a look at the horridness that is the ADPCM-B implementation (I'm pretty sure there's still a bug in this (or the register handler)):
I compiled this and I have a picture (Neo Yume alpha 03... .... .... file1 fail: 4!) on the monitor: I have to stuff the SD card with this MSLUG thing now. (and find it somewhere in the darknet first)
You also need a NeoGeo AES BIOS file at /sd/neoyume/neogeo/neo-epo.bin. You'd usually find all the BIOS variants together in a neogeo.zip.
Also, side note: For reasons entirely unrelated to the continued nonexistence of a preprocessor in propeller tool, some configuration is duplicated between config and neoyume_lower. (In particular, audio pins)
Yes, I now have the error, lack of neo-epo.bin
Like a P2 based DemoBoard? I would love something like that too, especially if it came from Parallax themselves so it becomes a standard of sort.