Turns out HyperRAM support was actually semi-broken in MegaYume for a while now. Could load games, but anything relying on the loader reading back didn't work (i.e. SRAM support and 6-button issue mitigation). Ooop. I think I actually remember being like "I'll check that this works later" and then never did.
@Wuerfel_21 said:
Just re-did the YPbPr coefficients... colors should be a bit better now. There was some degree of red/blue confusion.
Also, interesting tidbit: hotplugging headphones seems to crash NeoYume now.
I've had this happen also when plugging the headphone jack cable on the A/V breakout into some powered PC speakers, resetting the P2-EVAL. I'm thinking it probably uses one of those nasty capacitively coupled cheapo power adapters instead of galvanic isolation. You know the sort that you feel the buzz when you touch the DC output. The grounding sleeve on TRRS must do something nasty with the P2 power as it makes connection. Spiky spike spike.
@Wuerfel_21 said:
So hmm, the HyperRAM board is actually 16MB?
Where did I get the idea that it's 8MB from then?
Anyways, might be interesting to add HyperRAM support to the neoyume memory arbiter. Most likely by just completely branching off a separate copy to avoid convoluting the code further.
Yeah it can do 16MB but there are two 8MB dies in the single device - burst accesses wrap inside each die - probably okay with your code. If you are rich you could even have two HyperRAM boards on a P2-EVAL to get you your 32MB, assuming you can setup the different CS and CLK pins per bank appropriately and use some other data bus pins.
@Wuerfel_21 said:
Just re-did the YPbPr coefficients... colors should be a bit better now. There was some degree of red/blue confusion.
Also, interesting tidbit: hotplugging headphones seems to crash NeoYume now.
I've had this happen also when plugging the headphone jack cable on the A/V breakout into some powered PC speakers, resetting the P2-EVAL. I'm thinking it probably uses one of those nasty capacitively coupled cheapo power adapters instead of galvanic isolation. You know the sort that you feel the buzz when you touch the DC output. The grounding sleeve on TRRS must do something nasty with the P2 power as it makes connection. Spiky spike spike.
In my case it actually doesn't reset the P2, it just disturbs it and the memory expansion enough to crash the 68000.
And yep, imagine like 8 ungrounded power supplies having their grounds tied together and you get what comes out of my TV. (TV, DVD player (that can only play CDs anymore), VCR (that smells of burn when powered), Tape deck, Surround receiver, Subwoofer, Wii, SNES)
But not sure why that'd affect passive headphones...
It's also not the jostling. The PSRAM board is positively jostle-proof. ZX81 this ain't~
@rogloh said:
Yeah it can do 16MB but there are two 8MB dies in the single device - burst accesses wrap inside each die - probably okay with your code. If you are rich you could even have two HyperRAM boards on a P2-EVAL to get you your 32MB, assuming you can setup the different CS and CLK pins per bank appropriately and use some other data bus pins.
I'm not rich, I just harness the power of freeloading. Or maybe I am. It's a swabian's well-kept secret.
Eitherhow, I don't know what to blame my strange ability to part people of their computer hardware on. maybe it's those "feminine charms" they(tm) talk about, but idk i haven't been sexually harassed in the process yet, so that's probably not it.
Though dual HyperRAM boards, I possess not. It'd be far more reasonable to construct a single 32MB RAM one (or 64MB flash ig), which the footprint allows for. Would need an unpopulated board and BGA soldering skills that I don't possess either.
@Wuerfel_21 said:
Though dual HyperRAM boards, I possess not. It'd be far more reasonable to construct a single 32MB RAM one (or 64MB flash ig), which the footprint allows for. Would need an unpopulated board and BGA soldering skills that I don't possess either.
Yeah BGA is a problem for home brew work. I've not seen anything larger with HyperRAM than 16MB at 3.3V
This part from Winbond would be nice (esp at $4 for 32MB) if only it was 3.3V capable:
I found this octal memory from ISSI that can do 32MB and runs at 3.3V up to 200MHz DDR. I wonder if it could handle 3 chips in parallel, if so this could make a nice 96MB board for NeoYume and P2 EVAL...basically it looks a lot like HyperRAM. It could have excellent performance at sysclk/1 and the DQSM mask avoids all RMW requirements.
DQSM has to be a single Prop2 pin, same as each data pin. For tx at sysclock/1, its output skew on the Prop2 is just as critical as the data pins ... and trying to achieve equal track lengths would be good too. Then each signal equally has three loads.
That's an impressive part you've found Roger. Not only is it 3.0 Volts, it can do 200 MHz SCLK at 3.0 Volts! Better than an overclocked Prop2 can do even.
There's no mention of dual dies in that datasheet. Maybe it's a larger die being used, but given the improved speed rating it could also be a higher density fabrication node.
@evanh said:
That's an impressive part you've found Roger. Not only is it 3.0 Volts, it can do 200 MHz SCLK at 3.0 Volts! Better than an overclocked Prop2 can do even.
Yeah it looks decent and might be worth a try on a P2. If 2 or 3 chips could be paralleled it could make a fast+large enough memory board for P2 for applications like NeoYume etc. Even a single device would be handy with 32MB. If BGA is easy enough to do on my hotplate maybe I'll try to layout a test board sometime....might need a stencil though.
Interesting that the 3V part appears to default to 133MHz, whereas 1.8V to 200MHz. I wonder why they did that. Perhaps the timing diagrams need careful checking before assuming they'll work on par at both voltages. Some stock would also be nice
-I've been thinking it would be useful to bump the HyperRam accessory capacity at some point, and improve the clock routing at the same time. Something like that part could be just the ticket. Memory tech is slowly catching up with demands
Got NeoYume working with HyperRAM. Surprisingly not-painfully. Currently no multi-bank support, so it's limited to whatever fits in 16 MB, which is (yes i've been expanding the GameDB recently):
Art of Fighting
Blue's Journey
Captain Tomaday
Crossed Swords
Cyber-Lip
Fatal Fury
Karnov's Revenge
King of the Monsters
Last Resort
League Bowling
Mahjong Kyo Retsuden
Magical Drop 2
Magician Lord
Mutation Nation
NAM-1975
Neo Drift Out (buggy!)
Ninja Combat
Puzzle de Pon (R)
Sengoku
Sengoku 2
Sonic Wings 2
Street Hoop
Spinmaster (black screen!)
Thrash Rally (minor gfx issue)
Viewpoint
Windjammers
Zed Blade
That's actually quite a lot, quantitatively. Not neccessarily qualitatively.
And I thought the HyperRAM'd be useless for this. I mean it is, it doesn't run Metal Slug lmao.
It also doesn't run any of the usual performance-sensitive ones, so IDK how well it's running. Should be quite a bit worse than 8 bit PSRAM at sysclk/2 due to high latency. Not sure how it compares to sysclk/3.
@VonSzarvas said:
Interesting that the 3V part appears to default to 133MHz, whereas 1.8V to 200MHz. I wonder why they did that. Perhaps the timing diagrams need careful checking before assuming they'll work on par at both voltages. Some stock would also be nice
The current part has is only rated for 100 MHz at 3V. It sure doesn't mind being overclocked to 169 MHz.
@evanh said:
When did you use a ZX81, Ada? Or is that just its reputation you're referring to? Solder tinned edge connector is certifiably horribly bad for sure.
Yeah, just the reputation. Though I have seen a ZX81 with memory expander in person, though they didn't power it on.
@Wuerfel_21 said:
And I thought the HyperRAM'd be useless for this. I mean it is, it doesn't run Metal Slug lmao.
That's why I thought the extra 32MB of HyperFLASH could be so interesting...for single game setups anyway (or requiring re-flashing to change games, ok if you stick to the same game for quite a while).
It also doesn't run any of the usual performance-sensitive ones, so IDK how well it's running. Should be quite a bit worse than 8 bit PSRAM at sysclk/2 due to high latency. Not sure how it compares to sysclk/3.
Yeah latency will be higher for HyperRAM at sysclk/2, and the HyperFLASH is probably even more latency too.
I think sysclk/1 operation is out of the question at over 330MHz for the HyperRAM on this breakout but it might potentially work for HyperFLASH which would help with latency. The HyperFLASH core is rated to 166MHz DDR so it's hardly even overclocked with NeoYume's operating frequency. But this is with 1.8V IO, for 3.0V it is meant to be less, that's where the unknown lies.
yea. Idk if I want to do flash support, would need some big loader rework to do properly.
I kinda want it to be done now.
Maybe I'll look into NEO-PVC support (Metal Slug 5, KOF 2003 and SNK vs. Capcom need this special chip).
Really need to make that decryption tool - I think I'll just make it a ruby script that you drop into your NEOYUME directory and it will generate all decrypted files for you.
Then I'll have one last round of looking into the known bugs and then I'll mark both emulators V1.0. Then I'll grab a vanilla coke and do something else (unless something interesting comes up - new USB driver, new RAM boards, etc). Probably in time for next Live Forum, but that's an arbitrary deadline.
EDIT: Just checked mslug5 in mame debugger: PVC RAM doesn't seem to be used - MAME initializes it to funny values on boot (bug?) but if I clear it the only location that actually does something is the bankswitch register. Wonder if I can even get away with implementing it write-only (which would solve the main headaches)
@Wuerfel_21 said:
yea. Idk if I want to do flash support, would need some big loader rework to do properly.
That's fair enough, you won't want to waste the time unless it is known to work well. Once I grab your HyperRAM release I could possibly give it it try. I do have some flash write support in my main memory driver that I'm current porting over to the hyperdrv.spin file as part of my driver refactoring. So if the applicable parts of this write code (register polling sequences etc) was put into neoyume_upper (assuming there is a little bit of space left for them) I could just make the flashing step part of the exmem_write code to get the chip programmed and then run, beginning from an erased state the first time. It could at least prove the concept even works or not.
After it's flashed the first time, I could just comment out the exmem_write functionality in the code.
I think I may just remove the mslugx-specific patching mechanism, the only thing that uses small writes and reads (at all).
That should make flash support easier.
I haven't found any other use for it and I think I can just handle that in the decryption tool when it is done.
By the same token I might remove the runtime M1 decryption (since every game with M1 encryption also has graphics encryption, which is the real hard one)
Sounds good, yeah I was looking at that patching code before and wondering what might need it. I don't think you needed that read back stuff in earlier releases (I thought it was all write only).
Tried hacking in some terrible NEO-PVC support for mslug5, no dice. Either I'm getting something wrong, the game writes the bank value as individual words (why?) or it actually does need more than just a write-only bank register.
By the way. I think you read either 2 or 4 longs for graphics and audio but how many bytes/longs do you read for code? If these blocks are aligned on 3216 byte boundaries (or you don't cross them when you start part way into a page) the flash will be happy to burst read at full speed. It doesn't like crossing its (first) page boundary unless it starts on an address which is a multiple of 3216 bytes. Only then will the data stream always remain continuous.
X's are bad cases. Numbers in the table are the word addresses accessed.
It reads some multiple of 4 bytes from an arbitrary 4-byte aligned address. So far the interface.
In practice it reads:
- large aligned blocks to init S1/M1 on startup
- 4..8 bytes from arbitrary address for 68k data read
- 32..36 bytes from arbitrary address for 68k program read
Only the last one is an issue. There is already a mechanism for block crossing avoidance, guess you can turn that down to 32 byte blocks. Other option is to set MK_ROMQUE_SIZE to 7, which should always generate an access short enough. Not sure which of these has better performance, but my bet is on the block splitter (since romque size also dictates the size of loops that can be executed without memory stalls)
EDIT: Oh, other, perhaps more performant option (assuming there are no block bounds beyond this) is to rewrite the entire 68k access path to insert an appropriate dummy streamer command to match the flash logic.
I'm not rich, I just harness the power of freeloading. Or maybe I am. It's a swabian's well-kept secret.
Eitherhow, I don't know what to blame my strange ability to part people of their computer hardware on.
It's probably because people are really impressed of what you have done with the P2. I know I am.
@evanh said:
That's an impressive part you've found Roger. Not only is it 3.0 Volts, it can do 200 MHz SCLK at 3.0 Volts! Better than an overclocked Prop2 can do even.
Is that because of DDR? So at 200MHz this part will peak at 400MB/s?
Turns out:
- yes, I forgot to actually enable the PVC bankswitch codepath at first
- yes, mslug5 writes individual words to the bank register
- yes, that's the only PVC feature the game uses (aside from the ROM scrambling ig)
Well done for a working MS5. So it's just MS3 left with sound issues now from that entire series of Metal Slug games?
@Wuerfel_21 said:
It reads some multiple of 4 bytes from an arbitrary 4-byte aligned address. So far the interface.
In practice it reads:
- large aligned blocks to init S1/M1 on startup
- 4..8 bytes from arbitrary address for 68k data read
- 32..36 bytes from arbitrary address for 68k program read
Only the last one is an issue. There is already a mechanism for block crossing avoidance, guess you can turn that down to 32 byte blocks
Yes I thought the same this afternoon, when I read through more of your code. Could just leverage that existing splitting code with a new block size parameter.
EDIT: Oh, other, perhaps more performant option (assuming there are no block bounds beyond this) is to rewrite the entire 68k access path to insert an appropriate dummy streamer command to match the flash logic.
It varies with address offset which means more code to figure out the number of clocks and dummy streamer delays to put in but I guess it could be done as well. I'd start with the block crossing change first though.
@hinv said:
@evanh said:
That's an impressive part you've found Roger. Not only is it 3.0 Volts, it can do 200 MHz SCLK at 3.0 Volts! Better than an overclocked Prop2 can do even.
Is that because of DDR? So at 200MHz this part will peak at 400MB/s?
Yes 400MB/s, in theory, but only if the P2 and IO could clock that fast which it won't. Potentially it hits something about 350MHz or so if we are lucky, or at least what Ada needs for NeoYume.
Comments
Turns out HyperRAM support was actually semi-broken in MegaYume for a while now. Could load games, but anything relying on the loader reading back didn't work (i.e. SRAM support and 6-button issue mitigation). Ooop. I think I actually remember being like "I'll check that this works later" and then never did.
I've had this happen also when plugging the headphone jack cable on the A/V breakout into some powered PC speakers, resetting the P2-EVAL. I'm thinking it probably uses one of those nasty capacitively coupled cheapo power adapters instead of galvanic isolation. You know the sort that you feel the buzz when you touch the DC output. The grounding sleeve on TRRS must do something nasty with the P2 power as it makes connection. Spiky spike spike.
Yeah it can do 16MB but there are two 8MB dies in the single device - burst accesses wrap inside each die - probably okay with your code. If you are rich you could even have two HyperRAM boards on a P2-EVAL to get you your 32MB, assuming you can setup the different CS and CLK pins per bank appropriately and use some other data bus pins.
In my case it actually doesn't reset the P2, it just disturbs it and the memory expansion enough to crash the 68000.
And yep, imagine like 8 ungrounded power supplies having their grounds tied together and you get what comes out of my TV. (TV, DVD player (that can only play CDs anymore), VCR (that smells of burn when powered), Tape deck, Surround receiver, Subwoofer, Wii, SNES)
But not sure why that'd affect passive headphones...
It's also not the jostling. The PSRAM board is positively jostle-proof. ZX81 this ain't~
I'm not rich, I just harness the power of freeloading. Or maybe I am. It's a swabian's well-kept secret.
Eitherhow, I don't know what to blame my strange ability to part people of their computer hardware on. maybe it's those "feminine charms" they(tm) talk about, but idk i haven't been sexually harassed in the process yet, so that's probably not it.
Though dual HyperRAM boards, I possess not. It'd be far more reasonable to construct a single 32MB RAM one (or 64MB flash ig), which the footprint allows for. Would need an unpopulated board and BGA soldering skills that I don't possess either.
Yeah BGA is a problem for home brew work. I've not seen anything larger with HyperRAM than 16MB at 3.3V
This part from Winbond would be nice (esp at $4 for 32MB) if only it was 3.3V capable:
https://static6.arrow.com/aropdfconversion/462dda31aac4b45d60d30cc2f4866ae14de63864/w958d8nbya5i.pdf
When did you use a ZX81, Ada? Or is that just its reputation you're referring to? Solder tinned edge connector is certifiably horribly bad for sure.
I found this octal memory from ISSI that can do 32MB and runs at 3.3V up to 200MHz DDR. I wonder if it could handle 3 chips in parallel, if so this could make a nice 96MB board for NeoYume and P2 EVAL...basically it looks a lot like HyperRAM. It could have excellent performance at sysclk/1 and the DQSM mask avoids all RMW requirements.
https://au.mouser.com/datasheet/2/198/66_67WVO32M8DALL_BLL-2932836.pdf
Sample wiring for P2 double wide breakout:
Lower 8 pins:
Upper 8 pins:
DQSM has to be a single Prop2 pin, same as each data pin. For tx at sysclock/1, its output skew on the Prop2 is just as critical as the data pins ... and trying to achieve equal track lengths would be good too. Then each signal equally has three loads.
Yeah that's true, it would be critical as a mask. Thankfully that pin is tri-stated so it could be shared over multiple devices.
That's an impressive part you've found Roger. Not only is it 3.0 Volts, it can do 200 MHz SCLK at 3.0 Volts! Better than an overclocked Prop2 can do even.
There's no mention of dual dies in that datasheet. Maybe it's a larger die being used, but given the improved speed rating it could also be a higher density fabrication node.
Yeah it looks decent and might be worth a try on a P2. If 2 or 3 chips could be paralleled it could make a fast+large enough memory board for P2 for applications like NeoYume etc. Even a single device would be handy with 32MB. If BGA is easy enough to do on my hotplate maybe I'll try to layout a test board sometime....might need a stencil though.
Ha, Mouser has half them listed as SRAM (incorrect), eg: https://nz.mouser.com/ProductDetail/ISSI/IS66WVO32M8DBLL-166BLI?qs=sGAEpiMZZMutXGli8Ay4kDJdwYep/v4xl%2Bau9pZlm6Y=
And the others correctly listed as DRAM, eg: https://nz.mouser.com/ProductDetail/ISSI/IS66WVO32M8DALL-200BLI-TR?qs=sGAEpiMZZMutXGli8Ay4kEIUxYYDpBry4MgonyKnat8=
Interesting that the 3V part appears to default to 133MHz, whereas 1.8V to 200MHz. I wonder why they did that. Perhaps the timing diagrams need careful checking before assuming they'll work on par at both voltages. Some stock would also be nice
-I've been thinking it would be useful to bump the HyperRam accessory capacity at some point, and improve the clock routing at the same time. Something like that part could be just the ticket. Memory tech is slowly catching up with demands
Got NeoYume working with HyperRAM. Surprisingly not-painfully. Currently no multi-bank support, so it's limited to whatever fits in 16 MB, which is (yes i've been expanding the GameDB recently):
That's actually quite a lot, quantitatively. Not neccessarily qualitatively.
And I thought the HyperRAM'd be useless for this. I mean it is, it doesn't run Metal Slug lmao.
It also doesn't run any of the usual performance-sensitive ones, so IDK how well it's running. Should be quite a bit worse than 8 bit PSRAM at sysclk/2 due to high latency. Not sure how it compares to sysclk/3.
The current part has is only rated for 100 MHz at 3V. It sure doesn't mind being overclocked to 169 MHz.
hmm. Good to know. Thanks.
Yeah, just the reputation. Though I have seen a ZX81 with memory expander in person, though they didn't power it on.
That's why I thought the extra 32MB of HyperFLASH could be so interesting...for single game setups anyway (or requiring re-flashing to change games, ok if you stick to the same game for quite a while).
Yeah latency will be higher for HyperRAM at sysclk/2, and the HyperFLASH is probably even more latency too.
I think sysclk/1 operation is out of the question at over 330MHz for the HyperRAM on this breakout but it might potentially work for HyperFLASH which would help with latency. The HyperFLASH core is rated to 166MHz DDR so it's hardly even overclocked with NeoYume's operating frequency. But this is with 1.8V IO, for 3.0V it is meant to be less, that's where the unknown lies.
yea. Idk if I want to do flash support, would need some big loader rework to do properly.
I kinda want it to be done now.
Maybe I'll look into NEO-PVC support (Metal Slug 5, KOF 2003 and SNK vs. Capcom need this special chip).
Really need to make that decryption tool - I think I'll just make it a ruby script that you drop into your NEOYUME directory and it will generate all decrypted files for you.
Then I'll have one last round of looking into the known bugs and then I'll mark both emulators V1.0. Then I'll grab a vanilla coke and do something else (unless something interesting comes up - new USB driver, new RAM boards, etc). Probably in time for next Live Forum, but that's an arbitrary deadline.
EDIT: Just checked mslug5 in mame debugger: PVC RAM doesn't seem to be used - MAME initializes it to funny values on boot (bug?) but if I clear it the only location that actually does something is the bankswitch register. Wonder if I can even get away with implementing it write-only (which would solve the main headaches)
That's fair enough, you won't want to waste the time unless it is known to work well. Once I grab your HyperRAM release I could possibly give it it try. I do have some flash write support in my main memory driver that I'm current porting over to the hyperdrv.spin file as part of my driver refactoring. So if the applicable parts of this write code (register polling sequences etc) was put into neoyume_upper (assuming there is a little bit of space left for them) I could just make the flashing step part of the exmem_write code to get the chip programmed and then run, beginning from an erased state the first time. It could at least prove the concept even works or not.
After it's flashed the first time, I could just comment out the exmem_write functionality in the code.
I think I may just remove the mslugx-specific patching mechanism, the only thing that uses small writes and reads (at all).
That should make flash support easier.
I haven't found any other use for it and I think I can just handle that in the decryption tool when it is done.
By the same token I might remove the runtime M1 decryption (since every game with M1 encryption also has graphics encryption, which is the real hard one)
Sounds good, yeah I was looking at that patching code before and wondering what might need it. I don't think you needed that read back stuff in earlier releases (I thought it was all write only).
Tried hacking in some terrible NEO-PVC support for mslug5, no dice. Either I'm getting something wrong, the game writes the bank value as individual words (why?) or it actually does need more than just a write-only bank register.
Well, tomorrow...
By the way. I think you read either 2 or 4 longs for graphics and audio but how many bytes/longs do you read for code? If these blocks are aligned on 3216 byte boundaries (or you don't cross them when you start part way into a page) the flash will be happy to burst read at full speed. It doesn't like crossing its (first) page boundary unless it starts on an address which is a multiple of 3216 bytes. Only then will the data stream always remain continuous.
X's are bad cases. Numbers in the table are the word addresses accessed.
It reads some multiple of 4 bytes from an arbitrary 4-byte aligned address. So far the interface.
In practice it reads:
- large aligned blocks to init S1/M1 on startup
- 4..8 bytes from arbitrary address for 68k data read
- 32..36 bytes from arbitrary address for 68k program read
Only the last one is an issue. There is already a mechanism for block crossing avoidance, guess you can turn that down to 32 byte blocks. Other option is to set MK_ROMQUE_SIZE to 7, which should always generate an access short enough. Not sure which of these has better performance, but my bet is on the block splitter (since romque size also dictates the size of loops that can be executed without memory stalls)
EDIT: Oh, other, perhaps more performant option (assuming there are no block bounds beyond this) is to rewrite the entire 68k access path to insert an appropriate dummy streamer command to match the flash logic.
It's probably because people are really impressed of what you have done with the P2. I know I am.
Is that because of DDR? So at 200MHz this part will peak at 400MB/s?
Turns out:
- yes, I forgot to actually enable the PVC bankswitch codepath at first
- yes, mslug5 writes individual words to the bank register
- yes, that's the only PVC feature the game uses (aside from the ROM scrambling ig)
Thus behold:
Well done for a working MS5. So it's just MS3 left with sound issues now from that entire series of Metal Slug games?
Yes I thought the same this afternoon, when I read through more of your code. Could just leverage that existing splitting code with a new block size parameter.
It varies with address offset which means more code to figure out the number of clocks and dummy streamer delays to put in but I guess it could be done as well. I'd start with the block crossing change first though.
Yes 400MB/s, in theory, but only if the P2 and IO could clock that fast which it won't. Potentially it hits something about 350MHz or so if we are lucky, or at least what Ada needs for NeoYume.