Even if this direct execution from OctalFlash/HyperFlash thing doesn't work out for latency reasons it still might be useful for game storage, assuming you don't want lots of games on a machine. E.g. you can get 2Gbit flash chips now and two of them on a breakout board would have room for all 6 MetalSlug games for example. Then you could fast copy them into PSRAM/Octal/HyperRAM etc at ~170MB/s so games could load in less than a second vs the SD card loading (takes ~24s). It could make for a responsive system, albeit with fewer games.
@rogloh said:
Well done for a working MS5. So it's just MS3 left with sound issues now from that entire series of Metal Slug games?
Yeah, mslug3 is kinda an unsolvable issue since it actually uses more than 128K of Z80 code. It may infact be the only game that does (some of the later games have larger ROMs, too, but if you look at them they just use the first couple K. MS5's ROM is 512k, but actual data (post-decryption) stops short of 32K - makes sense since it uses streaming music)
Pity, since a lot of people seem to like MS3 the best. Interesting that it works at all though with some of the ROM missing.
@rogloh said:
Even if this direct execution from OctalFlash/HyperFlash thing doesn't work out for latency reasons it still might be useful for game storage, assuming you don't want lots of games on a machine. E.g. you can get 2Gbit flash chips now and two of them on a breakout board would have room for all 6 MetalSlug games for example. Then you could fast copy them into PSRAM/Octal/HyperRAM etc at ~170MB/s so games could load in less than a second vs the SD card loading (takes ~24s). It could make for a responsive system, albeit with fewer games.
I guess if you were building some sort of dedicated thing it'd be neat. Maybe you'd have a frontend that loads ROMs into RAM and then calls the appropriate emulator (MegaYume/NeoYume/whatever)?
Though direct XIP could allow emulating a real multislot MVS system. Would need some patching of the BIOS so it can run without NVRAM/RTS. And I guess also patching SLOTCHECK code out of the games (which relies on NVRAM and watchdog...). EDIT: Or I guess use a real slow external actual NVRAM... And implement actual watchdog.
Managed to implement the NEO-CMC algorithm for the decryption tool... No more messing with WinKawaks lmao
Just look at this insanity. I think my version here is very slightly more readable than MAME's (where I copped the logic from).
I'm fairly sure all this scrambling is just to stop someone from desoldering the ROM chips and cloning them individually. The decrypted data obviously must be present on the cart edge. I guess generating appropriate signalling is mildly difficult.
Nice thing is that the script can also take care of a lot of other annoyances, such as renaming files (e.g. removing "proto_" prefixes that break 8.3 naming convention, weird file extensions, etc.), deleting extraneous BIOS ROMs (as are often included in downloads), etc.
What all that might look like for MS5 (which has every encryption possible), including --cleanup flag which deletes any known-unnecessary files (encrypted originals and aforementioned BIOS nonsense)
$ time ruby ./decrypt.rb --cleanup
NeoYume ROMset fixer!
basedir is ./NEOYUME
Deleting ./NEOYUME/mslug5h/000-lo.lo
Deleting ./NEOYUME/mslug5h/asia-s3.rom
Deleting ./NEOYUME/mslug5h/japan-j3.bin
Deleting ./NEOYUME/mslug5h/sfix.sfix
Deleting ./NEOYUME/mslug5h/sm1.sm1
Deleting ./NEOYUME/mslug5h/sp-1v1_3db8c.bin
Deleting ./NEOYUME/mslug5h/sp-45.sp1
Deleting ./NEOYUME/mslug5h/sp-e.sp1
Deleting ./NEOYUME/mslug5h/sp-j2.sp1
Deleting ./NEOYUME/mslug5h/sp-s.sp1
Deleting ./NEOYUME/mslug5h/sp-s2.sp1
Deleting ./NEOYUME/mslug5h/sp-u2.sp1
Deleting ./NEOYUME/mslug5h/sp1.jipan.1024
Deleting ./NEOYUME/mslug5h/uni-bios_1_0.rom
Deleting ./NEOYUME/mslug5h/uni-bios_1_1.rom
Deleting ./NEOYUME/mslug5h/uni-bios_1_2.rom
Deleting ./NEOYUME/mslug5h/uni-bios_1_2o.rom
Deleting ./NEOYUME/mslug5h/uni-bios_1_3.rom
Deleting ./NEOYUME/mslug5h/uni-bios_2_0.rom
Deleting ./NEOYUME/mslug5h/uni-bios_2_1.rom
Deleting ./NEOYUME/mslug5h/uni-bios_2_2.rom
Deleting ./NEOYUME/mslug5h/uni-bios_2_3.rom
Deleting ./NEOYUME/mslug5h/uni-bios_2_3o.rom
Deleting ./NEOYUME/mslug5h/uni-bios_3_0.rom
Deleting ./NEOYUME/mslug5h/uni-bios_3_1.rom
Deleting ./NEOYUME/mslug5h/uni-bios_3_2.rom
Deleting ./NEOYUME/mslug5h/v2.bin
Deleting ./NEOYUME/mslug5h/vs-bios.rom
Renaming ./NEOYUME/mslug5h/268-c1c.c1 to ./NEOYUME/mslug5h/268-c1c.bin
Renaming ./NEOYUME/mslug5h/268-c2c.c2 to ./NEOYUME/mslug5h/268-c2c.bin
Renaming ./NEOYUME/mslug5h/268-c3c.c3 to ./NEOYUME/mslug5h/268-c3c.bin
Renaming ./NEOYUME/mslug5h/268-c4c.c4 to ./NEOYUME/mslug5h/268-c4c.bin
Renaming ./NEOYUME/mslug5h/268-c5c.c5 to ./NEOYUME/mslug5h/268-c5c.bin
Renaming ./NEOYUME/mslug5h/268-c6c.c6 to ./NEOYUME/mslug5h/268-c6c.bin
Renaming ./NEOYUME/mslug5h/268-c7c.c7 to ./NEOYUME/mslug5h/268-c7c.bin
Renaming ./NEOYUME/mslug5h/268-c8c.c8 to ./NEOYUME/mslug5h/268-c8c.bin
Renaming ./NEOYUME/mslug5h/268-m1.m1 to ./NEOYUME/mslug5h/268-m1.bin
Renaming ./NEOYUME/mslug5h/268-p1c.p1 to ./NEOYUME/mslug5h/268-p1c.bin
Renaming ./NEOYUME/mslug5h/268-p2c.p2 to ./NEOYUME/mslug5h/268-p2c.bin
Renaming ./NEOYUME/mslug5h/268-v1c.v1 to ./NEOYUME/mslug5h/268-v1c.bin
Renaming ./NEOYUME/mslug5h/268-v2c.v2 to ./NEOYUME/mslug5h/268-v2c.bin
Decrypting ./NEOYUME/mslug5h/268-m1.bin into ./NEOYUME/mslug5h/268-m1d.bin (CMC50 M1)
Computed key: EABC
Deleting ./NEOYUME/mslug5h/268-m1.bin
Decrypting 268-v1c,268-v2c into ./NEOYUME/mslug5h/268-vd.bin (PCM2 type 2)
Deleting ./NEOYUME/mslug5h/268-v1c.bin
Deleting ./NEOYUME/mslug5h/268-v2c.bin
Decrypting ./NEOYUME/mslug5h/268-p1c.bin and ./NEOYUME/mslug5h/268-p2c.bin into ./NEOYUME/mslug5h/268-pd.bin (NEO-PVC mslug5)
Deleting ./NEOYUME/mslug5h/268-p1c.bin
Deleting ./NEOYUME/mslug5h/268-p2c.bin
Decrypting CMC data in ./NEOYUME/mslug5h
Deleting ./NEOYUME/mslug5h/268-c1c.bin
Deleting ./NEOYUME/mslug5h/268-c2c.bin
Deleting ./NEOYUME/mslug5h/268-c3c.bin
Deleting ./NEOYUME/mslug5h/268-c4c.bin
Deleting ./NEOYUME/mslug5h/268-c5c.bin
Deleting ./NEOYUME/mslug5h/268-c6c.bin
Deleting ./NEOYUME/mslug5h/268-c7c.bin
Deleting ./NEOYUME/mslug5h/268-c8c.bin
real 0m43,964s
user 0m0,000s
sys 0m0,000s
Have yet to write docs...
Script should be placed in either SD root or NEOYUME directory (it will autodetect it) and will process all known game folders within. Run with --cleanup argument to delete unnecessary files.
Should probably add an option to run it on an arbitrary given directory.
This sure is a way to implement the mslugx patch... using ruby text processing functions.
@rogloh said:
Even if this direct execution from OctalFlash/HyperFlash thing doesn't work out for latency reasons it still might be useful for game storage, assuming you don't want lots of games on a machine. E.g. you can get 2Gbit flash chips now and two of them on a breakout board would have room for all 6 MetalSlug games for example. Then you could fast copy them into PSRAM/Octal/HyperRAM etc at ~170MB/s so games could load in less than a second vs the SD card loading (takes ~24s). It could make for a responsive system, albeit with fewer games.
Sounds like a faster SD card driver using the 4-bit SD mode would be of use here, and for many other projects too. That could take down load times to perhaps ~6s, and give almost unlimited space for cheap in terms of pins used and media costs too.
@hinv said:
Sounds like a faster SD card driver using the 4-bit SD mode would be of use here, and for many other projects too. That could take down load times to perhaps ~6s, and give almost unlimited space for cheap in terms of pins used and media costs too.
Yeah wouldn't that be nice. Who wants to code it? Does it risk the wrath of SanDisk by somehow using their IP without paying royalties or licensing it etc?
IIRC "you need a license to use SD Bus" is just an urban myth. There's some theoretical DRM features that you need a full spec for, but those are dead basically, complete nonstarter.
I've got my eye on doing full SD ... but haven't done much after having butted against an earlier bug in flexspin compiler. I'm probably not going to get back to SD for a while though.
4-bit mode can't happened without actual hardware, btw. The existing Parallax SD sockets/adaptors are all only 1-bit capable.
I'm glad someone is with the skill set to do it right.
but haven't done much after having butted against an earlier bug in flexspin compiler. I'm probably not going to get back to SD for a while though.
4-bit mode can't happened without actual hardware, btw. The existing Parallax SD sockets/adaptors are all only 1-bit capable.
Yes, I understand that SPI only takes 4 IO/s. Does 4-bit mode take 6? IIRC, a microSD card only has 8 pins, so 2 of those have to be power and ground...
Trying to fix that one graphics bug... I think bad data may be ending up in VRAM, but I can't confirm that because loadp2 is too [insert bad(tm) word of choice] to handle 64K worth of hex data, even if I reduce the baudrate to the point where it takes like 10 seconds to send it. Even 4K is too much.
Okay, 4K + slow works sometimes enough to get a good dump.
So...
NeoYume running: bad
MAME running: good
MAME VRAM dump + Ruby reference renderer: good
MAME VRAM dump + NeoYume LSPC/blitter: good, if I remember correctly (too lazy to get the ol' dump renderer working again)
MAME low VRAM dump + NeoYume high VRAM dump + Ruby reference renderer: bad (in exactly the same way as NeoYume itself)
Thus: something goes bad with high VRAM.
Here's what that's like: (note that these aren't the exact same frame ofc)
(lines with wrong palettes added for debugging (black -> line 256, blue/red -> line 384)
Ok, issue is that the background's Y position is off (in this case, by exactly 128?), which causes the seam between lines 255/256 to be in the visible area (quick primer incase you missed it: "valid" lines are 0..vshrink and (511-vshrink)..511. (vshrink being 255 for a full size sprite. The dump images above use 191 to shrink by ~25%). Normally, invalid lines are filled with the last valid line (255 for lines <= 255, line 256 for lines >= 256) due to shrink lookup ROM containing $FF at those positions, but when height is set to 33 some logic makes it wrap back around to 0/511. So you get a nice wrapping tilemap as long as you keep that seam off the screen)
Weird thing though: This oopsie gets written to VRAM like that. So it's actually a 68000 issue?
So why does a similar scaling seam issue crop up in different games (though less obnoxiously)? I guess Thrash Rally maybe shares some code with Neo Drift Out despite being by different companies. But I think some of the stage intros in samsho5 also have troubles. Do they all use some "adjust scroll position" snippet from some sort of developer documentation? Am I just going insane?
Unrelatedly, issue with Spinmaster is that it uses Timer IRQ (which is not emulated because lmao) to do... something. Oh well, at least no mystery there.
Also updated the MegaYume readme a bit and copied over the RAM config text. Odd that I decided to use a TXT file for its readme.
Also, minor 68000 optimizations (cutting 2 cycles in a lot of paths by skipping set of mk_shiftit, removing a bunch of ops in the CMPA path, replacing two TESTB with an RCZR in absolute/PC-relative address mode selection).
On track to release 1.0 for both emulators next week.
When I said rx there, I mean the only the receiving data phase of the rx routine. The CA phase is controlled by PSRAM_SYNC_DATA still.
I know that's how, by automatically controlling pin registration, Roger gets the extra combinations from the compensating delay. That delay, PSRAM_DELAY, only applies for the receiving data phase. That is unless he's changed things in these newer revisions of the driver code.
Hi Ada,
perhaps you might by chance be willing to provide some help for people, who want to use your Z80 and/or MC68000 emulators?
I am thinking, if I shall dig into a CP/M-68k emulator for P2. I did never use CP/M (and would have preferred OS9-68k, because this has got multitasking, which would be interesting with P2). But I have read, that CP/M can be used freely now and it comes with a C-compiler. And there is µemacs.
At the moment, I have got an emulator working on a raspi and I have read some manuals.
I would use a P2 Kiss board with no additional RAM but VGA-Output @200MHz and PS2 or USB keyboard input + SD-card. It seems to make sense for me to get a system, that will have about 256k for CP/M or a little bit more. (?)
I would need some starting point for the 68k-emulator to get it execute a program from HUB Ram. A separated 68k emulator file would be rather helpful....
Dokumentation for CP/M-68k seems to be quite good, so to write a bios seems thinkable, even for me. (???!!) https://www.ndr-nkc.de/compo/68000/cpm68k6.htm
Christof
Comments
Even if this direct execution from OctalFlash/HyperFlash thing doesn't work out for latency reasons it still might be useful for game storage, assuming you don't want lots of games on a machine. E.g. you can get 2Gbit flash chips now and two of them on a breakout board would have room for all 6 MetalSlug games for example. Then you could fast copy them into PSRAM/Octal/HyperRAM etc at ~170MB/s so games could load in less than a second vs the SD card loading (takes ~24s). It could make for a responsive system, albeit with fewer games.
Yeah, mslug3 is kinda an unsolvable issue since it actually uses more than 128K of Z80 code. It may infact be the only game that does (some of the later games have larger ROMs, too, but if you look at them they just use the first couple K. MS5's ROM is 512k, but actual data (post-decryption) stops short of 32K - makes sense since it uses streaming music)
Pity, since a lot of people seem to like MS3 the best. Interesting that it works at all though with some of the ROM missing.
I guess if you were building some sort of dedicated thing it'd be neat. Maybe you'd have a frontend that loads ROMs into RAM and then calls the appropriate emulator (MegaYume/NeoYume/whatever)?
Though direct XIP could allow emulating a real multislot MVS system. Would need some patching of the BIOS so it can run without NVRAM/RTS. And I guess also patching SLOTCHECK code out of the games (which relies on NVRAM and watchdog...). EDIT: Or I guess use a real slow external actual NVRAM... And implement actual watchdog.
Managed to implement the NEO-CMC algorithm for the decryption tool... No more messing with WinKawaks lmao
Just look at this insanity. I think my version here is very slightly more readable than MAME's (where I copped the logic from).
I'm fairly sure all this scrambling is just to stop someone from desoldering the ROM chips and cloning them individually. The decrypted data obviously must be present on the cart edge. I guess generating appropriate signalling is mildly difficult.
EDIT: This doesn't actually work right for the 3 bank case...
Yeah that looks nasty. I imagine it would slow down loading quite a bit if you put it into the P2 code too, so a script makes sense.
Nice thing is that the script can also take care of a lot of other annoyances, such as renaming files (e.g. removing "proto_" prefixes that break 8.3 naming convention, weird file extensions, etc.), deleting extraneous BIOS ROMs (as are often included in downloads), etc.
What all that might look like for MS5 (which has every encryption possible), including --cleanup flag which deletes any known-unnecessary files (encrypted originals and aforementioned BIOS nonsense)
@Wuerfel_21 sorry a bit off topic but may I ask what that software that is in the picture?
https://ghidra-sre.org , but really old version I think (for no particular reason, just too lazy to update it)
@rogloh here's the branch with the fixup script added (renamed
romfix.rb
) and all the complex upper functions (patching/M1 decrypt) removed.https://github.com/IRQsome/NeoYume/tree/romfix
Have yet to write docs...
Script should be placed in either SD root or NEOYUME directory (it will autodetect it) and will process all known game folders within. Run with
--cleanup
argument to delete unnecessary files.Should probably add an option to run it on an arbitrary given directory.
This sure is a way to implement the mslugx patch... using ruby text processing functions.
Sounds like a faster SD card driver using the 4-bit SD mode would be of use here, and for many other projects too. That could take down load times to perhaps ~6s, and give almost unlimited space for cheap in terms of pins used and media costs too.
Yeah wouldn't that be nice. Who wants to code it? Does it risk the wrath of SanDisk by somehow using their IP without paying royalties or licensing it etc?
IIRC "you need a license to use SD Bus" is just an urban myth. There's some theoretical DRM features that you need a full spec for, but those are dead basically, complete nonstarter.
I've got my eye on doing full SD ... but haven't done much after having butted against an earlier bug in flexspin compiler. I'm probably not going to get back to SD for a while though.
4-bit mode can't happened without actual hardware, btw. The existing Parallax SD sockets/adaptors are all only 1-bit capable.
Should be able to get near an eight fold speed-up in the end. So from 4 MB/s to 30 MB/s.
NeoYume shows that we need this yesterday.
I do have a cheap µSD breakout somewhere that I can wire to the P2EDGE breadboard. So if you made a 4bit VFS driver I'd be up for testing it.
I'm glad someone is with the skill set to do it right.
Yes, I understand that SPI only takes 4 IO/s. Does 4-bit mode take 6? IIRC, a microSD card only has 8 pins, so 2 of those have to be power and ground...
Yes, indeed! Ada's console emulations have been pushing the limits to the next bottleneck, so now the bottleneck is pushed to SDcard and USB drivers.
Load times are least of Ada's worries.
Trying to fix that one graphics bug... I think bad data may be ending up in VRAM, but I can't confirm that because loadp2 is too [insert bad(tm) word of choice] to handle 64K worth of hex data, even if I reduce the baudrate to the point where it takes like 10 seconds to send it. Even 4K is too much.
Okay, 4K + slow works sometimes enough to get a good dump.
So...
Thus: something goes bad with high VRAM.
Here's what that's like: (note that these aren't the exact same frame ofc)
(lines with wrong palettes added for debugging (black -> line 256, blue/red -> line 384)
MAME full VRAM dump
MAME low VRAM dump + NeoYume high VRAM dump
Ok, issue is that the background's Y position is off (in this case, by exactly 128?), which causes the seam between lines 255/256 to be in the visible area (quick primer incase you missed it: "valid" lines are
0..vshrink
and(511-vshrink)..511
. (vshrink being 255 for a full size sprite. The dump images above use 191 to shrink by ~25%). Normally, invalid lines are filled with the last valid line (255 for lines <= 255, line 256 for lines >= 256) due to shrink lookup ROM containing $FF at those positions, but when height is set to 33 some logic makes it wrap back around to 0/511. So you get a nice wrapping tilemap as long as you keep that seam off the screen)Weird thing though: This oopsie gets written to VRAM like that. So it's actually a 68000 issue?
So why does a similar scaling seam issue crop up in different games (though less obnoxiously)? I guess Thrash Rally maybe shares some code with Neo Drift Out despite being by different companies. But I think some of the stage intros in samsho5 also have troubles. Do they all use some "adjust scroll position" snippet from some sort of developer documentation? Am I just going insane?
Unrelatedly, issue with Spinmaster is that it uses Timer IRQ (which is not emulated because lmao) to do... something. Oh well, at least no mystery there.
Wrote a page that tries to explain the configuration of PSRAM for NeoYume.
IDK how helpful it'd really be to someone who doesn't know what they're doing already, but oh well.
Good read. And good hints on use.
One detail: I think
PSRAM_SYNC_DATA
is tx data only. Since rx data registration is controlled automatically as part of thePSRAM_DELAY
mechanism.No, it isn't. Registration is same for tx/rx and controlled entirely by
PSRAM_SYNC_DATA
Also updated the MegaYume readme a bit and copied over the RAM config text. Odd that I decided to use a TXT file for its readme.
Also, minor 68000 optimizations (cutting 2 cycles in a lot of paths by skipping set of
mk_shiftit
, removing a bunch of ops in the CMPA path, replacing two TESTB with an RCZR in absolute/PC-relative address mode selection).On track to release 1.0 for both emulators next week.
When I said rx there, I mean the only the receiving data phase of the rx routine. The CA phase is controlled by
PSRAM_SYNC_DATA
still.I know that's how, by automatically controlling pin registration, Roger gets the extra combinations from the compensating delay. That delay,
PSRAM_DELAY
, only applies for the receiving data phase. That is unless he's changed things in these newer revisions of the driver code.But I'm only using roger's code for the load phase. The parameters are oriented around my own read code
Oh, I see. Almost surprising you've used his driver at all then.
Hi Ada,
perhaps you might by chance be willing to provide some help for people, who want to use your Z80 and/or MC68000 emulators?
I am thinking, if I shall dig into a CP/M-68k emulator for P2. I did never use CP/M (and would have preferred OS9-68k, because this has got multitasking, which would be interesting with P2). But I have read, that CP/M can be used freely now and it comes with a C-compiler. And there is µemacs.
At the moment, I have got an emulator working on a raspi and I have read some manuals.
I would use a P2 Kiss board with no additional RAM but VGA-Output @200MHz and PS2 or USB keyboard input + SD-card. It seems to make sense for me to get a system, that will have about 256k for CP/M or a little bit more. (?)
I would need some starting point for the 68k-emulator to get it execute a program from HUB Ram. A separated 68k emulator file would be rather helpful....
Dokumentation for CP/M-68k seems to be quite good, so to write a bios seems thinkable, even for me. (???!!) https://www.ndr-nkc.de/compo/68000/cpm68k6.htm
Christof