Shop Learn P1 Docs P2 Docs
Console Emulation - Page 39 — Parallax Forums

Console Emulation

13637394142

Comments

  • roglohrogloh Posts: 4,516

    Even if this direct execution from OctalFlash/HyperFlash thing doesn't work out for latency reasons it still might be useful for game storage, assuming you don't want lots of games on a machine. E.g. you can get 2Gbit flash chips now and two of them on a breakout board would have room for all 6 MetalSlug games for example. Then you could fast copy them into PSRAM/Octal/HyperRAM etc at ~170MB/s so games could load in less than a second vs the SD card loading (takes ~24s). It could make for a responsive system, albeit with fewer games.

  • Wuerfel_21Wuerfel_21 Posts: 3,137
    edited 2022-08-09 13:09

    @rogloh said:
    Well done for a working MS5. So it's just MS3 left with sound issues now from that entire series of Metal Slug games?

    Yeah, mslug3 is kinda an unsolvable issue since it actually uses more than 128K of Z80 code. It may infact be the only game that does (some of the later games have larger ROMs, too, but if you look at them they just use the first couple K. MS5's ROM is 512k, but actual data (post-decryption) stops short of 32K - makes sense since it uses streaming music)
    Pity, since a lot of people seem to like MS3 the best. Interesting that it works at all though with some of the ROM missing.

    @rogloh said:
    Even if this direct execution from OctalFlash/HyperFlash thing doesn't work out for latency reasons it still might be useful for game storage, assuming you don't want lots of games on a machine. E.g. you can get 2Gbit flash chips now and two of them on a breakout board would have room for all 6 MetalSlug games for example. Then you could fast copy them into PSRAM/Octal/HyperRAM etc at ~170MB/s so games could load in less than a second vs the SD card loading (takes ~24s). It could make for a responsive system, albeit with fewer games.

    I guess if you were building some sort of dedicated thing it'd be neat. Maybe you'd have a frontend that loads ROMs into RAM and then calls the appropriate emulator (MegaYume/NeoYume/whatever)?

    Though direct XIP could allow emulating a real multislot MVS system. Would need some patching of the BIOS so it can run without NVRAM/RTS. And I guess also patching SLOTCHECK code out of the games (which relies on NVRAM and watchdog...). EDIT: Or I guess use a real slow external actual NVRAM... And implement actual watchdog.

  • Wuerfel_21Wuerfel_21 Posts: 3,137
    edited 2022-08-10 23:23

    Managed to implement the NEO-CMC algorithm for the decryption tool... No more messing with WinKawaks lmao

    Just look at this insanity. I think my version here is very slightly more readable than MAME's (where I copped the logic from).

    I'm fairly sure all this scrambling is just to stop someone from desoldering the ROM chips and cloning them individually. The decrypted data obviously must be present on the cart edge. I guess generating appropriate signalling is mildly difficult.

    class DecryptCTask < Struct.new(:newnamepat,:oldnamepat,:banks,:type,:unikey)
    
        # Imagine horrible tables here
    
        def cmc_decrypt(dataodd,dataeven)
            case type
            when :cmc42
                type0_t03 = CMC42_TYPE0_T03
                type0_t12 = CMC42_TYPE0_T12
                type1_t03 = CMC42_TYPE1_T03
                type1_t12 = CMC42_TYPE1_T12
                axor_low = CMC42_AXOR_LOW
                axor_mid1 = CMC42_AXOR_MID_1
                axor_mid2 = CMC42_AXOR_MID_2
                axor_high1 = CMC42_AXOR_HIGH_1
                axor_high2 = CMC42_AXOR_HIGH_2
            else raise
            end
            raise unless dataodd.size == dataeven.size
    
            longs = dataodd.size / 2
            # Special case for 3 banks (= 6 ROMs)
            if dataodd.size == 0x1800000
                dataodd.concat dataodd[0x1000000..0x17FFFFF]
                dataeven.concat dataeven[0x1000000..0x17FFFFF]
            end
            clampval = (dataodd.size/2)-1
            outodd = String.new
            outeven = String.new
            longs.times do |i|
                ilow = i&255
                imid = (i>>8)&255
                baser = i ^ unikey
                baser ^= axor_mid1[(baser>>16)&255]
                baser ^= axor_mid2[baser&255]
                rmid = (baser>>8)&255
                baser ^= axor_high1[baser&255]
                baser ^= axor_high2[rmid]
                rhigh = (baser>>16)&255
                baser ^= axor_low[rmid]
                rlow = baser&255
    
                clampr = baser & clampval
    
                bp0 = dataodd.getbyte(clampr*2  )
                bp1 = dataodd.getbyte(clampr*2+1)
                bp2 = dataeven.getbyte(clampr*2  )
                bp3 = dataeven.getbyte(clampr*2+1)
    
                bp0,bp3 = bp3,bp0 if baser[8]==1
                bp1,bp2 = bp2,bp1 if (baser^axor_high2[rmid])[16] == 1
    
                tmp = type1_t03[rlow ^ axor_low[rmid]]
                bp0 ^= type0_t03[rmid]&0xfe
                bp0 ^= tmp&0x01
                bp3 ^= tmp&0xfe
                bp3 ^= type0_t12[rmid]&0x01
    
                tmp = type1_t12[rlow ^ axor_low[rmid]]
                bp2 ^= type0_t12[rmid]&0xfe
                bp2 ^= tmp&0x01
                bp1 ^= tmp&0xfe
                bp1 ^= type0_t03[rmid]&0x01
    
                outodd << bp0
                outodd << bp1
                outeven << bp2
                outeven << bp3
            end
            # Re-split into banks
            return (0...longs/0x40_0000).flat_map{|i|[outodd.slice(i*0x80_0000,0x80_0000),outeven.slice(i*0x80_0000,0x80_0000)]}
        end
    end
    

    EDIT: This doesn't actually work right for the 3 bank case...

  • roglohrogloh Posts: 4,516

    Yeah that looks nasty. I imagine it would slow down loading quite a bit if you put it into the P2 code too, so a script makes sense.

  • Nice thing is that the script can also take care of a lot of other annoyances, such as renaming files (e.g. removing "proto_" prefixes that break 8.3 naming convention, weird file extensions, etc.), deleting extraneous BIOS ROMs (as are often included in downloads), etc.

  • What all that might look like for MS5 (which has every encryption possible), including --cleanup flag which deletes any known-unnecessary files (encrypted originals and aforementioned BIOS nonsense)

    $ time ruby ./decrypt.rb --cleanup
    NeoYume ROMset fixer!
    basedir is ./NEOYUME
    Deleting ./NEOYUME/mslug5h/000-lo.lo
    Deleting ./NEOYUME/mslug5h/asia-s3.rom
    Deleting ./NEOYUME/mslug5h/japan-j3.bin
    Deleting ./NEOYUME/mslug5h/sfix.sfix
    Deleting ./NEOYUME/mslug5h/sm1.sm1
    Deleting ./NEOYUME/mslug5h/sp-1v1_3db8c.bin
    Deleting ./NEOYUME/mslug5h/sp-45.sp1
    Deleting ./NEOYUME/mslug5h/sp-e.sp1
    Deleting ./NEOYUME/mslug5h/sp-j2.sp1
    Deleting ./NEOYUME/mslug5h/sp-s.sp1
    Deleting ./NEOYUME/mslug5h/sp-s2.sp1
    Deleting ./NEOYUME/mslug5h/sp-u2.sp1
    Deleting ./NEOYUME/mslug5h/sp1.jipan.1024
    Deleting ./NEOYUME/mslug5h/uni-bios_1_0.rom
    Deleting ./NEOYUME/mslug5h/uni-bios_1_1.rom
    Deleting ./NEOYUME/mslug5h/uni-bios_1_2.rom
    Deleting ./NEOYUME/mslug5h/uni-bios_1_2o.rom
    Deleting ./NEOYUME/mslug5h/uni-bios_1_3.rom
    Deleting ./NEOYUME/mslug5h/uni-bios_2_0.rom
    Deleting ./NEOYUME/mslug5h/uni-bios_2_1.rom
    Deleting ./NEOYUME/mslug5h/uni-bios_2_2.rom
    Deleting ./NEOYUME/mslug5h/uni-bios_2_3.rom
    Deleting ./NEOYUME/mslug5h/uni-bios_2_3o.rom
    Deleting ./NEOYUME/mslug5h/uni-bios_3_0.rom
    Deleting ./NEOYUME/mslug5h/uni-bios_3_1.rom
    Deleting ./NEOYUME/mslug5h/uni-bios_3_2.rom
    Deleting ./NEOYUME/mslug5h/v2.bin
    Deleting ./NEOYUME/mslug5h/vs-bios.rom
    Renaming ./NEOYUME/mslug5h/268-c1c.c1 to ./NEOYUME/mslug5h/268-c1c.bin
    Renaming ./NEOYUME/mslug5h/268-c2c.c2 to ./NEOYUME/mslug5h/268-c2c.bin
    Renaming ./NEOYUME/mslug5h/268-c3c.c3 to ./NEOYUME/mslug5h/268-c3c.bin
    Renaming ./NEOYUME/mslug5h/268-c4c.c4 to ./NEOYUME/mslug5h/268-c4c.bin
    Renaming ./NEOYUME/mslug5h/268-c5c.c5 to ./NEOYUME/mslug5h/268-c5c.bin
    Renaming ./NEOYUME/mslug5h/268-c6c.c6 to ./NEOYUME/mslug5h/268-c6c.bin
    Renaming ./NEOYUME/mslug5h/268-c7c.c7 to ./NEOYUME/mslug5h/268-c7c.bin
    Renaming ./NEOYUME/mslug5h/268-c8c.c8 to ./NEOYUME/mslug5h/268-c8c.bin
    Renaming ./NEOYUME/mslug5h/268-m1.m1 to ./NEOYUME/mslug5h/268-m1.bin
    Renaming ./NEOYUME/mslug5h/268-p1c.p1 to ./NEOYUME/mslug5h/268-p1c.bin
    Renaming ./NEOYUME/mslug5h/268-p2c.p2 to ./NEOYUME/mslug5h/268-p2c.bin
    Renaming ./NEOYUME/mslug5h/268-v1c.v1 to ./NEOYUME/mslug5h/268-v1c.bin
    Renaming ./NEOYUME/mslug5h/268-v2c.v2 to ./NEOYUME/mslug5h/268-v2c.bin
    Decrypting ./NEOYUME/mslug5h/268-m1.bin into ./NEOYUME/mslug5h/268-m1d.bin (CMC50 M1)
    Computed key: EABC
    Deleting ./NEOYUME/mslug5h/268-m1.bin
    Decrypting 268-v1c,268-v2c into ./NEOYUME/mslug5h/268-vd.bin (PCM2 type 2)
    Deleting ./NEOYUME/mslug5h/268-v1c.bin
    Deleting ./NEOYUME/mslug5h/268-v2c.bin
    Decrypting ./NEOYUME/mslug5h/268-p1c.bin and ./NEOYUME/mslug5h/268-p2c.bin into ./NEOYUME/mslug5h/268-pd.bin (NEO-PVC mslug5)
    Deleting ./NEOYUME/mslug5h/268-p1c.bin
    Deleting ./NEOYUME/mslug5h/268-p2c.bin
    Decrypting CMC data in ./NEOYUME/mslug5h
    Deleting ./NEOYUME/mslug5h/268-c1c.bin
    Deleting ./NEOYUME/mslug5h/268-c2c.bin
    Deleting ./NEOYUME/mslug5h/268-c3c.bin
    Deleting ./NEOYUME/mslug5h/268-c4c.bin
    Deleting ./NEOYUME/mslug5h/268-c5c.bin
    Deleting ./NEOYUME/mslug5h/268-c6c.bin
    Deleting ./NEOYUME/mslug5h/268-c7c.bin
    Deleting ./NEOYUME/mslug5h/268-c8c.bin
    
    real    0m43,964s
    user    0m0,000s
    sys     0m0,000s
    
  • @Wuerfel_21 said:

    @Wuerfel_21 sorry a bit off topic but may I ask what that software that is in the picture?

  • https://ghidra-sre.org , but really old version I think (for no particular reason, just too lazy to update it)

  • Wuerfel_21Wuerfel_21 Posts: 3,137
    edited 2022-08-11 18:29

    @rogloh here's the branch with the fixup script added (renamed romfix.rb) and all the complex upper functions (patching/M1 decrypt) removed.

    https://github.com/IRQsome/NeoYume/tree/romfix

    Have yet to write docs...
    Script should be placed in either SD root or NEOYUME directory (it will autodetect it) and will process all known game folders within. Run with --cleanup argument to delete unnecessary files.
    Should probably add an option to run it on an arbitrary given directory.


    This sure is a way to implement the mslugx patch... using ruby text processing functions.

  • hinvhinv Posts: 1,190
    edited 2022-08-12 14:54

    @rogloh said:
    Even if this direct execution from OctalFlash/HyperFlash thing doesn't work out for latency reasons it still might be useful for game storage, assuming you don't want lots of games on a machine. E.g. you can get 2Gbit flash chips now and two of them on a breakout board would have room for all 6 MetalSlug games for example. Then you could fast copy them into PSRAM/Octal/HyperRAM etc at ~170MB/s so games could load in less than a second vs the SD card loading (takes ~24s). It could make for a responsive system, albeit with fewer games.

    Sounds like a faster SD card driver using the 4-bit SD mode would be of use here, and for many other projects too. That could take down load times to perhaps ~6s, and give almost unlimited space for cheap in terms of pins used and media costs too.

  • roglohrogloh Posts: 4,516

    @hinv said:
    Sounds like a faster SD card driver using the 4-bit SD mode would be of use here, and for many other projects too. That could take down load times to perhaps ~6s, and give almost unlimited space for cheap in terms of pins used and media costs too.

    Yeah wouldn't that be nice. Who wants to code it? Does it risk the wrath of SanDisk by somehow using their IP without paying royalties or licensing it etc?

  • IIRC "you need a license to use SD Bus" is just an urban myth. There's some theoretical DRM features that you need a full spec for, but those are dead basically, complete nonstarter.

  • evanhevanh Posts: 13,608

    I've got my eye on doing full SD ... but haven't done much after having butted against an earlier bug in flexspin compiler. I'm probably not going to get back to SD for a while though.

    4-bit mode can't happened without actual hardware, btw. The existing Parallax SD sockets/adaptors are all only 1-bit capable.

  • evanhevanh Posts: 13,608

    Should be able to get near an eight fold speed-up in the end. So from 4 MB/s to 30 MB/s.

  • roglohrogloh Posts: 4,516

    @evanh said:
    Should be able to get near an eight fold speed-up in the end. So from 4 MB/s to 30 MB/s.

    NeoYume shows that we need this yesterday.

  • @evanh said:
    4-bit mode can't happened without actual hardware, btw. The existing Parallax SD sockets/adaptors are all only 1-bit capable.

    I do have a cheap µSD breakout somewhere that I can wire to the P2EDGE breadboard. So if you made a 4bit VFS driver I'd be up for testing it.

  • hinvhinv Posts: 1,190

    @evanh said:
    I've got my eye on doing full SD ...

    I'm glad someone is with the skill set to do it right.

    but haven't done much after having butted against an earlier bug in flexspin compiler. I'm probably not going to get back to SD for a while though.

    4-bit mode can't happened without actual hardware, btw. The existing Parallax SD sockets/adaptors are all only 1-bit capable.

    Yes, I understand that SPI only takes 4 IO/s. Does 4-bit mode take 6? IIRC, a microSD card only has 8 pins, so 2 of those have to be power and ground...

  • hinvhinv Posts: 1,190

    @rogloh said:

    @evanh said:
    Should be able to get near an eight fold speed-up in the end. So from 4 MB/s to 30 MB/s.

    NeoYume shows that we need this yesterday.

    Yes, indeed! Ada's console emulations have been pushing the limits to the next bottleneck, so now the bottleneck is pushed to SDcard and USB drivers.

  • evanhevanh Posts: 13,608

    Load times are least of Ada's worries.

  • Wuerfel_21Wuerfel_21 Posts: 3,137
    edited 2022-08-12 16:59

    Trying to fix that one graphics bug... I think bad data may be ending up in VRAM, but I can't confirm that because loadp2 is too [insert bad(tm) word of choice] to handle 64K worth of hex data, even if I reduce the baudrate to the point where it takes like 10 seconds to send it. Even 4K is too much.

  • Okay, 4K + slow works sometimes enough to get a good dump.

    So...

    • NeoYume running: bad
    • MAME running: good
    • MAME VRAM dump + Ruby reference renderer: good
    • MAME VRAM dump + NeoYume LSPC/blitter: good, if I remember correctly (too lazy to get the ol' dump renderer working again)
    • MAME low VRAM dump + NeoYume high VRAM dump + Ruby reference renderer: bad (in exactly the same way as NeoYume itself)

    Thus: something goes bad with high VRAM.

    Here's what that's like: (note that these aren't the exact same frame ofc)
    (lines with wrong palettes added for debugging (black -> line 256, blue/red -> line 384)

    MAME full VRAM dump

    MAME low VRAM dump + NeoYume high VRAM dump

  • Wuerfel_21Wuerfel_21 Posts: 3,137
    edited 2022-08-12 18:52

    Ok, issue is that the background's Y position is off (in this case, by exactly 128?), which causes the seam between lines 255/256 to be in the visible area (quick primer incase you missed it: "valid" lines are 0..vshrink and (511-vshrink)..511. (vshrink being 255 for a full size sprite. The dump images above use 191 to shrink by ~25%). Normally, invalid lines are filled with the last valid line (255 for lines <= 255, line 256 for lines >= 256) due to shrink lookup ROM containing $FF at those positions, but when height is set to 33 some logic makes it wrap back around to 0/511. So you get a nice wrapping tilemap as long as you keep that seam off the screen)

    Weird thing though: This oopsie gets written to VRAM like that. So it's actually a 68000 issue?
    So why does a similar scaling seam issue crop up in different games (though less obnoxiously)? I guess Thrash Rally maybe shares some code with Neo Drift Out despite being by different companies. But I think some of the stage intros in samsho5 also have troubles. Do they all use some "adjust scroll position" snippet from some sort of developer documentation? Am I just going insane?


    Unrelatedly, issue with Spinmaster is that it uses Timer IRQ (which is not emulated because lmao) to do... something. Oh well, at least no mystery there.

  • Wuerfel_21Wuerfel_21 Posts: 3,137
    edited 2022-08-13 15:45

    Wrote a page that tries to explain the configuration of PSRAM for NeoYume.

    IDK how helpful it'd really be to someone who doesn't know what they're doing already, but oh well.

  • evanhevanh Posts: 13,608

    Good read. And good hints on use.

    One detail: I think PSRAM_SYNC_DATA is tx data only. Since rx data registration is controlled automatically as part of the PSRAM_DELAY mechanism.

  • No, it isn't. Registration is same for tx/rx and controlled entirely by PSRAM_SYNC_DATA

  • Wuerfel_21Wuerfel_21 Posts: 3,137
    edited 2022-08-13 21:29

    Also updated the MegaYume readme a bit and copied over the RAM config text. Odd that I decided to use a TXT file for its readme.

    Also, minor 68000 optimizations (cutting 2 cycles in a lot of paths by skipping set of mk_shiftit, removing a bunch of ops in the CMPA path, replacing two TESTB with an RCZR in absolute/PC-relative address mode selection).

    On track to release 1.0 for both emulators next week.

  • evanhevanh Posts: 13,608
    edited 2022-08-14 03:21

    When I said rx there, I mean the only the receiving data phase of the rx routine. The CA phase is controlled by PSRAM_SYNC_DATA still.

    I know that's how, by automatically controlling pin registration, Roger gets the extra combinations from the compensating delay. That delay, PSRAM_DELAY, only applies for the receiving data phase. That is unless he's changed things in these newer revisions of the driver code.

  • But I'm only using roger's code for the load phase. The parameters are oriented around my own read code

  • evanhevanh Posts: 13,608

    Oh, I see. Almost surprising you've used his driver at all then.

  • Hi Ada,
    perhaps you might by chance be willing to provide some help for people, who want to use your Z80 and/or MC68000 emulators?
    I am thinking, if I shall dig into a CP/M-68k emulator for P2. I did never use CP/M (and would have preferred OS9-68k, because this has got multitasking, which would be interesting with P2). But I have read, that CP/M can be used freely now and it comes with a C-compiler. And there is µemacs.
    At the moment, I have got an emulator working on a raspi and I have read some manuals.
    I would use a P2 Kiss board with no additional RAM but VGA-Output @200MHz and PS2 or USB keyboard input + SD-card. It seems to make sense for me to get a system, that will have about 256k for CP/M or a little bit more. (?)
    I would need some starting point for the 68k-emulator to get it execute a program from HUB Ram. A separated 68k emulator file would be rather helpful....
    Dokumentation for CP/M-68k seems to be quite good, so to write a bios seems thinkable, even for me. (???!!) https://www.ndr-nkc.de/compo/68000/cpm68k6.htm
    Christof

Sign In or Register to comment.