Tried NeoYume with some of my self made boards.... The ones that take a P2 Edge module, seem to work. All my own with LDO regulators just doesn't cut it. One works for a while then the screen starts shimmering then it eventually goes off the rails... Probably need to look into switching regulators...
What kind of memory are you using on the non-edge units? You often just need to twiddle the DELAY up and down (the multibank branch from the other thread has some more fine controls over memory timing)
Even @rogloh 's video tester fails at 325 MHz (XGA) mode.
Put one of my boards in the freezer and that let it work for about 2 seconds before going haywire.
Not sure if it's the CPU overheating or the LDO power supplies, or both...
Plotting a pixel takes some ~10 cycles, factoring in all overheads. 96 Sprites can overdraw each pixel 4.5 times and then there's the fix layer. So 5.5x overdraw. Most lines have plenty time to spare.
Was just on gog.com and looking at Metal Slug games...they have a bunch. Which ones will work properly on P2 NeoYume and which ones need the 64MB RAM or higher?
1 works on 32MB
2 works on 48MB
X works on 64MB (note: need to uncomment all the C roms in gamedb.)
3,4,5 need much more RAM and also have CMC encryption (which would take ages to undo on load (address XOR), so I've settled on requiring one to provide pre-decrypted sets, which I'm not entirely sure what the easiest way to generate them is. Could just write my own tool I guess.)
3 and 5 also have 512k M1, would need some refactoring to work.
Okay, either I'm going mad (i.e fixed something else on accident while hunting) or adding the scanline counter is the single most useful change ever (I don't even think it's quite correct!).
Fixes hangs and blackscreens in (some of these aren't in any public gamedb file yet):
Cyber-Lip (previously black screen, now broken sound (???))
Karnov's Revenge (previously black screen after eyecatcher, now works(?))
Metal Slug 3 (previously black screen, now broken sound (needs M1 refactor))
Metal Slug 4 (previously black screen, now broken sound(NEO-PCM2) and broken fix (???))
Matrimelee (previously black screen, now broken sound(NEO-PCM2) and probably broken fix (Haven't decrypted it yet))
Samurai Shodown 3 (previously hang during gameplay, now works(?))
Sengoku 2 (previously hang during gameplay, now works(?))
Viewpoint (previously instant hang, now garbled gfx)
That's 4 of the 6 "broken" rated games in current master gamedb that can be changed to "issue" or "ok" rating. (Waku Waku 7 and Spinmaster remain blackscreening)
Will have to go over this again tomorrow to cleanly commit.
Also need to figure out what to do about the NEO-PCM2 nonsense (I think the scrambling is not as obnoxious as CMC scrambling, so could probably just be done automagically)...
Okay, following stuff is now on beegram branch:
- LSPC status register implemented
- fixed psram8drv
- embiggened gamedb
- games that exceed RAM size greyed out
- Press B in menu to toggle japanese/internationl BIOS
Will be merged to master when I get the P2EDGE back on the desk to make sure I haven't bungled it.
Through the combined power of the scanline counter and loader improvements, I present, terrible 2AM Metal Slug 4 gameplay (yep, we skipped over 3 because that one has broken sound due to using 512K M1, as previously mentioned. It does work otherwise by the same token (well, it doesn't have PCM2))
Also a good one to demonstrate why sysclk/4 is suck
Will it ever be reliable enough though? Can anyone make a board that has fine enough clock control for delay tuning and use it with P2-EVAL? We couldn't really achieve it reliably over the full frequency range with HyperRAM and P2-EVAL as I recall. Maybe on a dedicated P2 board with really short wiring it's feasible but the P2 EVAL itself introduced some timing skew etc and some header positions were better than others. We'd need a way to change the phase of the clock that ideally has several more steps that what we can coarsely achieve with registered/unregistered clock and data pins, and some process to track errors and adjust it over time. That complicates the board design. In comparison 16 bit PSRAM at sysclk/2 is much easier to manage vs DDR 8 bit with the same bandwidth but less IO pins.
Hey @Wuerfel_21 , is two player mode working right now with NeoYume?
I have one of these USB to arcade button breakout boards below and was thinking I could potentially use it somehow if I build something using arcade controls. It has barely enough buttons (4 joystick switches + 12 = 16 inputs) but it might work with something like that MetalSlug game with 7 total buttons per player plus two spare player start pins maybe?
It can emulate two players no problem, but the USB driver can only read one device at a time. But I guess your encoder board shows up as a single device, so it would work. You'll be short D and select buttons though.
That said, there is room for improvement over the Hyper add-on because I believe the SCLK capacitor isn't needed if data pins have lighter relative loading. The Hyper add-on had a SCLK for each memory IC. That meant SCLK had lighter loading than the data pins, which was unsuitable for needed data setup time when data pins were registered. And unregistered data pins had too wide a skew.
@Wuerfel_21 said:
It can emulate two players no problem, but the USB driver can only read one device at a time. But I guess your encoder board shows up as a single device, so it would work. You'll be short D and select buttons though.
Thanks, yeah I just got it going with my single gamepad...I used the spare trigger switches for player 2 start and move right , and the D button for jump. Could just about drag along Player 2 with Player 1 to see how it works with 2 players. I just had to copy player 2 switch button states into the upper 16 bits of the pad_tmp variable in the USB driver, so I think it's easy enough to achieve if this controller board reports in a standard format...about to check that now.
Update: just tried it out with the controller board and it seems to respond just like my gamepad, but with 12 buttons plus two axes now instead of the 8+2 I had on the gamepad. So I should be able to make it work the same. Now I just need to obtain a second set of arcade controls and mount it on something...
If I have to see another KOF game I will snap (No, 2003 won't work, copy protection make brain hurt. Though Metal Slug 5 has the same nonsense, so maybe I'll have to figure it out somehow)
To make the wait time on these big games a bit shorter, I've added a load animation (actually, loading is single-threaded, so updating the display actually makes it slightly longer)
Each dot represents 256K, so one line is 8MB, exactly the capacity of one PSRAM chip.
Might replace them font characters with the same little ROM icon from MegaYume.
(Yes, Samurai Shodown 5 is the largest game that kinda works now. Sound's busted though.)
Makes you realize just how slow SD is, doesn't it. For your 8-bit setup and P2 clock speed, that 96MB of PSRAM itself could be loaded in about 0.6 seconds by my driver with block copies if you had it in HUB RAM, but it takes another 24s or so to read it in and process it before writing to the PSRAM.
You might want to consider inlining some of the per long processing work in this code using asm or org blocks if that is contributing anything to the slowdown vs the SD transfer. I was able to speed up some of my own graphics code quite a lot with that approach in flexspin. Given this is done in a loop I'd go for asm block vs org block as org uses FCACHE and you don't want to suck in the inlined code zillions of times.
Of course, this is of diminishing return if the SD transfer forms the bulk of the delay... seems you are already getting at least ~30Mbps off the card if that is the majority of the work, not sure if you can go too much faster on a P2 unless you move to 4 bit SDIO transfer and that's not gonna be simple.
repeat while pos < size
case type
LOAD_RAW: ' Only type that can load arbitrary lengths
tmp := $10000 <# size-pos
c.fread(MKRAM_BASE,1,tmp,f1)
exmem_write(target+pos,MKRAM_BASE,tmp,false)
pos+=tmp
LOAD_BSWAP:
c.fread(MKRAM_BASE,1,$10000,f1)
repeat i from MKRAM_BASE to MKRAM_BASE+$FFFC step 4
long[i] := __movbyts(long[i],%%2301)
exmem_write(target+pos,MKRAM_BASE,$1_0000,false)
pos+=$10000
LOAD_CROM:
c.fread(MKRAM_BASE+$0000,1,$4000,f1)
c.fread(MKRAM_BASE+$4000,1,$4000,f2)
tmp := MKRAM_BASE+$8000
repeat i from 0 to 255
k := MKRAM_BASE+i*64
repeat 16
long[tmp] := __mergeb(word[$0020+k]+word[$4020+k]<<16)
tmp += 4
long[tmp] := __mergeb(word[$0000+k]+word[$4000+k]<<16)
tmp += 4
k+=2
exmem_write(target+pos,MKRAM_BASE+$8000,$8000,false)
pos+=$8000
LOAD_SROM:
c.fread(MKRAM_BASE,1,$10000,f1)
repeat i from $ffff to $0000
j := i
repeat
j.[4..0] := j.[4..2] + (j.[1..0]^2)<<3
while i < j
tmp := byte[MKRAM_BASE+i]
byte[MKRAM_BASE+i] := byte[MKRAM_BASE+j]
byte[MKRAM_BASE+j] := tmp
exmem_write(target+pos,MKRAM_BASE,$10000,false)
pos+=$10000
God, I remember those days..300bps cassette load times! : Lucky I only had a 32kB machine then to load and a floppy disk came only a couple of years after that. It was painful.
Comments
Tried NeoYume with some of my self made boards.... The ones that take a P2 Edge module, seem to work. All my own with LDO regulators just doesn't cut it. One works for a while then the screen starts shimmering then it eventually goes off the rails... Probably need to look into switching regulators...
What kind of memory are you using on the non-edge units? You often just need to twiddle the DELAY up and down (the multibank branch from the other thread has some more fine controls over memory timing)
(Unless it like dies completely, no video at all. That does mean the P2 CPUs are having a good time)
Even @rogloh 's video tester fails at 325 MHz (XGA) mode.
Put one of my boards in the freezer and that let it work for about 2 seconds before going haywire.
Not sure if it's the CPU overheating or the LDO power supplies, or both...
Just wondering, if Neo Geo pixel clock is ~ 6MHz then why is emulator sysclk ~ 338MHz?
Likely more for graphics performance...
338 ~= 24.1 Mhz * 14 = 6 Mhz * 56 ~= 6Mhz * 5.5 * 10
Plotting a pixel takes some ~10 cycles, factoring in all overheads. 96 Sprites can overdraw each pixel 4.5 times and then there's the fix layer. So 5.5x overdraw. Most lines have plenty time to spare.
Was just on gog.com and looking at Metal Slug games...they have a bunch. Which ones will work properly on P2 NeoYume and which ones need the 64MB RAM or higher?
1 works on 32MB
2 works on 48MB
X works on 64MB (note: need to uncomment all the C roms in gamedb.)
3,4,5 need much more RAM and also have CMC encryption (which would take ages to undo on load (address XOR), so I've settled on requiring one to provide pre-decrypted sets, which I'm not entirely sure what the easiest way to generate them is. Could just write my own tool I guess.)
3 and 5 also have 512k M1, would need some refactoring to work.
Thanks for that. Now I have 64MB I can probably try more of these, the original is quite a good game.
So... Metal Slug 4 is blackscreening. After getting incredibly confused for a while, I think I got something:
That's checking the scanline counter! I never actually implemented that (or anything else in LSPC status register).
How odd that it does that.
Okay, either I'm going mad (i.e fixed something else on accident while hunting) or adding the scanline counter is the single most useful change ever (I don't even think it's quite correct!).
Fixes hangs and blackscreens in (some of these aren't in any public gamedb file yet):
That's 4 of the 6 "broken" rated games in current master gamedb that can be changed to "issue" or "ok" rating. (Waku Waku 7 and Spinmaster remain blackscreening)
Will have to go over this again tomorrow to cleanly commit.
Also need to figure out what to do about the NEO-PCM2 nonsense (I think the scrambling is not as obnoxious as CMC scrambling, so could probably just be done automagically)...
Good find!
Okay, following stuff is now on beegram branch:
- LSPC status register implemented
- fixed psram8drv
- embiggened gamedb
- games that exceed RAM size greyed out
- Press B in menu to toggle japanese/internationl BIOS
Will be merged to master when I get the P2EDGE back on the desk to make sure I haven't bungled it.
Through the combined power of the scanline counter and loader improvements, I present, terrible 2AM Metal Slug 4 gameplay (yep, we skipped over 3 because that one has broken sound due to using 512K M1, as previously mentioned. It does work otherwise by the same token (well, it doesn't have PCM2))
Also a good one to demonstrate why sysclk/4 is suck
This is with the 96 MB board right? Glad you can at least test out big games.
Sounds like we need 16bit bus for best results ..
Or 8-bit DDR (sysclock/1).
Will it ever be reliable enough though? Can anyone make a board that has fine enough clock control for delay tuning and use it with P2-EVAL? We couldn't really achieve it reliably over the full frequency range with HyperRAM and P2-EVAL as I recall. Maybe on a dedicated P2 board with really short wiring it's feasible but the P2 EVAL itself introduced some timing skew etc and some header positions were better than others. We'd need a way to change the phase of the clock that ideally has several more steps that what we can coarsely achieve with registered/unregistered clock and data pins, and some process to track errors and adjust it over time. That complicates the board design. In comparison 16 bit PSRAM at sysclk/2 is much easier to manage vs DDR 8 bit with the same bandwidth but less IO pins.
Hey @Wuerfel_21 , is two player mode working right now with NeoYume?
I have one of these USB to arcade button breakout boards below and was thinking I could potentially use it somehow if I build something using arcade controls. It has barely enough buttons (4 joystick switches + 12 = 16 inputs) but it might work with something like that MetalSlug game with 7 total buttons per player plus two spare player start pins maybe?
https://www.altronics.com.au/p/s1148a-usb-interface-for-arcade-joystick-and-buttons/
Of course I could also direct wire switches to the P2 but it might be nice to have USB as I could use it for other things apart from the P2.
It can emulate two players no problem, but the USB driver can only read one device at a time. But I guess your encoder board shows up as a single device, so it would work. You'll be short D and select buttons though.
Probably not as an add-on.
That said, there is room for improvement over the Hyper add-on because I believe the SCLK capacitor isn't needed if data pins have lighter relative loading. The Hyper add-on had a SCLK for each memory IC. That meant SCLK had lighter loading than the data pins, which was unsuitable for needed data setup time when data pins were registered. And unregistered data pins had too wide a skew.
[double posted]
Thanks, yeah I just got it going with my single gamepad...I used the spare trigger switches for player 2 start and move right , and the D button for jump. Could just about drag along Player 2 with Player 1 to see how it works with 2 players. I just had to copy player 2 switch button states into the upper 16 bits of the pad_tmp variable in the USB driver, so I think it's easy enough to achieve if this controller board reports in a standard format...about to check that now.
Update: just tried it out with the controller board and it seems to respond just like my gamepad, but with 12 buttons plus two axes now instead of the 8+2 I had on the gamepad. So I should be able to make it work the same. Now I just need to obtain a second set of arcade controls and mount it on something...
If I have to see another KOF game I will snap (No, 2003 won't work, copy protection make brain hurt. Though Metal Slug 5 has the same nonsense, so maybe I'll have to figure it out somehow)
To make the wait time on these big games a bit shorter, I've added a load animation (actually, loading is single-threaded, so updating the display actually makes it slightly longer)
Each dot represents 256K, so one line is 8MB, exactly the capacity of one PSRAM chip.
Might replace them font characters with the same little ROM icon from MegaYume.
(Yes, Samurai Shodown 5 is the largest game that kinda works now. Sound's busted though.)
Makes you realize just how slow SD is, doesn't it. For your 8-bit setup and P2 clock speed, that 96MB of PSRAM itself could be loaded in about 0.6 seconds by my driver with block copies if you had it in HUB RAM, but it takes another 24s or so to read it in and process it before writing to the PSRAM.
You might want to consider inlining some of the per long processing work in this code using asm or org blocks if that is contributing anything to the slowdown vs the SD transfer. I was able to speed up some of my own graphics code quite a lot with that approach in flexspin. Given this is done in a loop I'd go for asm block vs org block as org uses FCACHE and you don't want to suck in the inlined code zillions of times.
Of course, this is of diminishing return if the SD transfer forms the bulk of the delay... seems you are already getting at least ~30Mbps off the card if that is the majority of the work, not sure if you can go too much faster on a P2 unless you move to 4 bit SDIO transfer and that's not gonna be simple.
It could possibly load at double speed again if that inline resistor wasn't present on the DO line.
Still faster than an Atari 8-bit cassette
God, I remember those days..300bps cassette load times! : Lucky I only had a 32kB machine then to load and a floppy disk came only a couple of years after that. It was painful.
Maybe use the SD accessory instead, and short out that R? Has anyone tried that to prove the speed jump?
If it's significant I would think we could do another version that's wired without the R for booting; infact with all DAT lines included.