@Wuerfel_21 said:
Not sure if the diamond layout translates well from thumb buttons to an arcade panel.
Actual SNES arcade sticks tend to use this layout:
Relatedly, I have all the parts for the SNES accessory board, will have to solder it together later.
Oh thanks for that Ada, that would be much better
The quick prototype I cut was just to get us started. I have some ideas to try out that I suspect are impractical, but I'm not going to let that stop me at this early stage.
After seeing Roger's hardware I went and picked up the same encoder board, and some buttons. They had half price yellow buttons, so we're going with mostly yellow...
@ke4pjw said:
440lbs -- In the early 90's I worked in a retail electronics store. Loved watching Trinitrons. Hated taking them to customer cars.
I used to sell a lot of SGI badged Trinitrons....and still have some including a 24" behemoth. I hated those things because they were so heavy, that no matter how you packed them the UPS and FedEx monkeys could figure out how to break the plastics.
@Wuerfel_21 What performance would you have to get from the SDCard to not have to load the various ROMS into RAM in order to get full speed play? Perhaps rather than throwing more memory at it, a better performing SDCard driver (using more pins than SPI) could be implemented to get somewhere around 70MB/sec and lower latency too, of course.
@hinv said:
@Wuerfel_21 What performance would you have to get from the SDCard to not have to load the various ROMS into RAM in order to get full speed play? Perhaps rather than throwing more memory at it, a better performing SDCard driver (using more pins than SPI) could be implemented to get somewhere around 70MB/sec and lower latency too, of course.
Just for NeoGeo sprite streaming, on the order of 740 MB/s of fully random reads (1575096512). This is because SDs are block devices, so for each tile it'd actually have to read the entire containing sector. This could be cached to some extent, but that'd still brap on worst cases. In actuality each SD read command has multiple scanlines worth of latency.
@hinv said:
@Wuerfel_21 What performance would you have to get from the SDCard to not have to load the various ROMS into RAM in order to get full speed play? Perhaps rather than throwing more memory at it, a better performing SDCard driver (using more pins than SPI) could be implemented to get somewhere around 70MB/sec and lower latency too, of course.
Another option is to use 3x32MB HyperFlash chips and run code directly from them like a cartridge. That could work but it would be expensive. No slow SD load time though which is nice for huge games, apart from plugging in a cartridge.
If you wanted to build a real arcade machine with just a single game, that is always ready to run as soon as you power it on and price wasn't an issue, that could be one way to go. But for that it's even simpler to boot from SD and leave the P2 board powered on all the time, not quite so green though.
@hinv said:
@Wuerfel_21 What performance would you have to get from the SDCard to not have to load the various ROMS into RAM in order to get full speed play? Perhaps rather than throwing more memory at it, a better performing SDCard driver (using more pins than SPI) could be implemented to get somewhere around 70MB/sec and lower latency too, of course.
Just for NeoGeo sprite streaming, on the order of 740 MB/s of fully random reads (1575096512). This is because SDs are block devices, so for each tile it'd actually have to read the entire containing sector. This could be cached to some extent, but that'd still brap on worst cases. In actuality each SD read command has multiple scanlines worth of latency.
If SYSCLK/2 timing on PSRAM is about 338MB/second-overhead, how is this accomplished then? Aren't there different types of ROMs in those huge carterages that have big sizes but need less bandwidth like sound/music data? My thinking is that the higher speed requirements be loaded into PSRAM, but the lower speed stay on SDCard (with a faster interface) so that more games will fit within the 32MB on the P2-EC32MB.
@hinv said:
If SYSCLK/2 timing on PSRAM is about 338MB/second-overhead, how is this accomplished then? Aren't there different types of ROMs in those huge carterages that have big sizes but need less bandwidth like sound/music data? My thinking is that the higher speed requirements be loaded into PSRAM, but the lower speed stay on SDCard (with a faster interface) so that more games will fit within the 32MB on the P2-EC32MB.
It's accomplished because it the emulator only reads what is required at the time, not entire 512 byte sectors which is how Ada calculated things above if I'm not mistaken. Admittedly there's a small cache for instructions but it is not 512 bytes and it also reduces the number of times the memory requires reading. The ADPCM samples and sprite data reads are transferring only like 2-4 longs each time in the code. So basically lots of very small transfers over a large address space (random access) for quite detailed graphics and/or lots of game levels. If you knew which levels needed which graphics you could cache from SD into the smaller PSRAM but that really needs to be built into the game itself to work and it would also slow down the game between levels to refill. NeoGeo just has a large address space available at all times.
Streaming the ADPCM is possible in theory (since it always reads sequentially in blocks of 256 bytes), but not worth it since it's significantly smaller than the graphics and having big latency would cause all sort of edge case issues.
@rogloh said: If you knew which levels needed which graphics you could cache from SD into the smaller PSRAM but that really needs to be built into the game itself to work and it would also slow down the game between levels to refill.
That's exactly what happens on NeoGeo CD. There's 4MB of video RAM that's (incredibly slowly) loaded from CD between levels. Not sure if the games are exactly identical to the cartridge versions, they may have cut some bits to make everything fit.
This attaches using 12 pins: one 8 pin block has to connect to HSync,VSync and D0..D5 (in that order), then it also needs a dot clock and 3 pins for configuration somewhere. In actuality, you probably want to connect reset and backlight control, too.
Interesting things:
The "horizontal scan direction" bit in MADCTL doesn't actually do anything (v scan dir does), so you can not actually mirror the display (fair enough)
If bypass mode is disabled in IFMODE (bit 7), all the RAM write order bits in MADCTL become functional, including row/column swap. This may allow it to work to some extent with the 240x320 controllers?
Well done. If I can only figure out that LCD controller interface type on my handheld maybe I can get that to work too. I didn't notice tearing so it might be a real 320x240 setup, not 240x320.
Hey, in the picture above, is the last row with the CREDIT counters meant to be slightly wider than all those lines above it? The rest of the screen sort of look underscanned relative to that last line. Did the original Metal Slug arcade game on NeoGeo do this too or is it a feature of your emulator? Maybe it was normally hidden by a CRT bezel or something?
Oh yea, that's just something the game does. It tries to mask out those outermost columns with black tiles for some reason, but presumably forgets that when clearimg the last line to draw the credit counters (the routine to do so is likely copypasted from another game...). An actual monitor overscans most of this, so i guess they never noticed it.
Okay, just committed the LCD support code (and SNES pad input, because I forgot to do it earlier). Still on that side branch though. The main thing holding up a merge to master is the audio nonsense. Maybe I need to look into all the games with the totally busted audio...
Also still haven't looked at @rogloh 's sysclk/3 code, ouch.
Yeah you should try out sysclk/3 when you can. With any luck it will let you run Rayman's 96MB board a little faster to reduce latency - every little bit helps.
Such a pity we can't get dual USB controllers going with this somehow. Needs hub support I guess or another COG.
@rogloh said:
Such a pity we can't get dual USB controllers going with this somehow. Needs hub support I guess or another COG.
It's not impossible. It just requires you to sell your soul to some sort of supernatural entity in exchange for being able to perceive the true form of the USB specification instead of being consumed by its memetic kill aura.
...
By which I mean I'm pretty sure one could rewrite it to support two ports, but it's a task no one wants to do.
The pin and SNES input drivers do allow 2 players.
It's not impossible. It just requires you to sell your soul to some sort of supernatural entity in exchange for being able to perceive the true form of the USB specification instead of being consumed by its memetic kill aura.
Okay, some much needed investigation into those "Broken Sound" games on neoyume is what I'm going to be doing now (in lieu of trying to chase the much harder to capture "random sounds" bug, hoping they share a root cause).
First insight: Sonic Wings 2 (which plays some sound but seems to die halfway through - hard to explain and I'm too lazy to hook the capture up right now) actually stops generating Z80 IRQs when it dies. Have to figure out how that happens (Z80-side IRQ inhibit or timers disabled).
EDIT : Through the incredibly obvious technique of "log timer IRQ pending state on NMI entry", I have discovered that it indeed the timers being disabled (the minimum frequency for timers is > 10Hz, so it can't be a bung freq value) - time to go deeper and log individual register writes.
Indeed, garbo value goes to timer control register:
$C0 is a totally invalid value, too (disable both timers and IRQs, enable ch3 special, enable CSM (nonsense without Timer A and arguably nonsense in general)). Interesting that it happens after the correct $35 was written (which means: Timer A enabled with IRQ, Timer B disabled, clear both IRQ flags, disable special modes)
Okay, I think I'm onto smth. This is a function that writes a garbage timer control value. Notice that it actually intends to write to register $28 (tctrl is $27, as it were. $28 is key-on/key-off). Notice that it runs EI before writing the data port, so I think what happens is that the IRQ hits between EI and OUT and the IRQ handler clears the timer flags, but in doing so, leaves the timer control register selected, which is then clobbered by the OUT as soon as the IRQ returns.
I think a real Z80 can not trigger an IRQ immediately after an EI.
Jep, that was easy. Sonic Wings 2 and 3 have functioning sound now. Can't check all games right now (96MB board still upstairs and its past midnight), but it doesn't seem to fix any of the totally-no-sound games (though I have an idea for some of those now - they may not have a NEO-ZMC on their cartridge, so we need to init the bankswitch to a linear map - I saw MAME does that).
Will have to see if the random sounds bug is also caused by this EI issue. The games really prone to it (samsho3 and 4) need the big memory, so can't test them right now. It would make sense though if the IRQ handler were to write an ADPCM register last and the main code had a reg write sequence like that. (though the samsho3 issue I think was triggering a sound with start address $0000, which seems unlikely as a result of a single reg corruption. Then again, maybe the driver clears the sample start register as part of what it's doing and then the corrupted write triggers the channel)
Indeed, the bankswitch pre-init thing does infact fix Cyber-Lip and its stupid ZMC-less 64K M1 setup.
Remaining "broken sound" games:
- Windjammers, Street Hoop, Magical Drop 2 (all made by Data East, likely same driver.)
- Blazing Star (can't test on P2EDGE)
- Samurai Shodown 5/Special (can't test on P2EDGE)
- Metal Slug 3 (512K M1 not supported)
Okay, magdrop2's issue is apparently a heisenbug - starts working if I enable DEBUG. Mmmh, timing edge cases.
Indeed, it seems that enforcing a delay on the 68000 side between writes to the mailbox seems to fix it, so presumably the issue is that the 68k is triggering an NMI before the previous one has fully processed and the Z80 code is confused by this. However, the minimum delay seems to be on the order of 3000 P2 cycles, which is hundreds of 68000 machine cycles, so I don't think this is the correct fix... Unless they're relying on something like DIVU being awfully slow to create a delay between bytes.
In this particular case the NMI isn't hardwired, you can externally disable it if you want - speaking of, that may be it: what happens when a mailbox write occurs, but the NMI is currently disabled? Does it just go nowhere or does it get held until you enable it again? Not sure what the current implementation does even.
EDIT: It seems that the current thing is correct: NMI is held while disabled
The loop is supposed to take 832 68k cycles, which would be some 23k P2 cycles. A single nop should be 112 P2 cycles. So it seems just adding a 50 cycle waitx inside the NOP impl is enough to correctly fix the issue. 50 being a totally arbitrary value. Should perhaps experiment a bit.
EDIT: The lowest value that works is waitx #16, so I think I'll go with 32 to be safe.
Comments
Plasma's were a responsive alternative but they're no more now either.
Oh thanks for that Ada, that would be much better
The quick prototype I cut was just to get us started. I have some ideas to try out that I suspect are impractical, but I'm not going to let that stop me at this early stage.
After seeing Roger's hardware I went and picked up the same encoder board, and some buttons. They had half price yellow buttons, so we're going with mostly yellow...
Perhaps of interest: Actual NeoGeo arcade layouts
Note the attempt to ergonomically arrange the buttons to allow you to keep one finger dedicated to each one (mini layout aside)
I used to sell a lot of SGI badged Trinitrons....and still have some including a 24" behemoth. I hated those things because they were so heavy, that no matter how you packed them the UPS and FedEx monkeys could figure out how to break the plastics.
@Wuerfel_21 What performance would you have to get from the SDCard to not have to load the various ROMS into RAM in order to get full speed play? Perhaps rather than throwing more memory at it, a better performing SDCard driver (using more pins than SPI) could be implemented to get somewhere around 70MB/sec and lower latency too, of course.
Just for NeoGeo sprite streaming, on the order of 740 MB/s of fully random reads (1575096512). This is because SDs are block devices, so for each tile it'd actually have to read the entire containing sector. This could be cached to some extent, but that'd still brap on worst cases. In actuality each SD read command has multiple scanlines worth of latency.
(Do note that SD streaming in general is viable, just not for emulation)
Another option is to use 3x32MB HyperFlash chips and run code directly from them like a cartridge. That could work but it would be expensive. No slow SD load time though which is nice for huge games, apart from plugging in a cartridge.
If you wanted to build a real arcade machine with just a single game, that is always ready to run as soon as you power it on and price wasn't an issue, that could be one way to go. But for that it's even simpler to boot from SD and leave the P2 board powered on all the time, not quite so green though.
If SYSCLK/2 timing on PSRAM is about 338MB/second-overhead, how is this accomplished then? Aren't there different types of ROMs in those huge carterages that have big sizes but need less bandwidth like sound/music data? My thinking is that the higher speed requirements be loaded into PSRAM, but the lower speed stay on SDCard (with a faster interface) so that more games will fit within the 32MB on the P2-EC32MB.
It's accomplished because it the emulator only reads what is required at the time, not entire 512 byte sectors which is how Ada calculated things above if I'm not mistaken. Admittedly there's a small cache for instructions but it is not 512 bytes and it also reduces the number of times the memory requires reading. The ADPCM samples and sprite data reads are transferring only like 2-4 longs each time in the code. So basically lots of very small transfers over a large address space (random access) for quite detailed graphics and/or lots of game levels. If you knew which levels needed which graphics you could cache from SD into the smaller PSRAM but that really needs to be built into the game itself to work and it would also slow down the game between levels to refill. NeoGeo just has a large address space available at all times.
The music could be streamed off the SD, couldn't it? How big is that in the larger games?
Streaming the ADPCM is possible in theory (since it always reads sequentially in blocks of 256 bytes), but not worth it since it's significantly smaller than the graphics and having big latency would cause all sort of edge case issues.
That's exactly what happens on NeoGeo CD. There's 4MB of video RAM that's (incredibly slowly) loaded from CD between levels. Not sure if the games are exactly identical to the cartridge versions, they may have cut some bits to make everything fit.
Behold the might of the ILI9342 320x240 LCD
No, it isn't quite as washed out in real life.
This attaches using 12 pins: one 8 pin block has to connect to HSync,VSync and D0..D5 (in that order), then it also needs a dot clock and 3 pins for configuration somewhere. In actuality, you probably want to connect reset and backlight control, too.
Interesting things:
Well done. If I can only figure out that LCD controller interface type on my handheld maybe I can get that to work too. I didn't notice tearing so it might be a real 320x240 setup, not 240x320.
Hey, in the picture above, is the last row with the CREDIT counters meant to be slightly wider than all those lines above it? The rest of the screen sort of look underscanned relative to that last line. Did the original Metal Slug arcade game on NeoGeo do this too or is it a feature of your emulator? Maybe it was normally hidden by a CRT bezel or something?
Oh yea, that's just something the game does. It tries to mask out those outermost columns with black tiles for some reason, but presumably forgets that when clearimg the last line to draw the credit counters (the routine to do so is likely copypasted from another game...). An actual monitor overscans most of this, so i guess they never noticed it.
Okay, just committed the LCD support code (and SNES pad input, because I forgot to do it earlier). Still on that side branch though. The main thing holding up a merge to master is the audio nonsense. Maybe I need to look into all the games with the totally busted audio...
Also still haven't looked at @rogloh 's sysclk/3 code, ouch.
Then I'll have a deserved break.
Yeah you should try out sysclk/3 when you can. With any luck it will let you run Rayman's 96MB board a little faster to reduce latency - every little bit helps.
Such a pity we can't get dual USB controllers going with this somehow. Needs hub support I guess or another COG.
It's not impossible. It just requires you to sell your soul to some sort of supernatural entity in exchange for being able to perceive the true form of the USB specification instead of being consumed by its memetic kill aura.
...
By which I mean I'm pretty sure one could rewrite it to support two ports, but it's a task no one wants to do.
The pin and SNES input drivers do allow 2 players.
LOL, yeah. It's a soul eater isn't it.
Okay, some much needed investigation into those "Broken Sound" games on neoyume is what I'm going to be doing now (in lieu of trying to chase the much harder to capture "random sounds" bug, hoping they share a root cause).
First insight: Sonic Wings 2 (which plays some sound but seems to die halfway through - hard to explain and I'm too lazy to hook the capture up right now) actually stops generating Z80 IRQs when it dies. Have to figure out how that happens (Z80-side IRQ inhibit or timers disabled).
EDIT : Through the incredibly obvious technique of "log timer IRQ pending state on NMI entry", I have discovered that it indeed the timers being disabled (the minimum frequency for timers is > 10Hz, so it can't be a bung freq value) - time to go deeper and log individual register writes.
Indeed, garbo value goes to timer control register:
$C0 is a totally invalid value, too (disable both timers and IRQs, enable ch3 special, enable CSM (nonsense without Timer A and arguably nonsense in general)). Interesting that it happens after the correct $35 was written (which means: Timer A enabled with IRQ, Timer B disabled, clear both IRQ flags, disable special modes)
Okay, I think I'm onto smth. This is a function that writes a garbage timer control value. Notice that it actually intends to write to register $28 (tctrl is $27, as it were. $28 is key-on/key-off). Notice that it runs EI before writing the data port, so I think what happens is that the IRQ hits between EI and OUT and the IRQ handler clears the timer flags, but in doing so, leaves the timer control register selected, which is then clobbered by the OUT as soon as the IRQ returns.
I think a real Z80 can not trigger an IRQ immediately after an EI.
Correct, /INT is ignored at end of EI or DI (but /NMI is accepted).
Jep, that was easy. Sonic Wings 2 and 3 have functioning sound now. Can't check all games right now (96MB board still upstairs and its past midnight), but it doesn't seem to fix any of the totally-no-sound games (though I have an idea for some of those now - they may not have a NEO-ZMC on their cartridge, so we need to init the bankswitch to a linear map - I saw MAME does that).
Will have to see if the random sounds bug is also caused by this EI issue. The games really prone to it (samsho3 and 4) need the big memory, so can't test them right now. It would make sense though if the IRQ handler were to write an ADPCM register last and the main code had a reg write sequence like that. (though the samsho3 issue I think was triggering a sound with start address $0000, which seems unlikely as a result of a single reg corruption. Then again, maybe the driver clears the sample start register as part of what it's doing and then the corrupted write triggers the channel)
Indeed, the bankswitch pre-init thing does infact fix Cyber-Lip and its stupid ZMC-less 64K M1 setup.
Remaining "broken sound" games:
- Windjammers, Street Hoop, Magical Drop 2 (all made by Data East, likely same driver.)
- Blazing Star (can't test on P2EDGE)
- Samurai Shodown 5/Special (can't test on P2EDGE)
- Metal Slug 3 (512K M1 not supported)
Okay, magdrop2's issue is apparently a heisenbug - starts working if I enable DEBUG. Mmmh, timing edge cases.
Indeed, it seems that enforcing a delay on the 68000 side between writes to the mailbox seems to fix it, so presumably the issue is that the 68k is triggering an NMI before the previous one has fully processed and the Z80 code is confused by this. However, the minimum delay seems to be on the order of 3000 P2 cycles, which is hundreds of 68000 machine cycles, so I don't think this is the correct fix... Unless they're relying on something like DIVU being awfully slow to create a delay between bytes.
No processor of any sort should have anything resembling an NMI. It's a fundamental design flaw in any processor.
In this particular case the NMI isn't hardwired, you can externally disable it if you want - speaking of, that may be it: what happens when a mailbox write occurs, but the NMI is currently disabled? Does it just go nowhere or does it get held until you enable it again? Not sure what the current implementation does even.
EDIT: It seems that the current thing is correct: NMI is held while disabled
Turns out my NOP instruction is too fast:
The loop is supposed to take 832 68k cycles, which would be some 23k P2 cycles. A single nop should be 112 P2 cycles. So it seems just adding a 50 cycle waitx inside the NOP impl is enough to correctly fix the issue. 50 being a totally arbitrary value. Should perhaps experiment a bit.
EDIT: The lowest value that works is
waitx #16
, so I think I'll go with 32 to be safe.