@Wuerfel_21 said:
I think my previous bundled SDSD was really old, but a quick test seems to indicate everything still works with the new 1.7 dropped in as-is. I pushed that to both git remotes.
Cool. It was likely v1.0. There hadn't been anything of value for your emulators changed/added since then.
This all is basically an artifact of MAME being the most popular arcade emulator and its developers like using raw ROM dumps (like you'd get from pulling the chips and reading them out), so that just became the de-facto format for NeoGeo ROMs.
A wise decision by MAME team. Everyone knows where the data stands then, no confusion. I can see why it was an easy decision for them too - It's no problem to runtime massage the ROMs when you've got GBytes of working RAM and a modern CPU.
@Wuerfel_21 said:
You can see that the big graphics data ("Type 6", LOAD_CROM) is much slower to load than the other data. This is because it has to read from 2 files and interleave the data. This means it's reading disjoint 16K blocks instead of a nice contiguous file.
I took a stab at measuring the speed change in that video using a reduced playback frame rate and a stopwatch ... came out to 9 secs for dual file speed and 6.5 secs for single file speed. So +40% boost or -28% reduction.
@Rayman said:
@Wuerfel_21 Are your emulators going to work with this LCD?
IDK I don't have one or any specs. I think we talked about this once in some e-mails, but who knows what of that made it to the final product. (My influence did result in upgrading to 16 bit wide PSRAM)
This is the schematic used for the LCD panel shown in video.
I opted for a parallel 24 bit RGB display driven by only 16 P2 pins, latching RGB bus in group of 8 bits.
In this way I was able to pick a good TFT panel (IPS) from a reliable supplier at a fraction of the cost of any HDMI LCD.
With this kind of interface is easy to switch to other panels with different dimensions and resolution. A lot of options available in the market like 2.8", 3.5", 4.3", 5", 7" etc..
I found the parallel RGB interface faster and also easier to implement (no weird init routines on another control interface), just pump out the pixels and it works (with proper timings of course :-)
I didn't recnognized any issue about data holding timings at least with a "bit bang" implementation without using the P2 streamer.
Three latches are present on the board but probably two would be enough.
Driving signals used on my LCD
I remember a kind of LCD6 driver in NeoVGA that should work in a similar way but only with 6 bit at once insted of 8 bit.
@Wuerfel_21 In your opinion would be easy to adapt LCD6 code to this latched 8 bit interface? Could you provide some hints where to look at to adapt the code?
Next step is trying a bigger 4.3" panel on the same interface.
Oh sorry, I've been busy over the weekend, that's why I didn't reply to your mail earlier. I did get the prototype, maybe will have time to investigate that later this week. Currently workshooping raw 24 bit LCD interface with @rogloh. This latch stuff is more similar to the LCD816 thing from Ray, but there's N latches instead of N-1 (where 1 channel would be direct). I think that solves the hold issue mostly. DE not being latched may be trouble, but idk.
@MXX , looking at the NeoVGA code this is how Ada's current 6 bit code is outputting pixels from what I can tell. She'll obviously be able to tell you far more but you sort of need to find an equivalent sequence to output 2 pixels in a limited number of clock cycles. I think the total budget is sending 2 complete (double wide) RGB pixels out in 40 P2 clocks but there is also some loop overhead on the calling side between calls to this subroutine code that needs to be accounted for too so you don't get all those clocks to play with. I can imagine you would need to create a Smartpin pin transition sequence that generates the 3 latch enables at the correct time(s), that could get tricky to synchronize it. As you don't have the streamer accessing the DE/HSYNC pins you may need to prepend a DE signal generation instruction to this subroutine and call it for the active pixels of a line and clear on first blanking porch output at the end of a line. VSYNC handling has more time and can typically be bit banged where needed and HSYNC has time to be handled separately as well but requires accurate syncing.
DAT ' LCD overlay code
org video_ovl_area
lcd_overlay
lcd6_mask long $FCFCFC00
scanfunc_lcd6_rgb24
rflong scantmp
and scantmp,lcd6_mask
shr scantmp,#8
xcont pixel_multiplier,scantmp
rflong scantmp
and scantmp,lcd6_mask
shr scantmp,#8
_ret_ xcont pixel_multiplier,scantmp
@rogloh said:
@MXX , looking at the NeoVGA code this is how Ada's current 6 bit code is outputting pixels from what I can tell. She'll obviously be able to tell you far more but you sort of need to find an equivalent sequence to output 2 pixels in a limited number of clock cycles. I think the total budget is sending 2 complete (double wide) RGB pixels out in 40 P2 clocks but there is also some loop overhead on the calling side between calls to this subroutine code that needs to be accounted for too so you don't get all those clocks to play with.
Something like that. But the LCD he sent me is 320x240, I think, so it's actually a lot more relaxed.
I can imagine you would need to create a Smartpin pin transition sequence that generates the 3 latch enables at the correct time(s), that could get tricky to synchronize it.
I figured this out for Ray's board, so that isn't a problem (well, it is, but I know how to solve it, once the tasks have been worked down enough to set up the logic probe stuff on the desk)
As you don't have the streamer accessing the DE/HSYNC pins you may need to prepend a DE signal generation instruction to this subroutine and call it for the active pixels of a line and clear on first blanking porch output at the end of a line. VSYNC handling has more time and can typically be bit banged where needed and HSYNC has time to be handled separately as well but requires accurate syncing.
The streamer can actually handle this. You can have it in 4x8 mode during active scan (so all extra signals go low -> invert the output mode if need be) and in 1x32 mode during blanking and drive DE/HSYNC/VSYNC appropriately.
Comments
Cool. It was likely v1.0. There hadn't been anything of value for your emulators changed/added since then.
A wise decision by MAME team. Everyone knows where the data stands then, no confusion. I can see why it was an easy decision for them too - It's no problem to runtime massage the ROMs when you've got GBytes of working RAM and a modern CPU.
@Wuerfel_21 @MXX There's a newer version of loadp2 (0.077) that should fix the flashing issue.
Thanks for all your work and your prompt fix!
I took a stab at measuring the speed change in that video using a reduced playback frame rate and a stopwatch ... came out to 9 secs for dual file speed and 6.5 secs for single file speed. So +40% boost or -28% reduction.
For reference, one dot on the loading visualizer is 256 KBytes. One row has 32 of them, so each is 8MiB -> one standard PSRAM chip's worth.
This is the schematic used for the LCD panel shown in video.
I opted for a parallel 24 bit RGB display driven by only 16 P2 pins, latching RGB bus in group of 8 bits.
In this way I was able to pick a good TFT panel (IPS) from a reliable supplier at a fraction of the cost of any HDMI LCD.
With this kind of interface is easy to switch to other panels with different dimensions and resolution. A lot of options available in the market like 2.8", 3.5", 4.3", 5", 7" etc..
I found the parallel RGB interface faster and also easier to implement (no weird init routines on another control interface), just pump out the pixels and it works (with proper timings of course :-)
I didn't recnognized any issue about data holding timings at least with a "bit bang" implementation without using the P2 streamer.
Three latches are present on the board but probably two would be enough.
Driving signals used on my LCD
I remember a kind of LCD6 driver in NeoVGA that should work in a similar way but only with 6 bit at once insted of 8 bit.
@Wuerfel_21 In your opinion would be easy to adapt LCD6 code to this latched 8 bit interface? Could you provide some hints where to look at to adapt the code?
Next step is trying a bigger 4.3" panel on the same interface.
Oh sorry, I've been busy over the weekend, that's why I didn't reply to your mail earlier. I did get the prototype, maybe will have time to investigate that later this week. Currently workshooping raw 24 bit LCD interface with @rogloh. This latch stuff is more similar to the LCD816 thing from Ray, but there's N latches instead of N-1 (where 1 channel would be direct). I think that solves the hold issue mostly. DE not being latched may be trouble, but idk.
@MXX , looking at the NeoVGA code this is how Ada's current 6 bit code is outputting pixels from what I can tell. She'll obviously be able to tell you far more but you sort of need to find an equivalent sequence to output 2 pixels in a limited number of clock cycles. I think the total budget is sending 2 complete (double wide) RGB pixels out in 40 P2 clocks but there is also some loop overhead on the calling side between calls to this subroutine code that needs to be accounted for too so you don't get all those clocks to play with. I can imagine you would need to create a Smartpin pin transition sequence that generates the 3 latch enables at the correct time(s), that could get tricky to synchronize it. As you don't have the streamer accessing the DE/HSYNC pins you may need to prepend a DE signal generation instruction to this subroutine and call it for the active pixels of a line and clear on first blanking porch output at the end of a line. VSYNC handling has more time and can typically be bit banged where needed and HSYNC has time to be handled separately as well but requires accurate syncing.
Something like that. But the LCD he sent me is 320x240, I think, so it's actually a lot more relaxed.
I figured this out for Ray's board, so that isn't a problem (well, it is, but I know how to solve it, once the tasks have been worked down enough to set up the logic probe stuff on the desk)
The streamer can actually handle this. You can have it in 4x8 mode during active scan (so all extra signals go low -> invert the output mode if need be) and in 1x32 mode during blanking and drive DE/HSYNC/VSYNC appropriately.