If you can find a P2 clock speed that achieves 960x540 video output, as long as the P2 core : pixel ratio (ie. pixel divisor) is 3 or more you'll be able to do 16bpp colour.
I have 1024x576 HDMI timings @ 50 Hz, signaling 624 lines @ 354 MHz CPU clock at exactly the same frame rate as 8-bit Atari and maybe Amiga which have the same clock basics. The P2 RAM allows only 4 bpp graphics in this mode, so the external RAM is needed even for 8 bpp. The pixel ratio is of course 10:1
As your pixel ratio is 10:1 you'll be able to get truecolour output with the external memory at that resolution assuming they can overclock from 144MHz up to 177MHz. I've had it to 175MHz before and it seemed to still work at room temp, but I didn't leave it running for long.
These PSRAM chips at least aren't BGA
Yep, that is a real win. They will be far easier to add to your own boards, though you'll burn a lot of I/O pins and need 4 devices for the bandwidth equivalent to a single HyperRAM device at the same clock. It's 18 pins vs 11. The input/output timing is assisted, with two P2 clocks per bit period to play with.
LOL, you'll need my external memory driver @evanh .
I'm putting something together right now as a demo. It's mostly self contained to do a single external graphics region at the requested resolution but the burst size calculations are problematic for the non-video COGs. You want to set it to the highest value for best performance but by setting it too high it can create read delay jitter for the video client. Thankfully if you go too high it doesn't affect the sync, just the image contents. I'd like it to be auto-computed so the user doesn't need to worry too much about figuring it out, but it's a bit messy.
Yeah, maybe. Looking at the generator code now. I think it is a straight forward scan line building process. If fast enough it doesn't need a full screen buffer at all. Throw some more cogs at it ... tomorrow ...
What is it called when the picture is built just ahead of the scan line DMA, then discarded/overwritten?
@pik33 said:
I have 1024x576 HDMI timings @ 50 Hz, signaling 624 lines @ 354 MHz CPU clock ...
That's unnecessarily overclocked. Certainly don't need 624 for vtotal. I take it you wanted the extra MIPS that 354 MHz gives.
This is 1140x624 signalled, 1024x576 visible, and this 354/357 clock is 100x retrocomputers PAL/NTSC, while 8-bit Atari has 114 color cycles for scanline. I can still have 576 lines @ 280 MHz, but only 912 horizontal pixels total (~848 visible)
Darn, can't test that resolution using the spiral demo. It's larger than 512 kpixels.
That's why I am writing a displaylisted driver. Then I can have 1025x576 with text (8-bit selectable background and foreground), or 1 to 8 (maybe more, I have bits left) bpp graphics and blank line mixed. Blank line doesn't eat the buffer and text lines eats a lot less buffer than graphics. In the current state (still alpha) the driver can display text or border or blank or 256-colors graphics. Every scanline has a display list entry which contains the line buffer start address, mode(graphics/text/blank/border), horizontal zoom (1-2-4-8) font line # for text modes and color depth for graphics.
I believe it's known as either "Racing the beam" or "Chasing the beam".
A displaylist can make this easy. Make a display list with 1,2 or 4.. repeated lines with the same buffer address, chase the beam. This means I have to add some type of "display list interrupt" and/or the current scan line # register available for the rest of cogs.
This, with the PSRAM added... I need to buy these chips...
It just dawned on me that after all that talk I'd spilled on how LCDs have likely evolved, resulting in VRR feature for some, that we can make use of it ourselves without actually requiring VRR.
And ... lo and behold, easy-peasy, 480x270@60 while still on 250 MHz sysclock:
EDIT: Added 29.5 kHz comment to source
Yeah my Dell likes that latest spiral demo output too @evanh, though I know it is really good at taking a huge amount of input variation and scaling nicely with its DVI input. Reports as 480x270 60Hz. Not sure how many TVs will accept it or how it would look on those devices but if it is universal that will be good for games on TVs via HUB RAM etc. You could probably squeeze in a static 24bpp image in HUB at that screen size for a very colourful image, albeit at low res.
It's interesting to look at this demo at that lower resolution but getting scaled up. You get a general fuzziness/blurriness to the image from a distance, but up close you get an overlayed almost Moire type of pattern with a screen door effect on top when your eyes follow the a spiral arm. At least that's what it looks like on my LCD.
I'm testing with a TV most of the time. I occasionally double check with an older DVI only monitor ... speaking of which, it doesn't handle this mode. Okay, maybe these features do rely on it being a newer display. Possibly HDMI opened up the timing limits further than DVI allowed for.
@evanh said:
It just dawned on me that after all that talk I'd spilled on how LCDs have likely evolved, resulting in VRR feature for some, that we can make use of it ourselves without actually requiring VRR.
And ... lo and behold, easy-peasy, 480x270@60 while still on 250 MHz sysclock:
EDIT: Added 29.5 kHz comment to source
My big PC monitor says 480x270 and displays the spiral.
My small 15" TV says "Unsupported". It supports a lot of strange things (I am using it for testing the DL driver) but not this.
For me. The two that don't work:
DVI monitor made June 2011. Thought it was a lot older than that!
Plasma TV made Oct 2012. Thought it was a year older.
The two that do work:
4k monitor made April 2015. Older than I expected. Possibly was old stock.
Small TV not dated but has serial number 15090... which seems about right for Sept 2015. The brand went belly-up early the following year.
This HyperRAM problem I encountered a couple of days back looks like signal integrity with sysclk/1 reads. If I drop it down to sysclk/2 the video data seems okay. Problem is that sysclk/2 is not fast enough for some higher resolution modes so I have to fall back to only have 4bpp @ 1080p with sysclk/2.
In comparison the PSRAM board is really cranking it's data out cleanly at double this bandwidth, very reliably for video.
In my tests of sysclk/1 operation in the past with HyperRAM and graphics I mainly did flat shaded objects which looked okay (as the data is not changing a lot). Now I have text and fine lines on the screen I see problems with the signal quality at sysclk/1 and changing the input delay and using registered/unregistered data pins hasn't seemed to solve this. Am using P16-P31 for HyperRAM in this case.
It would be good to try the newer and faster v2 HyperRAM too to avoid overclocking. In general for the P2 to use this HyperRAM memory at sysclk/1 rates I think the P2 really needs more output clock phase selections.
I'm guessing it's a write timing issue. Although it does reduce the max usable frequency a little, 10 pF on the hyperClk signal, plus unregistered clk and registered data during writes, was enough to clean that up.
An aside: The aim for the Edge module is to not need the 10 pF cap at all because a single hyperRAM by itself reduces loading on the data pins compared to the accessory board, and with shortest tracks for the data and longer for the clock should help as well. On that note I need to make sure Vons knows this ...
@evanh said:
An aside: The aim for the Edge module is to not need the 10 pF cap at all because a single hyperRAM by itself reduces loading on the data pins compared to the accessory board, and with shortest tracks for the data and longer for the clock should help as well. On that note I need to make sure Vons knows this ...
Von,
There was a spreadsheet containing the 64 pin skew times a while back. I didn't entirely know what each column meant but I think, for the output case at least, they were unregistered propagation times from internal prop2 clock edge to each pin transitioning or something to that effect.
On that assumption, choosing a late skewed pin to be the hyper clock would be desirable. Looking at the sheet, of OUTB pins, P48 is pretty good. P57 very good but obviously a long way from the P32-P47 probable data pin group.
If you build a P2-edge with hyperram etc, those pins need to be as close as possible to the hyperram, and not go to the edge connector. So you end up with two pcb designs.
Correct, it was always going to be a second Edge module. If I'm not mistaken the idea is only bring out 32 pins to the edge. P0-P31
I had previously advocated for doing a swap of one pin group to keep the P28-P31 group away from the edge because of my concerns over their sensitivity to damage taking out the whole chip.
@evanh, I think the writes should be still getting done at sysclk/2 in this setup, but I'll check that.
The weird thing about this problem is if I fire up the HyperRAM memory driver AFTER the video driver (which changes the PLL) then I don't seem to see this corruption even at sysclk/1. Or if its there is far less noticeable. I need to dig some more, might be some timing calculation differences resulting in the wrong latency being programmed into the chip etc. That would probably cause something like this. Hopefully it is just a simple bug like that, because I've worked with this RAM quite a bit before and had thought it was okay at sysclk/1 reads in most cases below 300MHz.
I do actually wait 50ms after I fire up the HyperRAM COG before the PLL changes, but maybe this is not long enough...and the PLL may be getting changed in the middle of some transaction, corrupting it.
There was a topic started not long ago by someone that reckoned the higher numbered cogs were slower on the I/O. It surprised me a little because I'd thought the extra I/O stages would cover such differences. I never followed up on it.
Looks like writing those HyperRAM registers has come back to haunt me. Just read back the registers for the good and bad video output cases.
good case reads:
CR0 - 1F1F
CR1 - 0202
bad case reads:
CR0 - 9F9F
CR1 - 0202
The register value seems wrong. I think this must be the issue. It could explain the signal integrity issue and bit 15 of CR0 should not be a 1. Also bytes shouldn't be duplicated here. I do a read modify write on this register at startup, so garbage in garbage out.
Sounds right, I went through some effort in my testing to verify the register content when I started writing them. I had a list of diagnostic prints that were displayed on every run of the test code. Some values had both prior and post setting of registers.
Having not used that code in quite a while, I'd long forgotten about this.
I've been hunting for the post I mentioned above but haven't had any luck finding it. It was distinctive too, because he'd included the image of the Prop1 AND OR gates that merge all the outputs from all cogs together. ... Oh, maybe it was a Prop1 topic! I might have been looking in the wrong forum the whole time.
Fixed the issue, was only a combination of some bad luck and stupidity. Some test code had been left lying about and the PLL changing as I spawned the video COG was affecting register accesses because the latency was setup incorrectly at that time.
One thing I discovered in this investigation today is that my register code shares the normal data read code path so when sysclk/1 is activated the register's data phase is done at sysclk/1 as well. This is a problem for read-modify-writes if the PLL is too high for reliable reads, and it can cause bad writes. I've patched out the writing of CR0 latency for now. It would only be needed when V2 HyperRAM is there if we don't want its defaults.
My graphics display code is working ok now at VGA/SVGA/XGA/SXGA and 1080p with stable images using HyperRAM. UXGA(1600x1200) is a little bit too high for the HyperRAM, being clocked at 324MHz or so. I did some experiments with reduced blanking according to VESA CVT-RB and CVT-R2 to shrink this down, but my LCD didn't like it. With v1 blanking it would display the image but it was streched/cropped (even though it reports as 1600x1200), and with CVT-R2 timing my driver would appear to lock up the output so something there must have been too small.
This is the UXGA CVT-RB (v1) timing that didn't scale right on my LCD monitor, but at least it displayed something. @evanh do you want to try this one to see if it works for you? As far as I know it follows the CVT-R1 rules here (see section 3.4):
Comments
As your pixel ratio is 10:1 you'll be able to get truecolour output with the external memory at that resolution assuming they can overclock from 144MHz up to 177MHz. I've had it to 175MHz before and it seemed to still work at room temp, but I didn't leave it running for long.
Yep, that is a real win. They will be far easier to add to your own boards, though you'll burn a lot of I/O pins and need 4 devices for the bandwidth equivalent to a single HyperRAM device at the same clock. It's 18 pins vs 11. The input/output timing is assisted, with two P2 clocks per bit period to play with.
That's unnecessarily overclocked. Certainly don't need 624 for vtotal. I take it you wanted the extra MIPS that 354 MHz gives.
Darn, can't test that resolution using the spiral demo. It's larger than 512 kpixels.
LOL, you'll need my external memory driver @evanh .
I'm putting something together right now as a demo. It's mostly self contained to do a single external graphics region at the requested resolution but the burst size calculations are problematic for the non-video COGs. You want to set it to the highest value for best performance but by setting it too high it can create read delay jitter for the video client. Thankfully if you go too high it doesn't affect the sync, just the image contents. I'd like it to be auto-computed so the user doesn't need to worry too much about figuring it out, but it's a bit messy.
Yeah, maybe. Looking at the generator code now. I think it is a straight forward scan line building process. If fast enough it doesn't need a full screen buffer at all. Throw some more cogs at it ... tomorrow ...
What is it called when the picture is built just ahead of the scan line DMA, then discarded/overwritten?
I believe it's known as either "Racing the beam" or "Chasing the beam".
I do that in my plasma demo. In inline ASM, nonetheless.
May> @evanh said:
This is 1140x624 signalled, 1024x576 visible, and this 354/357 clock is 100x retrocomputers PAL/NTSC, while 8-bit Atari has 114 color cycles for scanline. I can still have 576 lines @ 280 MHz, but only 912 horizontal pixels total (~848 visible)
That's why I am writing a displaylisted driver. Then I can have 1025x576 with text (8-bit selectable background and foreground), or 1 to 8 (maybe more, I have bits left) bpp graphics and blank line mixed. Blank line doesn't eat the buffer and text lines eats a lot less buffer than graphics. In the current state (still alpha) the driver can display text or border or blank or 256-colors graphics. Every scanline has a display list entry which contains the line buffer start address, mode(graphics/text/blank/border), horizontal zoom (1-2-4-8) font line # for text modes and color depth for graphics.
A displaylist can make this easy. Make a display list with 1,2 or 4.. repeated lines with the same buffer address, chase the beam. This means I have to add some type of "display list interrupt" and/or the current scan line # register available for the rest of cogs.
This, with the PSRAM added... I need to buy these chips...
Okay it is the 100x clock you really wanted. But only cos you could! :P
Lol. but saving all that hubRAM means you don't need external RAM at all. At least that's how I'm seeing my aim with the spiral demo.
It just dawned on me that after all that talk I'd spilled on how LCDs have likely evolved, resulting in VRR feature for some, that we can make use of it ourselves without actually requiring VRR.
And ... lo and behold, easy-peasy, 480x270@60 while still on 250 MHz sysclock:
EDIT: Added 29.5 kHz comment to source
No need for pixel doubling or other scaling tricks. You can have whatever low-end resolution you like and let the monitor/TV do the scaling.
This is gonna work really well. Unless you are really wanting the high resolutions, VGA can be put to bed I feel.
Yeah my Dell likes that latest spiral demo output too @evanh, though I know it is really good at taking a huge amount of input variation and scaling nicely with its DVI input. Reports as 480x270 60Hz. Not sure how many TVs will accept it or how it would look on those devices but if it is universal that will be good for games on TVs via HUB RAM etc. You could probably squeeze in a static 24bpp image in HUB at that screen size for a very colourful image, albeit at low res.
It's interesting to look at this demo at that lower resolution but getting scaled up. You get a general fuzziness/blurriness to the image from a distance, but up close you get an overlayed almost Moire type of pattern with a screen door effect on top when your eyes follow the a spiral arm. At least that's what it looks like on my LCD.
I'm testing with a TV most of the time. I occasionally double check with an older DVI only monitor ... speaking of which, it doesn't handle this mode. Okay, maybe these features do rely on it being a newer display. Possibly HDMI opened up the timing limits further than DVI allowed for.
Plugged it into the DVI port on newer 4k monitor and it works nice.
HDMI port on old plasma TV gives nothing at all. Doesn't even say there's a signal. At least the old DVI monitor said it didn't like the mode.
So it seems newer is more flexible. Which coincides with my understanding of how they work too.
My big PC monitor says 480x270 and displays the spiral.
My small 15" TV says "Unsupported". It supports a lot of strange things (I am using it for testing the DL driver) but not this.
And the TV is a few years older?
For me. The two that don't work:
DVI monitor made June 2011. Thought it was a lot older than that!
Plasma TV made Oct 2012. Thought it was a year older.
The two that do work:
4k monitor made April 2015. Older than I expected. Possibly was old stock.
Small TV not dated but has serial number 15090... which seems about right for Sept 2015. The brand went belly-up early the following year.
This HyperRAM problem I encountered a couple of days back looks like signal integrity with sysclk/1 reads. If I drop it down to sysclk/2 the video data seems okay. Problem is that sysclk/2 is not fast enough for some higher resolution modes so I have to fall back to only have 4bpp @ 1080p with sysclk/2.
In comparison the PSRAM board is really cranking it's data out cleanly at double this bandwidth, very reliably for video.
In my tests of sysclk/1 operation in the past with HyperRAM and graphics I mainly did flat shaded objects which looked okay (as the data is not changing a lot). Now I have text and fine lines on the screen I see problems with the signal quality at sysclk/1 and changing the input delay and using registered/unregistered data pins hasn't seemed to solve this. Am using P16-P31 for HyperRAM in this case.
It would be good to try the newer and faster v2 HyperRAM too to avoid overclocking. In general for the P2 to use this HyperRAM memory at sysclk/1 rates I think the P2 really needs more output clock phase selections.
I'm guessing it's a write timing issue. Although it does reduce the max usable frequency a little, 10 pF on the hyperClk signal, plus unregistered clk and registered data during writes, was enough to clean that up.
An aside: The aim for the Edge module is to not need the 10 pF cap at all because a single hyperRAM by itself reduces loading on the data pins compared to the accessory board, and with shortest tracks for the data and longer for the clock should help as well. On that note I need to make sure Vons knows this ...
He does now ! Thanks !
Doh! I just spent 45 minutes writing the other message.
EDIT: Well ... almost 5 AM, bed time I think. Just dropped out of daylight saving too so it would've been 6 AM.
Von,
There was a spreadsheet containing the 64 pin skew times a while back. I didn't entirely know what each column meant but I think, for the output case at least, they were unregistered propagation times from internal prop2 clock edge to each pin transitioning or something to that effect.
On that assumption, choosing a late skewed pin to be the hyper clock would be desirable. Looking at the sheet, of OUTB pins, P48 is pretty good. P57 very good but obviously a long way from the P32-P47 probable data pin group.
If you build a P2-edge with hyperram etc, those pins need to be as close as possible to the hyperram, and not go to the edge connector. So you end up with two pcb designs.
Correct, it was always going to be a second Edge module. If I'm not mistaken the idea is only bring out 32 pins to the edge. P0-P31
I had previously advocated for doing a swap of one pin group to keep the P28-P31 group away from the edge because of my concerns over their sensitivity to damage taking out the whole chip.
Great. I thought it was going to be an optional fit on the same pcb, and was part of the current redesign.
The accessory boards already fill that space. Tight integration with the hyperRAM is the key here.
@evanh, I think the writes should be still getting done at sysclk/2 in this setup, but I'll check that.
The weird thing about this problem is if I fire up the HyperRAM memory driver AFTER the video driver (which changes the PLL) then I don't seem to see this corruption even at sysclk/1. Or if its there is far less noticeable. I need to dig some more, might be some timing calculation differences resulting in the wrong latency being programmed into the chip etc. That would probably cause something like this. Hopefully it is just a simple bug like that, because I've worked with this RAM quite a bit before and had thought it was okay at sysclk/1 reads in most cases below 300MHz.
I do actually wait 50ms after I fire up the HyperRAM COG before the PLL changes, but maybe this is not long enough...and the PLL may be getting changed in the middle of some transaction, corrupting it.
There was a topic started not long ago by someone that reckoned the higher numbered cogs were slower on the I/O. It surprised me a little because I'd thought the extra I/O stages would cover such differences. I never followed up on it.
Looks like writing those HyperRAM registers has come back to haunt me. Just read back the registers for the good and bad video output cases.
good case reads:
CR0 - 1F1F
CR1 - 0202
bad case reads:
CR0 - 9F9F
CR1 - 0202
The register value seems wrong. I think this must be the issue. It could explain the signal integrity issue and bit 15 of CR0 should not be a 1. Also bytes shouldn't be duplicated here. I do a read modify write on this register at startup, so garbage in garbage out.
Sounds right, I went through some effort in my testing to verify the register content when I started writing them. I had a list of diagnostic prints that were displayed on every run of the test code. Some values had both prior and post setting of registers.
Having not used that code in quite a while, I'd long forgotten about this.
I've been hunting for the post I mentioned above but haven't had any luck finding it. It was distinctive too, because he'd included the image of the Prop1 AND OR gates that merge all the outputs from all cogs together. ... Oh, maybe it was a Prop1 topic! I might have been looking in the wrong forum the whole time.
Fixed the issue, was only a combination of some bad luck and stupidity. Some test code had been left lying about and the PLL changing as I spawned the video COG was affecting register accesses because the latency was setup incorrectly at that time.
One thing I discovered in this investigation today is that my register code shares the normal data read code path so when sysclk/1 is activated the register's data phase is done at sysclk/1 as well. This is a problem for read-modify-writes if the PLL is too high for reliable reads, and it can cause bad writes. I've patched out the writing of CR0 latency for now. It would only be needed when V2 HyperRAM is there if we don't want its defaults.
My graphics display code is working ok now at VGA/SVGA/XGA/SXGA and 1080p with stable images using HyperRAM. UXGA(1600x1200) is a little bit too high for the HyperRAM, being clocked at 324MHz or so. I did some experiments with reduced blanking according to VESA CVT-RB and CVT-R2 to shrink this down, but my LCD didn't like it. With v1 blanking it would display the image but it was streched/cropped (even though it reports as 1600x1200), and with CVT-R2 timing my driver would appear to lock up the output so something there must have been too small.
This is the UXGA CVT-RB (v1) timing that didn't scale right on my LCD monitor, but at least it displayed something. @evanh do you want to try this one to see if it works for you? As far as I know it follows the CVT-R1 rules here (see section 3.4):
https://glenwing.github.io/docs/VESA-CVT-1.2.pdf