PSRAM vs HyperRAM testing

rogloh · 2021-05-31 05:29

@evanh said:

@rogloh said:
... 1920x1200 with reduced blanking at 60Hz refresh needs a pixel clock around 154 MHz or thereabouts.

Oh, that's a big reduction from the VGA spec, that rate would fit in a single link and therefore HDMI. Not that that helps a prop2.

Yes it does fit in single link bandwidth, that's how these 1920x1200 monitors can be driven with simple (single-link) DVI/HDMI adapters.

Also I was looking at the EDID report from my dual link 30 inch monitor and it has these details...

Detailed mode: Clock 268.000 MHz, 646 mm x 406 mm
               2560 2608 2640 2720 hborder 0
               1600 1603 1609 1646 vborder 0
               +hsync +vsync 
               VertFreq: 59 Hz, HorFreq: 98529 Hz

Knowing this timing I think I could try an experiment with two of those cheap VGA to DVI converters. I just need to wire up a board with a DVI connector's dual link pins fed from the two converter's HDMI TMDS pairs. As a simple test I could probably try to just duplicate the same pixels on both sets of converter inputs in software (2 VGAs) and I might end up with a 1280x1600 screen with wide pixels. If that worked as a proof of concept I could try the next steps to split the output from a common frame buffer. I wonder if the PLLs inside two separate converters would remain locked close enough if their HSYNC and VSYNC inputs had transition edges happening at the exact same time? I guess if the data is buffered internally all bets are off but if it was synchronous maybe it could work. Seems a bit dodgy though.

pik33 · 2021-05-31 06:26

The cheapest hi-res HDMI card for a P2 seems to be RPi Zero. Powered by Ultibo and interfaced via SMI can do a lot of things and costs $5 - in reality, something about $12, but it is still cheap.

rogloh · 2021-05-31 06:32

Yes a RPi is the bang for buck champ. At one stage I was also hoping the new RPi could also do some type of dual link via its twin HDMI outputs. Might be possible if you get low level access to the chip to control the resolution timing and if it can be setup to interleave pixels on its outputs.

hinv · 2021-05-31 08:09

@rogloh said:

@evanh said:
Newer HDMI based TVs and monitors no longer appear to have the old minimum 30 KHz hsync hangover from the VGA days. So they can handle hugely varied mode settings parameters with the HDMI port.

But any VGA port on same display still only has a select set of known preset modes. CRT VGA monitors didn't do this, they were far more open ended but still had the minimum 30 kHz hsync.

I'm hopeful this applies to most small LCD panels too. Because of the Raspi effect those 800x480 HDMI panels are getting cheap/plentiful now, as are the 1024x600 ones and can save many pins vs parallel TTL RGB. I just don't know how slow they'll clock and that 10x multiplier for DVI does put a higher burden on the P2. Maybe LVDS is another option, because in theory it only needs a 7x multiplier in the P2, but it does limit the colour modes possible - I'm hopeful that 256 colours from a 18/24 bit palette could be achieved in my own driver with some customizations for LVDS but I've not dug fully into it.

Isn't LVDS what the panels speak natively? If so, this would definitely be a win. I happen to have a lot of nice SGI 1600SW monitors that have LVDS connectors. Would be nice to drive them with a P2.

hinv · 2021-05-31 08:21

@pik33 said:
The cheapest hi-res HDMI card for a P2 seems to be RPi Zero. Powered by Ultibo and interfaced via SMI can do a lot of things and costs $5 - in reality, something about $12, but it is still cheap.

Good point. They are cheap as dirt, and don't require much power either. What is the highest bandwith/lowest latency we can get communicating with a raspi0? What bare-metal environment would you recommend for one?

rogloh · 2021-05-31 08:22

@hinv said:
Isn't LVDS what the panels speak natively? If so, this would definitely be a win. I happen to have a lot of nice SGI 1600SW monitors that have LVDS connectors. Would be nice to drive them with a P2.

A P2 can only bit-bang LVDS panels with pixel clocks < 50MHz or so (~350MHz P2) due to 7x multiplier requirement. So basically only the smaller panels, or ones that can operate with low refresh rates. I doubt that includes the 1600SW.

Edit: just noticed that the 1600SW takes dual LVDS inputs, and without any blanking a 350MHz P2 could (just) drive out 800x1024 pixels @60Hz. Maybe operating at 50Hz gives enough headroom for some amount of reduced blanking, so if the monitor supports 50Hz refresh there could still be a chance....

pik33 · 2021-05-31 09:04

@hinv said:

@pik33 said:
The cheapest hi-res HDMI card for a P2 seems to be RPi Zero. Powered by Ultibo and interfaced via SMI can do a lot of things and costs $5 - in reality, something about $12, but it is still cheap.

Good point. They are cheap as dirt, and don't require much power either. What is the highest bandwith/lowest latency we can get communicating with a raspi0? What bare-metal environment would you recommend for one?

As I wrote earlier, Ultibo, www.ultibo.org. While being a bare metal environment, it gives good amount of GPU control including OpenGL ES. I also wrote a camera unit for it, so it is possible to have a camera working on OS-free RPi

I didn't try SMI yet, up to 16bit wide communication is possible, don't know at what speed.
What I did is a RPi Zero keyboard and mouse interface for a P2 which talks to the P2 via UART @ 1.92 Mbps.
In another project I tried i2s to make cmmunication between 2 RPis, up to 5 MHz was possible using breadboard type single 10 cm/4 inch long wires. Higher speeds can be possible using a proper cable, but I didn't try this, theoretical maximum is PLLD/2=250 Mbps. I will try this while I get some free time (I am working for a project , a UVC robot for a hospital, of course the controller is P2 and RPi based and it consumes all my resources now) - I want to build a media player using RPi (maybe Zero, maybe 3A) and a P2, where P2 will work as a main controller and set of audio DACs, while RPi will decode compressed audio files and send audio data via i2s.

hinv · 2021-05-31 16:24

@rogloh said:

@hinv said:
Why not create a windowing system so that the whole framebuffer doesn't have to exist in memory? Let's say for instance, you could have a render list that schedules bitmaps to be pulled out of PSRAM, while things like text windows get generated on the fly, thus removing some of the bandwidth required for the PSRAM. You could still move windows around without having to snap to character boundaries and use nicely shaped window frames and generate them on the fly, I would think. Other things like image windows, oscilloscope windows, WYSIWYG word processors, etc could have a section of framebuffer.
I,and not I alone use a LOT of text terminal windows or editors that would work just fine in a text only interface. I could see Chip's debug output windows handled the same way where the serial stream window would be rendered on the fly with a fixed width font, and in another window, fully bitmapped including sprites and other drawing tools.

I've tried to provide some of that in my video driver with regions. Some screen regions can be text, some can be bitmap graphics (sourced either from HUB or external memory), some sprite based etc, and you can dynamically create/destroy move and resize these regions. The only limitation is that it is per scanline, so you can't mix different region types horizontally, only vertically. If you want to mix horizontally you'd need another COG that can generate the entire scan line (eg. another type of sprite or compositing driver). It's probably doable but would obviously require more COGs.

Well, we have 8 cogs, so using 2 of them for local debug wouldn't be that much of a problem. Alternatively, using a separate P2 for debug & development for target P2 would be sweet. Maybe it could use a much faster serial or parallel connection too. This brings to mind Baggers dedicated Propeller for graphics idea from years ago. P2 has SOOOO much more umph!

hinv · 2021-05-31 16:34

@pik33 What is SMI? Is there another bare-metal environment that uses C or C++ instead? I'm not that interested in learning Pascal just so I could use Ultibo.

pik33 · 2021-05-31 18:48

@hinv said:
@pik33 What is SMI? Is there another bare-metal environment that uses C or C++ instead? I'm not that interested in learning Pascal just so I could use Ultibo.

Yes, there is C++ environment called Circle. What I know about it: it exists.

SMI is Secondary Memory Interface. 16 if I remember) data lines and 6 address lines available on GPIO. The RPi is a master so it has to initiate transmission, but as the hardware does all hard stuff and you can use DMA, the thing can be fast (much faster than bitbanging a parallel bus on GPIO) . I didn't test it yet.

hinv · 2021-06-01 04:12

6 address lines to me means that the RPi can only get 64 addresses at a time, if I am understanding you correctly. If it's 8 bits wide, that's only 64bytes per transfer. Yeah, better than bitbanging on the RPi side. Then a handshake, I guess for the RPi to acknowledge the transfer and start the next one?

pik33 · 2021-06-01 08:00

There are more than 6 of these address lines, only these 6 lines are physically available on RPi's GPIO pins. The example code I saw doesn't use these lines at all. The RPi sends bytes via data lines and the external device can receive it using control signals.

Example: a display memory. Tell the RPi you need a display line data using UART. Then RPi sends 1024 bytes via SMI. It "thinks" that it writes these bytes to the 1024 bytes of external memory. The P2 can only use control signals, receive these 1024 bytes and place them in the display line cache.

OR: the P2 has data to send to RPi. It has then to tell RPi (UART or something like this) it needs to send data. Then RPi can read these data. It will generate addresses on address lines, but the P2 can ignore these addresses and simply put the next byte for RPi to receive.

PS - about Ultibo and Pascal learning: Ultibo can use static libraries written in C. You can write your algorithms in C, compile (using gcc on RPi) this code to .o and make .a file using ar. Then you can link it to the Ultibo project.
I use this feature to make a mp3 decoder for Ultibo: I recompiled libmad library to .a format, then wrote a .h file equivalent for Pascal - it looks like this: https://github.com/pik33/ultibo_retro_gui/blob/master/mp3.pas - and it simply works.

Pascal - for C users - is not that difficult to learn. This is the same family of languages, like English vs Dutch, Spanish vs Portugal, Polish vs Czech... and not a Chinese Mandarin (=Python )

Yanomani · 2021-06-05 06:05

@rogloh said:

If you don't want to have the analog path and are building your own board/system, you could go parallel out from the P2 and use a DVI/HDMI encoder chip like TI TFP410. This would allow 165MHz pixel clocks supporting WUXGA with reduced blanking. But it burns a lot of P2 pins for 24 bit colour (26 pins at least).

TI's TFP410 does also accept a 24-bit, DDR-style (only 12 single-ended data lanes) color transfer mode, thus it's possible to consume only 16 single-ended pins at the interface (color + Hsync + Vsync + DE + IDCK).

By providing two more low-speed pins (I2C SCL/SDA), the whole chip configuration can be controlled by the P2.

Using I2C also opens the possibility of tacking an 8-pin dip switch to DATA[23:16] that can be read thru an internal register address (or even an 8-bit latch, allowing one-way parameter-passing from another P2 (or wichever)).

Two P2 + 2 TFP410 can do 2 x DVI, including independent EVEN/ODD pixel control, in a dual-link setup (with 2 extra low-speed 8-bit control/parameter-passing channels between them, almost for free).

The TFP410 also has internal (programmable) provisions (PLL???) for a data de-skew feature, which allows the freely adjusting of the phase of IDCK as related to the DATA lanes switching (valid data capture window) , in 90º steps (four phases).

If the Hypers did had something similar, transfering DDR information from/to them, would be just a breeze.

The same data de-skew feature opens a lot of other interesting possibilities, due to the someway relaxed timing constrains.

A lot of food for thought, and fun...

P.S. Even DE can be internally generated, based on Hsync/Vsync and IDCK counting...

rogloh · 2021-06-05 06:13

Interesting find Yanomani. Hmm, those 16 pins sound good for a double wide P2-EVAL breakout allowing DVI/HDMI output at much higher resolutions than VGA. Can the DDR operation be strapped or do you need those extra i2c pins to select it? The pixel data format in HUB RAM would need to be formatted appropriately to stream out to this odd/even format, so I guess that means RGB24 in 32 bit longs only, and no other colour modes (12 bits per 16 bit word, slightly different to native P2 DVI/VGA format of R:G:B:0).

Yanomani · 2021-06-05 06:21

Everything can be strapped, and I2C left totally innoperant (but I also love to be able to keep almost everything under software control ).

rogloh · 2021-06-05 06:26

Yes it is nice to be able to control it for debug etc with optional flying leads with i2c, but still handy to be able to strap if you wanted a clean double wide breakout board for P2-EVAL. Formatting the 12 bit pixel data for DDR with the streamer is probably the main issue to sort out. You'd need to have a nibble reserved in each with zeroed bits for H&V sync and just OR in the sync pattern at the right times outside of video (or do the reverse for inverted sync). It's a little messy but I suspect something could be done. You'd also need to OR the pixel clock as well over one of these 16 bits being streamed.

Yanomani · 2021-06-05 06:38

Just remember that DE, Hsync and Vsync don't needs to forcefully come from the HUB. Smart pins are there just for such cases (deterministic signaling). Even the "000"-data (innactive ray-tracing-data) can be fed thru them...

P.S. The pixel clock too...

The more I read (from the datasheet), the more I like it, and understand how much an external-memory-stuffed P2 deserves this kind of companion chip.

IMHO, trying to spare just two low-speed pins (I2C) don't makes too much a profit, at least in a prototype sense.

rogloh · 2021-06-05 07:07

Yes I'd use smart pins for clock, and maybe even sync but I think the issue is smart pins OR their data on top of the streamer data (which can't be bit masked off). This was a concern of mine way back when I looked at parallel LCD data output path with smart pin control and questioned Chip on it. There wasn't an independent mask that could stop the streamer data interfering with the smart pin IIRC. As a result you need to keep the streamer data zeroed in the overlapping bits.

evanh · 2021-06-05 07:35

@rogloh said:
Yes I'd use smart pins for clock, and maybe even sync but I think the issue is smart pins OR their data on top of the streamer data ...

OUT/DIR from the cogs and streamers are all OR'd together but the smartpins are not part of that. It's an either/or case for smartpins. The pin mode selects either the smartpin as OUT source or the streamers/cogs as OUT source.

Determined by the %MMMMM mode number - the ones listed with an asterisk. Which has the following statement - "OUT signal overridden"

rogloh · 2021-06-05 07:58

If that's right evanh I'll be very happy if smart pin clocks can completely override streamer output. It allows 6bpc LCD modes and the unused two pins per byte to be freed up for clock and DE signals etc instead of spilling into other pins outside the group. That is different information to what I had (mis)understood earlier.

Hopefully there is also at least one mode in this asterisk group that still lets you somehow statically set a pin's state high or low. That would let it be used to control LCD power etc while the streamer still works around it.

evanh · 2021-06-05 09:01

@rogloh said:
Hopefully there is also at least one mode in this asterisk group that still lets you somehow statically set a pin's state high or low. That would let it be used to control LCD power etc while the streamer still works around it.

Only thing that comes to mind is the pulse out modes can be left in a static stopped idle state. Toggling the level would then be done with output drive polarity in the low level WRPIN mode bits: P_INVERT_OUTPUT

rogloh · 2021-06-05 09:30

@evanh said:
Only thing that comes to mind is the pulse out modes can be left in a static stopped idle state. Toggling the level would then be done with output drive polarity in the low level WRPIN mode bits: P_INVERT_OUTPUT

Yeah or maybe the sync serial port tx mode if the last shifted bit is replicated, you'd then load it with all ones or all zeroes to control the pin state (or just alter the drive polarity and leave the transmit data the same).

PSRAM vs HyperRAM testing

Comments