HyperRam/Flash as VGA screen buffer (Now XGA, 720p &1080p) &Rev.B

Rayman · 2019-04-14 20:53

I don't think so. If this doesn't work for HyperRam too, I'll probably try that...

Rayman · 2019-04-14 21:17

Ok, I went back to the HyperRam 8bpp VGA demo and verified that the data was actually not starting at first byte of each row, but the fifth byte.
Was able to switch from waitse1 to waitx #38 on read and have a stable image at 250 MHz clock. The write latency clocks were adjusted to put the data start at the first byte of each row.

Rayman · 2019-04-15 18:41

Finally! This took a lot of head scratching, but finally have the VGA bird streaming from HyperFlash.

There is very little difference between this and the HyperRam version in terms of the reading and display routines.
Writing is a bit different though. There are two programs here:

"HyperFlash_Test2d.spin2" loads the bitmap bits into flash. Stores 320 bytes on each 1024 byte row (easiest way).
"HyperVGA_640_x_480_8bpp_1p_Flash" is the modified version of the HyperRam VGA code. I've removed the writing portion and changed for two reads per line instead of just one. Also, rearranged latency waitx location.

jmg · 2019-04-15 19:50

Rayman wrote: »

Finally! This took a lot of head scratching, but finally have the VGA bird streaming from HyperFlash.

Good progress

Code appears to drive RWDS from P2, but the data says it is a Flash output pin ?
"The Read Data Strobe (RWDS) is an output from the HyperFlash device that indicates when data is being transferred from the memory to the host. RWDS is referenced to the rising and falling edges of CK during the data transfer portion of read operations."
Is this using clocked IO mode in the pin-config yet ?

Rayman · 2019-04-15 20:02

I don't think I'm using RWDS at all any more... Well, I think I am for HyperRam write, to specify latency, but think could just tie to ground.

Also, I read somewhere that you don't need the Reset pin.

Not using clocked IO.

Yanomani · 2019-04-15 21:28

Hi Rayman

Sure you are bringing some good news from the HyperThings frontier. Another breakthrough!

I also have a trend to agree with kwinn's early comment, about bringing such a nightmare-alike interface to a new product. If I ever have to choose a dance, a polka is way funnyer than a minuet, but...

But this also adds to your victory; besides the determination you've show, conforming the steel untill the Katana is straight and sharp.

Congratulations

Rayman · 2019-04-16 21:56

Looking closer into the datasheet, the latency crossing a page boundary (page=16 words=32 bytes) is fairly complicated....

But, I was completely ignoring this and have no gaps in data.
It appears that if I start a read on a word address with 3 lower bits all zero, then I can read the entire device in one operation with no latency gaps in the data.

This could make VGA display much simpler that HyperRam. With HyperRam, I believe there is a latency when crossing a 1024-byte row boundary. And, you can't keep CS low for more than around 640 bytes.

Here, I think I can keep CS low during the whole display period.

jmg · 2019-04-16 22:43

Rayman wrote: »

Looking closer into the datasheet, the latency crossing a page boundary (page=16 words=32 bytes) is fairly complicated....

But, I was completely ignoring this and have no gaps in data.
It appears that if I start a read on a word address with 3 lower bits all zero, then I can read the entire device in one operation with no latency gaps in the data.

This could make VGA display much simpler that HyperRam...

Here, I think I can keep CS low during the whole display period.

I think that's related to fractional page crossings. As your tests indicate, larger blocks are ok, and I think the issue is the time needed to read the next page needs to be enough.
The example they give has only 9 clocks on first page, for a 12 clock latency so it has to add another 3. I think that also means if you started further back than -12 bytes for 12 setting, you would be ok.
Word_xx_000 would be a 16 byte lead-in, so that should be ok as linear page crossing.
ie your word address with 3 lower bits all zero rule is then enough to meet this :

"When configured in linear burst mode, while a page is being burst out, the device will automatically fetch the next sequential page from the MirrorBit flash memory array. This simultaneous burst output while fetching from the array allows for a linear sequential burst operation that can provide a sustained output of 333 MB/s data rate"

Rayman wrote: »

.. With HyperRam, I believe there is a latency when crossing a 1024-byte row boundary. And, you can't keep CS low for more than around 640 bytes.

That 640 looks to be refresh dictated, and if you did manual refresh during frame flyback, that may relax ?
Based on that 640, it looks like the refresh counter increments on (or near) CS=\_, then refreshes one row during the set-address part of the cycles, and needs to scan the whole RAM (there is no option to partial-scan to save time), so that would mandate so many CS/clks sets to keep inside the overall refresh milliseconds max. That's simple HW, with no extra clock generation needed.

Another way to 'hurry-along' the refresh counter, inside that overall milliseconds time ceiling, could be to issue a few CS =\__/== with some minimum clocks contained.
eg 3 'advance refresh' short bursts in a flyback, and then a full, real set-address-read would advance the refresh ctr 4 per line, and that might allow you to read 640*4 = 2560 in that 4th CS window ?
A smart pin might be able to do those 'advance refresh' bursts quite quickly, as address info is 'don't care'

Yanomani · 2019-04-16 23:19

Hi Rayman

Have you ever tried to leverage from the fact that 640 = 512 + 128?

That way, each image you are using at 640 * 480 resolution, wich occupies 307200 bytes (or exactly 300 x 1024 bytes) can be layered in two blocks, within HyperFlashs address space.

The first block, starting at, says, address 00000H, will hold 480 x 512 = 245760 bytes (240 kB), right thru address 3BFFFH; the second block, would start at address 3C000H, holding the remaining 480 x 128 = 61440 bytes (60 kB).

There would never be any address boundary crossing anymore, either during Write Buffer Lines (512 bytes) write operationss, either during read ones.

Since you are using HyperFlash, you don't need to worry about self-refresh limits when accessing the data blocks, thus, the longest (256 + 24 = 280) period of Hyper_CSn = Low would be limited to 4.480 uS, and the shortest ones (64 + 24 = 88) would hit 1.408 uS, totalizing 5.888 uS, or 147.2 pixel clocks (@ 25 MHz / pixel). (the 24 value should be enough to account for 16 latency clocks, plus the time to do the CA-Phase transfer and some spare cycles to ensure sane limits to switch Hyper_CSn and Hyper_CK, within specced timings)

Since the HFront Porch (16) + HSync Pulse (96) + HBack Porch (48) totals 160 pixel clocks (6.4 uS), I believe the two read operations, needed to transfer each horizontal line under such conditions, could fit within those limits, sparing the whole horizontal display period for any other activities one would intend to do.

Hope it helps a bit

Henrique

Rayman · 2019-11-13 15:54

Just tested this VGA HyperRAM code with Parallax board... Worked after just changing some pin assignments and forcing the reset pin high.
This was with the Rev.A board. (VGA adapter starting on P0, HyperRAM on P32) I'll try Rev.B next...

Rayman · 2019-11-14 00:46

Just got VGA going with Rev.B and Parallax HyperRam module…
To code in last post, had to change VGA a hair and add one clock here:

waitx     #50'49'38   'changed from 49 to 50 for Rev.B  'works at 38 at 250 MHz

Tubular · 2019-11-14 09:47

Ozprop showed me this code working today Rayman. Nice work!

Rayman · 2020-01-17 19:31

I've got the 16-bpp VGA example working with Rev.B eval board and Parallax HyperRam and VGA adapters.

HyperRam's base pin can now be either 0 or 32
VGA base pin can now be any multiple of 8.

You still need to run this twice with small code changes in 2 places to load both upper and lower images into HyperRam.

whicker · 2020-01-18 06:05

@Rayman
Can you please start to use the more preferred method of hub addressing that's discussed in this thread?
https://forums.parallax.com/discussion/comment/1484429/#Comment_1484429

Instead of hardcoding the BitmapData to a fixed address, use a blank ORGH and a label to get the hub address of the beginning of the file. And then skip the header by adding an offset (70) to the label?

...
'Load bmp from hub into HyperRam
'Note:  Change second line after this one to select upper/lower image location
                mov       SourceAdd,##BitmapDataStart + 70 'adding 70 to skip bitmap header
ImageSelect     mov       HyperRow,#240*0   'use 240*2 for top half of 640x480x16bpp image
                'mov       HyperRow,#240*2  'use 240*2 for bottom half of 640x480x16bpp image
                mov       k2,##240*2  'twice as much with 16bpp

...

DAT

	ORGH
BitmapDataStart
	' Bitmap 'Select top or bottom image here
        file    "macaws_top.bmp"
        'file    "macaws_bot.bmp"
        'another image from wikipedia:  By Olaf Oliviero Riemer, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=10969708

Rayman · 2020-01-18 11:28

You’re in luck because I’m about to turn this into multi file spin with objects.
This kind of hard address coding doesn’t work there.

But, I’ve already figured out how to do this with the 90s 3D code...

sobakava · 2020-08-20 21:42

Would it be possible to capture parallel LCD data (24bit 800x480 TFT) with one of P2's COGs to HyperRAM and output VGA or HDMI using this driver?

Microprocessor>>> 24bit LCD driver signals >>> P2 >>> HyperRAM Frame Buffer >>> Text Overlay in Memory >>> P2 COG >>> HDMI or VGA

Wuerfel_21 · 2020-08-20 22:01

sobakava wrote: »

Would it be possible to capture parallel LCD data (24bit 800x480 TFT) with one of P2's COGs to HyperRAM and output VGA or HDMI using this driver?

Microprocessor>>> 24bit LCD driver signals >>> P2 >>> HyperRAM Frame Buffer >>> Text Overlay in Memory >>> P2 COG >>> HDMI or VGA

Yes, sounds possible.

You probably don't even need the HyperRAM buffer if the LCD is being driven at close-enough-to-standard timing - just buffer a few scanlines at a time and get lower latency, too.

sobakava · 2020-08-20 22:50

I still would like to capture entire frame to do some text and drawing operations.

I don't have a P2 demo board to test the concept yet but; I was thinking about a double buffer scheme.. One buffer to capture LCD driver image, other buffer to generate video signal... I am not sure if there is enough bandwidth with HyperRAM to do this.. May be two RAMS on separate ports?

rogloh · 2020-08-21 01:33

@sobakava
I suspect LCD frame capture like this is probably doable with two HyperRAM boards once you can capture the parallel data to hub and deal with the timing signals etc. 800x480 24bpp is going to generate about 108MB/s if you write everything directly to memory as longs. You can compress this down a bit by only saving active pixels per frame. But writes will happen at sysclk/2 so the P2 would only be able to write about 120-140MB/s or so, I'd expect.

You'd need four COGs at least, plus whatever COGs might manage it all.

Two HyperRAM drivers.
One video COG
One capture COG.

Good news is that right now I am working on supporting multiple independent HyperRAM banks accessible from my driver which would allow double buffer switching and think I have come up with something overnight that might allow this.

rogloh · 2020-08-21 04:39

@sobakava
Your particular application is going to be using a LOT of pins. You'll probably need to build your own board, as using two P2-EVAL HyperRAM breakouts may not leave you with enough contiguous data pins to control your application and it would reset on any serial port activity if a HyperRAM module is fitted in P48-P63. (Maybe if you gave up serial after startup perhaps there would be a way for it to work.)

It needs
22 pins for 2 x HyperRAM buses
24 pins for RGB LCD data
1-3 pins for LCD clocking DE, (HSYNC, VSYNC?)
8 pins for HDMI out
Total = 55-58 pins

In terms of input sampling you might have to clock the P2 off the LCD clock if it remains continuous. Then you can just examine the DE pin for framing and triggering each line if you have this signal on the LCD (or use V+H syncs).

E.g. if the LCD panel was clocked at 25MHz you could feed this into the P2 and multiply by 10-12 with the PLL to run the P2 at 250MHz-300MHz, then the streamer can sample your LCD data synchronously (as 32 bits). You'd then have 125-150 MB/s raw HyperRAM write speed (excluding overheads) to play with, but this may be enough for the application with the double buffering approach. The HDMI would be driven out at the P2 clock speed.

It's an interesting project to make work and it will load up the P2 quite a bit but I think it is still feasible.

sobakava · 2020-08-30 20:42

Hello Roglog,

Thanks for the detailed information. I have just ordered P2 chips and HyperRAM. I will make a test board first with these.
I have been using P1, this will be my first P2 design.

Rayman · 2020-08-30 21:58

@sobakava
If you are using Eagle, I posted a design with P2 and a HyperRam chip...

HyperRam/Flash as VGA screen buffer (Now XGA, 720p &amp;1080p) &amp;Rev.B

Comments

HyperRam/Flash as VGA screen buffer (Now XGA, 720p &1080p) &Rev.B