Memory Breakout Poll

Rayman · 2019-06-22 20:36

Refresh rate is 60 hz and read rate is maybe up to 150 MBPS
I guess that defines it

cgracey · 2019-06-22 20:48

VonSzarvas wrote: »

For VGA buffering, what is the maximum RAM you'd expect to need per buffer?

640 x 480 8bpp needs 300KB.

1920 x1080 8bpp needs 2.1MB.

1920 x1080 32bpp needs 8.3MB.

Roy Eltham · 2019-06-22 21:57

1920x1080x32bit = 8,294,400 bytes
1920x1080x24bit = 6,220,800 bytes
1920x1080x8bit = 2,073,600 bytes
1280x720x24bit = 2,764,800 bytes
1280x720x8bit = 921,600 bytes

Yanomani · 2019-06-22 21:57

Rayman wrote: »

Also, I'm not sure I actually use RWDS anymore, but would need to confirm that...

Sure, it's possible to totally avoid the use of RWDS during the execution of main ram memory array data read and still capture all data packets, without any flaw, but remember, to the extent of my knowledge, all the attempts made till now, were restricted to the use of HyperCK up to P2 Sysclk/4, so the data throughput was limited to |Sysclk/2| MB/s.

Also to the extent of my knowledge, there was not any succesfull atempt to leverage from RWDS early-signaling, in order to achieve variable latency count, thus shortening the overall period main ram memory array Read/Write commands take to complete.

Since during main ram memory array reads, RWDS toggling keeps close to D[7:0] timing, it can be used to count the effective number of data itens HyperRams were able to send, just in time.

On another front, I saw no temptative use of the absence of incoming RWDS going up, as a timeout, capable of signaling data transfer timing gaps, or even errors, when it is totally absent for more than 32 HyperCk-timing cycles.

During main ram memory array writes, RWDS can be, and should have been used (at least as a demo), to mask the overwrite of any byte (or group of bytes, interleaved, or in rows), among the ones the write command is operating on.

This can be done with the help of another streamer, transfering a single bit, in sync with the main data streamer. Resource consuming? Yes. But we could find good uses for it, as Hyper's breakouts tend to be generally available for experimentation, at the near future.

IMHO, there were not yet enough work done at this area, to justify dismissing something.

A kangaroo without a tail could eventually be trainned to jump and do some exhibits, wearing boxing gloves, but it could'n kick anymore, so, no more street fighting.

Henrique

Wuerfel_21 · 2019-06-22 23:06

Roy Eltham wrote: »

1920x1080x32bit = 8,294,400 bytes
1920x1080x24bit = 6,220,800 bytes
1920x1080x8bit = 2,073,600 bytes
1280x720x24bit = 2,764,800 bytes
1280x720x8bit = 921,600 bytes

It should of course be noted that one usually would want at least two buffers

Peter Jakacki · 2019-06-22 23:32

Wuerfel_21 wrote: »

Roy Eltham wrote: »

1920x1080x32bit = 8,294,400 bytes
1920x1080x24bit = 6,220,800 bytes
1920x1080x8bit = 2,073,600 bytes
1280x720x24bit = 2,764,800 bytes
1280x720x8bit = 921,600 bytes

It should of course be noted that one usually would want at least two buffers

But udating the framebuffer from the backbuffer when they both exist in the same serial device will at least be at half the maximum speed. I would expect to use 6MB for a 1920x1080x24 image so that would need two of those PSRAMs or a 16MB HyperRAM. While I have those HyperRAMs I like the simplicity and low cost of the PSRAM chips so I will probably explore using two of these either as independent 4-bit or ganged 8-bit. The independent 4-bit mode would allow a little hardware trick though in that I can switch the chips in software so that the backbuffer becomes the framebuffer etc. Of course this would only be good for modes that don't require reading the backbuffer such as viewing images etc. Maybe these little guys aren't fast enough for that resolution but it will be fun to experiment nonetheless.

Rayman · 2019-06-23 01:24

Hyperram and VGA resolution has the advantage that there is time to read in a line, modify it, and then write it back for each horizontal line.

evanh · 2019-06-23 02:36

1920x1080@60Hz video output is spec'd at 148.5 MHz pixel rate. A streamer could run at that speed with a sysclock of the same. This would be sufficient for signage displays. And a HyperRAM could, in theory, be operated from that same sysclock and same bandwidth.

I suspect upping it to 150 MHz pixel rate wouldn't be an issue if that was a preferable sysclock.

For PSRAM, without the DDR signalling, they would need to be four in parallel to get the same bandwidth.

First multiple is 297 MHz. 295-300 MHz sysclock is seriously pushing what the Prop2 can be reliable at. It will crash at higher temperatures. Digital outputs will be sloped, rounded and shrunk, aka attenuated. Would need special treatment to deploy.

evanh · 2019-06-23 02:43

PS: Assumed to be eight bits per pixel in the frame buffer. More than eight bits would need a wider data path.

evanh · 2019-06-23 02:52

The downside here is only blanking time is remaining for buffer writes. That's a pretty sucky ratio for anything that isn't just an occasional page flip.

So, in practise, you'd still want to halve the bits per pixel or double the bandwidth available.

Roy Eltham · 2019-06-23 04:54

hey, evanh,
are you aware of the edit post feature? You can edit your first post to added the other comments... instead of replying with several short posts one after another....

evanh · 2019-06-23 07:19

Roy Eltham wrote: »

You can edit your first post to added the other comments...

I do that too.

I do a lot of post-edits, mostly for clarification/typo fixes but sometimes because I've decided I should test out my answer. Which has resulted in the occasional complete rewrite but mostly I make amendments. Slight bad habit of speaking before absolute certainty. I end up testing for days sometimes though, so saying nothing isn't very engaging either. So the posts get a tad lose.

Seairth · 2019-06-23 14:17

Personally, I'm fine with any option, as long as it's not too complicated to code for. Part of the joy of using the Propeller is the relative simplicity of programming for it. I'd rather sacrifice data rates (and/or a few extra pins) for a solution that is simple to implement/understand. People who are making purpose-built boards can use whatever chips they want, but an add-on for the Eval board should (imo) stay true to the nature of the P2 itself.

whicker · 2019-06-25 03:35

I'm dead-set on HyperRam.
Once you get the 6 byte address in there (3 clock cycles) and the latency, you get data every clock edge.
Drop the latency clock setting to 3 and use variable latency and it's speedy enough for random access.

Or use about 18 pins and go two 8 MB chips in parallel with fixed latency to get a 16-bit wide bus that reads in 32-bits every full clock cycle. That'd be my dream option for 16-bit color with a 16-bit z-buffer, or RGB 888 plus alpha.

VonSzarvas · 2019-06-25 12:45

Thanks again for all the comments from everyone.

Here's a schematic of how it's looking.

The series R's (value not determined, but nominally shown as 27R) will probably be replaced by shunt footprints (hence experimental R108). Depends on optimal layout-- anyway the aim is that for those control signals that are shared, the user will be able to move the small resistor to "un-share" the pins, and then hook into control signals individually.

Of course this all shares a single databus- in the end the view was taken that for higher throughput experiments, people could get two of these, running one each side of the P2, and thus experiment with pretty much every combination of connections- including 16-wide bus.

Hopefully this will provide a really solid way to continue experiments with HyperRAM and HyperFLASH, and save the headaches of BGA assembly at home!

Does this tick the boxes for everyone ?

dMajo · 2019-06-25 13:25

VonSzarvas wrote: »

Thanks again for all the comments from everyone.

Here's a schematic of how it's looking.

The series R's (value not determined, but nominally shown as 27R) will probably be replaced by shunt footprints (hence experimental R108). Depends on optimal layout-- anyway the aim is that for those control signals that are shared, the user will be able to move the small resistor to "un-share" the pins, and then hook into control signals individually.

Of course this all shares a single databus- in the end the view was taken that for higher throughput experiments, people could get two of these, running one each side of the P2, and thus experiment with pretty much every combination of connections- including 16-wide bus.

Hopefully this will provide a really solid way to continue experiments with HyperRAM and HyperFLASH, and save the headaches of BGA assembly at home!

Does this tick the boxes for everyone ?

As this is a development/test add-on would it be possible to have reset signal of the four ICs connected to an on-board push-button and/or jumper, to free the socket pin, and have RWDS-U100-U101 to IO-9 and RWDS-U102-U103 to IO-10

evanh · 2019-06-25 14:23

dMajo,
Just been reading what that might mean - A hand operated reset will corrupt DRAM content because the internal DRAM refresh is frozen for the duration of the reset.

dMajo · 2019-06-25 15:15

The aim to have two separated RWDS signals was to be able to copy content between chips (flash>ram and ram<->ram) without driving databus.

Can't then the Reset just by on a jumper connector so it can be wired to an other P2 IO (out of the 16 of the 2 sockets)?
Having the 4 ICs on the same RWDS signal is not fine IMHO

evanh · 2019-06-25 15:35

I'm struggling to get my head around how RWDS can be obeyed without large stalls to test for it.

As far as I can tell the best approach is ensure addressing is managed to prevent RWDS from ever asserting outside of predictable sequence. So then it can be ignored at all times.

EDIT: Or at the very least, ignored during the data burst phase. In this approach I imagine bit-bashing the control and addressing and then engage the smartpin+streamer only for the data burst.

evanh · 2019-06-25 15:50

Reset could be happily left unconnected. It doesn't really have a serious use. Don't bother with any push button.

evanh · 2019-06-25 16:09

Ah, I see, for DRAM writes, RWDS can be used to write selected bytes of the burst. Thereby providing the speed without the burden of a read-modify-write. That would be handy.

Is it possible to make a streamer have a 9 or 10 bit wide output? I'm thinking it can but would need 16-bit hubRAM data encoded with the control patten.

VonSzarvas · 2019-06-25 16:31

Thinking out loud... To split RWDS we need to save an IO.

Maybe CS could be grouped into two's. Could use an inverting or tristate buffer, diode, a couple pull resistors and perhaps a little more thought! Such that setting CS low asserts CS0 low & CS1 high, setting CS high asserts CS0 high & CS1 low, setting CS to input leaves both CS0 and CS1 pulled high. Something like that.

If that was doable, would that be suitably non-confusing enough to be worth gaining RWDS control ?

cgracey · 2019-06-25 16:41

evanh wrote: »

Ah, I see, for DRAM writes, RWDS can be used to write selected bytes of the burst. Thereby providing the speed without the burden of a read-modify-write. That would be handy.

Is it possible to make a streamer have a 9 or 10 bit wide output? I'm thinking it can but would need 16-bit hubRAM data encoded with the control patten.

The streamer handles 1, 2, 4, 8, 16, or 32 I/Os. You could output 9 or 10 bits by not enabling all DIR bits, but you'd still be moving 16-bit values around.

evanh · 2019-06-25 16:56

cgracey wrote: »

The streamer handles 1, 2, 4, 8, 16, or 32 I/Os. You could output 9 or 10 bits by not enabling all DIR bits, but you'd still be moving 16-bit values around.

Thanks. Right. I doubt it could be used effectively. Better to just do the simple read-modify-write I think. No encoding needed then.

evanh · 2019-06-25 16:59

VonSzarvas wrote: »

If that was doable, would that be suitably non-confusing enough to be worth gaining RWDS control ?

I'd prefer ditching the reset line. Keeping the chip selects separate will give more flexibility to using the two RWDS lines.

PS: Reset isn't needed for anything particular. Not even for recovering from deep power down. The CS line can do that.

cgracey · 2019-06-25 17:07

So, there's some interest in getting rid of RESET# control and having separate RWDS lines for the flash and RAM groups?

evanh · 2019-06-25 17:13

dMajo suggested two groups of Flash+DRAM paired. The idea being that copying could then be DRAM<->DRAM or Flash<->Flash or diagonal DRAM<->Flash.

VonSzarvas · 2019-06-25 17:28

In-case the 2nd schematic sheet didn't show up, all of those shared control lines (CS, CK, RESET, RWDS) are broken out to 0.1" pads for TH headers, such that they can be split up and controlled by other IOs independently if preferred.

So what is being proposed could already be achieved, by moving 2 resistor shunts and using whatever header & jumper cable you prefer to hook over to another P2 IO.

Sure, RWDS could be split by default as proposed. With RESET being left unconnected (and available on the optional header pads).

At the moment RESET is pulled low, so that the module is kept in reset until P2 asserts the reset pin.
Would need to re-think that if RESET is isolated by default.

If having RESET toggled at least once when the module is attached, to ensure a clean start, would be desirable, then we could add a latching buffer to one of the CS pins, so that the first time CS is asserted it releases the RESET.

evanh · 2019-06-25 17:55

True. I'm easy.

VonSzarvas · 2019-06-25 19:18

If having RWDS per channel available by default means getting various copy functions that many would want to use, then doing that kinda makes sense.

Putting reset to an option header (perhaps even pre-installed) provides a quick way for reset to be jumpered to some other P2 IO (or direct to RESn, which is available on RevB EVAL !).

Some ponders...

Is RESET really unnecessary here?

If RESET is left un-connected (pulled up), will the memory be totally usable, and reliable ? (ie. Is it sure, the memory wouldn't need at least one RESET after power-up, once VCC gets all the way up?) ** Also bearing in mind, the 3V3 rail can be power-cycled on the new Eval board via a shunt **

Will go read the datasheet some more... maybe there's some notes about RESET.

Memory Breakout Poll

Comments