My reality check :(

Rayman · 2009-04-17 18:02

I was hoping that my "Propeller System Module" (name subject to change) would be able to show full screen (320x240) video from the SD card...

But sadly, I've just found that it takes about a second just to show a single 320x240x24bit image...

It appears that the SD card is the bottleneck.· I've found a thread where Rokicki gives his read speed at between "271 to 430 KBytes/sec".· The filesize of a fullscreen, 24-bit bitmap is 226 kB.·

So, full-screen uncompressed video doesn't appear possible...

Even if I drop the bits/pixel it won't be anywhere near video speed.

MagIO2 · 2009-04-17 18:25

Does the SD card driver already use the SD mode? There you have 4 data lines and you can transfere 4x 271 to 430 KBytes/sec. But I'd guess that then HUB RAM access will be the bottleneck.

parsko · 2009-04-17 18:48

Rayman,

Within Tom's SD code, there is a timeout that one could lower to around 5-10msec. His driver will wait for the SD card to say okay. If it gets hung, it can wait up to a second (if I recall). If you turn this wait time down, it will simply bail on the read. Then you could restart the read at the next block (512bytes). Downside is you might miss some data here and there. Not sure how that would affect you, or the person watching.

The other option is to use more SD cards with another data bus. You could be clever and distribute the data between the two. Though, I'm not sure how easy this would be without you writing the driver to the write the data to the SD's.

The timeout is in the "checktime" part of his routine, in this line "cmp duration,clockfreq wc". (the clockfreq is the variable he checks against)

-Parsko

EDIT - I assume you are using the "sdspiqasm" routine. That is what I am referencing.

rokicki · 2009-04-17 19:09

It is possible to speed things up somewhat, at an expansion in code size. There are a lot of possibilities:

1. Use multiblock reads and much larger buffers

2. Eliminate all spin overhead (use "DMA" type transfers)

3. Unroll the read loop to do 32 bits at a time, one instruction per bit (or is it two instructions per
bit? it's been a long time since I did it)

4. Go to the four-bit mode (if you can get adequate documentation and if your use/distribution doesn't
conflict with any licensing rules)

5. Use multiple cards in parallel

Probably (2) above is the easiest and will give you some reasonable
improvement; (5) is also pretty easy, as is (3). Then, (1) is not too bad,
and finally (4) if you are desparate.

Post Edited (rokicki) : 4/17/2009 7:19:10 PM GMT

Toby Seckshund · 2009-04-17 19:35

With streaming video the next picture is far more important than data accuracy (within reason) drop as much housekeeping as possible to keep the bit rate up. This is what UDP does over TCP and video hard drives do not error check so much as data ones. Persistance of vision will smudge over small problems.

Rayman · 2009-04-17 20:59

Rockiki,

What would you predict the transfer rate from the SD card to be, if I were to take suggestion #2 and make the transfer all in assembly?

Could it be 20X faster?

Thanks,

Ray

rokicki · 2009-04-17 21:20

No, to get 20X you're probably going to have to go to a 4-bit transfer mode and use several cards.

If you go all assembly and DMA and big buffers, you might be able to get 1MB/s.

Note that the SPI protocol for an SD card *itself* tops out at 25MHz; assuming you overclock your
propeller and do some serious magic to make that actually work (maybe using multiple cogs and
storing the video in some interleaved format) even then you'd only get 3MB/sec. And eliminating
that overhead would be really tricky.

If you go 4-bit at 25MHz (and it's not clear the propeller *itself* will do this) you might be able to
get 12MB/s but I'd say it's unlikely, and you'd probably need all eight cogs just to do this.

You may want to start looking at simple video compression techniques that can be decompressed
in real time; I think that's probably where you'll get the best bang for your buck.

jazzed · 2009-04-17 23:15

Rayman,

Ariba (Andy) made what looks like a "near 20MB/s driver" on one COG for SD SRAM using the counters for clocking. That's about 1 bit per instruction [noparse]:)[/noparse] There is some overhead.

Have a look: http://forums.parallax.com/forums/default.aspx?f=25&m=331054&g=331127#m331127

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230

Rayman · 2009-04-17 23:57

Rokicki,
1MB/sec should be enough for 8-bit/pixel half-screen video. I could just stretch it to full screen... Worth a shot...

Steve,
I think that thread is about SRAM chips, right? I don't really know if that's the same as SD cards...

jazzed · 2009-04-17 23:59

They are the same type of physical interface. You would have to work some to replace the PASM layer ....

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230

rokicki · 2009-04-18 00:17

The SRAM object does byte reads and writes, not blocks, and there appears to be
significant command overhead; I don't think the bandwidth will be all that high.
He quotes speeds of

Read max. 300 ticks (3.7µs), Write max. 236 ticks (2.9µs)

These are per byte.

On the "20MB/s", that only applies when writing, and it just spools out the data and
clock one instruction per bit; fsrw has been doing that since its first release.

Getting reads fast is all about eliminating the overhead; on the prop you can
(in one cog) do about one bit per two instructions. To make it fast, you want to
do it 32 bits at a time (unrolled) rather than 8, so you can writelong the data into
hub ram. If you use two cogs, you can probably get two loops running in sync and actually
attain a 10Mbit input data rate (but you need to use multiblock reads to sustain this, and
there will be some amount drained off this for command overhead.) If you do it
with only one cog, the wrlong() and loop overhead will probably cut that down
by 25-50%.

The sdspi*.spin routines are pretty basic SPI readers/writers and should be pretty
easy to extend in this fashion. Of course you have to make sure your buffers are
longword aligned.

I did some of this work long ago but didn't release it because it expanded the code
substantially. I am not sure how much time I will have to revisit it, but the key is
to make sure that the reading *stays* in assembly; you can use Spin to set up a
circular buffer and coordinate on a pair of pointers (say, a buffer of 2K or more)
and you'd probably be best served by ensuring the data is contiguous on the card
(otherwise you'd need some sort of block list datastructure to keep in sync).

jazzed · 2009-04-18 00:42

Tom,

For whatever it's worth you had some nice little optimizations in your sdspiasm PASM. It is nice to see you posting on the forum today. I'm very interested in how this could turn out because I've been playing with another interface and would like to have a strategy for updating pieces of the display quickly enough to be effective.

I like having all options open in the display realm since I'm working on a related product to make display aspects easier. Ray ... you might be interested in this ... I'll PM you after I have a prototype next month.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230

Timothy D. Swieter · 2009-04-18 07:28

Rayman -

Maybe the SD card reading didn't work out, but what about other interfaces like external RAM or Ethernet or ....? I have wanted to make a display that I could stream data to over Ethernet. The data would come from a PC with a screen capture program and send the pixel information to a display like the one you are using.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Timothy D. Swieter, E.I.
www.brilldea.com - Prop Blade, LED Painter, RGB LEDs, 3.0" LCD Composite video display, eProto for SunSPOT
www.tdswieter.com

OwenS · 2009-04-18 11:38

You could cut down the bandwidth significantly if you DXT1 compressed the images. You still won't get it down as far as you need - but you will get it down to 17mbit/s, which a single SD card can do. I don't know how many cogs you have free but with enough you should be able to decompress it in real time

Rayman · 2009-04-18 11:42

Timothy,

Streaming video over ethernet would presumably be fast enough... But, I don't have any experience with ethernet on the Prop.

I can already send pixel data from PC over RS232, but it's very slow even at 115200 baud (~5 seconds or so). I've heard the FTDI chip can do ~2MBaud, so that might not be too bad.

For computer created images, I think the ideal thing would be to implement some kind of decompression. I was looking into the png compression, it doesn't look too bad. Some kind of RLL would be even easier...

But, I really want to do video from the SD card. I guess it just won't be as nice or easy as I had thought....

Anyway, this goal of video playback was just for show really anyway. It's not very practical without mp4 compression, and I'm pretty sure Prop I can't do real time mp4 decompression.

OwenS · 2009-04-18 11:49

DXT1 would cut it down to 17mbit/s, and it should be telling of it's simplicity that it's been done in hardware since the S3 Savage 3D in 1998. It's really just a bit of computation and a lookup table.

My reality check :(

Comments