My reality check :(
Rayman
Posts: 14,826
I was hoping that my "Propeller System Module" (name subject to change) would be able to show full screen (320x240) video from the SD card...
But sadly, I've just found that it takes about a second just to show a single 320x240x24bit image...
It appears that the SD card is the bottleneck.· I've found a thread where Rokicki gives his read speed at between "271 to 430 KBytes/sec".· The filesize of a fullscreen, 24-bit bitmap is 226 kB.·
So, full-screen uncompressed video doesn't appear possible...
Even if I drop the bits/pixel it won't be anywhere near video speed.
But sadly, I've just found that it takes about a second just to show a single 320x240x24bit image...
It appears that the SD card is the bottleneck.· I've found a thread where Rokicki gives his read speed at between "271 to 430 KBytes/sec".· The filesize of a fullscreen, 24-bit bitmap is 226 kB.·
So, full-screen uncompressed video doesn't appear possible...
Even if I drop the bits/pixel it won't be anywhere near video speed.
bmp
225K
Comments
Within Tom's SD code, there is a timeout that one could lower to around 5-10msec. His driver will wait for the SD card to say okay. If it gets hung, it can wait up to a second (if I recall). If you turn this wait time down, it will simply bail on the read. Then you could restart the read at the next block (512bytes). Downside is you might miss some data here and there. Not sure how that would affect you, or the person watching.
The other option is to use more SD cards with another data bus. You could be clever and distribute the data between the two. Though, I'm not sure how easy this would be without you writing the driver to the write the data to the SD's.
The timeout is in the "checktime" part of his routine, in this line "cmp duration,clockfreq wc". (the clockfreq is the variable he checks against)
-Parsko
EDIT - I assume you are using the "sdspiqasm" routine. That is what I am referencing.
1. Use multiblock reads and much larger buffers
2. Eliminate all spin overhead (use "DMA" type transfers)
3. Unroll the read loop to do 32 bits at a time, one instruction per bit (or is it two instructions per
bit? it's been a long time since I did it)
4. Go to the four-bit mode (if you can get adequate documentation and if your use/distribution doesn't
conflict with any licensing rules)
5. Use multiple cards in parallel
Probably (2) above is the easiest and will give you some reasonable
improvement; (5) is also pretty easy, as is (3). Then, (1) is not too bad,
and finally (4) if you are desparate.
Post Edited (rokicki) : 4/17/2009 7:19:10 PM GMT
What would you predict the transfer rate from the SD card to be, if I were to take suggestion #2 and make the transfer all in assembly?
Could it be 20X faster?
Thanks,
Ray
If you go all assembly and DMA and big buffers, you might be able to get 1MB/s.
Note that the SPI protocol for an SD card *itself* tops out at 25MHz; assuming you overclock your
propeller and do some serious magic to make that actually work (maybe using multiple cogs and
storing the video in some interleaved format) even then you'd only get 3MB/sec. And eliminating
that overhead would be really tricky.
If you go 4-bit at 25MHz (and it's not clear the propeller *itself* will do this) you might be able to
get 12MB/s but I'd say it's unlikely, and you'd probably need all eight cogs just to do this.
You may want to start looking at simple video compression techniques that can be decompressed
in real time; I think that's probably where you'll get the best bang for your buck.
Ariba (Andy) made what looks like a "near 20MB/s driver" on one COG for SD SRAM using the counters for clocking. That's about 1 bit per instruction [noparse]:)[/noparse] There is some overhead.
Have a look: http://forums.parallax.com/forums/default.aspx?f=25&m=331054&g=331127#m331127
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230
1MB/sec should be enough for 8-bit/pixel half-screen video. I could just stretch it to full screen... Worth a shot...
Steve,
I think that thread is about SRAM chips, right? I don't really know if that's the same as SD cards...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230
significant command overhead; I don't think the bandwidth will be all that high.
He quotes speeds of
Read max. 300 ticks (3.7µs), Write max. 236 ticks (2.9µs)
These are per byte.
On the "20MB/s", that only applies when writing, and it just spools out the data and
clock one instruction per bit; fsrw has been doing that since its first release.
Getting reads fast is all about eliminating the overhead; on the prop you can
(in one cog) do about one bit per two instructions. To make it fast, you want to
do it 32 bits at a time (unrolled) rather than 8, so you can writelong the data into
hub ram. If you use two cogs, you can probably get two loops running in sync and actually
attain a 10Mbit input data rate (but you need to use multiblock reads to sustain this, and
there will be some amount drained off this for command overhead.) If you do it
with only one cog, the wrlong() and loop overhead will probably cut that down
by 25-50%.
The sdspi*.spin routines are pretty basic SPI readers/writers and should be pretty
easy to extend in this fashion. Of course you have to make sure your buffers are
longword aligned.
I did some of this work long ago but didn't release it because it expanded the code
substantially. I am not sure how much time I will have to revisit it, but the key is
to make sure that the reading *stays* in assembly; you can use Spin to set up a
circular buffer and coordinate on a pair of pointers (say, a buffer of 2K or more)
and you'd probably be best served by ensuring the data is contiguous on the card
(otherwise you'd need some sort of block list datastructure to keep in sync).
For whatever it's worth you had some nice little optimizations in your sdspiasm PASM. It is nice to see you posting on the forum today. I'm very interested in how this could turn out because I've been playing with another interface and would like to have a strategy for updating pieces of the display quickly enough to be effective.
I like having all options open in the display realm since I'm working on a related product to make display aspects easier. Ray ... you might be interested in this ... I'll PM you after I have a prototype next month.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230
Maybe the SD card reading didn't work out, but what about other interfaces like external RAM or Ethernet or ....? I have wanted to make a display that I could stream data to over Ethernet. The data would come from a PC with a screen capture program and send the pixel information to a display like the one you are using.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Timothy D. Swieter, E.I.
www.brilldea.com - Prop Blade, LED Painter, RGB LEDs, 3.0" LCD Composite video display, eProto for SunSPOT
www.tdswieter.com
Streaming video over ethernet would presumably be fast enough... But, I don't have any experience with ethernet on the Prop.
I can already send pixel data from PC over RS232, but it's very slow even at 115200 baud (~5 seconds or so). I've heard the FTDI chip can do ~2MBaud, so that might not be too bad.
For computer created images, I think the ideal thing would be to implement some kind of decompression. I was looking into the png compression, it doesn't look too bad. Some kind of RLL would be even easier...
But, I really want to do video from the SD card. I guess it just won't be as nice or easy as I had thought....
Anyway, this goal of video playback was just for show really anyway. It's not very practical without mp4 compression, and I'm pretty sure Prop I can't do real time mp4 decompression.