Shop OBEX P1 Docs P2 Docs Learn Events
My reality check :( — Parallax Forums

My reality check :(

RaymanRayman Posts: 14,826
edited 2009-04-18 11:49 in Propeller 1
I was hoping that my "Propeller System Module" (name subject to change) would be able to show full screen (320x240) video from the SD card...

But sadly, I've just found that it takes about a second just to show a single 320x240x24bit image...

It appears that the SD card is the bottleneck.· I've found a thread where Rokicki gives his read speed at between "271 to 430 KBytes/sec".· The filesize of a fullscreen, 24-bit bitmap is 226 kB.·

So, full-screen uncompressed video doesn't appear possible...

Even if I drop the bits/pixel it won't be anywhere near video speed.
1905 x 1536 - 712K

Comments

  • MagIO2MagIO2 Posts: 2,243
    edited 2009-04-17 18:25
    Does the SD card driver already use the SD mode? There you have 4 data lines and you can transfere 4x 271 to 430 KBytes/sec. But I'd guess that then HUB RAM access will be the bottleneck.
  • parskoparsko Posts: 501
    edited 2009-04-17 18:48
    Rayman,

    Within Tom's SD code, there is a timeout that one could lower to around 5-10msec. His driver will wait for the SD card to say okay. If it gets hung, it can wait up to a second (if I recall). If you turn this wait time down, it will simply bail on the read. Then you could restart the read at the next block (512bytes). Downside is you might miss some data here and there. Not sure how that would affect you, or the person watching.

    The other option is to use more SD cards with another data bus. You could be clever and distribute the data between the two. Though, I'm not sure how easy this would be without you writing the driver to the write the data to the SD's.

    The timeout is in the "checktime" part of his routine, in this line "cmp duration,clockfreq wc". (the clockfreq is the variable he checks against)

    -Parsko

    EDIT - I assume you are using the "sdspiqasm" routine. That is what I am referencing.
  • rokickirokicki Posts: 1,000
    edited 2009-04-17 19:09
    It is possible to speed things up somewhat, at an expansion in code size. There are a lot of possibilities:

    1. Use multiblock reads and much larger buffers

    2. Eliminate all spin overhead (use "DMA" type transfers)

    3. Unroll the read loop to do 32 bits at a time, one instruction per bit (or is it two instructions per
    bit? it's been a long time since I did it)

    4. Go to the four-bit mode (if you can get adequate documentation and if your use/distribution doesn't
    conflict with any licensing rules)

    5. Use multiple cards in parallel

    Probably (2) above is the easiest and will give you some reasonable
    improvement; (5) is also pretty easy, as is (3). Then, (1) is not too bad,
    and finally (4) if you are desparate.

    Post Edited (rokicki) : 4/17/2009 7:19:10 PM GMT
  • Toby SeckshundToby Seckshund Posts: 2,027
    edited 2009-04-17 19:35
    With streaming video the next picture is far more important than data accuracy (within reason) drop as much housekeeping as possible to keep the bit rate up. This is what UDP does over TCP and video hard drives do not error check so much as data ones. Persistance of vision will smudge over small problems.
  • RaymanRayman Posts: 14,826
    edited 2009-04-17 20:59
    Rockiki,

    What would you predict the transfer rate from the SD card to be, if I were to take suggestion #2 and make the transfer all in assembly?

    Could it be 20X faster?

    Thanks,

    Ray
  • rokickirokicki Posts: 1,000
    edited 2009-04-17 21:20
    No, to get 20X you're probably going to have to go to a 4-bit transfer mode and use several cards.

    If you go all assembly and DMA and big buffers, you might be able to get 1MB/s.

    Note that the SPI protocol for an SD card *itself* tops out at 25MHz; assuming you overclock your
    propeller and do some serious magic to make that actually work (maybe using multiple cogs and
    storing the video in some interleaved format) even then you'd only get 3MB/sec. And eliminating
    that overhead would be really tricky.

    If you go 4-bit at 25MHz (and it's not clear the propeller *itself* will do this) you might be able to
    get 12MB/s but I'd say it's unlikely, and you'd probably need all eight cogs just to do this.

    You may want to start looking at simple video compression techniques that can be decompressed
    in real time; I think that's probably where you'll get the best bang for your buck.
  • jazzedjazzed Posts: 11,803
    edited 2009-04-17 23:15
    Rayman,

    Ariba (Andy) made what looks like a "near 20MB/s driver" on one COG for SD SRAM using the counters for clocking. That's about 1 bit per instruction [noparse]:)[/noparse] There is some overhead.

    Have a look: http://forums.parallax.com/forums/default.aspx?f=25&m=331054&g=331127#m331127

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • RaymanRayman Posts: 14,826
    edited 2009-04-17 23:57
    Rokicki,
    1MB/sec should be enough for 8-bit/pixel half-screen video. I could just stretch it to full screen... Worth a shot...

    Steve,
    I think that thread is about SRAM chips, right? I don't really know if that's the same as SD cards...
  • jazzedjazzed Posts: 11,803
    edited 2009-04-17 23:59
    They are the same type of physical interface. You would have to work some to replace the PASM layer ....

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • rokickirokicki Posts: 1,000
    edited 2009-04-18 00:17
    The SRAM object does byte reads and writes, not blocks, and there appears to be
    significant command overhead; I don't think the bandwidth will be all that high.
    He quotes speeds of

    Read max. 300 ticks (3.7µs), Write max. 236 ticks (2.9µs)

    These are per byte.

    On the "20MB/s", that only applies when writing, and it just spools out the data and
    clock one instruction per bit; fsrw has been doing that since its first release.

    Getting reads fast is all about eliminating the overhead; on the prop you can
    (in one cog) do about one bit per two instructions. To make it fast, you want to
    do it 32 bits at a time (unrolled) rather than 8, so you can writelong the data into
    hub ram. If you use two cogs, you can probably get two loops running in sync and actually
    attain a 10Mbit input data rate (but you need to use multiblock reads to sustain this, and
    there will be some amount drained off this for command overhead.) If you do it
    with only one cog, the wrlong() and loop overhead will probably cut that down
    by 25-50%.

    The sdspi*.spin routines are pretty basic SPI readers/writers and should be pretty
    easy to extend in this fashion. Of course you have to make sure your buffers are
    longword aligned.

    I did some of this work long ago but didn't release it because it expanded the code
    substantially. I am not sure how much time I will have to revisit it, but the key is
    to make sure that the reading *stays* in assembly; you can use Spin to set up a
    circular buffer and coordinate on a pair of pointers (say, a buffer of 2K or more)
    and you'd probably be best served by ensuring the data is contiguous on the card
    (otherwise you'd need some sort of block list datastructure to keep in sync).
  • jazzedjazzed Posts: 11,803
    edited 2009-04-18 00:42
    Tom,

    For whatever it's worth you had some nice little optimizations in your sdspiasm PASM. It is nice to see you posting on the forum today. I'm very interested in how this could turn out because I've been playing with another interface and would like to have a strategy for updating pieces of the display quickly enough to be effective.

    I like having all options open in the display realm since I'm working on a related product to make display aspects easier. Ray ... you might be interested in this ... I'll PM you after I have a prototype next month.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • Timothy D. SwieterTimothy D. Swieter Posts: 1,613
    edited 2009-04-18 07:28
    Rayman -

    Maybe the SD card reading didn't work out, but what about other interfaces like external RAM or Ethernet or ....? I have wanted to make a display that I could stream data to over Ethernet. The data would come from a PC with a screen capture program and send the pixel information to a display like the one you are using.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Timothy D. Swieter, E.I.
    www.brilldea.com - Prop Blade, LED Painter, RGB LEDs, 3.0" LCD Composite video display, eProto for SunSPOT
    www.tdswieter.com
  • OwenSOwenS Posts: 173
    edited 2009-04-18 11:38
    You could cut down the bandwidth significantly if you DXT1 compressed the images. You still won't get it down as far as you need - but you will get it down to 17mbit/s, which a single SD card can do. I don't know how many cogs you have free but with enough you should be able to decompress it in real time
  • RaymanRayman Posts: 14,826
    edited 2009-04-18 11:42
    Timothy,

    Streaming video over ethernet would presumably be fast enough... But, I don't have any experience with ethernet on the Prop.

    I can already send pixel data from PC over RS232, but it's very slow even at 115200 baud (~5 seconds or so). I've heard the FTDI chip can do ~2MBaud, so that might not be too bad.

    For computer created images, I think the ideal thing would be to implement some kind of decompression. I was looking into the png compression, it doesn't look too bad. Some kind of RLL would be even easier...

    But, I really want to do video from the SD card. I guess it just won't be as nice or easy as I had thought....

    Anyway, this goal of video playback was just for show really anyway. It's not very practical without mp4 compression, and I'm pretty sure Prop I can't do real time mp4 decompression.
  • OwenSOwenS Posts: 173
    edited 2009-04-18 11:49
    DXT1 would cut it down to 17mbit/s, and it should be telling of it's simplicity that it's been done in hardware since the S3 Savage 3D in 1998. It's really just a bit of computation and a lookup table.
Sign In or Register to comment.