Shop OBEX P1 Docs P2 Docs Learn Events
OLED 96-prop - Loading an Image from uSD card, and draw speed. — Parallax Forums

OLED 96-prop - Loading an Image from uSD card, and draw speed.

etherVisionetherVision Posts: 3
edited 2007-11-11 15:29 in Propeller 1
I've been working to display animation on the 96-Prop from 4D, I've modified some of the demo code, added SD read capability. The problem I'm having now is image write speed, which seems to be about 2fps. I added a speed test section, that just writes frames of solid color, 1 pixel at a time. Still getting only about 2fps. I'm posting the code here, in the hopes that someone will help me optimize it further, or figure out if it's just a problem with the unit that I have.

ethervision.gif

Comments

  • etherVisionetherVision Posts: 3
    edited 2007-11-10 06:02
    BTW,
    Here are the 2 routines added to the prop driver codes:

    PUB PutPixel16 (X, Y, V1, V2)
    '' Writes 2 bytes (16 bits) of color data to the upper-left corner (X,Y) of the area in
    '' Graphic RAM defined by the Set_GRAM_Access method.

    Set_GRAM_Access (X, 95, Y, 63)
    OUTA[noparse][[/noparse]CS_OLED] := 0
    OUTA[noparse][[/noparse]WR_OLED] := 0
    OUTA[noparse][[/noparse]7..0] := V1 ' MSB
    OUTA[noparse][[/noparse]WR_OLED] := 1
    OUTA[noparse][[/noparse]WR_OLED] := 0
    OUTA[noparse][[/noparse]7..0] := V2 ' LSB
    OUTA[noparse][[/noparse]WR_OLED] := 1
    OUTA[noparse][[/noparse]CS_OLED] := 1

    PUB PutPixelNext (V1, V2)
    '' Writes 2 bytes (16 bits) of color data to the next location

    OUTA[noparse][[/noparse]CS_OLED] := 0
    OUTA[noparse][[/noparse]WR_OLED] := 0
    OUTA[noparse][[/noparse]7..0] := V1 ' MSB
    OUTA[noparse][[/noparse]WR_OLED] := 1
    OUTA[noparse][[/noparse]WR_OLED] := 0
    OUTA[noparse][[/noparse]7..0] := V2 ' LSB
    OUTA[noparse][[/noparse]WR_OLED] := 1
    OUTA[noparse][[/noparse]CS_OLED] := 1
  • Mike GreenMike Green Posts: 23,101
    edited 2007-11-10 06:21
    At 6400 pixels per image and 8 Spin statements and a method call plus whatever
    code is in the routine that calls PutPixelNext, I can imagine that Spin is just a little
    too slow. I'm sure you could optimize the Spin code a little bit and get a few more
    frames per second, but you'll do better with assembly language.
  • deSilvadeSilva Posts: 2,967
    edited 2007-11-10 07:21
    A raw estimation (unshown loop and the shown basis routines) indicate that you waste at least 150µs per pixel; writing about 6,000 pixels will need one second.

    I doubt you even get 2fps smile.gif

    This is a matter of SPIN, you should easily accomplish 60 fps with machine code.
  • Greg PGreg P Posts: 58
    edited 2007-11-10 07:32
    I'm working (as I'm sure several other forum members are also) on a full assembler version of the uOLED-96-Prop demo code. I've gotten most of the code translated to assembly. Just need to work out some final details. I plan to use pread (from FAT16 object) to fill a 384 byte buffer with data from the SD card. I'll then pass to the Prop-cog a request to write the buffer's content to a specific start-row, filling two complete display rows (96pixels/row x 2rows x 2bytes/pixel). Do this 32 times and you have a full frame. A row-based approach simpifies the GRAM access, offering simple direct streaming of data from SD card to display.

    My results with a 2GB PNY uSD card are 164 kbytes/sec write, 316 kbytes/sec read. This translates to more than 24 frames per second ... i.e. true video. Not bad. Past experiments with the USBwiz, a USB host product by GHIelectronics, maxed out at 54 kbytes/sec write speed.

    On another front, I have had success with a 'color averaging' routine implemented in Visual Basic 6. It may take 6 seconds or so for a large 3 MB jpeg from my digital camera to be processed, but it will examine individual pixel blocks, calculate an average color, and construct 12kbyte .txt files which can be written to the uSD card. It runs in batch mode, translating a folder full of JPGs to small BMPs and propeller TXT. I modified the uOLED-96-Prop demo code into a simple SlideShow program with about three dozen colorful images grabbed using Google/Images. The image quality is amazingly high. I love this thing !

    I've also been toying with the uSD card write operation ...specifically the sdspiqasm code .. the really fast version of the low-level SPI interface code. This code is amazing !! The author has used the two cog counters as NCOs (numerically controlled oscillators) in a very unusual manner. One NCO drives the CLK pin to the uSD card, the other the DI pin (data to the card). The CLK NCO is configured in such a way that loading (for example the PHSa register with #8) will cause the CLK pin to go low for 8 cycles before returning high. This saves an instruction or two. The other NCO is used as a bit-shift to an output pin, with its FRQ register set to zero. Data is shifted into bit 31 and this maps to the DI pin. Another implementation elsewhere in his code, toggles the clock line up and down repeatedly while assembly code is reading in data bits from the SD card. Cool, amazing stuff ! My hats off. PS: I also liked the trick of editing the cog code while it is still in hub memory, changing a few longs which hold the pin#s, THEN calling cognew() to load the SPIN modified code into the cog. This is truly Propeller-ology at its finest !!

    Once I got my head around this code I thought I saw an opportunity to speed up the low-level writeblock routine. The bit shifts out uses 3 instructions in a loop. I rewrote the code without the loop and modified the phase and period of the NCO generated CLK signal so that the CLK's rising edge (SD card's signal to sample the data on its DI pin) aligned with the assembly instructions which where just repeated "shl phsb,#1" + "nop" instructions - effectively 2 instructions per bit output. It seemed to work (at least nothing crashed and burned !) Unfortunately the final write time for 2Mbytes was hardly changed at about 12. 8 seconds. WHY ? My rough first estimate (from the timing loop) was that an upper possible limit of 666Kbytes per second would be possible IN THEORY. I thought I would at least see some small improvement.

    I don't have access over the weekend to my logic analyzer, but if I did, I'm fairly certain I would see long pauses during which an asserted BUSY by the SD card was applying the brakes for a few milliseconds during each 512 byte write attempt. It's back to the drawing board with the best hope being a multiblock write operation. The actual write of 512 bytes with my modified code should take less than 0.8 mS ... I still have another 2.3 mS of additional unexplained delay (I'm only at 25% of theoretical max write speed). With multiblock I could just keep sending blocks and the SD card will (I think) just keep accepting them until an Erase sector of data has been received, and only then force a BUSY. That's just a guess. IF anyone out there has real working Multiblock source code from any source it would be a welcome starting point !
  • deSilvadeSilva Posts: 2,967
    edited 2007-11-10 08:13
    Greg P said...
    This code is amazing !! The author has used the two cog counters as NCOs (numerically controlled oscillators) in a very unusual manner. One NCO drives the CLK pin to the uSD card, the other the DI pin (data to the card). The CLK NCO is configured in such a way that loading (for example the PHSa register with #8) will cause the CLK pin to go low for 8 cycles before returning high. This saves an instruction or two. The other NCO is used as a bit-shift to an output pin, with its FRQ register set to zero. Data is shifted into bit 31 and this maps to the DI pin.
    Greg this is nice of you to mention that! As often noted in other threads the timers/counters tend to be forgotten, especially by SPIN only programmers. When deSilva wrote his Tutorial many months ago, he erroniously assumed that SPIN programmers turning to machine code would know all about COGs and Counters. He was deadwrong! But Chrismas wil be a good time for an update of the Tutorial , this time WITH examples how to use timers/counters.

    As in fact it is all standard use and not really amazing for the experienced programmer.

    A small addition for the folks who want to do it themselves.

    (1) To output a high pulse of length N ticks requires to set PHS to minus N and FRQ to 1 - as this depends on the CLKFREQ it can become tricky when the crystal is not well tuned to the required timing... But you can't do any better with next next best WAITCNT instruction

    (2) Shifting things out from PHS has been discussed some times by our video specialists, the issue is to shift things in smile.gif We still have nothing better but an unrolled INA loop...
     MOV PHSA, theValue32
     MOV CTRA, NCO/PWN mode + pin  ' MSB now output ..
     MOV loopc, #31                   ' ...for 2 instruction lengths
    loop
      SHL PHSA, #1
      DJNZ loopc, #loop
    


    will do a 10 Mhz shift-out - which is much slower than using the Video Logic.

    Edit: Oops, the last line must read 10MHz of course
    Edit2: Oh dear, it must be 31 of course, as the MSB is already output befor the loop starts...

    Post Edited (deSilva) : 11/10/2007 7:52:56 PM GMT
  • Greg PGreg P Posts: 58
    edited 2007-11-10 22:07
    Each bit shifted out the DI pin requires execution of two 4-cycle assembly instructions:

    4 clks: shl phsb,#1 ' output bit5
    4 clks: nop

    At the uOLED-09-Prop's system clock frequency of 64 Mhz, that's 125 nS per bit, or an 8Mhz SPI clock.
    If running at 80 Mhz, that's a 10 Mhz SPI clock.

    Can anyone explain why uncommenting the OR statement below would cause all $00 values to be written to the SD card instead of "always odd" values (with bits7 thru bits1 preserved) ? Note, its not a timing issue because a substituted NOP instruction has no effect ... all data is correctly written and read.

    At first it seemed possible to drive the CLK pin at 16Mhz (2clks high, 2 clks low), delete the NOP, and just have a string of SHL instructions, but how would you start the CLK and stop it from toggling beyond the last bit ?

    Below is the modified code, with an 8 Mhz SPI clock. It did not improve Write speed as I believe the bottleneck lies with an extended BUSY state issued by the SD card, not the Propeller SPI clock speed. It did, however, work, so it will serve as a good example to study (and hopefully improve ! )

    I know nothing of the video registers. Can video & PHS be combined to shift out 32 data bits (WITH CLK controlled by NCO) ???

    PS: Ok, I just go it ! Your code has no penalty for its DJNZ loop as it serves the same purpose as my NOP, with much less code and a lot more elegance !

    MODIFIED SDSPIQASM:

    
    '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                              
    '                       WRITE 512 BYTES TO uSD CARD   
    wbyte
            rdbyte phsb,accb   ' Read in actual data byte at hub-address 'accb'
    
            'or phsb, #$01    'This is an attempt to intentionally disrupt the value written to the card
                                     'so that I could confirm that I was truly writing data and correctly verifying
                                     'the data with a subsequent read. The idea was to write even byte values
                                     'as odd values, so that I would be reading back the wrong value half the time.
                                     'Oddly, instead I'm apparently writing $00 bytes all the time when the OR
                                     'statement's comment is removed. WHY WOULD THIS HAPPEN ???
     
            shl phsb,#23       ' Shift 8-bits of data byte to within 1-bit of PHSb[noparse][[/noparse]31]
            add accb,#1        ' Advance to next byte in hub memory
    
    '        mov ctr,#8         ' Load for 8 bit-shift-out cycles
    'wbit    mov phsa,#8        '4 Enable CTRa to drive CLK pin for 8 system clock cycles
    '        shl phsb,#1        '4 Shift into PHSb[noparse][[/noparse]31] - DI pin - the next output data bit
    '        djnz ctr,#wbit     '4 (if jump), 8 (at last call)
    
         '*** TEST ONLY ****
            mov frqa, #0      ' halt PHSa accumulation
            mov phsa, hifreq2 ' hifreq2 == long $80_00_01_00, CLK=HIGH
            mov frqa, freq    ' freq ==  long $20_00_00_00
                              ' Each sys clk, starting NOW, adds FRQa($20) to PHSa ($80)
    
            shl phsb,#1       ' $A0,C0,E0,00 ... output bit7, MSB
            nop               ' $20,40,60,80 ... at end of 'nop' CLK latches data on its rising edge
    
            shl phsb,#1       ' $A0,C0,E0,00 ... output bit6 (at CLK falling edge)
            nop               ' $20,40,60,80 ... at end of 'nop' CLK latches data on its rising edge
    
            shl phsb,#1       ' output bit5
            nop
    
            shl phsb,#1       ' output bit4
            nop
    
            shl phsb,#1       ' output bit3
            nop
    
            shl phsb,#1       ' output bit2
            nop
    
            shl phsb,#1       ' output bit1
            nop
    
            shl phsb,#1       ' output bit0
            mov frqa, freq2   ' freq2 ==  long $10_00_00_00 
            mov frqa,#0       ' halt
            
            djnz ctr2,#wbyte   'ctr2 == 512, loop to transmit next byte ! 
    
    '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    
    
    
    
  • Greg PGreg P Posts: 58
    edited 2007-11-10 23:18
    I was able to answer one of my questions pretty quickly ..... When I inserted the 'OR' statement, in an attempt to intentionally "mess up" writing to a file I had failed to consider that the writeblock is used for ALL writes to the drive .... including the FAT table !! When I popped the uSD card out, plugged it into its USB adapter, and into the PC you should have seen the long list of wildly named files which had been inadvertently generated due to my mischief. When playing with fire you may get burnt! I had to reformat the uSD card.
  • rokickirokicki Posts: 1,000
    edited 2007-11-11 02:00
    Wow, that's why I like the OLED-96; one platform; SD socket built in; just plug it in and go.
    Great job guys.

    On the writing to the SD card, there's a couple of things going on. One is the between-byte
    overhead; that adds up. Another is the between-block overhead, but that's not too bad.
    What really hurts is, while most writes go pretty quick (the SD card says 'done'), a small
    percentage of the writes incur a significant latency (almost certainly due to the need to
    erase another erase block, but I'm guessing at this.) Depending on how you do your timing
    and how many bytes you are writing, you may miss this altogether, or see it as though it
    were split among the blocks, or if you time each individual block for enough of them, you'll
    see quick quick quick quick ... for a bunch and then *slow* and then quick quick quick ...

    In my tests speeding up the actual bit-by-bit code doesn't make much of a difference for
    writing.

    For reading, on the other hand, improvement is still possible, but (I believe) it will take a bunch
    more instructions. If you completely unroll 32-bits worth of input, two instructions per bit
    (this is possible), you should get a reasonably good improvement. It's slightly tricky to do this
    but not too bad. But as I said, you'll probably need a bunch more instructions.

    Actually the best improvement on reading data to the display is probably just not using pread
    but instead writing another subroutine or something that will (essentially) DMA data from a
    file directly to a circular buffer (of probably at least 1K so you can keep a 512 byte block in
    flight). This way the Spin code that does the FAT16 stuff happens in parallel with the SD
    readblock stuff which happens in parallel with the video writing stuff, and no memory copy
    is needed (pread does a memory copy that slows things down).

    Definitely fun stuff. I hope Parallax adds an SD or microSD card slot to some version of one
    of their development boards; it really makes pretty amazing things possible, and it's just a
    ton of fun. (They might want to *call* it an MMC slot or something, though, so they don't
    have to pay royalties or whatnot; I'm not sure about the legalities.)
  • Greg PGreg P Posts: 58
    edited 2007-11-11 05:35
    That approach has worked incredibly well for one of our physics lab projects. The data acquistion cog would alternately write
    to two buffers while our modified serial cog continually monitored these same buffers for transmission as packets via a
    115.2kbaud serial link to a PC. The two buffers are defined by SPIN code and a method associated with the data acquisition
    cog is passed the address of these two buffers. Likewise, the serial cog has a method which accepts the two buffer addresses.
    The serial cog runs an 'application' that repeatedly monitors the two buffers 1st byte for non-zero content. The first byte of
    each buffer is used as the availability flag. Once the data cog has filled a buffer, only then does its write a non-zero entry
    into that buffer's first byte which also happens to be the number of bytes in the buffer that needs to be transmitted by the
    serial cog. The serial cog, upon detection of a non-zero 1st byte entry in the buffer preceeds to transmit it contents via the
    serial port. Upon completion it zeros, the buffer's 1st byte, signalling the data cog that the buffer is now available to store
    more data. The experiment may run for days at a time, continually streaming 10K bytes per second to a file in the PC.

    I have also double-buffered 11khz 8-bit audio data, reading from a USB flash drive (via USBwiz) to an audio pin with RC
    filter attached. I plugged the speaker output to the mic input on my notebook PC, then played my "Wizard of Oz" DVD
    using Audacity configured for RAW 11.025 Khz 8-bit signed audio recording. The resulting 60 Mbyte audio file was then
    copied to the USB flash drive. I could select any start time (in tenths of seconds) and any playtime (in tenths of seconds).
    The audio quality was excellent (at least for my old ears!) . I even had an old wired video editor remote control that could
    fast forward or rewind the audio. I would open the audio file with a USBwiz cog, then enform the sound cog of the address
    of two audio buffers. The sound cog, internally would then ask the USBwiz cog to fill the buffers with data. Instead of
    waiting in a forever loop until the "command" value returns to zero, these methods would send the command to the cog
    and immediately exit. The "calling" application (another cog) would then merely examine the buffers first byte to see
    when data actually arrived.

    I think the SPIN portion of the pread() method may need to integrated within the readblock code and placed within the cog
    SDSPI code. From my experience with the audio project, it was nice to be able to command the sound cog to start playing
    audio THEN go along my merry way and monitor for key presses on the remote control in SPIN. This frees the high-level
    spin code from the microsecond-fast tasks.

    I'm thinking I would like to open a file for reading, then inform my video or audio cog of the addresses of the double buffers
    and let them INTERNALLY call a cog-based Pread() like function. These cog 'applications' would have to have access
    to the hub addresses of all the variables that pread() in SPIN code currently manipulates so that everything stays current
    and up to date. The same could be done for Pwrite().

    Tonight I finally came across some excelllent source code for performing MultiBlock reads and writes. I hope to give
    them a test this coming week. My write rates (164 kbytes per second) are about half my read rates (316 kbytes per
    second). I still perhaps foolishly believe that it should be possible to double my write rate. My write code can spit
    out in about 0.8 mS a 512-byte block, yet 3.1 mS per block is the observed AVERAGE write rate. The existing code
    can, indeed, deliver the data in a timely manner, its just that the Flashing of memory internal to the SD card takes
    time. While the internal host processor of the SD card is waiting for this flash process completion it could be busy
    reading in more data for the next flash operation. As I understand the process, if I'm writing Multiblock, I instruct
    the flash drive to PRE-ERASE 'x' number of 512-byte BLOCKS, then I proceed to send block after block up to the
    number specified in the PRE-ERASE command. The SD card, if it has enough internal RAM is happy to accept as
    many blocks as it has RAM to store, hopefully waiting until an ERASE SECTOR-sized number of blocks have been received
    before initating the next flash operation. The SD card host processor can even skip it own normal internal erase
    operation which would otherwise accompany every BLOCK write as this task has already been done with the PRE-ERASE
    command. We will see, of course, if my crude understanding is correct !! Hopefully progress can be made in the near future.
  • rjo_rjo_ Posts: 1,825
    edited 2007-11-11 13:40
    Guys

    Another Amazing Thread[noparse]:)[/noparse]

    The 96-Prop has an 8Mhz crystal... using a 16x PLL that would give 128Mhz clkfreq (can we do this?) or a 16Mhz SPI(is this correct?)
  • Mike GreenMike Green Posts: 23,101
    edited 2007-11-11 15:29
    rjo_,
    This has come up before ... The Prop (any incarnation) will not work reliably at 128mHz. Some chips may and some may not.
    It depends mostly on temperature and power supply voltage.
Sign In or Register to comment.