Shop OBEX P1 Docs P2 Docs Learn Events
Haar wavelet, data compression and image analysis — Parallax Forums

Haar wavelet, data compression and image analysis

PerryPerry Posts: 253
edited 2011-07-11 18:21 in Propeller 1
I have pretty well completed the B/W version of "Stupid Video capture" and made a new test bed on one of Terry Hitt's dongle.
attachment.php?attachmentid=82884&d=1310394235
Now I have found stories about Haar wavelet compression and the whole field of wavelet activity and image analysis techniques.

On of my biggest problems on the "Stupid Video Capture"http://forums.parallax.com/showthread.php?98516-stupid-video-capture&highlight=Stupid+video is the size of the files on the SD card, the lateest version made roughly 30Meg per minute.

So I am initially working on development of two Haar wavelet functions.

1 Encoding/Decoding the video in real time so the program uses 1/2 the memory as well as 1/2 the SD card space. (this already has some functionality but I am not happy with the display yet)

2 Multi-level version for encoding/decoding before and after writing/reading SD card data, too be used on both audio and video streams.( code kinda outlined but not really tested)

So ... I am hoping some one here will have experience/thoughts/code to help with this effort.

.... Perry
1024 x 768 - 114K

Comments

  • Dave HeinDave Hein Posts: 6,347
    edited 2011-07-11 08:06
    Without compression you could record about an hour of video on a 2GB SD card. Do you need more than that? You should be able to get much more than a 2:1 compression with the Haar transform, especially if you do interframe compression. You may also want to consider the Hadamard transform or just simple DPCM.
  • PerryPerry Posts: 253
    edited 2011-07-11 08:36
    Dave Hein wrote: »
    Without compression you could record about an hour of video on a 2GB SD card. Do you need more than that? You should be able to get much more than a 2:1 compression with the Haar transform, especially if you do interframe compression. You may also want to consider the Hadamard transform or just simple DPCM.

    I am working on a "video doorbell" or alarm device ,critter camera, interview camera, lots of possible uses. wanted 100 files on 1GB

    real time is first level , want to use second method on columns

    I found DPCM too complex for the video in.

    Heres my realtime code, as you can see it still in flux

    video in ................ has little leeway for changes in number of instructions
    :adcline1 
                    mov    s_,phsb        'capture PHSA 
                    mov   phsb,#0
    
                    test     asm_temp,#1 wz, nr ' test for even/odd
    
             if_z   mov l_,s_ ' get left
             if_nz  mov r_,s_ ' get right
                    mov   a_,l_
                    add   a_,r_     ' average
    '               shr   a_,#1     '  a := (l + r)/2
                    mov   d_,s_
    '                subs  d_,r_     'difference d := l - r
    '                rol   d_,#1     ' save 3bits + sign of d
    '                and a_,#$F0
                    and   d_,#$0F
                    or    a_,d_
           if_nz  wrbyte  a_,video_buf_ptr 'write sample back to Spin variable "sample"
    
            if_nz add     video_buf_ptr,#1       
                  djnz      asm_temp,#:adcline1
    
    
                  JMP       #OvrMain
    
    

    and for video out ....
    :loop         and     numpixels,#1 wz, nr ' test for even/odd
       if_nz      jmp #:loop2
    
           RDBYTE   pixel, pixptr                      ' read pixel from memory
    '               mov a,pixel 'i
                   mov d,pixel 'i
                   and pixel,#$F0
    '               shr a,#1
                   and d,#$0F
                   shl d,#28
                   sar d,#28
                              mov      l,pixel
                              subs     l,d ' got left
                              ADD      l, #pwmlut   ' add offset
                              MOVS     :src, l 'pixel                 ' use as index
    :src                      MOV      FRQA, pwmlut+0                 ' draw pixel
    
                              nop 'ADD      pixptr, #1                         ' next pixel
                              DJNZ     numpixels, #:loop               ' loop active
    :loop2
                  RDBYTE   pixel, pixptr                      ' read pixel from memory
    '             mov a,pixel 'i
                  mov d,pixel 'i
                   and pixel,#$F0
    '               shr a,#1
                   and d,#$0F
                   shl d,#28
                   sar d,#28
    
                              mov      r,pixel
                              adds     r,d ' got right
                              ADD      r, #pwmlut   ' add offset
                              MOVS     :src1, r 'pixel                        ' use as index
    :src1                     MOV      FRQA, pwmlut+0                   ' draw pixel
    
                              ADD      pixptr, #1                         ' next pixel
                              DJNZ     numpixels, #:loop                  ' loop active
    
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-07-11 10:23
    Perry,

    It looks like your input loop takes 28*4 cycles per pair of pixels. You should unroll it so that you only do the operations for the left pixel on the first part of the loop, and only do the operations for the right pixels on the second half of the loop. This way you won't waste any cycles on the conditional execution instructions that aren't executed.

    It looks like you plan on putting the sum in the upper 4 bits and the difference in the lower 4 bits. Statistically, the difference should be smaller than the sum, so it should take less bits. However, you would need to clip the difference so it doesn't overflow.

    I think DPCM with a single-pixel predictor would work better. It would look something like the code I show below. This loop takes the same number of cycles as yours, and I had to add 6 NOPs to get the same timing. It clips the input to 8 bits, and clips the differences to 4 bits.

    You could get this loop down to 24*4 cycles without too much work if that is desirable. The clipping to 255 probably isn't necessary since the loop takes less than 256 cycles, so it could probably be removed. You might be able to get the loop down to 20*4 cycles that way. Let me know if you have any questions.

    Dave
                    mov     sum,#0                  ' Initialize sum to zero
                    shr     asm_temp,#1             ' Divide count by 2
    adcline1
                    ' Process the left pixel 
                    mov     sample,phsb             ' capture PHSA 
                    mov     phsb,#0                 ' Reset PHSA
                    min     sample,#255             ' Limit to 8 bits
                    
                    mov     diff1,sum               ' Get current sum in diff
                    sub     diff1,sample            ' Compute the difference
                    mins    diff1,#7                ' Limit the difference to +7
                    maxs    diff1,minus8            ' Limit the difference to -8
                    add     sum,diff1               ' Add the clipped difference back to the sum
                    shl     diff1,#4                ' Move to the upper 4 bits
    
                    nop                             ' Room for one more instruction
                    nop                             ' Room for one more instruction
                    nop                             ' Room for one more instruction
                    nop                             ' Room for one more instruction
                    nop                             ' Room for one more instruction
    
                    ' Process the right pixel
                    mov     sample,phsb             ' capture PHSA 
                    mov     phsb,#0                 ' Reset PHSA
                    min     sample,#255             ' Limit to 8 bits
    
                    mov     diff2,sum               ' Get current sum in diff
                    sub     diff2,sample            ' Compute the difference
                    mins    diff2,#7                ' Limit the difference to +7
                    maxs    diff2,minus8            ' Limit the difference to -8
                    add     sum,diff2               ' Add the clipped difference back to the sum
    
                    or      diff1,diff2             ' Merge the two 4-bit values into one byte
                    wrbyte  diff1,video_buf_ptr     ' Write to the video buffer
                    add     video_buf_ptr,#1        ' Increment the buffer pointer
                    nop                             ' Room for one more instruction
                    djnz    asm_temp,#adcline1      ' Get another pair of samples
    
  • PerryPerry Posts: 253
    edited 2011-07-11 17:48
    Dave Hein wrote: »
    Perry,

    It looks like your input loop takes 28*4 cycles per pair of pixels. You should unroll it so that you only do the operations for the left pixel on the first part of the loop, and only do the operations for the right pixels on the second half of the loop. This way you won't waste any cycles on the conditional execution instructions that aren't executed.

    It looks like you plan on putting the sum in the upper 4 bits and the difference in the lower 4 bits. Statistically, the difference should be smaller than the sum, so it should take less bits. However, you would need to clip the difference so it doesn't overflow.

    I think DPCM with a single-pixel predictor would work better. It would look something like the code I show below. This loop takes the same number of cycles as yours, and I had to add 6 NOPs to get the same timing. It clips the input to 8 bits, and clips the differences to 4 bits.

    You could get this loop down to 24*4 cycles without too much work if that is desirable. The clipping to 255 probably isn't necessary since the loop takes less than 256 cycles, so it could probably be removed. You might be able to get the loop down to 20*4 cycles that way. Let me know if you have any questions.

    Dave

    Thanks for the great suggestions. helped me work out unrolling the code.

    I should really be using waitcnts to time the loops but it is too tedious to adjust when you don't yet know how much code you need.

    I was confusing DPCM with ADPCM when I mentioned complexity, perhaps ADPCM is doable?

    You almost got the simple Haar algorithm in your analysis, actually the strategy is to take the average of the two values and some how encode the difference in the lower bits.

    I like to do this initial coding in real time as you can see the quality of the compression immediately
    the columns will be done as described above later(before/after SD access)

    still seeing columnar differences, hoping to to get the output to be invisible to the eye
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-07-11 18:21
    As I tried to say earlier, DPCM will work much better than the Haar technique that you are trying to do. Since you are only doing a 2-point transform I wouldn't even call it a Haar transform. All 2-point orthogonal transforms are identical whether it's the Haar, Hadamard, DCT, DFT, etc. It's basically the sum and difference of a pair of values.

    The problem is that there is a lot of redundancy between the sum coefficients of adjacent blocks that you're not taking advantage of. The DPCM technique does take advantage of this redundancy. You could improve the performance of the 2-point transform if you use DPCM on the adjacent sum coefficients, but it would work just as well if you just use DPCM on the raw samples.

    If you really want to improve coding performance you could use a larger two-dimensional block size, such as 8x8 blocks. However, this will be much more complicated than what you are currently trying to do.
Sign In or Register to comment.