Haar wavelet, data compression and image analysis

Perry · 2011-07-11 07:50

I have pretty well completed the B/W version of "Stupid Video capture" and made a new test bed on one of Terry Hitt's dongle.
attachment.php?attachmentid=82884&d=1310394235

attachment.php?attachmentid=82884&d=1310394235

Now I have found stories about Haar wavelet compression and the whole field of wavelet activity and image analysis techniques.

On of my biggest problems on the "Stupid Video Capture"http://forums.parallax.com/showthread.php?98516-stupid-video-capture&highlight=Stupid+video is the size of the files on the SD card, the lateest version made roughly 30Meg per minute.

So I am initially working on development of two Haar wavelet functions.

1 Encoding/Decoding the video in real time so the program uses 1/2 the memory as well as 1/2 the SD card space. (this already has some functionality but I am not happy with the display yet)

2 Multi-level version for encoding/decoding before and after writing/reading SD card data, too be used on both audio and video streams.( code kinda outlined but not really tested)

So ... I am hoping some one here will have experience/thoughts/code to help with this effort.

.... Perry

Dave Hein · 2011-07-11 08:06

Without compression you could record about an hour of video on a 2GB SD card. Do you need more than that? You should be able to get much more than a 2:1 compression with the Haar transform, especially if you do interframe compression. You may also want to consider the Hadamard transform or just simple DPCM.

Perry · 2011-07-11 08:36

Dave Hein wrote: »

Without compression you could record about an hour of video on a 2GB SD card. Do you need more than that? You should be able to get much more than a 2:1 compression with the Haar transform, especially if you do interframe compression. You may also want to consider the Hadamard transform or just simple DPCM.

I am working on a "video doorbell" or alarm device ,critter camera, interview camera, lots of possible uses. wanted 100 files on 1GB

real time is first level , want to use second method on columns

I found DPCM too complex for the video in.

Heres my realtime code, as you can see it still in flux

video in ................ has little leeway for changes in number of instructions

:adcline1 
                mov    s_,phsb        'capture PHSA 
                mov   phsb,#0

                test     asm_temp,#1 wz, nr ' test for even/odd

         if_z   mov l_,s_ ' get left
         if_nz  mov r_,s_ ' get right
                mov   a_,l_
                add   a_,r_     ' average
'               shr   a_,#1     '  a := (l + r)/2
                mov   d_,s_
'                subs  d_,r_     'difference d := l - r
'                rol   d_,#1     ' save 3bits + sign of d
'                and a_,#$F0
                and   d_,#$0F
                or    a_,d_
       if_nz  wrbyte  a_,video_buf_ptr 'write sample back to Spin variable "sample"

        if_nz add     video_buf_ptr,#1       
              djnz      asm_temp,#:adcline1


              JMP       #OvrMain

and for video out ....

:loop         and     numpixels,#1 wz, nr ' test for even/odd
   if_nz      jmp #:loop2

       RDBYTE   pixel, pixptr                      ' read pixel from memory
'               mov a,pixel 'i
               mov d,pixel 'i
               and pixel,#$F0
'               shr a,#1
               and d,#$0F
               shl d,#28
               sar d,#28
                          mov      l,pixel
                          subs     l,d ' got left
                          ADD      l, #pwmlut   ' add offset
                          MOVS     :src, l 'pixel                 ' use as index
:src                      MOV      FRQA, pwmlut+0                 ' draw pixel

                          nop 'ADD      pixptr, #1                         ' next pixel
                          DJNZ     numpixels, #:loop               ' loop active
:loop2
              RDBYTE   pixel, pixptr                      ' read pixel from memory
'             mov a,pixel 'i
              mov d,pixel 'i
               and pixel,#$F0
'               shr a,#1
               and d,#$0F
               shl d,#28
               sar d,#28

                          mov      r,pixel
                          adds     r,d ' got right
                          ADD      r, #pwmlut   ' add offset
                          MOVS     :src1, r 'pixel                        ' use as index
:src1                     MOV      FRQA, pwmlut+0                   ' draw pixel

                          ADD      pixptr, #1                         ' next pixel
                          DJNZ     numpixels, #:loop                  ' loop active

Dave Hein · 2011-07-11 10:23

Perry,

It looks like your input loop takes 28*4 cycles per pair of pixels. You should unroll it so that you only do the operations for the left pixel on the first part of the loop, and only do the operations for the right pixels on the second half of the loop. This way you won't waste any cycles on the conditional execution instructions that aren't executed.

It looks like you plan on putting the sum in the upper 4 bits and the difference in the lower 4 bits. Statistically, the difference should be smaller than the sum, so it should take less bits. However, you would need to clip the difference so it doesn't overflow.

I think DPCM with a single-pixel predictor would work better. It would look something like the code I show below. This loop takes the same number of cycles as yours, and I had to add 6 NOPs to get the same timing. It clips the input to 8 bits, and clips the differences to 4 bits.

You could get this loop down to 24*4 cycles without too much work if that is desirable. The clipping to 255 probably isn't necessary since the loop takes less than 256 cycles, so it could probably be removed. You might be able to get the loop down to 20*4 cycles that way. Let me know if you have any questions.

Dave

                mov     sum,#0                  ' Initialize sum to zero
                shr     asm_temp,#1             ' Divide count by 2
adcline1
                ' Process the left pixel 
                mov     sample,phsb             ' capture PHSA 
                mov     phsb,#0                 ' Reset PHSA
                min     sample,#255             ' Limit to 8 bits
                
                mov     diff1,sum               ' Get current sum in diff
                sub     diff1,sample            ' Compute the difference
                mins    diff1,#7                ' Limit the difference to +7
                maxs    diff1,minus8            ' Limit the difference to -8
                add     sum,diff1               ' Add the clipped difference back to the sum
                shl     diff1,#4                ' Move to the upper 4 bits

                nop                             ' Room for one more instruction
                nop                             ' Room for one more instruction
                nop                             ' Room for one more instruction
                nop                             ' Room for one more instruction
                nop                             ' Room for one more instruction

                ' Process the right pixel
                mov     sample,phsb             ' capture PHSA 
                mov     phsb,#0                 ' Reset PHSA
                min     sample,#255             ' Limit to 8 bits

                mov     diff2,sum               ' Get current sum in diff
                sub     diff2,sample            ' Compute the difference
                mins    diff2,#7                ' Limit the difference to +7
                maxs    diff2,minus8            ' Limit the difference to -8
                add     sum,diff2               ' Add the clipped difference back to the sum

                or      diff1,diff2             ' Merge the two 4-bit values into one byte
                wrbyte  diff1,video_buf_ptr     ' Write to the video buffer
                add     video_buf_ptr,#1        ' Increment the buffer pointer
                nop                             ' Room for one more instruction
                djnz    asm_temp,#adcline1      ' Get another pair of samples

Perry · 2011-07-11 17:48

Dave Hein wrote: »

Perry,

It looks like your input loop takes 28*4 cycles per pair of pixels. You should unroll it so that you only do the operations for the left pixel on the first part of the loop, and only do the operations for the right pixels on the second half of the loop. This way you won't waste any cycles on the conditional execution instructions that aren't executed.

It looks like you plan on putting the sum in the upper 4 bits and the difference in the lower 4 bits. Statistically, the difference should be smaller than the sum, so it should take less bits. However, you would need to clip the difference so it doesn't overflow.

I think DPCM with a single-pixel predictor would work better. It would look something like the code I show below. This loop takes the same number of cycles as yours, and I had to add 6 NOPs to get the same timing. It clips the input to 8 bits, and clips the differences to 4 bits.

You could get this loop down to 24*4 cycles without too much work if that is desirable. The clipping to 255 probably isn't necessary since the loop takes less than 256 cycles, so it could probably be removed. You might be able to get the loop down to 20*4 cycles that way. Let me know if you have any questions.

Dave

Thanks for the great suggestions. helped me work out unrolling the code.

I should really be using waitcnts to time the loops but it is too tedious to adjust when you don't yet know how much code you need.

I was confusing DPCM with ADPCM when I mentioned complexity, perhaps ADPCM is doable?

You almost got the simple Haar algorithm in your analysis, actually the strategy is to take the average of the two values and some how encode the difference in the lower bits.

I like to do this initial coding in real time as you can see the quality of the compression immediately
the columns will be done as described above later(before/after SD access)

still seeing columnar differences, hoping to to get the output to be invisible to the eye

Dave Hein · 2011-07-11 18:21

As I tried to say earlier, DPCM will work much better than the Haar technique that you are trying to do. Since you are only doing a 2-point transform I wouldn't even call it a Haar transform. All 2-point orthogonal transforms are identical whether it's the Haar, Hadamard, DCT, DFT, etc. It's basically the sum and difference of a pair of values.

The problem is that there is a lot of redundancy between the sum coefficients of adjacent blocks that you're not taking advantage of. The DPCM technique does take advantage of this redundancy. You could improve the performance of the 2-point transform if you use DPCM on the adjacent sum coefficients, but it would work just as well if you just use DPCM on the raw samples.

If you really want to improve coding performance you could use a larger two-dimensional block size, such as 8x8 blocks. However, this will be much more complicated than what you are currently trying to do.

Haar wavelet, data compression and image analysis

Comments