Haar wavelet, data compression and image analysis
I have pretty well completed the B/W version of "Stupid Video capture" and made a new test bed on one of Terry Hitt's dongle.

Now I have found stories about Haar wavelet compression and the whole field of wavelet activity and image analysis techniques.
On of my biggest problems on the "Stupid Video Capture"http://forums.parallax.com/showthread.php?98516-stupid-video-capture&highlight=Stupid+video is the size of the files on the SD card, the lateest version made roughly 30Meg per minute.
So I am initially working on development of two Haar wavelet functions.
1 Encoding/Decoding the video in real time so the program uses 1/2 the memory as well as 1/2 the SD card space. (this already has some functionality but I am not happy with the display yet)
2 Multi-level version for encoding/decoding before and after writing/reading SD card data, too be used on both audio and video streams.( code kinda outlined but not really tested)
So ... I am hoping some one here will have experience/thoughts/code to help with this effort.
.... Perry
Now I have found stories about Haar wavelet compression and the whole field of wavelet activity and image analysis techniques.
On of my biggest problems on the "Stupid Video Capture"http://forums.parallax.com/showthread.php?98516-stupid-video-capture&highlight=Stupid+video is the size of the files on the SD card, the lateest version made roughly 30Meg per minute.
So I am initially working on development of two Haar wavelet functions.
1 Encoding/Decoding the video in real time so the program uses 1/2 the memory as well as 1/2 the SD card space. (this already has some functionality but I am not happy with the display yet)
2 Multi-level version for encoding/decoding before and after writing/reading SD card data, too be used on both audio and video streams.( code kinda outlined but not really tested)
So ... I am hoping some one here will have experience/thoughts/code to help with this effort.
.... Perry


Comments
I am working on a "video doorbell" or alarm device ,critter camera, interview camera, lots of possible uses. wanted 100 files on 1GB
real time is first level , want to use second method on columns
I found DPCM too complex for the video in.
Heres my realtime code, as you can see it still in flux
video in ................ has little leeway for changes in number of instructions
:adcline1 mov s_,phsb 'capture PHSA mov phsb,#0 test asm_temp,#1 wz, nr ' test for even/odd if_z mov l_,s_ ' get left if_nz mov r_,s_ ' get right mov a_,l_ add a_,r_ ' average ' shr a_,#1 ' a := (l + r)/2 mov d_,s_ ' subs d_,r_ 'difference d := l - r ' rol d_,#1 ' save 3bits + sign of d ' and a_,#$F0 and d_,#$0F or a_,d_ if_nz wrbyte a_,video_buf_ptr 'write sample back to Spin variable "sample" if_nz add video_buf_ptr,#1 djnz asm_temp,#:adcline1 JMP #OvrMainand for video out ....
:loop and numpixels,#1 wz, nr ' test for even/odd if_nz jmp #:loop2 RDBYTE pixel, pixptr ' read pixel from memory ' mov a,pixel 'i mov d,pixel 'i and pixel,#$F0 ' shr a,#1 and d,#$0F shl d,#28 sar d,#28 mov l,pixel subs l,d ' got left ADD l, #pwmlut ' add offset MOVS :src, l 'pixel ' use as index :src MOV FRQA, pwmlut+0 ' draw pixel nop 'ADD pixptr, #1 ' next pixel DJNZ numpixels, #:loop ' loop active :loop2 RDBYTE pixel, pixptr ' read pixel from memory ' mov a,pixel 'i mov d,pixel 'i and pixel,#$F0 ' shr a,#1 and d,#$0F shl d,#28 sar d,#28 mov r,pixel adds r,d ' got right ADD r, #pwmlut ' add offset MOVS :src1, r 'pixel ' use as index :src1 MOV FRQA, pwmlut+0 ' draw pixel ADD pixptr, #1 ' next pixel DJNZ numpixels, #:loop ' loop activeIt looks like your input loop takes 28*4 cycles per pair of pixels. You should unroll it so that you only do the operations for the left pixel on the first part of the loop, and only do the operations for the right pixels on the second half of the loop. This way you won't waste any cycles on the conditional execution instructions that aren't executed.
It looks like you plan on putting the sum in the upper 4 bits and the difference in the lower 4 bits. Statistically, the difference should be smaller than the sum, so it should take less bits. However, you would need to clip the difference so it doesn't overflow.
I think DPCM with a single-pixel predictor would work better. It would look something like the code I show below. This loop takes the same number of cycles as yours, and I had to add 6 NOPs to get the same timing. It clips the input to 8 bits, and clips the differences to 4 bits.
You could get this loop down to 24*4 cycles without too much work if that is desirable. The clipping to 255 probably isn't necessary since the loop takes less than 256 cycles, so it could probably be removed. You might be able to get the loop down to 20*4 cycles that way. Let me know if you have any questions.
Dave
mov sum,#0 ' Initialize sum to zero shr asm_temp,#1 ' Divide count by 2 adcline1 ' Process the left pixel mov sample,phsb ' capture PHSA mov phsb,#0 ' Reset PHSA min sample,#255 ' Limit to 8 bits mov diff1,sum ' Get current sum in diff sub diff1,sample ' Compute the difference mins diff1,#7 ' Limit the difference to +7 maxs diff1,minus8 ' Limit the difference to -8 add sum,diff1 ' Add the clipped difference back to the sum shl diff1,#4 ' Move to the upper 4 bits nop ' Room for one more instruction nop ' Room for one more instruction nop ' Room for one more instruction nop ' Room for one more instruction nop ' Room for one more instruction ' Process the right pixel mov sample,phsb ' capture PHSA mov phsb,#0 ' Reset PHSA min sample,#255 ' Limit to 8 bits mov diff2,sum ' Get current sum in diff sub diff2,sample ' Compute the difference mins diff2,#7 ' Limit the difference to +7 maxs diff2,minus8 ' Limit the difference to -8 add sum,diff2 ' Add the clipped difference back to the sum or diff1,diff2 ' Merge the two 4-bit values into one byte wrbyte diff1,video_buf_ptr ' Write to the video buffer add video_buf_ptr,#1 ' Increment the buffer pointer nop ' Room for one more instruction djnz asm_temp,#adcline1 ' Get another pair of samplesThanks for the great suggestions. helped me work out unrolling the code.
I should really be using waitcnts to time the loops but it is too tedious to adjust when you don't yet know how much code you need.
I was confusing DPCM with ADPCM when I mentioned complexity, perhaps ADPCM is doable?
You almost got the simple Haar algorithm in your analysis, actually the strategy is to take the average of the two values and some how encode the difference in the lower bits.
I like to do this initial coding in real time as you can see the quality of the compression immediately
the columns will be done as described above later(before/after SD access)
still seeing columnar differences, hoping to to get the output to be invisible to the eye
The problem is that there is a lot of redundancy between the sum coefficients of adjacent blocks that you're not taking advantage of. The DPCM technique does take advantage of this redundancy. You could improve the performance of the 2-point transform if you use DPCM on the adjacent sum coefficients, but it would work just as well if you just use DPCM on the raw samples.
If you really want to improve coding performance you could use a larger two-dimensional block size, such as 8x8 blocks. However, this will be much more complicated than what you are currently trying to do.