To BYTE or LONG

pgbpsu · 2009-12-03 19:27

I'm trying to write up a compression algorithm based on bit packing (squeezing as many unnecessary bits from my data as possible). I'll set aside 512-bytes of HUB memory into which I'll put the packed data. My question is should I declare the HUB memory space as a BYTE, WORD, or LONG? I've decided to break my data down into 4 possible sizes: 24-bit; 12-bit; 8-bit; and 4-bit. The 512-bytes in memory will essentially be treated like a 4096-BIT buffer. The first 24-bits will be the first value of my data, then next 2 bits will be the size code for the data immediately following, then comes the actual 24,12,8,or 4-bit sample, followed by a 2-bit size code, followed by a 24,12,8,or 4-bit sample, until I can no longer fit a full size_code + sample in the buffer. So I'll basically have a 2-bit size code followed by 24-bit, 12-bit, 8-bit or 4-bit data all through the buffer.

Another way to think of it is samples (26-bit, 14-bit, 10-bit, or 6-bit) all jammed next to one another in the 512-byte buffer with no empty spaces between them. The real trouble comes when part of one of my data types belongs in the lower part of a HUB aligned byte and the rest belongs in the next byte.

I see some difficult bookkeeping and a lot of shifting and ORing in my future, but I wonder if some of it might be lessened by choosing BYTE alignment over LONG.

Any thoughts would be appreciated.
Regards,
Peter

Mike Green · 2009-12-03 20:17

Using LONGs and doing the bit packing on 32 bit values has a little speed advantage, particularly since some of the samples will be nearly 32 bits long. For the longer values, you're guaranteed to have to process them in several pieces if you use BYTE alignment and handle the values a byte at a time. With LONGs, some of the time you'll be able to handle an entire sample. If you decide to use assembly for speed, you'll be better off using LONGs since assembly uses all 32 bit values except when values are moved to and from hub memory.

pgbpsu · 2009-12-03 20:24

Hi Mike-

Thanks for your thoughts. I see exactly what you mean. It will be much easier to deal with longs over bytes since I'll have so many fewer boundaries to deal with. I suspect I'll do this in SPIN but, as you say, the LONG alignment will help things should I move it to ASM.

Thanks,
Peter

Dave Hein · 2009-12-03 21:08

Peter,

You could implement your packing routine as shown below.· The codes are packed into "shift_bits" one at a time, and then written to the memory buffer as bytes.· If you want to write them to memory as longs you would need to use two long variables·for "shift_bits".

I'm guessing you're encoding your data as differences, since you write the first value to the buffer and then use variable-length coding after that.· You might be able to get more compression if you use more efficient codes, such as Huffman codes.

Dave

DAT
· shift_cnt LONG 0
· shift_bits LONG 0
· outnum LONG 0

VAR
· BYTE outbuf[noparse][[/noparse]512]

PUB PackBits(code, nbits)
· shift_bits |= code << (32 - nbits - shift_cnt)
· shift_cnt += nbits
· REPEAT WHILE (shift_cnt => 8)
··· outbuf[noparse][[/noparse]outnum++] := shift_bits >> 24
··· shift_bits <<= 8
··· shift_cnt -= 8
··· IF (outnum => 512)
····· OutputBuffer(@outbuf, 512)
····· outnum := 0

pgbpsu · 2009-12-03 21:41

Hi Dave-

Indeed I'm sending first (or second depending on my success) differences. I'll have a look at the Huffman codes.

Thanks for the code snippet. Quick question. Why declare shift_cnt, shift_bits, and outnum in the DAT section and not the VAR section?

Regards and thanks.
Peter

Dave Hein · 2009-12-03 21:53

I put them in the DAT section so they would be initialized.· outnum and shift_cnt should be initialized to zero.· shift_bits doesn't need to be initialized.· I am under the impression that variables in the VAR section are not initilized.· Of course, you·could·put them in the VAR section and add a few lines of codes to initialize them.· This would be necessary if you wanted to restart the PackBits function.

I haven't tried running this code so I don't know if it's correct, but I think it should work.

Dave·

pgbpsu · 2009-12-03 22:12

Hi Dave-

Thanks for the quick response. I'm in the process of coding it up now with some real data to see how it performs. I'll find out quickly if it works, although I'm just about to quit for the day so it may have to wait until tomorrow. When I apply it to my complete dataset I'll have over 400_000 longs to apply it to so I'll need to loop over many, many 512-byte buffers and will need to reinitialize things.

Thanks,
Peter

To BYTE or LONG

Comments