To BYTE or LONG
pgbpsu
Posts: 460
I'm trying to write up a compression algorithm based on bit packing (squeezing as many unnecessary bits from my data as possible). I'll set aside 512-bytes of HUB memory into which I'll put the packed data. My question is should I declare the HUB memory space as a BYTE, WORD, or LONG? I've decided to break my data down into 4 possible sizes: 24-bit; 12-bit; 8-bit; and 4-bit. The 512-bytes in memory will essentially be treated like a 4096-BIT buffer. The first 24-bits will be the first value of my data, then next 2 bits will be the size code for the data immediately following, then comes the actual 24,12,8,or 4-bit sample, followed by a 2-bit size code, followed by a 24,12,8,or 4-bit sample, until I can no longer fit a full size_code + sample in the buffer. So I'll basically have a 2-bit size code followed by 24-bit, 12-bit, 8-bit or 4-bit data all through the buffer.
Another way to think of it is samples (26-bit, 14-bit, 10-bit, or 6-bit) all jammed next to one another in the 512-byte buffer with no empty spaces between them. The real trouble comes when part of one of my data types belongs in the lower part of a HUB aligned byte and the rest belongs in the next byte.
I see some difficult bookkeeping and a lot of shifting and ORing in my future, but I wonder if some of it might be lessened by choosing BYTE alignment over LONG.
Any thoughts would be appreciated.
Regards,
Peter
Another way to think of it is samples (26-bit, 14-bit, 10-bit, or 6-bit) all jammed next to one another in the 512-byte buffer with no empty spaces between them. The real trouble comes when part of one of my data types belongs in the lower part of a HUB aligned byte and the rest belongs in the next byte.
I see some difficult bookkeeping and a lot of shifting and ORing in my future, but I wonder if some of it might be lessened by choosing BYTE alignment over LONG.
Any thoughts would be appreciated.
Regards,
Peter
Comments
Thanks for your thoughts. I see exactly what you mean. It will be much easier to deal with longs over bytes since I'll have so many fewer boundaries to deal with. I suspect I'll do this in SPIN but, as you say, the LONG alignment will help things should I move it to ASM.
Thanks,
Peter
You could implement your packing routine as shown below.· The codes are packed into "shift_bits" one at a time, and then written to the memory buffer as bytes.· If you want to write them to memory as longs you would need to use two long variables·for "shift_bits".
I'm guessing you're encoding your data as differences, since you write the first value to the buffer and then use variable-length coding after that.· You might be able to get more compression if you use more efficient codes, such as Huffman codes.
Dave
DAT
· shift_cnt LONG 0
· shift_bits LONG 0
· outnum LONG 0
VAR
· BYTE outbuf[noparse][[/noparse]512]
PUB PackBits(code, nbits)
· shift_bits |= code << (32 - nbits - shift_cnt)
· shift_cnt += nbits
· REPEAT WHILE (shift_cnt => 8)
··· outbuf[noparse][[/noparse]outnum++] := shift_bits >> 24
··· shift_bits <<= 8
··· shift_cnt -= 8
··· IF (outnum => 512)
····· OutputBuffer(@outbuf, 512)
····· outnum := 0
Indeed I'm sending first (or second depending on my success) differences. I'll have a look at the Huffman codes.
Thanks for the code snippet. Quick question. Why declare shift_cnt, shift_bits, and outnum in the DAT section and not the VAR section?
Regards and thanks.
Peter
I haven't tried running this code so I don't know if it's correct, but I think it should work.
Dave·
Thanks for the quick response. I'm in the process of coding it up now with some real data to see how it performs. I'll find out quickly if it works, although I'm just about to quit for the day so it may have to wait until tomorrow. When I apply it to my complete dataset I'll have over 400_000 longs to apply it to so I'll need to loop over many, many 512-byte buffers and will need to reinitialize things.
Thanks,
Peter