DAT section alignment compiler bug?
BradC
Posts: 2,601
I'm declaring a string in the DAT block using word rather than byte.
I'm doing this as the string is UTF-16 and I need each upper byte to be 0.
I would have assumed (yeah yeah, I know..) that the bytes would have been consistently right-aligned, however this is not the case.
Compile the below code with and without the extra "byte" commented and check the hex window.
Is this expected behaviour or am I doing something the compiler should barf on?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
I'm doing this as the string is UTF-16 and I need each upper byte to be 0.
I would have assumed (yeah yeah, I know..) that the bytes would have been consistently right-aligned, however this is not the case.
Compile the below code with and without the extra "byte" commented and check the hex window.
Is this expected behaviour or am I doing something the compiler should barf on?
PUB Bletch ' To satisfy the compiler DAT 'If the following "byte" is uncommented the "word" string is left aligned ' whereas if it is commented the "word" string is right aligned ' Look for the disappearing "0" between the "F" and "r" in the hex output 'byte 0 byte 0 byte 2 byte "F" word "red Bloggs was 'ere" byte 0
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
Comments
If you put some unaligned code... it remains unaligned, you have to use org.
word "Fred"
I get 46 00 72 00 65 00 64 00
which is expected since the propellor is little endian. If you want a string
00 46 00 72 ..
then
word 0
byte 0
byte word "f"
byte word "r"
byte word "e"
byte word "d"
works, the first word 0 forces alignment, then the string is correct
Post Edited (Timmoore) : 6/14/2008 7:12:58 AM GMT
Which is what I'm trying to do..
In my case
byte 01
byte 02
byte 03
word "Fred"
gives me 01 02 03 00 46 00 72 00 65 00 64 00
^^
Where did that extra 0 come from?
Is the compiler padding the start of the "word" definition to align it?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
I need to be using
byte 01
byte 02
byte 03
byte word "Fred"
<edit>
Nope that did not work either.. back to the drawing board
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
byte word "Fred"
It stores the 1st character ("F") as a word, then reverts to byte for the rest of the string.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
byte word "F", word "r", word "e", word "d"
word 0
byte 0
prior to any string definitions I do..
I wonder if the behaviour of
byte word "Fred"
generating 46 00 72 65 64 is correct though.
I would have thought 46 00 72 00 65 00 64 00 should have been the result
Welcome input from Parallax on the problem.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
No. Again, it's adding padding to word-align the first byte, "F", which is what you've requested it to do with the byte word pragma.
Don't forget, the Prop uses little endian order, which means the least significant byte of a word or long comes first in memory.
-Phil
Post Edited (Phil Pilgrim (PhiPi)) : 6/14/2008 8:03:20 AM GMT
I was using
byte "F",0,"r",0,"e",0,"d",0
previously, but thought there must be a better way. It's a pain to edit 40 character strings when they are broken up like that.
Thanks for the input. Appreciated.
For now I'll just lose the 3 bytes forcing alignment with word/byte. It's not like I'm pushed for space.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
Ahh, but it's not padding anything. It does precisely what I want it to do for the 1st character in that it stores it as a word, it then stores all the following characters byte packed behind the word. I can play with the alignment any way I want and it still stores the structure precisely the same way. So it's not aligning _anything_
byte 1
byte 2
byte 3
byte word "Fred"
Whether the structure is odd or even aligned to begin with, it _always_ comes out as 01 02 03 46 00 72 65 64
If it were aligning or padding the result would change depending on the initial alignment.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
This is the same as:
I'm not sure that makes sense, though. Since "Fred" is a single syntactic element, treating all characters the same would seem more logical.
But why not use your original?
It does what you want without the awkward notation. You just lose a byte for the padding. To minimize losses such as this, declare all your longs first, then the words, and lastly the bytes. That way, there's no wasted space.
-Phil
The main reason is it's being packed into a structure that is being streamed. So, I can't just re-sort the data elements, and extraneous zero's being inserted in the wrong place tend to confuse the parser at the other end [noparse];)[/noparse]
As each structure only has one "word" string in them at the moment I've shifted the alignment of the entire block so the structure is not unexpectedly padded, but it's not what I'd call an optimal solution.
It'll be interesting to get the authoritative Parallax take on it anyway.
Cheers for the follow-up
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
I know it's not elegant and unwanted extra coding, but what about a standard byte string followed by padding and having the bytes expanded to words at run time start-up ? Something like ...
That might not work because $ and labels have to be long aligned if I recall correctly.
Given that the strings aren't likely to change often it may be the easiest option to use the annoying "f",0,"r",0,"e",0,"d",0. I agree though it would be nice if there were some directive to do this; 'text byte', 'text word', and 'text long' or that all characters of a string used the same sizing as the first ( though changing that might break some existing code ).