DAT section alignment compiler bug?

BradC · 2008-06-14 06:36

I'm declaring a string in the DAT block using word rather than byte.
I'm doing this as the string is UTF-16 and I need each upper byte to be 0.
I would have assumed (yeah yeah, I know..) that the bytes would have been consistently right-aligned, however this is not the case.

Compile the below code with and without the extra "byte" commented and check the hex window.

Is this expected behaviour or am I doing something the compiler should barf on?

PUB Bletch
  ' To satisfy the compiler
DAT

'If the following "byte" is uncommented the "word" string is left aligned
' whereas if it is commented the "word" string is right aligned
' Look for the disappearing "0" between the "F" and "r" in the hex output

'byte    0


byte    0
byte    2
byte    "F"
word    "red Bloggs was 'ere"
byte    0

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!

Ale · 2008-06-14 07:00

I did something similar, but the DAT with the code comes afterward. The org directive will align the code/data for that section if I'm not mistaken.
If you put some unaligned code... it remains unaligned, you have to use org.

Timmoore · 2008-06-14 07:05

What are you trying to do? If I do
word "Fred"
I get 46 00 72 00 65 00 64 00
which is expected since the propellor is little endian. If you want a string
00 46 00 72 ..
then
word 0
byte 0
byte word "f"
byte word "r"
byte word "e"
byte word "d"
works, the first word 0 forces alignment, then the string is correct

Post Edited (Timmoore) : 6/14/2008 7:12:58 AM GMT

BradC · 2008-06-14 07:30

Timmoore said...
What are you trying to do? If I do
word "Fred"
I get 46 00 72 00 65 00 64 00
which is expected since the propellor is little endian. If you want a string

Which is what I'm trying to do..
In my case
byte 01
byte 02
byte 03
word "Fred"

gives me 01 02 03 00 46 00 72 00 65 00 64 00

^^
Where did that extra 0 come from?

Is the compiler padding the start of the "word" definition to align it?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!

BradC · 2008-06-14 07:34

OK, I RTFM..

I need to be using
byte 01
byte 02
byte 03
byte word "Fred"

<edit>
Nope that did not work either.. back to the drawing board

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!

BradC · 2008-06-14 07:40

when using
byte word "Fred"

It stores the 1st character ("F") as a word, then reverts to byte for the rest of the string.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!

Timmoore · 2008-06-14 07:55

I found the same thing. The simplest form I found that does what you want is
byte word "F", word "r", word "e", word "d"

BradC · 2008-06-14 07:55

Ok, so I can work around the issue by forcing alignment using
word 0
byte 0

prior to any string definitions I do..
I wonder if the behaviour of

byte word "Fred"

generating 46 00 72 65 64 is correct though.
I would have thought 46 00 72 00 65 00 64 00 should have been the result

Welcome input from Parallax on the problem.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!

Phil Pilgrim (PhiPi) · 2008-06-14 07:56

BradC said...
Where did that extra 0 come from? Is the compiler padding the start of the "word" definition to align it?

Yes:

  01: byte 01
  02: byte 02
  03: byte 03
  00: '<padding>
0046: word "F"
0072: word "r"
0065: word "e"
0064: word "d"

BradC said...
It stores the 1st character ("F") as a word, then reverts to byte for the rest of the string.

No. Again, it's adding padding to word-align the first byte, "F", which is what you've requested it to do with the byte word pragma.

Don't forget, the Prop uses little endian order, which means the least significant byte of a word or long comes first in memory.

-Phil

Post Edited (Phil Pilgrim (PhiPi)) : 6/14/2008 8:03:20 AM GMT

BradC · 2008-06-14 07:57

Timmoore said...
I found the same thing. The simplest form I found that does what you want is
byte word "F", word "r", word "e", word "d"

I was using
byte "F",0,"r",0,"e",0,"d",0
previously, but thought there must be a better way. It's a pain to edit 40 character strings when they are broken up like that.

Thanks for the input. Appreciated.
For now I'll just lose the 3 bytes forcing alignment with word/byte. It's not like I'm pushed for space.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!

Timmoore · 2008-06-14 07:57

If I remember utf16 correctly you will also need a byte 0 on the end to get 2 0s for terminating the string

BradC · 2008-06-14 08:02

Phil Pilgrim (PhiPi) said...

BradC said...
It stores the 1st character ("F") as a word, then reverts to byte for the rest of the string.

No. Again, it's adding padding to word-align the first byte, "F", which is what you've requested it to do with the byte word pragma.

-Phil

Ahh, but it's not padding anything. It does precisely what I want it to do for the 1st character in that it stores it as a word, it then stores all the following characters byte packed behind the word. I can play with the alignment any way I want and it still stores the structure precisely the same way. So it's not aligning _anything_

byte 1
byte 2
byte 3
byte word "Fred"

Whether the structure is odd or even aligned to begin with, it _always_ comes out as 01 02 03 46 00 72 65 64
If it were aligning or padding the result would change depending on the initial alignment.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!

Phil Pilgrim (PhiPi) · 2008-06-14 08:26

My mistake. I thought you meant the 00 came before the 46, so I had to go back to the manual myself. Apparently it's being interpreted as:

    byte word "F", "red"

This is the same as:

    byte word "F"
    byte "red"

I'm not sure that makes sense, though. Since "Fred" is a single syntactic element, treating all characters the same would seem more logical.

But why not use your original?

    word "Fred"

It does what you want without the awkward notation. You just lose a byte for the padding. To minimize losses such as this, declare all your longs first, then the words, and lastly the bytes. That way, there's no wasted space.

-Phil

BradC · 2008-06-14 08:37

Phil Pilgrim (PhiPi) said...

This is the same as:
    byte word "F"
    byte "red"
I'm not sure that makes sense, though. Since "Fred" is a single syntactic element, treating all characters the same would seem more logical.

But why not use your original?
    word "Fred"
It does what you want without the awkward notation. You just lose a byte for the padding. To minimize losses such as this, declare all your longs first, then the words, and lastly the bytes. That way, there's no wasted space.

-Phil

The main reason is it's being packed into a structure that is being streamed. So, I can't just re-sort the data elements, and extraneous zero's being inserted in the wrong place tend to confuse the parser at the other end [noparse];)[/noparse]

As each structure only has one "word" string in them at the moment I've shifted the alignment of the entire block so the structure is not unexpectedly padded, but it's not what I'd call an optimal solution.

It'll be interesting to get the authoritative Parallax take on it anyway.
Cheers for the follow-up

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!

Rayman · 2008-06-14 11:03

I've had a lot of trouble with this too... This is the same as trying to create a mixed type structure... It just doesn't work the way you might hope/expect. I was trying to create a contiguous header for bmp files with different sized elements, but it didn't work out...

hippy · 2008-06-14 11:57

I'm guessing these are the annoying USB description strings.

I know it's not elegant and unwanted extra coding, but what about a standard byte string followed by padding and having the bytes expanded to words at run time start-up ? Something like ...

MyDesc byte "jim"
       byte 0[noparse][[/noparse]$-MyDesc]

That might not work because $ and labels have to be long aligned if I recall correctly.

Given that the strings aren't likely to change often it may be the easiest option to use the annoying "f",0,"r",0,"e",0,"d",0. I agree though it would be nice if there were some directive to do this; 'text byte', 'text word', and 'text long' or that all characters of a string used the same sizing as the first ( though changing that might break some existing code ).

DAT section alignment compiler bug?

Comments