Shop OBEX P1 Docs P2 Docs Learn Events
DAT section alignment compiler bug? — Parallax Forums

DAT section alignment compiler bug?

BradCBradC Posts: 2,601
edited 2008-06-14 11:57 in Propeller 1
I'm declaring a string in the DAT block using word rather than byte.
I'm doing this as the string is UTF-16 and I need each upper byte to be 0.
I would have assumed (yeah yeah, I know..) that the bytes would have been consistently right-aligned, however this is not the case.

Compile the below code with and without the extra "byte" commented and check the hex window.

Is this expected behaviour or am I doing something the compiler should barf on?

PUB Bletch
  ' To satisfy the compiler
DAT

'If the following "byte" is uncommented the "word" string is left aligned
' whereas if it is commented the "word" string is right aligned
' Look for the disappearing "0" between the "F" and "r" in the hex output

'byte    0


byte    0
byte    2
byte    "F"
word    "red Bloggs was 'ere"
byte    0

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!

Comments

  • AleAle Posts: 2,363
    edited 2008-06-14 07:00
    I did something similar, but the DAT with the code comes afterward. The org directive will align the code/data for that section if I'm not mistaken.
    If you put some unaligned code... it remains unaligned, you have to use org.
  • TimmooreTimmoore Posts: 1,031
    edited 2008-06-14 07:05
    What are you trying to do? If I do
    word "Fred"
    I get 46 00 72 00 65 00 64 00
    which is expected since the propellor is little endian. If you want a string
    00 46 00 72 ..
    then
    word 0
    byte 0
    byte word "f"
    byte word "r"
    byte word "e"
    byte word "d"
    works, the first word 0 forces alignment, then the string is correct

    Post Edited (Timmoore) : 6/14/2008 7:12:58 AM GMT
  • BradCBradC Posts: 2,601
    edited 2008-06-14 07:30
    Timmoore said...
    What are you trying to do? If I do
    word "Fred"
    I get 46 00 72 00 65 00 64 00
    which is expected since the propellor is little endian. If you want a string

    Which is what I'm trying to do..
    In my case
    byte 01
    byte 02
    byte 03
    word "Fred"

    gives me 01 02 03 00 46 00 72 00 65 00 64 00
    ^^
    Where did that extra 0 come from?

    Is the compiler padding the start of the "word" definition to align it?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Pull my finger!
  • BradCBradC Posts: 2,601
    edited 2008-06-14 07:34
    OK, I RTFM..

    I need to be using
    byte 01
    byte 02
    byte 03
    byte word "Fred"

    <edit>
    Nope that did not work either.. back to the drawing board

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Pull my finger!
  • BradCBradC Posts: 2,601
    edited 2008-06-14 07:40
    when using
    byte word "Fred"

    It stores the 1st character ("F") as a word, then reverts to byte for the rest of the string.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Pull my finger!
  • TimmooreTimmoore Posts: 1,031
    edited 2008-06-14 07:55
    I found the same thing. The simplest form I found that does what you want is
    byte word "F", word "r", word "e", word "d"
  • BradCBradC Posts: 2,601
    edited 2008-06-14 07:55
    Ok, so I can work around the issue by forcing alignment using
    word 0
    byte 0

    prior to any string definitions I do..
    I wonder if the behaviour of

    byte word "Fred"

    generating 46 00 72 65 64 is correct though.
    I would have thought 46 00 72 00 65 00 64 00 should have been the result

    Welcome input from Parallax on the problem.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Pull my finger!
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-06-14 07:56
    BradC said...
    Where did that extra 0 come from? Is the compiler padding the start of the "word" definition to align it?
    Yes:

      01: byte 01
      02: byte 02
      03: byte 03
      00: '<padding>
    0046: word "F"
    0072: word "r"
    0065: word "e"
    0064: word "d"
    
    
    


    BradC said...
    It stores the 1st character ("F") as a word, then reverts to byte for the rest of the string.
    No. Again, it's adding padding to word-align the first byte, "F", which is what you've requested it to do with the byte word pragma.

    Don't forget, the Prop uses little endian order, which means the least significant byte of a word or long comes first in memory.

    -Phil

    Post Edited (Phil Pilgrim (PhiPi)) : 6/14/2008 8:03:20 AM GMT
  • BradCBradC Posts: 2,601
    edited 2008-06-14 07:57
    Timmoore said...
    I found the same thing. The simplest form I found that does what you want is
    byte word "F", word "r", word "e", word "d"

    I was using
    byte "F",0,"r",0,"e",0,"d",0
    previously, but thought there must be a better way. It's a pain to edit 40 character strings when they are broken up like that.

    Thanks for the input. Appreciated.
    For now I'll just lose the 3 bytes forcing alignment with word/byte. It's not like I'm pushed for space.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Pull my finger!
  • TimmooreTimmoore Posts: 1,031
    edited 2008-06-14 07:57
    If I remember utf16 correctly you will also need a byte 0 on the end to get 2 0s for terminating the string
  • BradCBradC Posts: 2,601
    edited 2008-06-14 08:02
    Phil Pilgrim (PhiPi) said...
    BradC said...
    It stores the 1st character ("F") as a word, then reverts to byte for the rest of the string.
    No. Again, it's adding padding to word-align the first byte, "F", which is what you've requested it to do with the byte word pragma.

    -Phil

    Ahh, but it's not padding anything. It does precisely what I want it to do for the 1st character in that it stores it as a word, it then stores all the following characters byte packed behind the word. I can play with the alignment any way I want and it still stores the structure precisely the same way. So it's not aligning _anything_

    byte 1
    byte 2
    byte 3
    byte word "Fred"

    Whether the structure is odd or even aligned to begin with, it _always_ comes out as 01 02 03 46 00 72 65 64
    If it were aligning or padding the result would change depending on the initial alignment.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Pull my finger!
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-06-14 08:26
    My mistake. I thought you meant the 00 came before the 46, so I had to go back to the manual myself. Apparently it's being interpreted as:

        byte word "F", "red"
    
    
    


    This is the same as:

        byte word "F"
        byte "red"
    
    
    


    I'm not sure that makes sense, though. Since "Fred" is a single syntactic element, treating all characters the same would seem more logical.

    But why not use your original?

        word "Fred"
    
    
    


    It does what you want without the awkward notation. You just lose a byte for the padding. To minimize losses such as this, declare all your longs first, then the words, and lastly the bytes. That way, there's no wasted space.

    -Phil
  • BradCBradC Posts: 2,601
    edited 2008-06-14 08:37
    Phil Pilgrim (PhiPi) said...

    This is the same as:

        byte word "F"
        byte "red"
    
    
    


    I'm not sure that makes sense, though. Since "Fred" is a single syntactic element, treating all characters the same would seem more logical.

    But why not use your original?

        word "Fred"
    
    
    


    It does what you want without the awkward notation. You just lose a byte for the padding. To minimize losses such as this, declare all your longs first, then the words, and lastly the bytes. That way, there's no wasted space.

    -Phil

    The main reason is it's being packed into a structure that is being streamed. So, I can't just re-sort the data elements, and extraneous zero's being inserted in the wrong place tend to confuse the parser at the other end [noparse];)[/noparse]

    As each structure only has one "word" string in them at the moment I've shifted the alignment of the entire block so the structure is not unexpectedly padded, but it's not what I'd call an optimal solution.

    It'll be interesting to get the authoritative Parallax take on it anyway.
    Cheers for the follow-up

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Pull my finger!
  • RaymanRayman Posts: 14,817
    edited 2008-06-14 11:03
    I've had a lot of trouble with this too... This is the same as trying to create a mixed type structure... It just doesn't work the way you might hope/expect. I was trying to create a contiguous header for bmp files with different sized elements, but it didn't work out...
  • hippyhippy Posts: 1,981
    edited 2008-06-14 11:57
    I'm guessing these are the annoying USB description strings.

    I know it's not elegant and unwanted extra coding, but what about a standard byte string followed by padding and having the bytes expanded to words at run time start-up ? Something like ...

    MyDesc byte "jim"
           byte 0[noparse][[/noparse]$-MyDesc]
    
    



    That might not work because $ and labels have to be long aligned if I recall correctly.

    Given that the strings aren't likely to change often it may be the easiest option to use the annoying "f",0,"r",0,"e",0,"d",0. I agree though it would be nice if there were some directive to do this; 'text byte', 'text word', and 'text long' or that all characters of a string used the same sizing as the first ( though changing that might break some existing code ).
Sign In or Register to comment.