Aligned characters in the DAT section: some notable oddities

Phil Pilgrim (PhiPi) · 2008-01-28 19:05

Monday musings...

In a program's DAT section, if I enter

        word long "abc", "def"

the assembler produces: "a", 0, 0, 0, "b", 0, "c", 0, "d", 0, "e", 0, "f", 0. I find it odd that the default alignment gets applied to each character rather than each group of characters, and odder still that the size modifier gets applied only to the first character of a group. It appears that the notational cohesion apparent in "abc" extends no further than the source code.

I wonder if this was intended, as it leads to a possible problem. Suppose I want a sequence of characters to begin in memory on a long boundary. I try this:

        long byte "abcd"

The assembler flags the byte modifier with an error: "Size override must be larger." Hmm. Catch 22. I could certainly throw in a long 0 ahead of my string to align it, but that's awkward and wasteful. I could also make sure to declare my string after any other long declaration. But what if I had a whole bunch of strings whose first characters I want aligned on long boundaries? Must I then count the characters and pad them out by hand?

The solution, as I've discovered, is to place an empty long ahead of each string, thus:

        long
        byte "abc"
        long
        byte "efg"

A bit awkward, but it works. Nonetheless, wouldn't the following notation have been more concise (but for the first behavior noted above)?

        long byte "abc", "efg"

Granted, treating strings cohesively would have required a way to include non-ASCII bytes within a group (which is what escape sequences are for). The string() notation might also have worked, but it generates an extra character and doesn't seem to be allowed in the DAT section anyway...

-Phil

Mike Green · 2008-01-28 19:59

The "string()" notation wouldn't help because its value is the address of the string, not the string itself.

You just have to explicitly pack the characters as in:
long "a" | "b"<<8 | "c"<<16 | "d"<<24, "e" | "f"<<8 | 0 << 16

This is inconvenient, but gets the job done.

I believe the current behavior is due to a string being treated as a list of character values, so "abc" is the same as "a","b","c" and the rest of the behavior follows from that.

deSilva · 2008-01-28 20:45

Confusing as it seems (and contrary to the manual!), the FIRST item is the "size and default alignment" (and the SECOND the "alignment override"). At least this is what can be seen in the dump.

The error message is confusing and most likely wrong.

As to the philosphy of "string denotations", especially in the context of LOOKUP, LOOKDOWN, and CASE there is an extensive discussion mainly based on Hippy's postings in another thread. It comes to the point Mike has brought it to above..

Edit. I noticed that the deleted statements above are just nonsense.

Post Edited (deSilva) : 1/29/2008 1:41:02 AM GMT

Phil Pilgrim (PhiPi) · 2008-01-28 22:20

I do remember that discussison which, as I recall, was in the context of Spin. I don't know that this is a huge deal. The longish alternating form (third example in my post) works, but is wanting in terms of convenient notation. I s'pose it's just one more reason to yearn for preprocessor hooks, so all these little annoyances can be patched up and prettified to suit individual tastes.

-Phil

Aligned characters in the DAT section: some notable oddities

Comments