[Solved?] Inline assembly: Immediate instructions broken?

jac_goudsmit · 2013-03-02 07:38

Dave asked in a different forum about why I was working on Binutils. Here's why.

Here's an easy test:

/* Compile in COG mode */
int main(int argc, char **argv) {

   __asm__ __volatile__ (

   "x:"

   "     mov r8, x;"

   "     mov r8, #x;"

   "     jmp x;"

   "     jmp #x;"

   :

   :

   : "r8"

   );

 }

Obviously this won't produce a program that you can run to make it do anything useful. But when you compile this and then do an objdump -D a.out, you get:

...
00000058 <x>:
  58:   1610bca0                        mov     20 <r8>, 58 <x>
  5c:   5810fca0                        mov     20 <r8>, #88        <------ WRONG: I expected to see 1610fca0
  60:   16003c5c                        jmp     58 <x>
  64:   16007c5c                        jmp     #58 <x>
...

If I have to believe the output of objdump, it botches up the source parameter of any instruction (except JMP or CALL) that uses immediate mode to get the address (not the contents) of a cog memory variable.

This happens regardless of whether I put a ".cog_ram" directive at the top (which is a silly directive in the first place but that's a whole different matter; I'll write a post on that next).

Have you posted an issue for it on Google Code? I hadn't heard about this until you posted this message. I also didn't realize you were working on binutils. We should probably coordinate our efforts.

I'm not sure yet whether the loader somehow fixes the problem, and it kinda seems unbelievable that GCC can work at all with this, that's why I haven't filed an issue yet. I do think I brought it up here before.

===Jac

UPDATE: Apparently this problem is now solved (currently in the p2test branch only?): the .cog_ram directive was replaced with a .pasm directive, and an @ operator was added. Using cog addresses in constants (e.g. LONG directive) also works and disassembly with objdump is more readable.

ersmith · 2013-03-02 08:56

It's more likely that objdump is broken than that the assembler is. Even more likely is that there's a misunderstanding, and I think that's the case here. The value of "x" is 0x58 (you can see that in the listing). So "mov r8, #x" is going to put 0x58 into r8. That seems correct.

The difficulty is determining what the value of "x" should be. In code like:

x    long 0
y    long 0

what are the values of "x", "y", and "y-x"? If they are all labels in HUB RAM, it's clear that "y-x" must be 4; gcc won't work unless it is. And in general binutils is consistent about this; all labels are byte addresses, and so "y" is always "x+4" in the case above, even when the labels are in COG memory (because in most cases we don't know whether or not they are going in COG memory until the final link step).

This is different from PASM. In PASM, y is always "x+1", because the only unit of addressing PASM supports is the 32 bit word (gas is built to support 8 bit byte addressing). This can cause a great deal of confusion when porting code from PASM to gas.

The ".cog_ram" directive tells gas to divide addresses by 4 before use, which helps a lot. But there's no magic bullet to this issue. If you always divide addresses by 4, you'll get things wrong for HUB addresses (or for symbolic constants).

Eric

jac_goudsmit · 2013-03-03 18:42

ersmith wrote: »

It's more likely that objdump is broken than that the assembler is. Even more likely is that there's a misunderstanding, and I think that's the case here. The value of "x" is 0x58 (you can see that in the listing). So "mov r8, #x" is going to put 0x58 into r8. That seems correct.

Unfortunately, it seems that objdump is right and the assembler is broken. I added "LONG $A7A7A7A7" to the start to make the code easier to find in the binary output, and I used propeller-loader -s to generate a binary file. This is the result:

As you can see, the MOV instruction that uses the immediate value gets it wrong: it inserts 0x53 instead of 0x17.

The difficulty is determining what the value of "x" should be. In code like:
x    long 0
y    long 0
what are the values of "x", "y", and "y-x"? If they are all labels in HUB RAM, it's clear that "y-x" must be 4; gcc won't work unless it is. And in general binutils is consistent about this; all labels are byte addresses, and so "y" is always "x+4" in the case above, even when the labels are in COG memory (because in most cases we don't know whether or not they are going in COG memory until the final link step).

I understand that.

But also, when two modules have to end up in the same cog, the linker adds the length of the first module (in bytes, no doubt) to the addresses in the second module. That's wrong too.

Really, the addresses shouldn't be divided by four until the link stage, or maybe even the load stage.

If you always divide addresses by 4, you'll get things wrong for HUB addresses (or for symbolic constants).

Symbolic constants that are not related to a stored value (e.g. created with .set) should not be changed at all.

Symbols (with addresses) that are used in a regular instruction such as MOV or JMP (regardless of whether immediate mode is used) should always be a cog address because they always refers to a cog location. The Propeller will interpret the instruction as having cog locations, so there's no ambiguity.

Only for the RDBYTE/RDWORD/RDLONG/WRBYTE/WRWORD/WRLONG in immediate mode only should a hub address be encoded in the source field. In non-immedate mode, they refer to a cog location so the source should be encoded as a cog address. No ambiguity there either.

ONLY WHEN GENERATING DATA THAT REFERS TO ANOTHER ADDRESS (such as "LONG buffer"), there is ambiguity, and there is currently no way to force the address to be either a cog address or hub address, so I propose that in that case we introduce a new operator @ which would force the assembler to use a hub address instead of a cog address.

I think it's also clear that cog addresses aren't necessarily the same as hub addresses divided by four: just like the Spin assembler, the Gnu assembler should keep track of cog addresses and hub addresses separately and it should be possible to use the ORG directive to change the assumed cog address for the next instruction (and obviously ORG should not change the hub address).

But we can talk about the .cog_ram vs .hub_ram operation and the @ operator later. And we can declare having multiple modules in one cog as unsupported for now.

But I'd really like you to agree that assembly of immediate mode instructions (except JMP) is broken.

===Jac

ersmith · 2013-03-04 09:26

jac_goudsmit wrote: »

But I'd really like you to agree that assembly of immediate mode instructions (except JMP) is broken.

I don't know what the source code is for your most recent example, so I don't know why it's putting 0x53 into the register. But in your earlier example, the code

   mov r8, #x

is placing the value 0x58 into r8, and the value of the symbol "x" is 0x58. I disagree that this is wrong.

There is a legitimate argument that the value of the symbol "x" should, in fact, be 0x16 (an offset in longs rather than in bytes). But at the moment the tools use byte addresses everywhere, except where overridden with .cog_ram, and given that "x" is 0x58, I think that "mov r8, #x" should put 0x58 into r8. Doing otherwise is likely to result in problems elsewhere in the tool chain.

Eric

jac_goudsmit · 2013-03-04 22:13

Fair enough, here's that source code again.

int main(int argc, char **argv) {
   __asm__ __volatile__ (
   ".cog_ram;"
   "     long $A7A7A7A7;"
   "x:"
   "     mov r8, x;"
   "     mov r8, #x;"
   "     jmp x;"
   "     jmp #x;"
   :
   :
   : "r8"
   );
 }

I compiled this in cog mode in SimpleIDE. Even though there is a .cog_ram directive in place, it doesn't convert the address.

The same thing happens with a number of different instructions in my real program, such as MOVS and MOVD. All destinations are apparently calculated correctly, and so are sources, as long as I don't use a cog address as source in immediate mode.

Source addresses in immediate mode are only calculated correctly when they're used in a JMP or CALL.

After a lot of browsing through the GAS sources, it looks to me like the problem starts here, in parse_src( ) in tc-propeller.c, where the source parameter is parsed:

      if (format != PROPELLER_OPERAND_JMP && format != PROPELLER_OPERAND_JMPRET && format != PROPELLER_OPERAND_MOVA)
    {
      integer_reloc = 1;
    }

The integer_reloc flag is used later in the function to either set the reloc.type of the fixup to BFD_RELOC_PROPELLER_SRC_IMM or BFD_RELOC_PROPELLER_SRC, further down in the same function:

    case O_symbol:
    case O_add:
    case O_subtract:
      if (pcrel_reloc)
    {
      operand->reloc.type = compress ? BFD_RELOC_8_PCREL : BFD_RELOC_PROPELLER_PCREL10;
      operand->reloc.pc_rel = 1;
    }
      else
    {
      operand->reloc.type = integer_reloc ? BFD_RELOC_PROPELLER_SRC_IMM : BFD_RELOC_PROPELLER_SRC;
      operand->reloc.pc_rel = 0;
    }
      break;

Then in md_apply_fix (not shown here), the fixup is shifted right by 2, if the reloc.type is BFD_RELOC_PROPELLER_SRC but not if it's BFD_RELOC_PROPELLER_SRC_IMM.

At first glance this makes sense: mov x, #123 should obviously not be shifted to the right and mov x, location should be shifted to the right.

But mov x, #location DOES need to be shifted even though it's immediate.

The fix seems to be to change the reloc.type assignment in the else-section of the last code fragment to simply:

operand->reloc.type = BFD_RELOC_PROPELLER_SRC_SRC;

This makes it so that the source parameter basically is calculated the same as for a JMP instruction, which is what we're after.

I'm thinking there must have been an important reason why the code was written this way, and I don't know the history of it of course. So I'm too worried that the fix is going to introduce some nasty side effects, and that's basically what I was still digging for.

Any help would be appreciated, my project with lots of self-modifying code is basically stuck until it works.

===Jac

ersmith · 2013-03-05 10:47

I don't think there's any good solution to this problem. In general

  mov r0, #x

should put the value of x into r0, not the value of x/4 -- otherwise symbolic constants, and HUB addresses won't work. Your proposal to prefix HUB addresses with @ would fix the latter issue, but I think it would require a lot of changes to the existing compiler, tools, and libraries.

We have an alternate form of mov, "mova", (for "MOVe cog Address") which does do the division by 4 on immediate operands. It might also make sense to have MOVD, MOVS, and MOVI act like MOVA/JMP/JMPRET and divide immediates by 4. The only real downside of this would be breaking programs which use those instructions to manipulate integers rather than addresses, but there are problably few of those (and at the moment the compiler does not use those instructions).

Eric

jac_goudsmit · 2013-03-05 16:25

ersmith wrote: »
I don't think there's any good solution to this problem. In general
  mov r0, #x
should put the value of x into r0, not the value of x/4 -- otherwise symbolic constants, and HUB addresses won't work.

I understand the dilemma: either break compatibility with existing Spin ASM source, or making the code hard to read/write/debug because of the implicit divide-by-4's. Plus there is existing PropGCC code that may depend on how it works now (though I have a hard time imagining how someone would write MOV r0, #location and expect #location to be a cog location: you'd be 3 bits short and the instruction will execute in a different way from what you expect).

Adding pseudo-instructions like MOVSA, MOVDA and MOVIA would not go far enough either: there really would have to be two alternatives for any instruction that takes an immediate value such as ADD, SUB, CMP...

Either way, the way the compiler is doing it now (where some instructions such as JMP and JMPRET handle #address different from others) is inconsistent. And even in situations like MOV r0, #location+1 it's difficult or impossible to figure out what the intention of the code is.

A quick-and-dirty solution would probably be to add a directive (say, .immref_cog) that lets the assembler divide any of the following immediate instructions divide the address by 4 (for all instructions, not just JMP), and another directive (say, .immref_normal) to revert back to "normal" behavior. By default, the flag would be set to .immref_normal which would be okay for the compiler and libraries, but will break almost all self-modifying code.

An alternative (that doesn't require a directive and is probably so easy to program that I can do it myself, but still breaks compatibility) is to parse a different symbol to mean "immediate cog-location", for example '&'. An instruction such as MOV r0, &location loads the cog address of "location" in r0, it shouldn't cause any parsing conflicts, it won't break the libraries and its meaning should be pretty obvious to any C/C++ programmers.

Of course the directives LONG and WORD should also be able to take these directives, and the C/C++ compiler will need a new parameter restriction to indicate that values are not just immediate, but immediate and in cog memory.

Your proposal to prefix HUB addresses with @ would fix the latter issue, but I think it would require a lot of changes to the existing compiler, tools, and libraries.

That kinda surprises me. Do the libraries and compiler really use instructions like MOV r0, #x to indicate a hub address? I should try and write some C code that would make the compiler use an immediate instruction and see how it handles that.

We have an alternate form of mov, "mova", (for "MOVe cog Address") which does do the division by 4 on immediate operands. It might also make sense to have MOVD, MOVS, and MOVI act like MOVA/JMP/JMPRET and divide immediates by 4. The only real downside of this would be breaking programs which use those instructions to manipulate integers rather than addresses

As I said, it's probably not hard to add a MOVSA/MOVDA/MOVIA and it won't break backward compatibility with existing PropGCC source (obviously it's already incompatible with Spin ASM). But it would require the addition of a lot of instructions.

===Jac

jac_goudsmit · 2013-03-15 11:34

David Betz wrote: »

I guess I need to go back and read your messages on this topic. I can't remember why the hack of doing #foo>>2 won't work or even using the .cog_ram directive. Does your assembly have to be inline or can you put it all in a separate .S file? Will that even help?

The address that ends up in the code when I do e.g. movs bar, #foo is not simply the actual address of foo multiplied by 4. The .cog_ram instruction helps but still doesn't fix "bar long foo".

I haven't tried using a .S file because I want this code to be part of some C code and that's simply not possible when using a .S file: As far as I understand, it's not possible to compile two modules and link them in such a way that they end up in the same cog (is there?). I think because the cog addresses are calculated early (in GAS instead of LD) the relocations aren't going to work, because the fixups will end up getting divided by 4 before they get an offset (which is byte-address based) applied at link time. I'm thinking we need to really implement my proposal of adding an '@" and '&' operator, or we need to move all the hub-to-cog address calculations to the linker. I prefer the latter for compatibility with spin-pasm. Simply put: the assembler should emit records in the object file that indicate whether a fixup (source or target operand) needs to be a cog address or hub address, and the linker calculates the cog address (if applicable) by counting the number of LONGs from the last entered ORG instruction. Obviously an ORG 0 is implied at the start of each segment (or whatever they're called in LD speak). By the way, currently the ORG directive has no specific implementation for the Propeller so it doesn't work the same way as the spin-asm ORG works. This can be a problem for drivers that want to e.g. copy themselves to the end of cog memory. But that's a whole different matter, so let's concentrate on the task at hand :-)

To see my code, go to https://github.com/jacgoudsmit/Propeddle/blob/master/Software/p6502control.cogc, the inline assembly code starts at line 312. The first few lines are debug lines that I added to troubleshoot the problem, but the code that actually belongs there and doesn't compile correctly starts at line 345. It uses a table (constructed with LONG directives) to patch code in the main loop and of course it needs to modify a lot of instructions using movs and movd. All of those are assembled wrongly if they have an immediate operand: they should be assembled the same way as jmp #foo but they aren't. Note, I haven't debugged the code so there may be functional mistakes, but it compiles without warnings or errors.

Thanks!

===Jac

David Betz · 2013-03-15 11:50

jac_goudsmit wrote: »

As far as I understand, it's not possible to compile two modules and link them in such a way that they end up in the same cog (is there?).

I don't have time for a complete reply right now but yes, it is possible to link multiple files into a single COG image.

pedward · 2013-03-15 12:17

Perhaps I'm not understanding the problem fully, but from a compiler standpoint it seems there are 2 simple rules:

If src operand is an immediate literal (integer, etc) then don't divide that token by 4
If src operand is an immediate symbol (name) then take contents and divide by 4 so the final value matches what's in the actual memory image of the COG.

I suspect there isn't a good way to do immediate expressions with the linker's help, so expressions such as #foo+1 wouldn't work (because you don't know the value of #foo until the linker is done). I think the proper way of solving this is to convert immediate expressions into meta code:

mov foo, #bar+1

converted

mov foo, #bar
add foo, #1

and

mov foo, #bar*10

converted

mov foo, #bar
shl foo, #3
add foo, #bar
add foo, #bar

I would assume there is a helper macro that will generate the multiplication component.

David Betz · 2013-03-15 13:13

jac_goudsmit wrote: »

To see my code, go to https://github.com/jacgoudsmit/Propeddle/blob/master/Software/p6502control.cogc, the inline assembly code starts at line 312. The first few lines are debug lines that I added to troubleshoot the problem, but the code that actually belongs there and doesn't compile correctly starts at line 345. It uses a table (constructed with LONG directives) to patch code in the main loop and of course it needs to modify a lot of instructions using movs and movd. All of those are assembled wrongly if they have an immediate operand: they should be assembled the same way as jmp #foo but they aren't. Note, I haven't debugged the code so there may be functional mistakes, but it compiles without warnings or errors.

Thanks!

===Jac

Wow! That's the longest inline assembly code I've ever seen. I think you could solve this problem by simply moving it all into a .S file that you link with your main C code. I can try doing that later for you if you'd like.

ersmith · 2013-03-15 13:50

jac_goudsmit wrote: »

The address that ends up in the code when I do e.g. movs bar, #foo is not simply the actual address of foo multiplied by 4.

Could you explain a bit further? As far as I've been able to see in my testing, the value placed in bar by the expression:

   movs bar, #foo

is the same value reported by all of the other binutils tools as the value of "foo". Note that this is the byte address of foo, so it probably is not what you really wanted (if foo is a COG label, that is... if foo is a symbolic constant then perhaps it is what you wanted!)

The .cog_ram instruction helps but still doesn't fix "bar long foo".

I'm not sure any solution would fix that. Again, the problem is that the value of "foo" is a byte address, rather than a long address. There are some workarounds (e.g. putting a "cogbegin" label right at the start of your COG code, and then using "bar long (foo-cogbegin)/4".

I haven't tried using a .S file because I want this code to be part of some C code and that's simply not possible when using a .S file: As far as I understand, it's not possible to compile two modules and link them in such a way that they end up in the same cog (is there?).

Yes, you can link any number of modules together. There are several ways to do this. I'm not sure how exactly you're starting the code in other COGs. If it's as LMM C threads you can just link normally. If you want stand-alone COG C programs, the easiest way is to do a partial link (using the -r option to the linker) and then use objcopy --rename-section and --localize-text to prevent symbol conflicts between the COG C and the LMM C. The demo/toggle/cog_c_toggle has an example of this. The demo happens to only use one .c file, but you could have put multiple .c and .S files in the toggle_fw.o dependency.

If the GAS byte addressing is causing you headaches (and I can certainly understand why it might!) another option is is to write PASM code in a .spin file, compile that with bstc or spin2cpp to binary code in a .dat file, and then launch that from the LMM C. The demos/toggle/pasm_toggle illustrates how to do this.

Finally, spin2cpp has a --gas option which converts Spin assembly to GAS assembly. It's still a bit experimental, though.

Eric

pedward · 2013-03-15 14:38

The root cause of this problem is the architecture of the original SPIN compiler and the PASM language.

The compiler and linker are one and thus all of the intermediate symbolic info is available at the linker stage. In the standard GNU toolchain with the divorced compiler, assembler, and linker, the information needs to be recovered or a workaround found.

I propose a workaround that is more inline with how the C universe tends to work, convert incompatible mnemonics into additional discrete instructions that can do the same effective thing.

It doesn't help that Chip is probably the biggest offender of this compile time cheat...

jac_goudsmit · 2013-03-15 14:39

pedward wrote: »

Perhaps I'm not understanding the problem fully, but from a compiler standpoint it seems there are 2 simple rules:

If src operand is an immediate literal (integer, etc) then don't divide that token by 4
If src operand is an immediate symbol (name) then take contents and divide by 4 so the final value matches what's in the actual memory image of the COG.

Yes that's basically it, in a nutshell. However, you're forgetting that (unless I'm very much mistaken) it's not as easy as simply multiplying/dividing the value of the address by 4: the compiler and/or linker should take into account that if two C modules are linked into the same cog, all the fixups to addresses in the second module are increased by the size of the first module.

In other words: if you have a module "X" that's x bytes long, and you have a module "Y" that gets linked into the executable behind module "X" but has to be loaded in the same cog, all the relocations in module "Y" are processed by the linker, and the linker adds x to all relocations.

So even if you have .cog_ram set correctly and you did everything else right, an instruction

movs bar, #foo

or even

movs bar, #foo >> 2

will not be translated correctly because the actual address that needs to be encoded in the instruction should be

movs bar, #(foo + x) >> 2

and -- while I still don't fully have my finger on the problem -- I suspect that the encoded instruction ends up being something like

movs bar, #(foo >> 2) + x

.

As an illustration: If you look at the hex dump that I pasted from the small code sample in post number 3 above, you can see that in that case, the assembler generated 0x53 instead of 0x17 which doesn't make sense at all: 0x17 << 2 is 0x5C, not 0x53. I don't know where the number 0x53 comes from.

I suspect there isn't a good way to do immediate expressions with the linker's help, so expressions such as #foo+1 wouldn't work (because you don't know the value of #foo until the linker is done).

I'm not sure if I'm right, but I'm pretty sure the expressions in arguments aren't calculated until the assembly is all the way done. The question to the Assembler is however: what do you mean with "#foo + 1"? Is it the hub address of foo, increased by one? Or is it the cog address of foo, increased by one? In an instructions such as JMP or MOV, this is usually pretty obvious: the Propeller interprets the argument as a cog address so clearly when converting the address into object code, it should generate the address of foo (possibly offset by the size of other modules in the same cog), divided by four, plus one, in that order. Similarly, in the case of

rdlong foo, #bar + 1

there is no ambiguity: bar is a hub location so it should be translated into the hub location (which is only known at link time) plus one. Clearly the linker has to resolve the issue in both cases and I'm pretty sure it will, except that I think it probably gets it wrong in the case where an address is a cog address which is already divided by 4 at assembly time (not link time).

In the case of a

foo long bar

there is ambiguity, where (as I mentioned before), we should probably make it possible (or necessary?) to use e.g. @ for hub addresses and & for cog addresses. Again, correct me if I'm wrong but I think in spin-asm this case is assembled as a cog address and you have to use Spin to initialize variables in DAT sections with hub addresses, but obviously that's not an option in C or C++. It's a real dilemma!

I think the proper way of solving this is to convert immediate expressions into meta code:

Converting an instruction to multiple instructions in the way you describe is not an option for an assembler: all instructions (except the ones that are clearly macros such as the special LMM instructions) should always compile to a single instruction. Besides, it's not necessary to compile an instruction such as

mov foo, #bar + 1

to multiple instructions: both #bar and 1 are known values before execution begins, so the instruction can be translated to one instruction.

===Jac

jac_goudsmit · 2013-03-15 14:52

David Betz wrote: »

Wow! That's the longest inline assembly code I've ever seen. I think you could solve this problem by simply moving it all into a .S file that you link with your main C code. I can try doing that later for you if you'd like.

Heh, I didn't intend for it to be that long, at first. The original main loop is less then 20 instructions (I have to do the work for one clock pulse at 1MHz within the time allotted, i.e. 20 standard instruction times of 4 cycles on an 80MHz propeller). But then I started added functionality and wanted to reuse the same loop to do other things, so enter all that self-modifying code.

Thanks for the offer, but I can rewrite it into a .S file myself. I might do that anyway because of other reasons too. For example, I ran into the fixed limit of 30 parameters for inline assembly and had to program some constant values as globals to work around it. Also, the C preprocessor is one of the reasons why the PropGCC toolchain is so interesting in the first place: if I want to add some debug code in there, I can just put #ifdef DEBUG / #endif around it, which is not an option in the Propeller tool.

But rewriting the code in a .S file doesn't solve the problem that even for short fragments of code such as the sample I put in the first posts, it's pretty much impossible to write self-modifying inline assembler code unless this bug is fixed. And I'm not even sure if it will work when I use a .S file (actually I'm pretty pessimistic about that).

Thanks for taking a look!

===Jac

jac_goudsmit · 2013-03-15 14:59

pedward wrote: »

The root cause of this problem is the architecture of the original SPIN compiler and the PASM language.

I agree.

I propose a workaround that is more inline with how the C universe tends to work, convert incompatible mnemonics into additional discrete instructions that can do the same effective thing.

As I mentioned, there should be a one-on-one relationship between instructions in the source and instructions in the object. If the assembler starts randomly inserting instructions to calculate addresses at runtime, I won't be able to trust it to generate coding that sticks to strict time limits. If my code has to absolutely positively be able to execute within 80 cycles and the 20 instructions that I write have timings that add up to exactly that, the assembler should not add extra instructions to work around this problem because it would break my code!

I'm pretty sure that Binutils allows storing and communicating machine-dependent information such as whether a fixup is intended to be cog-relative or hub-relative, and the Propeller specific linker can look at this information and recalculate the relocations in a different way.

===Jac

jac_goudsmit · 2013-03-15 15:17

ersmith wrote: »
Could you explain a bit further? As far as I've been able to see in my testing, the value placed in bar by the expression:
   movs bar, #foo
is the same value reported by all of the other binutils tools as the value of "foo". Note that this is the byte address of foo, so it probably is not what you really wanted (if foo is a COG label, that is... if foo is a symbolic constant then perhaps it is what you wanted!)

I guess my main problem at this time is that the expression "#foo" is used inconsistently in the GAS assembler: in a JMP or JMPRET or CALL instruction, it's interpreted as a cog address (and apparently handled correctly) but in the case of any other instruction it's used as a hub address. You may call this a feature, and fixing it may cause breakage in GCC, but it makes it difficult for those who know spin-pasm to read GAS source code, and it makes it easier for them to make mistakes which are probably difficult to find.

I'm not sure any solution would fix that. Again, the problem is that the value of "foo" is a byte address, rather than a long address. There are some workarounds (e.g. putting a "cogbegin" label right at the start of your COG code, and then using "bar long (foo-cogbegin)/4".

I guess that would be an acceptable temporary workaround, but it would make for some incredibly messy code. And it's still error-prone.

I appreciate all the workarounds you're providing, but it doesn't really sound like anyone is interested in truly fixing this. I understand that you're afraid it might break GCC and the libraries if you do, but I believe it's possible to fix this the proper way (i.e. so that the GAS assembly language is fully backwards compatible with spin-asm, which currently it is not), without breaking GCC.

I still don't know much about the innards of Binutils, but I intend to work on this as soon as I can. I'm just running out of time to finish my project before the Official Parallax Conference, so making my stuff work (in spin-asm for now) has priority, and any bugfix from me for this will probably happen afterwards.

===Jac

David Betz · 2013-03-15 15:52

I'm beginning to think that your idea of using & to indicate a COG address might be the best solution if that information can be passed to the linker so it knows what to do. I suppose that ought to be possible since the targets of jmp instructions already have that handling. I'll take a look at that later.

pedward · 2013-03-15 16:14

If the assembler is generating symbol names to be dynamically replaced by the linker, then my solution would work, since you have two modes:

mov @symbol, @symbol

or

mov @symbol, [@symbol]

@symbol is the linker fixup target

The [@symbol] is a memory reference to the address @symbol

You could also use @@symbol to indicate location-of-location-of-symbol-name

So basically:

mov @foo, @bar

and

mov @foo, @@bar

or

mov @foo, @@bar+1

It's all valid compiler stuff, the real question is to what the linker currently supports.

Funny how those are similar SPIN operators.

In C parlance:

mov *(foo), bar

mov *(foo), *(bar)

mov *(foo), *(bar) + 1

Just offering up as much as I can think of, to see if it helps to make the problem more solvable. FWIW, I don't favor a different operator, when it's not the operator that's the problem, it's the implementation of immediate symbolic references.

jac_goudsmit · 2013-03-15 22:39

pedward wrote: »

If the assembler is generating symbol names to be dynamically replaced by the linker, then my solution would work, since you have two modes:

mov @symbol, @symbol

or

mov @symbol, [@symbol]

Thanks pedward, but that won't get us any closer to compatibility with the Spin-PASM compiler. And neither will my proposed & operator. The Spin assembler can do without those, and the Binutils assembler can do without them too, except without Spin you don't have the ability to get the hub address of a variable so that ability needs to be added. Except Binutils assembler already does it (when using the same language as what -- in spin-asm -- would get the cog (not hub) address), and possibly GCC and the libraries depend on this (in my opinion) wrong implementation.

It's a complicated matter and it's a shame that we are (or at least I am) running into it at this late of a stage in the development. I know for the GCC compiler and the libraries it's not a big deal that it works the way it does, but any code that people would want to port from the OBEX is not going to work without some serious reviewing, redesigning and rewriting. Some code may not work at all unless it's compiled as part of a DAT section in a Spin file and linked as a binary, probably without the ability to do any debugging. Programmers who learn PASM from existing code and documentation will get confused if the PropGCC uses instructions differently and it's basically going to be a major support headache in years to come if this isn't fixed. It worries me that this might turn people away from the Propeller so I also want to make sure that all information about this bug is in one place as much as possible, so that, once it's solved, we can cover this thread with big stickers "Solved" so people won't panic about it. I hope others share my concern.

===Jac

pedward · 2013-03-15 22:44

I'm sorry if what I wrote didn't make sense, I was trying to use a symbolic notation since the subject has to with the symbolic linker.

I'm not suggesting any operators, I'm suggesting that GAS expand the immediate symbol references into code that will result in the same runtime values, but is formatted in a way the linker can process.

GAS already emits symbols and the linker does all of the address fixups, it just needs to do the same thing as JMP fixups, but also account for compile time arithmetic.

jac_goudsmit · 2013-03-15 22:45

I think the addition of an '&' operator would be pretty trivial. Maybe I can look at it this weekend and post some diffs here.

===Jac

jac_goudsmit · 2013-03-15 22:54

pedward wrote: »

GAS already emits symbols and the linker does all of the address fixups, it just needs to do the same thing as JMP fixups, but also account for compile time arithmetic.

I don't grok Binutils 100% yet but I'm pretty sure that fixups are stored and communicated with the entire expression intact. That's why expressions can't be made all that complicated in assembly code: there is no recursion in the expression parser, it only accommodates a limited number of symbols in the expression which can only be combined in a few different ways. I know enough to know that it's possible to get information across to the linker, that's not the problem.

===Jac

ersmith · 2013-03-16 05:11

Jac:

It certainly is a complicated matter, and we've gone back and forth on this a number of times in the PropGCC development group. The fundamental issue I believe is that PASM keeps track of two addresses for each symbol (the HUB address and COG address) but GAS does not, and in fact cannot, since addresses are assigned by the linker and not the assembler. Changing binutils to keep two addresses per symbol is not really feasible.

The bottom line is that PASM and GAS use different syntaxes are are not compatible. We're not the first architecture for which GAS is not compatible with the "native" assembler (for example on the x86 platform GAS uses "AT&T" syntax by default (and only recently has it been able to support Intel syntax). Perhaps it would have been better if GAS for the Propeller were more obviously incompatible with PASM, perhaps by using some different mnemonics or conventions for registers. On the other hand, with the .cog_ram directive in place there's only a little bit of difference, so being able to easily port between assemblers is useful too.

Definitely we need documentation for this. I'm very puzzled -- I was sure we had a document in the propgcc "docs" directory that described the differences between assemblers, but I don't see it now. There are a few notes in the GAS documentation, but not enough (and nobody reads that anyway :-( ).

It might make sense to create a pasm assembler that outputs ELF .o files that can be linked with the linker and is designed for COG usage only (so all symbols have their COG address values). That would bypass some of the issues, although I'm not sure we could implement the PASM "@" operator properly since we'd only have the COG address available and not the HUB address.

Eric

David Betz · 2013-03-16 05:32

ersmith wrote: »

Jac:

It certainly is a complicated matter, and we've gone back and forth on this a number of times in the PropGCC development group. The fundamental issue I believe is that PASM keeps track of two addresses for each symbol (the HUB address and COG address) but GAS does not, and in fact cannot, since addresses are assigned by the linker and not the assembler. Changing binutils to keep two addresses per symbol is not really feasible.

The bottom line is that PASM and GAS use different syntaxes are are not compatible. We're not the first architecture for which GAS is not compatible with the "native" assembler (for example on the x86 platform GAS uses "AT&T" syntax by default (and only recently has it been able to support Intel syntax). Perhaps it would have been better if GAS for the Propeller were more obviously incompatible with PASM, perhaps by using some different mnemonics or conventions for registers. On the other hand, with the .cog_ram directive in place there's only a little bit of difference, so being able to easily port between assemblers is useful too.

Definitely we need documentation for this. I'm very puzzled -- I was sure we had a document in the propgcc "docs" directory that described the differences between assemblers, but I don't see it now. There are a few notes in the GAS documentation, but not enough (and nobody reads that anyway :-( ).

It might make sense to create a pasm assembler that outputs ELF .o files that can be linked with the linker and is designed for COG usage only (so all symbols have their COG address values). That would bypass some of the issues, although I'm not sure we could implement the PASM "@" operator properly since we'd only have the COG address available and not the HUB address.

Eric

Thanks Eric. That's a very clear explanation of the problem. I guess a simple syntactic hack like I proposed to make doesn't really solve the problem and may even introduce new inconsistencies.

ersmith · 2013-03-16 05:40

Some technical notes:

(1) The assembler outputs relocation information in for any expressions that are not fully defined. This includes pretty much all symbols (except constants in the same source file) since symbol values are not placed until link time. The linker is responsible for assigning symbol values and using the relocations to patch up the instructions with the correct addresses. All addresses are byte addresses.

(2) There are a lot of relocation types. The most relevant for our purposes are PROPELLER_DST (for a destination register), PROPELLER_SRC (for a source register), and PROPELLER_SRC_IMM (for a source immediate value). For example, in:

   mov A, B
   mov C, #D

we would get 4 relocation records: two PROPELLER_DST records for "A" and "C", a PROPELLER_SRC record for "B", and a PROPELLER_SRC_IMM record for "D". Note that in:

   long X

a completely different "RELOC_32" record is generated for "X".

(3) Some instructions (like JMP, JMPRET, and MOVA) always output PROPELLER_SRC relocations, even for immediate values. This was done because it makes no sense to jump to a byte aligned value, and it made porting simple assembly code from PASM to GAS much easier. This may or may not have been a good decision.

(4) PROPELLER_DST and PROPELLER_SRC records work by ignoring the bottom 2 bits of the address and (essentially) dividing by 4. This does the "right thing" for simple symbolic references, but note that any expressions like "x+1" are interpreted as "(x+1)/4", and not as "x/4 + 1". This means that for example

   jmp #x+1
   jmp #x

mean the same thing in GAS, whereas they do not mean the same thing in PASM!

(5) PROPELLER_SRC_IMM passes the address through unchanged, which means that normally:

  mov r0, #x

puts into r0 the exact value that is found in the symbol table for "x" (unless .cog_ram is in effect, see below). This works well for symbolic constants and HUB addresses, but causes problems for COG addresses. Unfortunately, the assembler is not generally in a position to distinguish these two cases, since it is the linker that assigns sections to memory addresses.

(6) All symbols defined after a .cog_ram directive are marked with a flag (PROPELLER_OTHER_COG_RAM) which tells the linker to divide the symbol value by 4 before using it in a PROPELLER_SRC_IMM directive. We had hoped that this would make porting code from PASM to GAS transparent. Unfortunately, while this helps it does not fix all cases, because some relocations (like the RELOC_32 relocation generated by "long") still use the original value. It would be an interesting exercise to see if .cog_ram can usefully be extended to these other cases.

(7) GCC does in fact rely on the behavior of PROPELLER_SRC_IMM for LMM code. For example, it can generate code like:

   mov r0, #4
loop
   ...
   sub  pc, #(endloop-loop)
endloop

where it adjusts the LMM program counter to do a short branch. If "endloop-loop" were treated as a COG expression (calculated as number of instructions rather than number of bytes) this code would not work right. This is particularly acute in CMM mode, where the instructions do not necessarily fall on long word boundaries.

Eric

David Betz · 2013-03-16 06:53

Maybe the best solution would be to get spin2cpp to generate .elf files. I guess it would have to just throw away any Spin code in that mode but it would allow COG drivers to be written in pure PASM but still linked with C code. How hard would it be to get spin2cpp to generate .elf object files? I guess we might need to give it a different name if we do that though. Maybe a conditional compilation of the spin2cpp code could produce a propeller-elf-pasm assembler?

ersmith · 2013-03-16 08:14

David Betz wrote: »

Maybe the best solution would be to get spin2cpp to generate .elf files. I guess it would have to just throw away any Spin code in that mode but it would allow COG drivers to be written in pure PASM but still linked with C code. How hard would it be to get spin2cpp to generate .elf object files? I guess we might need to give it a different name if we do that though. Maybe a conditional compilation of the spin2cpp code could produce a propeller-elf-pasm assembler?

That's a good idea. It probably wouldn't be too bad to get spin2cpp to produce linkable .o files. Having it output .o instead of .dat for files that have no external references should be easy (and the .dat output already throws away the Spin code, just like bstc does). Adding support for relocation and external references will take a while, though. I'll add it to my "to-do" list.

Eric

pedward · 2013-03-16 11:05

How does having Spin2CPP generate object files solve the problem if the linker is doing the wrong thing?

You have to differentiate the relocatable symbols from the expressions they are wrapped with.

I haven't seen any propeller PASM that uses HUB data in a src operand for an instruction. All of the example code clearly shows a passing convention to get access to symbols. I could see this being used for tricky stuff where you would sacrifice hub accesses for COG memory storage.

The handling of immediate symbol references is well defined, even with ambiguity it's still a dereference (data at location) at compile time.

It seems straightforward to treat all symbolic immediate references exactly how JMP SRC operands are treated.

If that won't work, please explain why.

David Betz · 2013-03-16 13:00

pedward wrote: »

How does having Spin2CPP generate object files solve the problem if the linker is doing the wrong thing?

You have to differentiate the relocatable symbols from the expressions they are wrapped with.

I haven't seen any propeller PASM that uses HUB data in a src operand for an instruction. All of the example code clearly shows a passing convention to get access to symbols. I could see this being used for tricky stuff where you would sacrifice hub accesses for COG memory storage.

The handling of immediate symbol references is well defined, even with ambiguity it's still a dereference (data at location) at compile time.

It seems straightforward to treat all symbolic immediate references exactly how JMP SRC operands are treated.

If that won't work, please explain why.

I think Eric's example 7 in message 26 above explains one reason it won't work.

ersmith · 2013-03-16 13:31

pedward wrote: »

How does having Spin2CPP generate object files solve the problem if the linker is doing the wrong thing?

The linker is not doing the wrong thing -- it's substituting symbolic values as its instructed to. The problem is that symbols in HUB RAM should have byte addresses, and symbols in COG RAM should have long addresses. For example, in code like:

x
    nop
y
    nop

the value of "y" is "x+1" in PASM, and "x+4" in GAS. That's the fundamental issue. We could certainly construct an assembler (perhaps using spin2cpp as a base) that output symbols as long addresses, like PASM does. In fact it could output two linker symbols for each symbol it encounters -- a COG symbol and a HUB symbol -- which would allow us to support expressions like @X. Such an assembler would be very useful for programmers porting PASM COG code. However, it would not be a good idea to make GAS behave that way -- not only would it be hard to do, but it would break all LMM and CMM programs.

It seems straightforward to treat all symbolic immediate references exactly how JMP SRC operands are treated.

If that won't work, please explain why.

'' this code breaks if immediates are treated like in JMP
   add pc, #x-.   '' jump to symbol x in LMM mode

'' this code does not work the same in GAS and PASM, no matter how JMP immediates are handled
   jmp #x+1

[Solved?] Inline assembly: Immediate instructions broken?

Comments