Inline assembly bug/flaw for LONG directive
jac_goudsmit
Posts: 418
When you use a named parameter in inline assembly, you can specify a restriction in the form of a string. One of the restrictions you can use is "i" which means that the value is be used as an immediate parameter to an instruction. You can use enum values as parameters, and they get expanded correctly:
There are other restrictions too, for example "rC" specifies that an operand is a register in Cog memory (right?). Example (based on the above):
As every PASM programmer knows, the only way to use pointers is with self-modifying code. Example:
Now here comes the problem: what if these instructions are part of a bigger piece of code (e.g. a table parser that looks for an enum value)
If I change the "i" restriction to "rC", the inline assembler stores the value in a register somewhere, and then puts the address of the register in the long, which is wrong:
I think with the current software (SimpleIDE 0.8.5 and the GCC version that comes with it) there is no way to fix this or work around the problem, other than using a literal value in the LONG directive instead of an inline assembly parameter; apparently this is a corner case that the Propeller GCC / GAS developers didn't think of.
I think the assembler should allow the LONG directive to accept a value with #, meaning "put this actual value here, not the address of the value" (an example of "put the address of the value" is the line "ptr long myvar" in the code above). And obviously there should be an error if # is used with a value that can only be used as an address (so "ptr long #myvar" in the above code would generate an error because myvar is a memory location, not a literal value that's known at compile time).
I expect the same problem to happen if you use one of the other directives (can BYTE or WORD even be used? I haven't tried).
I also filed an issue about this on the PropGCC google code site but I'm not sure if that's the right location (after all this is an Assembler problem), and I thought it might be a good idea to explain the problem here a little further.
Thanks!
===Jac
enum myenum { myenum_A, myenum_B = 234 } myvar; __asm__ __volatile__ { "x long 0;" " mov x, %[myenumb];" // expands to "mov x, #234" : : [myenumb] "i" (myenum_B) : };
There are other restrictions too, for example "rC" specifies that an operand is a register in Cog memory (right?). Example (based on the above):
enum myenum { myenum_A, myenum_B = 234 } myvar; __asm__ __volatile__ { " mov %[myvar], %[myenumb];" // expands to "mov myvar, #234" : [myvar] "=rC" (myvar) // "=" means output, write-only : [myenumb] "i" (myenum_B) : };
As every PASM programmer knows, the only way to use pointers is with self-modifying code. Example:
enum myenum { myenum_A, myenum_B = 234 } myvar; __asm__ __volatile__ { " movd store_ins, ptr;" " nop;" // there must be one instruction between modifying code and modified code "store_ins" " mov 0, %[myenumb];" // destination modified; result "mov myvar, #234" /*...*/ "ptr long %[myvar]" : [myvar] "=rC" (myvar) // "=" means output, write-only : [myenumb] "i" (myenum_B) : };
Now here comes the problem: what if these instructions are part of a bigger piece of code (e.g. a table parser that looks for an enum value)
__asm__ __volatile__ { " movd store_ins, ptr;" " movs store_ins, value;" " nop;" // there must be one instruction between modifying code and modified code "store_ins" " mov 0, #0;" // destination and source modified /*...*/ "ptr long %[myvar];" "value long %[myenumb]" // ERROR: expands to "value long #234" which is invalid syntax : [myvar] "=rC" (myvar) // "=" means output, write-only : [myenumb] "i" (myenum_B) : };
If I change the "i" restriction to "rC", the inline assembler stores the value in a register somewhere, and then puts the address of the register in the long, which is wrong:
__asm__ __volatile__ { // An instruction such as "mov r7, #234" is inserted here automatically by the compiler or assembler " movd store_ins, ptr;" " movs store_ins, value;" " nop;" // there must be one instruction between modifying code and modified code "store_ins" " mov 0, #0;" // destination and source modified /*...*/ "ptr long %[myvar];" "value long %[myenumb]" // Now expands to "value long r7" instead of "value long 234" : [myvar] "=rC" (myvar) // "=" means output, write-only : [myenumb] "i" (myenum_B) : };
I think with the current software (SimpleIDE 0.8.5 and the GCC version that comes with it) there is no way to fix this or work around the problem, other than using a literal value in the LONG directive instead of an inline assembly parameter; apparently this is a corner case that the Propeller GCC / GAS developers didn't think of.
I think the assembler should allow the LONG directive to accept a value with #, meaning "put this actual value here, not the address of the value" (an example of "put the address of the value" is the line "ptr long myvar" in the code above). And obviously there should be an error if # is used with a value that can only be used as an address (so "ptr long #myvar" in the above code would generate an error because myvar is a memory location, not a literal value that's known at compile time).
I expect the same problem to happen if you use one of the other directives (can BYTE or WORD even be used? I haven't tried).
I also filed an issue about this on the PropGCC google code site but I'm not sure if that's the right location (after all this is an Assembler problem), and I thought it might be a good idea to explain the problem here a little further.
Thanks!
===Jac
Comments
In general the fewer things you put in inline assembly, the better.
Eric
The code I'm working on contains a patch table, containing a number of values which would be an array of structs if it was written in C. The table describes which instructions in the rest of the code need to be modified and how, before they are executed. The selection of the correct structs from the table is done by using a mode parameter, and I iterate through the values by comparing that parameter with a constant value in the table.
So basically if I would have the option of programming self-modifying code in C it would look something like this:
So, the table contains literal values (represented as enums) combined with cog addresses that refer to code in the inline assembly. The addresses of the assembly code aren't available to the C compiler of course, so I can't move the table into the C part of the program. I also can't rewrite the code in C or C++ because the timing is too constrained.
I know there are other ways to implement this but I think this is something that should be possible, right?
For now, I worked around it by declaring the mode values as preprocessor macros instead of enums, and I insert them in the code using the stringize operator of the preprocessor. It's ugly and I'm not sure yet if it works, but I'm working on it.
I did some digging in the binutils code, and it looks like the "LONG" directive is handled by a different part of `gas` than the regular opcodes, so I understand that it would be difficult to implement my initial proposal. Maybe a new pseudo-opcode needs to be added?
If you can tell me in which binutils source file the LONG directive is handled, I may be able to post a patch. I'm not familiar with the binutils code but I'll be happy to do some more digging to make this work.
===Jac
(PS the code in this post and the previous one aren't tested; I'm not trying to hide things, just making them easier to read. The entire module is several hundreds of lines of inline assembly and will be on github soon).