This works as it's supposed to. But as soon as I ommit the second _waitms() the compiler thinks he's smart and that the two accesses to ctrl.currI must have the same value so that hi and lo are the same and can be optimized into a single variable. The compiled code is:
mov arg01, #1
call #__system___waitms ' first _waitms()
wrpin ##1048624, #8
waitx ##400000 ' first _waitx()
add ptr__dat__, ##74320
rdlong local01, ptr__dat__ ' first read of ctrl.currI
wrpin ##1081392, #8
waitx ##400000 ' second _waitx()
add ptr__dat__, #40
rdlong arg01, ptr__dat__ ' read of ccog
sub ptr__dat__, ##74360 ' second read of ctrl.currI is optimized away!
cogstop arg01
mov arg02, ##1146928
wrpin ##1146928, #8
mov arg01, local01 ' should be hi-lo
sub arg01, local01 ' but compiles to lo-lo !
The call to _waitms() seems to act as some sort of "clear optimization cache" command. Same for printf() and the like. This makes debugging very unpredictable. As soon as I remove the debug output the behaviour of the program changes. I'm almost sure this is my fault. But it would be really helpful if I knew how to avoid that.
(@ersmith I can send you the complete project if you need it)
I thought volatile is exactly meant for that purpose. But currently, it has no effect at all. I wonder how all my other programs work at all. Things like mailboxes used by at least two cogs are very common with the Propeller. There should be a safe way to implement them without guessing about the optimisations.
There's also a flag to disable these memory optimizations outright ...
The optimising options are in the "general.md" file. Read the section on "Per-function control." I only just now realised Eric had given me help via this mechanism to force using the full assembler on a per-function basis in my 4-bit SD card driver code.
Disclaimer: It's not my job to develop a compiler so I might be a little too naive. In reality it's probably a lot harder than I think so please take the following as suggestions from a somewhat ignorant outsider...
Optimisations are a good thing in general but the compiler shouldn't try to be smarter than the programmer. IMHO, it shouldn't try to optimize away whole lines of code. If the programmer wrote a specific statement then it's most likely there for a reason. Example
It's totally OK to not fetch the pointer again and keep it in a register. It's also OK to extract common subexpressions like (c+d). If there is no write in between then they most likely don't change. But the final read of myStruct.a and .b should always happen even if the compiler thinks that they have the same value. If I write two lines of code then I want it that way. And if it's not fully optimized it's my fault and/or intention.
I know... Changing the optimization strategies is probably not a good idea because it might introduce new bugs and cause a lot of work. Maybe it's easier to implement the volatile tag.
For now, I solved the problem by extracting the waitx() and the ctrl->currI read access into a seperate function and switching off optimization locally.
This works but interestingly also switches off small method inlining. Without the attribute the two calls to the functions get inlined as two waitx() and rdlong instruction (of which the second gets optimized away).
A lot of the problem here is that there's no distinction between code you explitly wrote a certain way and code that comes to be through macros/inlining or as a result of other optimizations. Going aggressive on deleting RDLONG instructions is really valueable because it's extremely slow (even moreso in hubexec).
If you want to control code exactly, just use ASM, it's better for that.
Come on! I know, ASM ist best when you want exact timing, exact memory and registerlayout and optimum performance. I enjoy programming in assembler as long as it's closely connected to the hardware and the code size remains managable. But it is also very time consuming and error prone. The whole purpose of a compiler is to not being forced to use assembler for everything. Especially for larger projects I prefer high level languages because it makes everything much more readable. Ok, C is not the best choice, probably...
I don't want to argue, you can't please everybody. If there's a workaround I'm happy. But it's one of those reasons why the Propeller is not so commonly used. It's very special. Everybody else uses C and "volatile".
@ersmith said:
I would use mostly temporary labels (like .x and .y) in the __asm. Another good solution would be the one I think you've already found, to enclose the assembly code in a module.
Well, I could declare ALL labels locally except for the entry point which has to be visible to the C code.
That's what I meant.
I've found that there are DAT section namespaces in Spin2 ("%namesp x"). Are they available in Spin style assembler sections?
Unfortunately those aren't hooked up to C or BASIC yet. Probably they could be made to work without too much effort, I'll look at it when I get the chance.
But I think a seperate Spin2 file is the better solution because of the syntax highlighting. Spin style comments and constants ($ABCD) always look ugly inside C files. The only drawback is I can't use C style #defines in the Spin2 modules. So I have to use two seperate include files for my global constants like pin numbers.
I agree with you that a seperate Spin2 file is probably the best solution. I'm not sure what problem you're having with C style #define in Spin2; simple #define (enough to define constants like pin numbers) should work, although macros with parameters won't.
@ManAtWork said:
I don't want to argue, you can't please everybody. If there's a workaround I'm happy. But it's one of those reasons why the Propeller is not so commonly used. It's very special. Everybody else uses C and "volatile".
I don't want too argue either, I'm just not in a position right now to implement correct volatile handling or the explicit fence intrinsic I suggested earlier.
Implementing volatile is slightly tricky, because we'll have to somehow propagate the variable information down into the registers that hold the addresses and keep track of it everywhere. In principle this is do-able, but getting all of the cases right is likely to be tricky, we'll probably miss some.
A simpler stop-gap solution is to suppress the read combinations if a waitx or similar instruction (like waitct or one of the interrupt waiting instructions) is found between the reads. This is a good idea, because if we're waiting then another COG may well change the memory. Without a wait (and without a loop) then there's a race condition anyway, so combining the reads is relatively safe.
@ersmith said:
Implementing volatile is slightly tricky, because we'll have to somehow propagate the variable information down into the registers that hold the addresses and keep track of it everywhere. In principle this is do-able, but getting all of the cases right is likely to be tricky, we'll probably miss some.
I fully understand. But if volatile is ignored it might be a good idea to throw a warning.
A simpler stop-gap solution is to suppress the read combinations if a waitx or similar instruction (like waitct or one of the interrupt waiting instructions) is found between the reads. This is a good idea, because if we're waiting then another COG may well change the memory. Without a wait (and without a loop) then there's a race condition anyway, so combining the reads is relatively safe.
Comments
Next problem: How do I properly declare shared memory used by multiple cogs as volatile so that the compiler does not optimize away consecutive reads?
Example:
This works as it's supposed to. But as soon as I ommit the second _waitms() the compiler thinks he's smart and that the two accesses to ctrl.currI must have the same value so that hi and lo are the same and can be optimized into a single variable. The compiled code is:
The call to _waitms() seems to act as some sort of "clear optimization cache" command. Same for printf() and the like. This makes debugging very unpredictable. As soon as I remove the debug output the behaviour of the program changes.
I'm almost sure this is my fault. But it would be really helpful if I knew how to avoid that.
(@ersmith I can send you the complete project if you need it)
This read elimination is blocked by most function calls, other memory writes and certain other instructions.
We should add an explicit memory fence intrinsic though.
I thought
volatile
is exactly meant for that purpose. But currently, it has no effect at all. I wonder how all my other programs work at all. Things like mailboxes used by at least two cogs are very common with the Propeller. There should be a safe way to implement them without guessing about the optimisations.Yeah volatile is ignored by the frontend because ???
Spin doesn't have an equivalent annotation, so ??? there.
Branch instructions stop the optimization even if it could otherwise happen, so thongs like wait loops always act as a fence.
There's also a flag to disable these memory optimizations outright, but idk what its called and am phoneposting from a train rn.
The optimising options are in the "general.md" file. Read the section on "Per-function control." I only just now realised Eric had given me help via this mechanism to force using the full assembler on a per-function basis in my 4-bit SD card driver code.
Disclaimer: It's not my job to develop a compiler so I might be a little too naive. In reality it's probably a lot harder than I think so please take the following as suggestions from a somewhat ignorant outsider...
Optimisations are a good thing in general but the compiler shouldn't try to be smarter than the programmer. IMHO, it shouldn't try to optimize away whole lines of code. If the programmer wrote a specific statement then it's most likely there for a reason. Example
It's totally OK to not fetch the pointer again and keep it in a register. It's also OK to extract common subexpressions like (c+d). If there is no write in between then they most likely don't change. But the final read of myStruct.a and .b should always happen even if the compiler thinks that they have the same value. If I write two lines of code then I want it that way. And if it's not fully optimized it's my fault and/or intention.
I know... Changing the optimization strategies is probably not a good idea because it might introduce new bugs and cause a lot of work. Maybe it's easier to implement the volatile tag.
For now, I solved the problem by extracting the waitx() and the ctrl->currI read access into a seperate function and switching off optimization locally.
This works but interestingly also switches off small method inlining. Without the attribute the two calls to the functions get inlined as two waitx() and rdlong instruction (of which the second gets optimized away).
A lot of the problem here is that there's no distinction between code you explitly wrote a certain way and code that comes to be through macros/inlining or as a result of other optimizations. Going aggressive on deleting RDLONG instructions is really valueable because it's extremely slow (even moreso in hubexec).
If you want to control code exactly, just use ASM, it's better for that.
Come on! I know, ASM ist best when you want exact timing, exact memory and registerlayout and optimum performance. I enjoy programming in assembler as long as it's closely connected to the hardware and the code size remains managable. But it is also very time consuming and error prone. The whole purpose of a compiler is to not being forced to use assembler for everything. Especially for larger projects I prefer high level languages because it makes everything much more readable. Ok, C is not the best choice, probably...
I don't want to argue, you can't please everybody. If there's a workaround I'm happy. But it's one of those reasons why the Propeller is not so commonly used. It's very special. Everybody else uses C and "volatile".
That's what I meant.
Unfortunately those aren't hooked up to C or BASIC yet. Probably they could be made to work without too much effort, I'll look at it when I get the chance.
I agree with you that a seperate Spin2 file is probably the best solution. I'm not sure what problem you're having with C style
#define
in Spin2; simple#define
(enough to define constants like pin numbers) should work, although macros with parameters won't.I don't want too argue either, I'm just not in a position right now to implement correct volatile handling or the explicit fence intrinsic I suggested earlier.
Implementing volatile is slightly tricky, because we'll have to somehow propagate the variable information down into the registers that hold the addresses and keep track of it everywhere. In principle this is do-able, but getting all of the cases right is likely to be tricky, we'll probably miss some.
A simpler stop-gap solution is to suppress the read combinations if a
waitx
or similar instruction (likewaitct
or one of the interrupt waiting instructions) is found between the reads. This is a good idea, because if we're waiting then another COG may well change the memory. Without a wait (and without a loop) then there's a race condition anyway, so combining the reads is relatively safe.I fully understand. But if volatile is ignored it might be a good idea to throw a warning.
That would be great.