Labels as Values with -mcog crashes compiler
Heater.
Posts: 21,230
I was looking to use the GCC feature of "labels as values" in some COG code. Would be useful for jump tables and thread switching etc. Turns out to crash the compiler:
Attached is a quick experiment with labels as values which compiles fine as LMM code but fails as below when -mcog is used. No matter the optimization level.
Attached is a quick experiment with labels as values which compiles fine as LMM code but fails as below when -mcog is used. No matter the optimization level.
heater@dadada.da:~$ propeller-elf-gcc -mcog -S -Os -Wall -o varlab.S varlab.c varlab.c: In function 'threads': varlab.c:64:1: error: unrecognizable insn: (insn 109 2 5 2 (set (reg:SI 7 r7) (symbol_ref/u:SI ("*L_LC16") [flags 0x2])) varlab.c:14 -1 (nil)) varlab.c:64:1: internal compiler error: in extract_insn, at recog.c:2109 Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions.
Comments
This works great in lmm mode. It's a problem in "-mcog" mode, though. gas knows that some instructions (notably jmpret) should have any immediate addresses divided by 4. But we don't know that the immediate label values should be so treated, and so they end up in a form that is not suitable for use by cog instructions.
Actually it's kind of a good sign that you got the crash report you did... it means we *may* be able to adjust the label values in -mcog mode. I'll look into the code generator to see if I can catch that specific case and adjust as needed.
Thanks for the bug report!
Eric
Thanks,
Eric
rxState = &&rx_state_0;
now generates:
mov _rxState, #L_L4/4
So I guess that the divide by 4 you mentioned.
I added the following global declarations to my full_duplex_serial_pt.c:
I then tried to set those values from labels within my tx_thread: I then compile to assembler with:
And inspect the PASM produced where for those assignments I see a sequence like: All looks good. Until I try to build the entire program which fails with:
Assembling my hand made .S file with:
produces the same errors.
Removing the / from "mov _txState, #(L_L2 / 4)" removes the error.
Eric
Thanks for the bug reports,
Eric
Hopefully I can find time to check out your new version this weekend. I really want to see a fast FullDuplexSerial in C and that will rely on jmpret or "goto variable" which needs to be able to pick up the addresses to start with.
On the subject of jmpret: I've been really uncomfortable about __builtin_propeller_taskswitch (or __builtin_propeller_jmpret) because it only works in cog mode. How about this instead: I think we can add a peephole optimizer to recognize that can be compiled to: That would remove the need for __builtin_propeller_jmpret, wouldn't it? The code produced would be the same in cog mode, but it would also work in LMM and XMM modes (more slowly, of course). Does that sound OK?
Eric
Just the man I need. It's 3.00am here in Finland and I'm having a sleepless night hacking out a new FDX that uses the taskswitch. I agree that the proto-threads version is just not quite cool enough and we can do 115200 with coroutines and jmpret in C.
I have no objection to dumping the taskswitch builtin if it can be done another way. In fact I was thinking of writing a version as you describe with the && operator. If that can be optimized to jmpret it would be brilliant.
Anyway I have seen some issues comming up here tonight.
For example I had a taskswich in a while loop. Both the compiled jmpret and the loop jmp were conditional on the Z flag. Well of course when control gets back from the othe coroutine the Z flag is no longer valid and things would fail.
I have found that using the << operator in my rx coroutine causes it to start using the stack. I'trying to keep everything stack free here.
I would post you the code I have so far but I only have a phone to connect to the world with just now.
Managed to get my draft version of FullDuplexSerial using taskswitch from PC to Android to forum.
Do have a look for "WARNING" in the code where I think the Z flag will get lost during a task switch. That loop could be rewritten as a "do...while" to work around that.
If you get rid of taskswitch this will have to be renamed.
I guess the task switch using goto could be tidied into a macro. In fact the proto-threads macros could be rewritten to use it if we wanted to go that far.
Yes, that's one of the hazards of the __builtin_taskswitch: it's a branch, but gcc doesn't know it's a branch, and so doesn't know that it has to be careful about flags and such. Another reason to get rid of it . The goto version shouldn't have that problem.
Odd. I don't think there's anything about the << operator itself that should cause it to require stack usage. Perhaps the code just got so complicated that it needed more temporary variables than it had registers?
Eric
The << causing stack usage was probably a byproduct of complexity as you say. It's still in the code and the stack usage has gone away.
I will now change FDX to use GCC label variables which should at least work as is now. If the peep hole optimization materializes that will be a bonus.
Here is a fullduplex serial cog driver using the GCC values as labels extension to create the tx and rx threads.
This works up to 115200 baud.!!
WARNING! Not fully tested at 115200 and relies on a few magic numbers lying around in the rx thread used to tweak the timing of the receiver, not nice:(
For some reason this insists on saving a bunch of regs to stack on main start up but after that it is stack free.
You will notice that both the threads are contained within main. I tried to break them out into their own functions but the resulting asm did not look at all right.
There are still a few issues with it:
1) Those magic numbers in rx bug me. They were just lucky guesses that got it working so I have no idea how marginal or otherwise the timing may be. Must be a sensible way to arrive at them or perhaps with future compiler optimisation they can go away.
2) Those threading macros are a mess. Too many parameters. If anyone has a better idea how to do that. The macros should be in their own header file.
3) The tx/rx functions should be in their own interface file.
I guess everyone is busy with other important things so I'll just tinker with this when I can.
On your point (2), gcc has a special macro __COUNTER__ which increments; you could probably use that to remove the various "line" parameters. Another option would be to use the __LINE__ macro, which expands to the line number of the input source file.
Eric
Can't even use __LINE__ as a param to the macro calls.
__COUNTER__ would give me the same problem.
Just defining concat(x,y) as x ## y won't do the trick, it has to go to another level of indirection to force the arguments to be substituted properly.
Eric
1) Uses "Heater Threads" much smaller an faster than proto-threads.
2) Runs in cog and works well at 115200 baud.
3) I streamlined the reciever and managed to get rid of all "fiddle factors" that were tweeking th timing of previous versions.
Wow Heater Threads. Nice code too.
Were you able to push it past 115200?
I'll try it in the morning.
"Nice code too."
Thanks, I spent a good long time making "pretty" and easy to read which had the side effect of making it work better.:)
PROBLEM: When I make my mailbox structure "static" nothing works any more. You will find a comment in the code "FAILS" where I have commented out the "static volatile ..." declaration. No idea what is going on here and no time to look just now.
I just noticed that proto threads has an option to use the GCC labels as values feture instead of the tricks it normally does with standard C switch statements.
However heater-threads are smaller and faster as they jump directly from thread to thread without having to return to a main line on every thread switch.