Shop OBEX P1 Docs P2 Docs Learn Events
Labels as Values with -mcog crashes compiler — Parallax Forums

Labels as Values with -mcog crashes compiler

Heater.Heater. Posts: 21,230
edited 2011-10-23 00:17 in Propeller 1
I was looking to use the GCC feature of "labels as values" in some COG code. Would be useful for jump tables and thread switching etc. Turns out to crash the compiler:

Attached is a quick experiment with labels as values which compiles fine as LMM code but fails as below when -mcog is used. No matter the optimization level.
heater@dadada.da:~$ propeller-elf-gcc -mcog  -S -Os -Wall -o varlab.S varlab.c
varlab.c: In function 'threads':
varlab.c:64:1: error: unrecognizable insn:
(insn 109 2 5 2 (set (reg:SI 7 r7)
        (symbol_ref/u:SI ("*L_LC16") [flags 0x2])) varlab.c:14 -1
     (nil))
varlab.c:64:1: internal compiler error: in extract_insn, at recog.c:2109
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
c
c

Comments

  • ersmithersmith Posts: 6,102
    edited 2011-10-05 15:08
    The problem is that gcc always treats addresses as byte quantities. So for example in the code below:
    label1 mov r0,#0
    label2 mov r1,#1
    label3 mov r2, #2
    
    The value of "label1" is 0, "label2" is 4, and "label3" is 8.

    This works great in lmm mode. It's a problem in "-mcog" mode, though. gas knows that some instructions (notably jmpret) should have any immediate addresses divided by 4. But we don't know that the immediate label values should be so treated, and so they end up in a form that is not suitable for use by cog instructions.

    Actually it's kind of a good sign that you got the crash report you did... it means we *may* be able to adjust the label values in -mcog mode. I'll look into the code generator to see if I can catch that specific case and adjust as needed.

    Thanks for the bug report!

    Eric
  • ersmithersmith Posts: 6,102
    edited 2011-10-05 16:50
    I think I have labels as values working correctly in -mcog mode now. Some other kinds of indirect jumps (e.g. to function pointers) probably will not work, but labels should be OK. Please let me know if you run into any other bugs.

    Thanks,
    Eric
  • Heater.Heater. Posts: 21,230
    edited 2011-10-06 03:55
    Wow, fixed already. No chance to test on the metal yet but I notice that for example:

    rxState = &&rx_state_0;

    now generates:

    mov _rxState, #L_L4/4

    So I guess that the divide by 4 you mentioned.
  • Heater.Heater. Posts: 21,230
    edited 2011-10-07 06:05
    Labels as Values is not entirely working out in COG mode at least.

    I added the following global declarations to my full_duplex_serial_pt.c:
    static volatile _COGMEM void* txState;
    static volatile _COGMEM void* rxState;
    
    I then tried to set those values from labels within my tx_thread:
    txState = &&tx_state_0;
    rxState = &&rx_state_0;
    
    I then compile to assembler with:
    propeller-elf-gcc -Os -mcog -I../libpropeller/../include -o full_duplex_serial_pt.S -S full_duplex_serial_pt.c
    
    And inspect the PASM produced where for those assignments I see a sequence like:
    mov     _txState, #(L_L2/4)
            mov     _rxState, #(L_L3/4)
    
    All looks good. Until I try to build the entire program which fails with:
    propeller-elf-gcc -Os -mcog -I../libpropeller/../include -o full_duplex_serial_pt.o -c full_duplex_serial_pt.c
    full_duplex_serial_pt.c: In function 'main':
    full_duplex_serial_pt.c:161:11: warning: assignment discards 'volatile' qualifier from pointer target type [enabled by default]
    full_duplex_serial_pt.c:162:11: warning: assignment discards 'volatile' qualifier from pointer target type [enabled by default]
    /tmp/ccvdjADT.s: Assembler messages:
    /tmp/ccvdjADT.s:5: Error: invalid sections for operation on `L_L2' and `L0'
    /tmp/ccvdjADT.s:6: Error: invalid sections for operation on `L_L3' and `L0'
    make: *** [full_duplex_serial_pt.o] Error 1
    
  • Heater.Heater. Posts: 21,230
    edited 2011-10-07 06:54
    So it turns out that gas does not like the divide by 4 in the generated code for && operator.

    Assembling my hand made .S file with:
    propeller-elf-gcc -Os -mcog -I../libpropeller/../include -o full_duplex_serial_pt.o -c full_duplex_serial_pt.S
    

    produces the same errors.

    Removing the / from "mov _txState, #(L_L2 / 4)" removes the error.
  • ersmithersmith Posts: 6,102
    edited 2011-10-07 10:06
    Thanks for tracking this down, heater. It looks like we may need to make some assembler changes to support this. I'll look into it some more.

    Eric
  • ersmithersmith Posts: 6,102
    edited 2011-10-07 10:21
    OK, I've updated the assembler to include a "mova" instruction that knows to divide the address by 4 (basically it treats an immediate value the same way the jmpret instruction does). The compiler now knows to emit this for moving cog memory addresses into registers, instead of trying to do the /4. This one should work for you -- at least, the generated .o files looks OK.

    Thanks for the bug reports,
    Eric
  • Heater.Heater. Posts: 21,230
    edited 2011-10-07 11:52
    Turns out the same /4 problem exists with getting function addresses. I guess your new mova instruction will have fixed that as well.
    Hopefully I can find time to check out your new version this weekend. I really want to see a fast FullDuplexSerial in C and that will rely on jmpret or "goto variable" which needs to be able to pick up the addresses to start with.
  • ersmithersmith Posts: 6,102
    edited 2011-10-07 15:17
    Your project is really cool, Heater -- it would definitely be interesting to have a fast FullDuplexSerial in C (what you've already done with pt is amazing, but getting up to 115200 would really be the icing on the cake!).

    On the subject of jmpret: I've been really uncomfortable about __builtin_propeller_taskswitch (or __builtin_propeller_jmpret) because it only works in cog mode. How about this instead: I think we can add a peephole optimizer to recognize that
      txdest = & & next;
      goto *rxdest;
    next:
    
    can be compiled to:
      jmpret txdest, rxdest
    next:
    
    That would remove the need for __builtin_propeller_jmpret, wouldn't it? The code produced would be the same in cog mode, but it would also work in LMM and XMM modes (more slowly, of course). Does that sound OK?

    Eric
  • Heater.Heater. Posts: 21,230
    edited 2011-10-07 17:00
    Eric,
    Just the man I need. It's 3.00am here in Finland and I'm having a sleepless night hacking out a new FDX that uses the taskswitch. I agree that the proto-threads version is just not quite cool enough and we can do 115200 with coroutines and jmpret in C.

    I have no objection to dumping the taskswitch builtin if it can be done another way. In fact I was thinking of writing a version as you describe with the && operator. If that can be optimized to jmpret it would be brilliant.

    Anyway I have seen some issues comming up here tonight.
    For example I had a taskswich in a while loop. Both the compiled jmpret and the loop jmp were conditional on the Z flag. Well of course when control gets back from the othe coroutine the Z flag is no longer valid and things would fail.

    I have found that using the << operator in my rx coroutine causes it to start using the stack. I'trying to keep everything stack free here.

    I would post you the code I have so far but I only have a phone to connect to the world with just now.
  • Heater.Heater. Posts: 21,230
    edited 2011-10-07 18:27
    Eric,

    Managed to get my draft version of FullDuplexSerial using taskswitch from PC to Android to forum.

    Do have a look for "WARNING" in the code where I think the Z flag will get lost during a task switch. That loop could be rewritten as a "do...while" to work around that.

    If you get rid of taskswitch this will have to be renamed.
    I guess the task switch using goto could be tidied into a macro. In fact the proto-threads macros could be rewritten to use it if we wanted to go that far.
  • ersmithersmith Posts: 6,102
    edited 2011-10-07 18:51
    Heater. wrote: »
    Anyway I have seen some issues comming up here tonight.
    For example I had a taskswich in a while loop. Both the compiled jmpret and the loop jmp were conditional on the Z flag. Well of course when control gets back from the othe coroutine the Z flag is no longer valid and things would fail.

    Yes, that's one of the hazards of the __builtin_taskswitch: it's a branch, but gcc doesn't know it's a branch, and so doesn't know that it has to be careful about flags and such. Another reason to get rid of it :). The goto version shouldn't have that problem.
    I have found that using the << operator in my rx coroutine causes it to start using the stack. I'trying to keep everything stack free here.

    Odd. I don't think there's anything about the << operator itself that should cause it to require stack usage. Perhaps the code just got so complicated that it needed more temporary variables than it had registers?

    Eric
  • Heater.Heater. Posts: 21,230
    edited 2011-10-08 00:21
    Ok, that's it then. I'm convinced. taskswitch/jmpret as a built in has to go. It's positively dangerous. That kind of behavior could waste someone a lot of debugging time if they were not expecting it.
    The << causing stack usage was probably a byproduct of complexity as you say. It's still in the code and the stack usage has gone away.
    I will now change FDX to use GCC label variables which should at least work as is now. If the peep hole optimization materializes that will be a bonus.
  • Heater.Heater. Posts: 21,230
    edited 2011-10-09 12:45
    Labels as variables looks to be in good shape for COG code now.

    Here is a fullduplex serial cog driver using the GCC values as labels extension to create the tx and rx threads.

    This works up to 115200 baud.!!

    WARNING! Not fully tested at 115200 and relies on a few magic numbers lying around in the rx thread used to tweak the timing of the receiver, not nice:(

    For some reason this insists on saving a bunch of regs to stack on main start up but after that it is stack free.
    You will notice that both the threads are contained within main. I tried to break them out into their own functions but the resulting asm did not look at all right.
  • Heater.Heater. Posts: 21,230
    edited 2011-10-11 00:13
    Yay, full_duplex_serial_vl has been echoing data all night at 115200 baud without error.
    There are still a few issues with it:
    1) Those magic numbers in rx bug me. They were just lucky guesses that got it working so I have no idea how marginal or otherwise the timing may be. Must be a sensible way to arrive at them or perhaps with future compiler optimisation they can go away.
    2) Those threading macros are a mess. Too many parameters. If anyone has a better idea how to do that. The macros should be in their own header file.
    3) The tx/rx functions should be in their own interface file.

    I guess everyone is busy with other important things so I'll just tinker with this when I can.
  • ersmithersmith Posts: 6,102
    edited 2011-10-12 14:09
    That looks very cool!

    On your point (2), gcc has a special macro __COUNTER__ which increments; you could probably use that to remove the various "line" parameters. Another option would be to use the __LINE__ macro, which expands to the line number of the input source file.

    Eric
  • Heater.Heater. Posts: 21,230
    edited 2011-10-12 14:25
    So far I have failed to figure out how to build a label, starting with an alpha character, using the number returned by the __LINE__ macro. You see I have next##line in the macros. If I use next##__LINE__ it does not do what I want.
    Can't even use __LINE__ as a param to the macro calls.

    __COUNTER__ would give me the same problem.
  • ersmithersmith Posts: 6,102
    edited 2011-10-13 05:09
    The C preprocessor is pretty tricky sometimes. In order to expand __LINE__ at the right time you have to do some indirection with the ## macro; in fact you need a double indirection (something similar to what is often done with the stringize operator #):
    #define concat_impl(x, y) x ## y
    #define concat(x, y) concat_impl(x, y)
    
    #define WAIT_WHILE(thisThread, nextThread, cond) \
        while(cond)                                                            \
        {                                                             \
            thisThread = && concat(next_,__LINE__); goto *thatThread; concat(next_,__LINE__) :; \
        }
    

    Just defining concat(x,y) as x ## y won't do the trick, it has to go to another level of indirection to force the arguments to be substituted properly.

    Eric
  • Heater.Heater. Posts: 21,230
    edited 2011-10-16 12:22
    Here is a full duplex serial using Erics macro advice.
    1) Uses "Heater Threads" much smaller an faster than proto-threads.
    2) Runs in cog and works well at 115200 baud.
    3) I streamlined the reciever and managed to get rid of all "fiddle factors" that were tweeking th timing of previous versions.
  • jazzedjazzed Posts: 11,803
    edited 2011-10-16 22:43
    Heater. wrote: »
    Here is a full duplex serial using Erics macro advice.
    1) Uses "Heater Threads" much smaller an faster than proto-threads.
    2) Runs in cog and works well at 115200 baud.
    3) I streamlined the reciever and managed to get rid of all "fiddle factors" that were tweeking th timing of previous versions.

    Wow :) Heater Threads. Nice code too.
    Were you able to push it past 115200?
    I'll try it in the morning.
  • Heater.Heater. Posts: 21,230
    edited 2011-10-17 03:42
    Jazzed,

    "Nice code too."

    Thanks, I spent a good long time making "pretty" and easy to read which had the side effect of making it work better.:)

    PROBLEM: When I make my mailbox structure "static" nothing works any more. You will find a comment in the code "FAILS" where I have commented out the "static volatile ..." declaration. No idea what is going on here and no time to look just now.
  • jazzedjazzed Posts: 11,803
    edited 2011-10-22 12:14
    @Heater, I want to add your super-powered demo to the repository.
  • Heater.Heater. Posts: 21,230
    edited 2011-10-23 00:17
    It's there for the taking.
    I just noticed that proto threads has an option to use the GCC labels as values feture instead of the tricks it normally does with standard C switch statements.
    However heater-threads are smaller and faster as they jump directly from thread to thread without having to return to a main line on every thread switch.
Sign In or Register to comment.