Shop OBEX P1 Docs P2 Docs Learn Events
CogC. Why so many useless mov commands ? — Parallax Forums

CogC. Why so many useless mov commands ?

dnalordnalor Posts: 223
edited 2012-09-09 03:45 in Propeller 1
In the assembly listing of a compiled cogc file I see many useless mov comands.

If I have two variables:
_COGMEM volatile _UDWORD udwV1;
_COGMEM volatile _UDWORD udwAnswer1;

The line:
udwAnswer1 = udwV1;

is "translated" in this assembler code:
mov    r7, _udwV1
mov    _udwAnswer1, r7

Why ?
Optimsisation level seems to be irrelevant.

Is there a way to change this behavior or is it possible to declare variables as registers ?

Comments

  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2012-09-07 11:53
    Although I don't program in C, this question interests me from a theoretical standpoint. My guess is that it has nothing to do with optimization but, rather, the object programming model implemented by the compiler. If it's a one-address programming model (i.e. register->accumulator->register), it may be hard to coax it to produce two-address object code. But let's see what the compiler writers have to say about it. ...

    -Phil
  • ersmithersmith Posts: 6,095
    edited 2012-09-07 12:32
    dnalor wrote: »
    In the assembly listing of a compiled cogc file I see many useless mov comands.

    If I have two variables:
    _COGMEM volatile _UDWORD udwV1;
    _COGMEM volatile _UDWORD udwAnswer1;
    

    The reason the output is awkward is because of the "volatile" keyword on the variables. GCC has to be very conservative about removing any read or write accesses to volatile memory, and so many optimizations on volatile variables are suppressed.

    "volatile" says that the contents of the variable can change unexpectedly, either because of hardware (the variable is connected to some hardware) or because other threads can read/write the variable. Unless you're doing some fancy multiple threading within the cog, it's usually not necessary to specify "volatile" for any _COGMEM variables (other than the hardware registers, which are already defined in <propeller.h>).
  • KyeKye Posts: 2,200
    edited 2012-09-07 12:32
    I believe GCC understands only loading and storing from registers. This means it thinks that it needs to use a register to move one piece of memory to another place in cog ram. For LMM/XMM modes this is certainly the case


    Thanks,
  • ersmithersmith Posts: 6,095
    edited 2012-09-07 12:34
    Kye wrote: »
    I believe GCC understands only loading and storing from registers. This means it thinks that it needs to use a register to move one piece of memory to another place in cog ram. For LMM/XMM modes this is certainly the case.

    Actually GCC will allow memory to memory moves on non-volatile values, and so taking the "volatile" keyword off you'll get the expected:
        mov    _udwAnswer1, _udwV1
    

    Mind you, there are still some optimization opportunities that gcc is missing. There's always room for improvement!

    Eric
  • KyeKye Posts: 2,200
    edited 2012-09-07 12:35
    Oh, that's cool that works!
  • David BetzDavid Betz Posts: 14,516
    edited 2012-09-07 12:38
    ersmith wrote: »
    Actually GCC will allow memory to memory moves on non-volatile values, and so taking the "volatile" keyword off you'll get the expected:
        mov    _udwAnswer1, _udwV1
    

    Mind you, there are still some optimization opportunities that gcc is missing. There's always room for improvement!

    Eric
    It seems kind of odd that volatile will prevent this optimization. Why would moving the value through a register be more correct for volatile variables?
  • ersmithersmith Posts: 6,095
    edited 2012-09-07 13:01
    David Betz wrote: »
    It seems kind of odd that volatile will prevent this optimization. Why would moving the value through a register be more correct for volatile variables?

    Remember that _COGMEM is still memory as far as GCC is concerned, and I guess on some architectures the semantics of memory to memory moves are not compatible with "volatile" (maybe depending on the bus width the order of byte accesses is different?). Or maybe it's just an artifact of how GCC parses variables internally -- initially everything is moved into registers, and later the register moves are optimized away, but those optimizations are probably suppressed by the volatile keyword.

    I have put some propeller specific peephole optimizations to improve this kind of thing in the "performance" branch of propgcc, but those aren't stable enough for release yet.

    Eric
  • dnalordnalor Posts: 223
    edited 2012-09-08 05:41
    ersmith wrote: »
    Actually GCC will allow memory to memory moves on non-volatile values, and so taking the "volatile" keyword off you'll get the expected:
        mov    _udwAnswer1, _udwV1
    

    Mind you, there are still some optimization opportunities that gcc is missing. There's always room for improvement!

    Eric

    Ok. It's true. Removed the volatile keyword and it looks better in this case.
    And that also explains why
    OUTA |= 0x01;
    
    gives not
    or OUTA, #1
    

    So best solution seems to be, to use local variables whenever it is possible. But then you get stackusage very fast, because 15 register are not always enough.

    Is it possible to make more register ? Maybe 32 ?

    Ok you would maybe loose some codespace, but if you can save 4 stack accesses, then you win.
    And unused registerspace could be reused with an variable declared as register.
    register _UDWORD udwReg10 asm("r10");
    
    Seems to work.
  • ersmithersmith Posts: 6,095
    edited 2012-09-08 05:54
    dnalor wrote: »
    So best solution seems to be, to use local variables whenever it is possible. But then you get stackusage very fast, because 15 register are not always enough.
    You can use _COGMEM registers all you want. Just don't declare them volatile. You only need the volatile keyword for variables that can be changed by more than one thread, and having more than one thread running in the same cog is rare. Heater did it in the FullDuplexSerial_ht example, but that's the only case I know of COG C code needing multiple threads. Even then, not all variables need to be volatile, only the ones shared by different threads.
  • dnalordnalor Posts: 223
    edited 2012-09-08 06:39
    You don't understand me!
    Here in this subroutine udwI is not volatile. And at the end you see this useless moving. If you use a local variable i then this is not the case.
    219:alfatdriver.cogc ****     for(udwI = 0; udwI < length; udwI++)
     148                      .loc 1 219 0
     149 00f0 00EEFCA0         mov    _udwI, #0
     150 00f4 00007C86         cmp    r14, #0 wz
     151 00f8 4900685C         IF_E     jmp    #.L9
     152                  .LVL16
     153                  .L13
     220:alfatdriver.cogc ****     {
     221:alfatdriver.cogc ****         b = udwPutByte(' ');
     154                      .loc 1 221 0
     155 00fc 2000FCA0         mov    r0, #32
     156 0100 0022FC5C         call    #_udwPutByte
     157                  .LVL17
     222:alfatdriver.cogc ****         if(ubpBuffer)
     158                      .loc 1 222 0
     159 0104 7600BCA2         mov    r7,_ubpBuffer wz
     223:alfatdriver.cogc ****             ubpBuffer[udwI] = b;
     160                      .loc 1 223 0
     161 0108 77009480         IF_NE add    r7, _udwI
     162 010c 00001400         IF_NE wrbyte    r0, r7
     219:alfatdriver.cogc ****     for(udwI = 0; udwI < length; udwI++)
     163                      .loc 1 219 0
     164 0110 7700BCA0         mov    r7, _udwI
     165 0114 0100FC80         add    r7, #1
     166 0118 00EEBCA0         mov    _udwI, r7
     167 011c 00003C85         cmp    r7, r14 wc
     168 0120 3F00705C         IF_B     jmp    #.L13
     169                  .LVL18
     170                  .L9
    

    But then you get often stackusage:
    sub    sp, #4
     124                  .LCFI6
     125 00c8 00003C08         wrlong    r13, sp
     126                  .LCFI7
     127 00cc 0400FC84         sub    sp, #4
     128                  .LCFI8
     129 00d0 00003C08         wrlong    r14, sp
    

    But If you have not only r0...r14 but maybe r0....r30 then this is maybe not the case. But I do not know how many registers gcc can handle.
  • ersmithersmith Posts: 6,095
    edited 2012-09-09 02:56
    dnalor wrote: »
    But If you have not only r0...r14 but maybe r0....r30 then this is maybe not the case. But I do not know how many registers gcc can handle.

    Ah, I see the problem. Thanks for the idea -- it's a good one. Unfortunately, whatever number of registers we pick will probably be wrong for some people -- either too many or not enough.

    I think the better answer is to keep improving the Propeller specific code generator so that _COGMEM variables are as good as register variables. We're still working on that. For example, the CMM preview does produce better code than the original beta; things like:
    _DIRA |= 1
    
    should now produce
      or DIRA, #1
    
    at least some of the time.

    In the meantime using a mixture of local variables (for anything that does not need to be preserved across function calls) and COGMEM variables is probably the best way to work around this issue.

    Thank you for thinking about this, though! It's good that people are questioning the code generation and offering these suggestions.

    Regards,
    Eric
  • dnalordnalor Posts: 223
    edited 2012-09-09 03:45
    Eric, thank you for the answer.
    ersmith wrote: »
    I think the better answer is to keep improving the Propeller specific code generator so that _COGMEM variables are as good as register variables.
    That of course would be best, if possible. Make register number/count settable via pragma or command line is too difficult ?? Maybe impossible ?
    ersmith wrote: »
    Thank you for thinking about this, though! It's good that people are questioning the code generation and offering these suggestions.
    That's the way I work with every c-compiler at the beginning, to get a feeling which code gives the the desired result or should not be used. You're on a very good way and near the goal.
Sign In or Register to comment.