Surprising propgcc code generation.

TomUdale · 2016-03-31 17:31

Hi All,

I am just putting the finishing touches on the firmware for a commercial propeller application using propgcc. We are very tight on memory in several of our cogc cogs so I was messing around trying to optimize things some. We normally pass pointers to data for each cog via the PAR pointer in cogstart rather than access global variables because, well that is how I think - you never know when you might want two cogs running the same routine. Of course in the end all the cogs here are single instances so I figured the compiler probably can do better looking at global data rather than having to load from a pointer all the time since it can (in principal) precalculate the exact address of the data it is reading (this logic is based on PC programming, not my vast - ehem - prop knowledge).

Well, I discovered I was completely wrong. Not only is reading from global variables not smaller than reading via a passed pointer, it is actually quite a bit larger. After poking at the assembly for a while, I discovered that the global variable seems to disable a very reasonable optimization.

Given a function call like this:

foo(pointer->a,pointer->b,pointer->c);

where pointer points to a struct with byte members a,b and c,

the compiler generates something like this:

mov     r7,r12  ;save address of pointer (previously saved in r12)
rdbyte  r0,r7   ;move a into r0
add     r7,#1   ;point r7 to b
rdbyte  r1,r7   ;move b into r1
add     r7,#1   ;point r7 to c
rdbyte  r2,r7   ;move c into r2
call foo        ;call foo (which I guess takes parameters in registers r0, r1...)

Now, if I simply change the code to be

foo(globalVar.a,globalVar.b,globalVar.c);

Where globalVar is the name of the variable pointed to by "pointer" in the first example, then we get this:

mov     r7,r12  ;save address of pointer (previously saved in r12)
rdbyte  r0,r7   ;move a into r0
mov     r7,r12  ;save address of pointer (previously saved in r12)
add     r7,#1   ;point r7 to b
rdbyte  r1,r7   ;move b into r1
mov     r7,r12  ;save address of pointer (previously saved in r12)
add     r7,#2   ;point r7 to c
rdbyte  r2,r7   ;move c into r2
call foo        ;call foo

You can see that the nice optimization that simply increments r7 to point from one member of the struct to the next is replaced by code that reloads and explicitly calculates the offset of each member every time.

Now, I am not completely surprised given the way that COGC is implemented that you could not achieve my hoped for optimal result of

rdbyte  r0,#456  ;move a into r0 
rdbyte  r1,#457  ;move b into r1
rdbyte  r2,#458  ;move c into r2
call foo         ;call foo

as I expect that would require quite some linker trickery to resolve the final address of the struct, but I certainly did not expect things to get worse.

This is obviously a very minor point and I only bring it up because the compiler is in general so excellent in terms of optimizations that this really stood out to me. By and large, the source level optimizations I have been trying have already been performed by the compiler and don't get me anything. But I figured you might want to know about this (if not by design) so maybe you can see why there is the divergence in the optimization results for the two cases.

All the best,

Tom

DavidZemon · 2016-03-31 17:56

Hi Tom, glad to find another PropGCC user. If you're looking for every last byte, try the latest PropGCC development which is based on GCC 6.0.

As for your ideal dream of referencing the exact memory address, not only is it difficult, it's not always possible. The source and destination fields of a Propeller Assembly instruction are each only 9-bits, therefore they are incapable of direct-addressing most of HUB memory.

DavidZemon · 2016-03-31 17:58

You can find the latest PropGCC download links in a number of places, including PropWare's related links page: http://david.zemon.name/PropWare/RelatedLinks.xhtml

TomUdale · 2016-03-31 18:03

Hi David,

DavidZemon wrote: »

Hi Tom, glad to find another PropGCC user. If you're looking for every last byte, try the latest PropGCC development which is based on GCC 6.0.

Thanks. I have been lazily just getting whatever comes with SimpleIDE but I will check to see what the most recent version does for me.

DavidZemon wrote: »

As for your ideal dream of referencing the exact memory address, not only is it difficult, it's not always possible. The source and destination fields of a Propeller Assembly instruction are each only 9-bits, therefore they are incapable of direct-addressing most of HUB memory.

Ah, it is worse than I thought. Well, it was indeed just a dream

All the best,

Tom

DavidZemon · 2016-03-31 18:10

Btw, you might do some investigation into fcache. I've found fcache + CMM code to be a fantastic compromise between small code and fast code. I have two such examples in PropWare's I2CBase class https://github.com/parallaxinc/PropWare/blob/develop/PropWare/i2cbase.h.

And here's some official docs on fcache (I just wrote some preprocessor macros to make it a little easier - and make it so the same code is compatible when compiled in COG or LMM/CMM mode) https://sites.google.com/site/propellergcc/documentation/faq#TOC-Q:-How-do-I-put-an-inline-assembly-loop-into-the-fcache-

And a description of fcache at the bottom of this page: http://propgcc.googlecode.com/hg/doc/Memory.html

ersmith · 2016-03-31 18:30

Tom:

Thanks for your post. It is interesting, but perhaps not as surprising as one might think at first. In general I think modern compilers will be able to optimize expressions involving local variables (including parameters) better than ones involving global variables. The reason is that globals can be changed in all sorts of ways beyond the compiler's control (other threads may be writing to them, or pointer aliasing may cause unexpected updates). In this particular case the compiler probably could have produced the same code for both, but usually the local variable / parameter will do better.

Regards,
Eric

JasonDorie · 2016-03-31 23:39

A global variable can't *move* can it? In that case, the expected optimization is perfectly reasonable. If the global is a struct, with a fixed address in memory, there should be no need to store a pointer to it each time you want to access a member in sequence. In theory, the optimizer should be able to see that the address store is redundant, and just apply the necessary offset, no?

TomUdale · 2016-04-01 13:58

Hi Eric,

ersmith wrote: »

It is interesting, but perhaps not as surprising as one might think at first. In general I think modern compilers will be able to optimize expressions involving local variables (including parameters) better than ones involving global variables. The reason is that globals can be changed in all sorts of ways beyond the compiler's control (other threads may be writing to them, or pointer aliasing may cause unexpected updates). In this particular case the compiler probably could have produced the same code for both, but usually the local variable / parameter will do better.

I see. I gather then that you all are not in control of the code paths that would be making those optimization decisions.

It is still interesting to me because I would think that the data pointed to by a local pointer to an unknown object would be subject to at least as many caveats and surprises as the global name. Certainly I could see a huge difference for a local object, but the local pointer strikes me as vastly similar to the global name (which I think is basically just an address in the end anyway).

But it is all moot if you are at the mercy of the gcc guys on this anyway

All the best,

Tom

Surprising propgcc code generation.

Comments