_COGMEM Automatic Variables
Kye
Posts: 2,200
Is it possible to have lots of _COGMEM automatic variables? I tried defining a bunch of _COGMEM variables outside of a _NAKED main function and I see that code was generated to store to these variables in the asm output. However, it looks like the compiler just stores to the variables and then keeps using its own local or stack variable for computation. My goal was to have it use the _COGMEM variable instead of using the stack.
Can you extend the register space using _COGMEM variables or does the compiler only benefit from those variables when they have initialized data? The compiler is wasting a huge amount of time loading and storing data to the stack versus just having a copy of the variable stored in cogram.
Can you extend the register space using _COGMEM variables or does the compiler only benefit from those variables when they have initialized data? The compiler is wasting a huge amount of time loading and storing data to the stack versus just having a copy of the variable stored in cogram.
Comments
If any kind of optimization is enabled then the compiler will keep variables in its own registers (r0-r15) as much as possible, and only spill them out to the stack when it runs out of room. The stack only gets used if you forget to include -O (or -Os, or some other -O variant) on the command line, or if you have a really huge number of local variables. In the latter case it might be worth splitting your function up into smaller, simpler pieces.
1. GCC refuses to uses cogmem variables equivalently as registers and will try to cache everything in its stack. This makes cogmem variables useless. The compiler will do one store to them for init purposes and that's it.
2. If GCC runs out of register space it pushes registers onto the stack. This will waste all your code space in cogmode if this happens due to all the register thrashing going on. Instead of storing a variable back in cogmem it will push it to the stack.
3. If you declare the cogmem variable as volatile then gcc can't cache it in its registers/overflow_stack and will have to move the value in and out of registers. While this method uses up two more moves than normal it saves the pushes and pops to and from the stack. I suppose this is the best you can get.
The https://code.google.com/p/propgcc/wiki/PropGccInDepth article says not to declare your cogmem variable as volatile. You must instead do the opposite for performance/space gain.
Could you post some sample code? That's not the behavior I'm seeing, but I'm using pretty simple functions. For example: produces: which looks correct to me, and there's no stack or compiler registers involved (other than the parameter x).
Right now, the code should be able to complete the inner loop in less than 500 clock cycles (I counted ~470). I needed this code to be that fast so that it doesn't miss quadrature encoder samples. If you make the cogmem variables non volatile you'll see the horror with all the stack pushes and pops.