PWM timing issue (HUB ram access issue?)

mbv · 2018-12-15 23:22

Hello,

I have a piece of C-code which launches a PWM-function into a cog using cog_run(). The pwm()- function is supposed to generate 40kHz PWM on an H-bridge while the variable "pwm_mode" (global static volatile int) equals 1. I am however getting a 29kHz PWM signal. When I change the while loop to while(1) so it doesn't need to read the value of the "pwm_mode" variable from the hub, the output is the expected 40kHz. This leads me to believe that reading the variable from hub-RAM creates a massive delay, but from what I have read it should take like 8 clock cycles only. I have already thought of multiple ways to implement this without having to read a variable from the hub, but why this delay is happening bugs me. An explanation would be great, I don't have the whole code on hand right now, but I've posted what basically is the pwm() function below and the main only launches this function + writes to an OLED every 100ms. Memory Model is LMM and optimization -O2 Speed

void pwm()
{
  DIRA = DIRA | (1 << P1G);
  DIRA = DIRA | (1 << N1G);
  DIRA = DIRA | (1 << P2G);
  DIRA = DIRA | (1 << N2G);
  
while(1){
  while(pwm_mode == 1){
    OUTA = OUTA | (1 << P1G);
    OUTA = OUTA | (1 << P2G);
    OUTA = OUTA & ~(1 << N1G);
    OUTA = OUTA & ~(1 << N2G);

    OUTA = OUTA & ~(1 << P1G);
    OUTA = OUTA | (1 << N2G);

    waitcnt(CNT + 1000 - 80);
    
    OUTA = OUTA | (1 << P1G);
    OUTA = OUTA | (1 << P2G);
    OUTA = OUTA & ~(1 << N1G);
    OUTA = OUTA & ~(1 << N2G);
    
    OUTA = OUTA & ~(1 << P2G);
    OUTA = OUTA | (1 << N1G);
     
    waitcnt(CNT + 1000 - 80);
  }
}
}

evanh · 2018-12-16 02:56

I'm no whizz on propGCC but LMM code is fetched from hubRAM. So, unless there is some smart caching/optimising going on, then every instruction is inuring that 16-clock, not 8, hub fetch time. 5 MIPS isn't terrible though.

To get top performance, I believe there is a way to say something is be compiled direct to native cog code and therefore can be a tight program running completely within a cog. I've never written anything for the Prop1 so don't have details, sorry.

Phil Pilgrim (PhiPi) · 2018-12-16 04:02

To save clocks, precompute your "1 << P1G", etc. Also, combine multiple bitwise | and & statements into one:

void pwm()
{
  P1Gmask = 1 << P1G;
  N1Gmask = 1 << N1G;
  P2Gmask = 1 << P2G;
  N2Gmask = 1 << N2G;
  DIRA |= P1Gmask | N1Gmask | P2Gmask | N2Gmask;

while(1) {
  while(pwm_mode == 1) {

    OUTA |= P1Gmask | P2Gmask; 
    OUTA &=  ~N1Gmask & ~N2Gmask;
    OUTA &= ~P1Gmask;
    OUTA |= N2Gmask;

    waitcnt(CNT + 1000 - 80);
        
    OUTA |= P1Gmask | P2Gmask; 
    OUTA &=  ~N1Gmask & ~N2Gmask;
    OUTA &=  ~P2Gmask;
    OUTA |= N1Gmask;
     
    waitcnt(CNT + 1000 - 80);
  }
}

-Phil

pmrobert · 2018-12-16 12:42

In addition to Phil's recommendation you may want to be sure that fcache is enabled either via the compiler command line or by placing an attribute before the routine def - ex. "__attribute__((fcache)) void decode(void *par)".

iseries · 2018-12-16 13:35

Also changing the code to use 1 instead of pwm_mode generate totally different code.

1 cause the compile to generate a branch always.

You can right click on the program and pick show assembler code. It's kind of hard to read but you can see how it assigns the values to registers and then checks there values.

My test shows it adds about 700 clocks to the code.

As far as precomputing the shifts. That's not necessary as the compiler takes care of that for you. It's like magic.
I assume those values are defines.

Mike

JonnyMac · 2018-12-16 16:57

Accessing the CNT register in the middle of the loop is also causing a problem -- it's resetting the sync point so the other loop instructions are padding the loop. You need to access CNT before you enter the loop; inside the loop you update the sync point. In Spin, I would do it like this:

  sync := cnt
  repeat
    waitcnt(sync += LOOP_PERIOD)
    ' other loop code

Mickster · 2018-12-16 17:26

evanh wrote: »

I'm no whizz on propGCC but LMM code is fetched from hubRAM. So, unless there is some smart caching/optimising going on, then every instruction is inuring that 16-clock, not 8, hub fetch time. 5 MIPS isn't terrible though.

To get top performance, I believe there is a way to say something is be compiled direct to native cog code and therefore can be a tight program running completely within a cog. I've never written anything for the Prop1 so don't have details, sorry.

I've always had it in my head that worst case was 22 clocks (?)

mbv · 2018-12-17 22:44

iseries wrote: »

Also changing the code to use 1 instead of pwm_mode generate totally different code.

1 cause the compile to generate a branch always.

You can right click on the program and pick show assembler code. It's kind of hard to read but you can see how it assigns the values to registers and then checks there values.

My test shows it adds about 700 clocks to the code.

As far as precomputing the shifts. That's not necessary as the compiler takes care of that for you. It's like magic.
I assume those values are defines.

Mike

Thanks, this was pretty helpful. 700 extra clocks do result in almost exactly the I was seeing. I understand just a bit of assembler and wanted to play around with modifying it but apparently the assembler view in SimpleIDE is for viewing only.

Accessing the CNT register adds only insignificant delay which I'm already compensating for, not a 10kHz reduction in PWM frequency. As someone already mentioned, the compiler will take care of |, & etc. statements which I wrote the long way for the sake of clarity (messing these up results in a short circuit in the H-bridge / blown fuse).

I was able to get this to work perfectly and output the 40kHz I'm looking for by simply decreasing the optimization level from -O2 Speed to -Os Size. Sounds very logical, right? Some very unpredictable things seem to happen with the compiler optimizing stuff on prop-C, not used to see results this crazy from AVR's. And were only talking 40kHz loop here.

I tested the output frequency with all other optimization levels too and this is approximately what I got:

-Os Size: 40kHz
-O2 Speed: 30kHz
-O1 Mixed: 30kHz
-O2 None: 17kHz

PWM timing issue (HUB ram access issue?)

Comments