Mixed mode programming tutorial

Daniel Nagy · 2015-03-04 04:25

Hello,

My question is, do we have a good tutorial on mixed-mode programming, where one can run an LMM program and use some small and quick COGC modules for time critical parts?

Andy Lindsay (Parallax) · 2015-03-04 10:34

We don't have tutorials about that yet in the learn site, so let's post some draft material here. First installment:

Advanced Topic - Increase Function Execution Speed with _FCACHE

Here is an example with function code that gets copied unused space in a cog. This reduces the access time for data because the kernel executing the machine codes looks inside its own cog RAM for the next instruction instead of having to wait for the next access window to get the instruction from Main RAM, which is shared with 7 other cogs.

Run the program as-is, and note the cycles per second that P5 toggles at is about 83 kHz. Then, un-comment the __attribute__((fcache)) line, and re-run the program. The toggle rate should increase to about 2.5 MHz. That's about a 30x speed increase and also lends itself to precisely timed loops. You can also un-comment n++ in the pinToggle function to share counted repetitions with the main function. Without fcache, the frequency is about 48 kHz. With fcache, it's about 828 kHz. Since there is now some communication with Main RAM that slows loop execution, so there is only a 17x speed increase. ...but hey, that's still 17x!

#include "simpletools.h"

#define POS_EDGE_CTR 0b01010 << 26;           // Pos edge counter config val

void pinToggle(void *par);                    // Function prototype

volatile int dt, n, pin;                      // Shared between cogs
unsigned int stack[43 + 128];                 // Large stack - prototyping

int main()                                    // Main function
{
  pin = 5;
  //dt = CLKFREQ/10;                            // Set toggle interval

  cogstart(pinToggle, NULL,                   // Pin toggle process to other cog  
           stack, sizeof(stack));
           
  FRQA = 1;                                   // PHSA+1 for each pos edge                              
  CTRA = pin | POS_EDGE_CTR;                  // CTRA pos edge count on P5
  int dtm = CLKFREQ;                          // Time increment for main
  int t   = CNT;                              // Capture current tick count

  while(1)                                    // Main loop
  {
    PHSA = 0;                                 // Clear phase accumulator
    waitcnt(t += dtm);                        // Wait 1 s
    int cycles = PHSA;                        // Capture cycle count
    print("n = %d, ", n);                     // Display n
    print("cycles = %d\n", cycles);           // Display cycles counted
  }    
}

//__attribute__((fcache))                        // Cache this function in cog
void pinToggle(void *par)                      // pinToggle function
{
  int pmask = 1 << pin;                        // Set up pin mask
  DIRA |= pmask;                               // Pin to output
  //int t = CNT;                                 // Capture current time
  while(1)                                     // Main loop in cog
  {
    //n++;                                       // Keep a running count
    OUTA ^= pmask;                             // Toggle output
    //waitcnt(t+=dt);                            // Interval gate
  }                            
}

According to ersmith in the How can I get most out of Fcache thread, functions can have the fcache attribute applied to them for improved performance if the function is small enough to fit in a cog along with the kernel for the memory model. The allowable size of the code varies from one memory model to the next. If the function is too large, you will see a compiler error. fcached functions are also severely restricted on the functions they can call; they can only call NATIVE functions. You can add a native function by adding __attribute__((native)) or _NATIVE.

NOTE: The simpletools library includes cog.h, which has a macro that allows you to just add _FCACHE above or to the left of the function, which is a little more concise than than __attribute__((fcache)).

Both fcached and native functions have to reside in the Cog RAM. This means, you cannot use many cmm/lmm/xmm library functions. For example, simpletools library functions like high, low and pause are not native. The propeller.h library has macros like OUTA, DIRA, and waitcnt that can be used to get the same functionality. So, instead of high(5), use OUTA |= 5 and DIRA |= 5. Instead of pause(100), use waitcnt(CNT + CLKFREQ/10).

More Info: For more on propeller.h, open SimpleIDE, and click Help -> Simple Library Reference. Then, follow the propeller.h link in the web documentation that opens. For more in-depth treatment of these topics, check out PropGccInDepth

How it Works

This application measures the number of times the while(1) loop in the pinToggle function repeats by measuring the number of low-to-high transitions on P5. By commenting and commenting different parts of the example, you can measure different execution rates. Keep in mind that the actual repeat rate is twice that fast because the pinToggle function's loop performs a low-to-high transition on one repetition, and a high-to-low transition on the next.

The simpletools library has some has some convenience functions used by main.

#include "simpletools.h"

Each Propeller cog has CTRA and CTRB modules that can be configured to perform certain processes independently. One of the features is positive edge detection. In this mode, a counter module adds 1 to its corresponding PHSA/PHSB register every time a low-to-high transition is detected on a certain I/O pin. This macro definition is a value that can be ORed with an I/O pin number and then stored in the cog's CTRA or CTRB register to provide edge counting for measuring signal frequencies.

#define POS_EDGE_CTR 0b01010 << 26;

This is a function prototype for the pinToggle function. This function is designed to be launched into another cog. The actual function is below main.

void pinToggle(void *par);

These volatile variables are going to be modified/accessed by more than one function running in more than one cog. The volatile declaration ensures that the compiler doesn't optimize out code that re-checks its value before performing each operation. This could happen if one function is modifying the variable from another cog because the compiler cannot figure that out, so volatile just prevents it from ever happening.

volatile int dt, n, pin;

Compact memory model (CMM) code running in another cog needs 176 bytes (44 ints) of stack space for the C kernel, and often additional stack space for function call/return and, local variables, and etc. I'm not sure if an fcached cog really needs any stack space, so this line is just out of habit at this point.

unsigned int stack[44 + 128];

This main function is running in cmm mode. In this mode, the C kernel runs in a cog and it fetches and executes machine stored in the Propeller chip's Main RAM.

int main()
{

Pin is one of the volatile variables shared by main and pinToggle. Main has to set it before starting pinToggle in another cog because pinToggle uses pin to determine which pin to toggle.

__ pin = 5;

The pinToggle function also has commented code to keep the I/O pin on/off for a certain number of clock ticks. If statements with dt are un-commented in pinToggle, this also needs to be un-commented.

__ //dt = CLKFREQ/10;

This starts the pinToggle function in another cog. For more info on this, see Multicore Example.

__ cogstart(pinToggle, NULL,
__ stack, sizeof(stack));

As mentioned earlier, POS_EDGE_CTR was defined so that it could be ORed with a pin number to configure a counter module. Here, the cog that's executing the main function gets its counter module A configured to count low-to-high transitions on P5.

__ CTRA = pin | POS_EDGE_CTR;

The rule for counting positive transitions counter module A is that the value in the FRQA register gets added to PHSA every time a positive transition is detected. So, we'll set FRQA to 1, so that PHSA increases by 1 with each transition.

__ FRQA = 1;

These two variables are created for setting up a loop that repeats at precise time intervals. The first one establishes the time interval by setting dtm (time increment for main) to CLKFREQ, the number of system clock ticks in a second. The second marks the current number of clock ticks that have elapsed (CNT) by storing a copy of that value in t. Every time the Propeller chip's system clock ticks, the CNT register increases by 1. In this application, the system clock is running at 80 MHz. So, the value of CNT increases by 80,000,000 each second.

__ int dtm = CLKFREQ;
__ int t = CNT;

The main loop starts by setting PHSA to 0. Then, wiatcnt(t += dtm) adds the number of clock ticks in 1 second to the system clock time that was previously stored in t. That sets a target CNT register value for the waitcnt function to wait for. It's more precise than the simpletools library's pause function. The waitcnt function allows the program to continue after 1 second, at which point, int cycles = PHSA captures the number of low-to-high transitions that have occurred on P5. Then, two print function calls display that value along with the value of n. The value of n might or might not increase depending on whether or not n++ has been un-commented in pinToggle.

__while(1)
__{
____PHSA = 0;
____waitcnt(t += dtm);
____int cycles = PHSA;
____print("n = %d, ", n);
____print("cycles = %d\n", cycles);
__}
}

The pinToggle function uses only built-in propeller.h macros for I/O pin manipulations, which allows it to be labeled with the fcache attribute so that it can be copied into the portion of Cog RAM not used by the C kernel. This greatly increases execution speed because the program does not have to wait for Main RAM access, which is shared with 7 other cogs, to get the next instruction. Local variables also have faster access because they are stored in Cog RAM as well. Global variables are stored in Main RAM, and when fast execution speed is the goal, they should be used sparingly.

IMPORTANT: You won't see the speed increase until you un-comment the __attribute__((fcache)) statement.

//__attribute__((fcache))
void pinToggle(void *par) // pinToggle function
{
__ int pmask = 1 << pin;
__ DIRA |= pmask;
__ //int t = CNT;
__ while(1)
__ {
____ //n++;
____ OUTA ^= pmask;
____ //waitcnt(t+=dt);
__ }
}

Andy Lindsay (Parallax) · 2015-03-04 10:42

Advanced Topic - Mixed Mode Programming with CMM and COGC

Here is a COGC application that does the same thing as the fcache application from the previous post. One important difference is that the COGC kernel is smaller than a CMM or other memory models, which means your application can fit more code into the cog.

The shared variables from the previous post were modified and moved into a header file. The fcached function was also modified so that it could be run from a COGC file. So what's left in the main file is just code that launches the COGC cog and tests it.

Did You Know?

In this activity, you will add files to your project. If you want to see them in a list or reopen them after closing, just use the Project Manager panel. You can open it by clicking the Show/Hide Project Manager button in SimpleIDE's bottom-left corner.
If you want to copy your project, just use your file browser (Mac Finder, Windows Explorer) to copy the folder that contains all the files. Then, open the .side project in the folder copy you created.

Project Setup

First, follow the checklist instructions for the creating a project and adding the three files below to it. Then continue to the Test Instructions.

Click SimpleIDE's New Project button.
In the New Project dialog, click the New Folder button, and name the folder Test pinToggle COGC App, then set the File name to Test pinToggle.side.
Copy the Test pinToggle.c code below into the SimpleIDE editor.
Now scroll down and check the instructions for the next file (pinToggle.h).

/*
  Test pinToggle.c
*/

#include "simpletools.h"                      // Include library headers
#include "pinToggle.h"

#define POS_EDGE_CTR 0b01010 << 26;           // Pos edge counter config val

int main()                                    // Main function
{
  pinToggle_t pinToggle;                      // Declare pinToggle type (structure)
  pinToggle.pin = 5;                          // Set pin to P5
  pinToggle.dt = CLKFREQ/100;                 // Set toggle interval
  pinToggle.n = 0;                            // Initialize n to 0

  // Launch COGC code into another cog.
  // IMPORTANT: pinToggle_cogc is a term that has to match pinToggle_cogc in
  // the pinToggle.cogc file.  If you change it to abc there, it would have to be
  // changed to abc here too. 
  extern unsigned int *pinToggle_cogc;
  int cog = cognew(pinToggle_cogc, &pinToggle) + 1;
           
  FRQA = 1;                                   // PHSA+1 for each pos edge                              
  CTRA = pinToggle.pin | POS_EDGE_CTR;        // CTRA pos edge count on P5
  int dtm = CLKFREQ;                          // Time increment for main
  int t   = CNT;                              // Capture current tick count

  while(1)                                    // Main loop
  {
    PHSA = 1;                                 // Clear phase accumulator
    waitcnt(t += dtm);                        // Wait 1 s
    int cycles = PHSA;                        // Capture cycle count
    print("n = %d, ", pinToggle.n);           // Display n
    print("cycles = %d\n", cycles);           // Display cycles counted
  }    
}

A header file with shared variables provides a convenient place where code in both files can access them.

Click Project and select Add Tab to Project.
Set the Save as type to C Header File (*.h).
Enter pinToggle.h into the File name field and then click Save.
Copy the pinToggle.h code below into pinToggle.h tab's editor pane.
Scroll down and check the instructions for the next file (pinToggle.cogc).

/*
  pinToggle.h
  Type-define a structure with variables that the code in Test pinToggle.c and 
  pinToggle.cogc can both access.  
*/

typedef                                       // A custom variable that's 
  struct pinToggle_s                          // a structure named pinToggle_s
  {
    volatile int dt, n, pin;                  // with vars shared by cogs
  } 
pinToggle_t;                                  // is type-defined pinToggle_t.

If the COGC code is going to be part of an application running in a different mode, like CMM, LMM or XMM, it needs to live in its own COGC file that's part of the project.

Click Project and select Add Tab to Project.
Set the Save as type to COG C File (*.h).
Enter pinToggle.cogc into the File name field and then click Save.
Copy the pinToggle.cogc code below into pinToggle.cogc tab's editor pane.
Scroll down and check the instructions for testing the application.

/*
  pinToggle.cogc
  The machine codes generated when this file is compiled will be copied to 
  Cog RAM and executed from there.
*/

extern unsigned int _load_start_pinToggle_cog[];
const unsigned int *pinToggle_cogc = _load_start_pinToggle_cog;

#include <propeller.h>                         // Header with I/O & timing defs
#include "pinToggle.h"                         // Header with pinToggle struct

__attribute__((naked))                         // Never returns
void main(pinToggle_t *share)                  // COGC main function
{
  int pmask = 1 << share->pin;                 // Set up pin mask
  DIRA |= pmask;                               // Pin to output
  int t = CNT;                                 // Capture current time

  while(1)                                     // Main loop in cog
  {
    (share->n)++;                              // Keep a running count
    OUTA ^= pmask;                             // Toggle output
    waitcnt(t+=share->dt);                     // Interval gate
  }                            
}

Test Instructions

Now you are ready to run the application.

Click SimpleIDE's Run with Terminal button.
Verify that the application toggles the pin 5x per second.
Click the pinToggle.cogc tab
Comment out these two statements: (share->n)++; and waitcnt(t+=share->dt);
Click Run with Terminal again, and you'll be back up to about 2.5 MHz signal, which means the loop is again repeating at 5 MHz.

Try This

Let's try modifying the main file to set up two light blinking processes on P26 and P27.

Click the Save Project As button, and set the File name to PinToggle (1).side.
Modify the code as shown below.
Click Run with Terminal and verify that LEDs connected to P26 and P27 blink at different rates.

/*
  Test pinToggle.c
*/

#include "simpletools.h"                      // Include library headers
#include "pinToggle.h"

#define POS_EDGE_CTR 0b01010 << 26;           // Pos edge counter config val

int main()                                    // Main function
{
  pinToggle_t pinToggle;                      // Declare pinToggle type (structure)
  pinToggle.pin = 26;                         // Set pin to P26
  pinToggle.dt = CLKFREQ/10;                  // Set toggle interval
  pinToggle.n = 0;                            // Initialize n to 0

  // Launch COGC code into another cog.
  extern unsigned int *pinToggle_cogc;
  int cog = cognew(pinToggle_cogc, &pinToggle) + 1;
           
  pinToggle_t pinToggle2;                     // Declare pinToggle type (structure)
  pinToggle2.pin = 27;                        // Set pin to P27
  pinToggle2.dt = CLKFREQ/13;                 // Set toggle interval
  pinToggle2.n = 0;                           // Initialize n to 0

  // Launch COGC code into another cog.
  int cog2 = cognew(pinToggle_cogc, &pinToggle2) + 1;
           
  int dtm = CLKFREQ;                          // Time increment for main
  int t   = CNT;                              // Capture current tick count

  for(int i = 1; i <= 7; i++)                 // Main loop
  {
    PHSA = 1;                                 // Clear phase accumulator
    waitcnt(t += dtm);                        // Wait 1 s
    print("n = %d\n", pinToggle.n);           // Display n
  }  
  
  cogstop(cog - 1);                           // Stop first light
  
  pinToggle2.dt = CLKFREQ;                    // Change P27 rate
  print("n2 = %d\n", pinToggle2.n);           // Display n2
  pause(4000);                                // Wait 4 seconds
  print("n2 = %d\n", pinToggle2.n);           // Display n2
  
  cogstop(cog2 - 1);                          // Stop second light
}

Daniel Nagy · 2015-03-05 04:26

Hello Andy,

Thanks for the really quick and detailed example on fcache.

Andy Lindsay (Parallax) · 2015-03-06 17:54

Sure thing Daniel.

A working code example has been added to the COGC project in post #3. Next step is to write a narrative of what's happening in the code.

Daniel Nagy · 2015-03-07 12:11

Hello Andy,

Thanks for this second tutorial too.
They are very valuable to me and I think for others who are looking for advanced stuff as well.

Daniel

MikeForsyth · 2016-06-05 15:02

Thanks Andy, these tutorials just explained what I really needed to know!

DavidZemon · 2016-06-05 15:15

Thank you Andy! This question has come up many times and you seem to have answered both COGC and FCache quite thoroughly!

laurent974 · 2017-02-27 12:29

This is really informative. it should be in the learn multicore tutorial

sanandak · 2017-08-07 15:34

Daniel - I have written a book on programming the Propeller with Assembler and C - including a section on cog-c.
Attached here is the intro to the C section.

You can buy the book on leanpub (you can download a sample to give you an idea of the things I cover)
Propeller Programming

Leanpub has a 45 day return policy.

Best,
Sridhar

Mixed mode programming tutorial

Comments