Mixed mode programming tutorial

Hello,

My question is, do we have a good tutorial on mixed-mode programming, where one can run an LMM program and use some small and quick COGC modules for time critical parts?

Comments

  • 8 Comments sorted by Date Added Votes
  • edited March 2015 Vote Up0Vote Down
    We don't have tutorials about that yet in the learn site, so let's post some draft material here. First installment:

    Advanced Topic - Increase Function Execution Speed with _FCACHE

    Here is an example with function code that gets copied unused space in a cog. This reduces the access time for data because the kernel executing the machine codes looks inside its own cog RAM for the next instruction instead of having to wait for the next access window to get the instruction from Main RAM, which is shared with 7 other cogs.

    Run the program as-is, and note the cycles per second that P5 toggles at is about 83 kHz. Then, un-comment the __attribute__((fcache)) line, and re-run the program. The toggle rate should increase to about 2.5 MHz. That's about a 30x speed increase and also lends itself to precisely timed loops. You can also un-comment n++ in the pinToggle function to share counted repetitions with the main function. Without fcache, the frequency is about 48 kHz. With fcache, it's about 828 kHz. Since there is now some communication with Main RAM that slows loop execution, so there is only a 17x speed increase. ...but hey, that's still 17x!
    #include "simpletools.h"
    
    #define POS_EDGE_CTR 0b01010 << 26;           // Pos edge counter config val
    
    void pinToggle(void *par);                    // Function prototype
    
    volatile int dt, n, pin;                      // Shared between cogs
    unsigned int stack[43 + 128];                 // Large stack - prototyping
    
    int main()                                    // Main function
    {
      pin = 5;
      //dt = CLKFREQ/10;                            // Set toggle interval
    
      cogstart(pinToggle, NULL,                   // Pin toggle process to other cog  
               stack, sizeof(stack));
               
      FRQA = 1;                                   // PHSA+1 for each pos edge                              
      CTRA = pin | POS_EDGE_CTR;                  // CTRA pos edge count on P5
      int dtm = CLKFREQ;                          // Time increment for main
      int t   = CNT;                              // Capture current tick count
    
      while(1)                                    // Main loop
      {
        PHSA = 0;                                 // Clear phase accumulator
        waitcnt(t += dtm);                        // Wait 1 s
        int cycles = PHSA;                        // Capture cycle count
        print("n = %d, ", n);                     // Display n
        print("cycles = %d\n", cycles);           // Display cycles counted
      }    
    }
    
    //__attribute__((fcache))                        // Cache this function in cog
    void pinToggle(void *par)                      // pinToggle function
    {
      int pmask = 1 << pin;                        // Set up pin mask
      DIRA |= pmask;                               // Pin to output
      //int t = CNT;                                 // Capture current time
      while(1)                                     // Main loop in cog
      {
        //n++;                                       // Keep a running count
        OUTA ^= pmask;                             // Toggle output
        //waitcnt(t+=dt);                            // Interval gate
      }                            
    }
    


    According to ersmith in the How can I get most out of Fcache thread, functions can have the fcache attribute applied to them for improved performance if the function is small enough to fit in a cog along with the kernel for the memory model. The allowable size of the code varies from one memory model to the next. If the function is too large, you will see a compiler error. fcached functions are also severely restricted on the functions they can call; they can only call NATIVE functions. You can add a native function by adding __attribute__((native)) or _NATIVE.
    NOTE: The simpletools library includes cog.h, which has a macro that allows you to just add _FCACHE above or to the left of the function, which is a little more concise than than __attribute__((fcache)).

    Both fcached and native functions have to reside in the Cog RAM. This means, you cannot use many cmm/lmm/xmm library functions. For example, simpletools library functions like high, low and pause are not native. The propeller.h library has macros like OUTA, DIRA, and waitcnt that can be used to get the same functionality. So, instead of high(5), use OUTA |= 5 and DIRA |= 5. Instead of pause(100), use waitcnt(CNT + CLKFREQ/10).

    More Info: For more on propeller.h, open SimpleIDE, and click Help -> Simple Library Reference. Then, follow the propeller.h link in the web documentation that opens. For more in-depth treatment of these topics, check out PropGccInDepth


    How it Works

    This application measures the number of times the while(1) loop in the pinToggle function repeats by measuring the number of low-to-high transitions on P5. By commenting and commenting different parts of the example, you can measure different execution rates. Keep in mind that the actual repeat rate is twice that fast because the pinToggle function's loop performs a low-to-high transition on one repetition, and a high-to-low transition on the next.

    The simpletools library has some has some convenience functions used by main.
    #include "simpletools.h"

    Each Propeller cog has CTRA and CTRB modules that can be configured to perform certain processes independently. One of the features is positive edge detection. In this mode, a counter module adds 1 to its corresponding PHSA/PHSB register every time a low-to-high transition is detected on a certain I/O pin. This macro definition is a value that can be ORed with an I/O pin number and then stored in the cog's CTRA or CTRB register to provide edge counting for measuring signal frequencies.
    #define POS_EDGE_CTR 0b01010 << 26;

    This is a function prototype for the pinToggle function. This function is designed to be launched into another cog. The actual function is below main.
    void pinToggle(void *par);

    These volatile variables are going to be modified/accessed by more than one function running in more than one cog. The volatile declaration ensures that the compiler doesn't optimize out code that re-checks its value before performing each operation. This could happen if one function is modifying the variable from another cog because the compiler cannot figure that out, so volatile just prevents it from ever happening.
    volatile int dt, n, pin;

    Compact memory model (CMM) code running in another cog needs 176 bytes (44 ints) of stack space for the C kernel, and often additional stack space for function call/return and, local variables, and etc. I'm not sure if an fcached cog really needs any stack space, so this line is just out of habit at this point.
    unsigned int stack[44 + 128];

    This main function is running in cmm mode. In this mode, the C kernel runs in a cog and it fetches and executes machine stored in the Propeller chip's Main RAM.
    int main()
    {

    Pin is one of the volatile variables shared by main and pinToggle. Main has to set it before starting pinToggle in another cog because pinToggle uses pin to determine which pin to toggle.
    __ pin = 5;

    The pinToggle function also has commented code to keep the I/O pin on/off for a certain number of clock ticks. If statements with dt are un-commented in pinToggle, this also needs to be un-commented.
    __ //dt = CLKFREQ/10;

    This starts the pinToggle function in another cog. For more info on this, see Multicore Example.
    __ cogstart(pinToggle, NULL,
    __ stack, sizeof(stack));

    As mentioned earlier, POS_EDGE_CTR was defined so that it could be ORed with a pin number to configure a counter module. Here, the cog that's executing the main function gets its counter module A configured to count low-to-high transitions on P5.
    __ CTRA = pin | POS_EDGE_CTR;

    The rule for counting positive transitions counter module A is that the value in the FRQA register gets added to PHSA every time a positive transition is detected. So, we'll set FRQA to 1, so that PHSA increases by 1 with each transition.
    __ FRQA = 1;

    These two variables are created for setting up a loop that repeats at precise time intervals. The first one establishes the time interval by setting dtm (time increment for main) to CLKFREQ, the number of system clock ticks in a second. The second marks the current number of clock ticks that have elapsed (CNT) by storing a copy of that value in t. Every time the Propeller chip's system clock ticks, the CNT register increases by 1. In this application, the system clock is running at 80 MHz. So, the value of CNT increases by 80,000,000 each second.
    __ int dtm = CLKFREQ;
    __ int t = CNT;

    The main loop starts by setting PHSA to 0. Then, wiatcnt(t += dtm) adds the number of clock ticks in 1 second to the system clock time that was previously stored in t. That sets a target CNT register value for the waitcnt function to wait for. It's more precise than the simpletools library's pause function. The waitcnt function allows the program to continue after 1 second, at which point, int cycles = PHSA captures the number of low-to-high transitions that have occurred on P5. Then, two print function calls display that value along with the value of n. The value of n might or might not increase depending on whether or not n++ has been un-commented in pinToggle.
    __while(1)
    __{
    ____PHSA = 0;
    ____waitcnt(t += dtm);
    ____int cycles = PHSA;
    ____print("n = %d, ", n);
    ____print("cycles = %d\n", cycles);
    __}
    }

    The pinToggle function uses only built-in propeller.h macros for I/O pin manipulations, which allows it to be labeled with the fcache attribute so that it can be copied into the portion of Cog RAM not used by the C kernel. This greatly increases execution speed because the program does not have to wait for Main RAM access, which is shared with 7 other cogs, to get the next instruction. Local variables also have faster access because they are stored in Cog RAM as well. Global variables are stored in Main RAM, and when fast execution speed is the goal, they should be used sparingly.

    IMPORTANT: You won't see the speed increase until you un-comment the __attribute__((fcache)) statement.
    //__attribute__((fcache))
    void pinToggle(void *par) // pinToggle function
    {
    __ int pmask = 1 << pin;
    __ DIRA |= pmask;
    __ //int t = CNT;
    __ while(1)
    __ {
    ____ //n++;
    ____ OUTA ^= pmask;
    ____ //waitcnt(t+=dt);
    __ }
    }
  • edited March 2015 Vote Up0Vote Down
    Advanced Topic - Mixed Mode Programming with CMM and COGC

    Here is a COGC application that does the same thing as the fcache application from the previous post. One important difference is that the COGC kernel is smaller than a CMM or other memory models, which means your application can fit more code into the cog.

    The shared variables from the previous post were modified and moved into a header file. The fcached function was also modified so that it could be run from a COGC file. So what's left in the main file is just code that launches the COGC cog and tests it.

    Did You Know?
    • In this activity, you will add files to your project. If you want to see them in a list or reopen them after closing, just use the Project Manager panel. You can open it by clicking the Show/Hide Project Manager button in SimpleIDE's bottom-left corner.
    • If you want to copy your project, just use your file browser (Mac Finder, Windows Explorer) to copy the folder that contains all the files. Then, open the .side project in the folder copy you created.
    Project Setup

    First, follow the checklist instructions for the creating a project and adding the three files below to it. Then continue to the Test Instructions.
    • Click SimpleIDE's New Project button.
    • In the New Project dialog, click the New Folder button, and name the folder Test pinToggle COGC App, then set the File name to Test pinToggle.side.
    • Copy the Test pinToggle.c code below into the SimpleIDE editor.
    • Now scroll down and check the instructions for the next file (pinToggle.h).
    /*
      Test pinToggle.c
    */
    
    #include "simpletools.h"                      // Include library headers
    #include "pinToggle.h"
    
    #define POS_EDGE_CTR 0b01010 << 26;           // Pos edge counter config val
    
    int main()                                    // Main function
    {
      pinToggle_t pinToggle;                      // Declare pinToggle type (structure)
      pinToggle.pin = 5;                          // Set pin to P5
      pinToggle.dt = CLKFREQ/100;                 // Set toggle interval
      pinToggle.n = 0;                            // Initialize n to 0
    
      // Launch COGC code into another cog.
      // IMPORTANT: pinToggle_cogc is a term that has to match pinToggle_cogc in
      // the pinToggle.cogc file.  If you change it to abc there, it would have to be
      // changed to abc here too. 
      extern unsigned int *pinToggle_cogc;
      int cog = cognew(pinToggle_cogc, &pinToggle) + 1;
               
      FRQA = 1;                                   // PHSA+1 for each pos edge                              
      CTRA = pinToggle.pin | POS_EDGE_CTR;        // CTRA pos edge count on P5
      int dtm = CLKFREQ;                          // Time increment for main
      int t   = CNT;                              // Capture current tick count
    
      while(1)                                    // Main loop
      {
        PHSA = 1;                                 // Clear phase accumulator
        waitcnt(t += dtm);                        // Wait 1 s
        int cycles = PHSA;                        // Capture cycle count
        print("n = %d, ", pinToggle.n);           // Display n
        print("cycles = %d\n", cycles);           // Display cycles counted
      }    
    }
    


    A header file with shared variables provides a convenient place where code in both files can access them.
    • Click Project and select Add Tab to Project.
    • Set the Save as type to C Header File (*.h).
    • Enter pinToggle.h into the File name field and then click Save.
    • Copy the pinToggle.h code below into pinToggle.h tab's editor pane.
    • Scroll down and check the instructions for the next file (pinToggle.cogc).
    /*
      pinToggle.h
      Type-define a structure with variables that the code in Test pinToggle.c and 
      pinToggle.cogc can both access.  
    */
    
    typedef                                       // A custom variable that's 
      struct pinToggle_s                          // a structure named pinToggle_s
      {
        volatile int dt, n, pin;                  // with vars shared by cogs
      } 
    pinToggle_t;                                  // is type-defined pinToggle_t. 
    


    If the COGC code is going to be part of an application running in a different mode, like CMM, LMM or XMM, it needs to live in its own COGC file that's part of the project.
    • Click Project and select Add Tab to Project.
    • Set the Save as type to COG C File (*.h).
    • Enter pinToggle.cogc into the File name field and then click Save.
    • Copy the pinToggle.cogc code below into pinToggle.cogc tab's editor pane.
    • Scroll down and check the instructions for testing the application.
    /*
      pinToggle.cogc
      The machine codes generated when this file is compiled will be copied to 
      Cog RAM and executed from there.
    */
    
    extern unsigned int _load_start_pinToggle_cog[];
    const unsigned int *pinToggle_cogc = _load_start_pinToggle_cog;
    
    #include <propeller.h>                         // Header with I/O & timing defs
    #include "pinToggle.h"                         // Header with pinToggle struct
    
    __attribute__((naked))                         // Never returns
    void main(pinToggle_t *share)                  // COGC main function
    {
      int pmask = 1 << share->pin;                 // Set up pin mask
      DIRA |= pmask;                               // Pin to output
      int t = CNT;                                 // Capture current time
    
      while(1)                                     // Main loop in cog
      {
        (share->n)++;                              // Keep a running count
        OUTA ^= pmask;                             // Toggle output
        waitcnt(t+=share->dt);                     // Interval gate
      }                            
    }
    


    Test Instructions

    Now you are ready to run the application.
    • Click SimpleIDE's Run with Terminal button.
    • Verify that the application toggles the pin 5x per second.
    • Click the pinToggle.cogc tab
    • Comment out these two statements: (share->n)++; and waitcnt(t+=share->dt);
    • Click Run with Terminal again, and you'll be back up to about 2.5 MHz signal, which means the loop is again repeating at 5 MHz.
    Try This

    Let's try modifying the main file to set up two light blinking processes on P26 and P27.
    • Click the Save Project As button, and set the File name to PinToggle (1).side.
    • Modify the code as shown below.
    • Click Run with Terminal and verify that LEDs connected to P26 and P27 blink at different rates.
    /*
      Test pinToggle.c
    */
    
    #include "simpletools.h"                      // Include library headers
    #include "pinToggle.h"
    
    #define POS_EDGE_CTR 0b01010 << 26;           // Pos edge counter config val
    
    int main()                                    // Main function
    {
      pinToggle_t pinToggle;                      // Declare pinToggle type (structure)
      pinToggle.pin = 26;                         // Set pin to P26
      pinToggle.dt = CLKFREQ/10;                  // Set toggle interval
      pinToggle.n = 0;                            // Initialize n to 0
    
      // Launch COGC code into another cog.
      extern unsigned int *pinToggle_cogc;
      int cog = cognew(pinToggle_cogc, &pinToggle) + 1;
               
      pinToggle_t pinToggle2;                     // Declare pinToggle type (structure)
      pinToggle2.pin = 27;                        // Set pin to P27
      pinToggle2.dt = CLKFREQ/13;                 // Set toggle interval
      pinToggle2.n = 0;                           // Initialize n to 0
    
      // Launch COGC code into another cog.
      int cog2 = cognew(pinToggle_cogc, &pinToggle2) + 1;
               
      int dtm = CLKFREQ;                          // Time increment for main
      int t   = CNT;                              // Capture current tick count
    
      for(int i = 1; i <= 7; i++)                 // Main loop
      {
        PHSA = 1;                                 // Clear phase accumulator
        waitcnt(t += dtm);                        // Wait 1 s
        print("n = %d\n", pinToggle.n);           // Display n
      }  
      
      cogstop(cog - 1);                           // Stop first light
      
      pinToggle2.dt = CLKFREQ;                    // Change P27 rate
      print("n2 = %d\n", pinToggle2.n);           // Display n2
      pause(4000);                                // Wait 4 seconds
      print("n2 = %d\n", pinToggle2.n);           // Display n2
      
      cogstop(cog2 - 1);                          // Stop second light
    }
    
  • Daniel NagyDaniel Nagy Posts: 26
    edited March 2015 Vote Up0Vote Down
    Hello Andy,

    Thanks for the really quick and detailed example on fcache. :)
  • edited March 2015 Vote Up0Vote Down
    Sure thing Daniel.

    A working code example has been added to the COGC project in post #3. Next step is to write a narrative of what's happening in the code.
  • Daniel NagyDaniel Nagy Posts: 26
    edited March 2015 Vote Up0Vote Down
    Hello Andy,

    Thanks for this second tutorial too.
    They are very valuable to me and I think for others who are looking for advanced stuff as well.

    Daniel
  • Thanks Andy, these tutorials just explained what I really needed to know!
  • Thank you Andy! This question has come up many times and you seem to have answered both COGC and FCache quite thoroughly!
    David
    PropWare: C++ HAL (Hardware Abstraction Layer) for PropGCC; Robust build system using CMake; Integrated Simple Library, libpropeller, and libPropelleruino (Arduino port); Instructions for Eclipse and JetBrain's CLion; Example projects; Doxygen documentation
  • This is really informative. it should be in the learn multicore tutorial
Sign In or Register to comment.