Shop OBEX P1 Docs P2 Docs Learn Events
How to get accurate micro sec Pauses? — Parallax Forums

How to get accurate micro sec Pauses?

BTL24BTL24 Posts: 54
edited 2013-11-21 18:19 in Propeller 1
I am having trouble getting micro sec pauses to work with the Prop Activity board. I followed the library guidelines and used the set_pause_dt(CLKFREQ/1000000); command, which is supposed to allow the pause() command to switch from 1 mS increments to 1 uS increments.

When I program up a pause (10) I see 40uS on scope. When I program pause(20), I see 50uS on scope.

I have tried a higher resolution with set_pause_dt(CLKFREQ/2000000); ..but all I see is a High output.

I have gone other way and programmed 10 uS increments with set_pause_dt(CLKFREQ/100000); and then programmed pause(1)... but again I see 40 uS.

What am I doing wrong?

Is there a better way to get 10 uS pulses than using pause() command?

Thanks,
BTL24 (Brian).

Comments

  • jmgjmg Posts: 15,173
    edited 2013-11-20 18:39
    Try some values further from the origin, eg see if 200, 201, 202 give 1us changes.

    Any delay library is going to have some minimum time, and then some granularity.
    With Prop GCC there may be other loading delays to consider as well.
  • BTL24BTL24 Posts: 54
    edited 2013-11-20 20:16
    I ran some more tests (found in library) on how to measure clock ticks against system timer called CNT. Results are discouraging, but explains why I cant get 1 uS increments in "pause" function.

    Optimization levels apparently make a difference, but I still cant get to 1 uS increments. I ran optimized for size for test results below. When I optimized for speed.... system just hung. Other optimization levels gave worse results.

    Results:

    For ptick10 (10 uS), display reports...PTick = 800 ticks against measured delay tick = 1216 ticks (or 1.52 times greater).

    For ptick20 (20uS), display reports ...PTick = 1600 ticks against measured delay tick = 2016 ticks (1.26 times greater).

    In summary, I am surprised that the caliber of processor with the prop cant get below 20 uS pause. What am I doing wrong? Is it the overhead of C?

    If this truly is the case... I will have to abandon the prop for a another processor. In my work, I have to manipulate bits and timing down to 100s of nS..sometimes smaller.

    #include "simpletools.h"                // Include simpletools header
    
    unsigned int us, ptick10, ptick20, ti, tf;
    
    int main()                              // main function
    {
      us = CLKFREQ/1000000;                //Establish speceial timer tick  of 1 uS
      ptick10 = 10 * us;                   //Tweaked for 10 us time
      ptick20 = 20 * us;                   //Tweaked for 20 uS time
      ti = CNT;
      pause_ticks(ptick10);
      tf = CNT;
      print("PTick = %d\n", ptick10);
      print("delay tick = %d\n",tf-ti);
    }
    
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-11-20 20:24
    The problem with any kind of pause function is, "Where does the pause begin and end?" The answer is that it begins after the procedure call overhead and ends before the procedure return overhead. In between those two events, it's totally accurate.

    The Propeller, with a timing granularity of 12.5 ns, can definitely do what you want; but don't expect it to happen in C, Spin, or any other high-level language. For that, you will have to take the time to learn Propeller assembly (PASM). It's nothing to be afraid of, BTW: PASM is one of the friendliest, most approachable assembly languages out there and well worth the time to master.

    -Phil
  • BTL24BTL24 Posts: 54
    edited 2013-11-20 20:41
    The problem with any kind of pause function is, "Where does the pause begin and end?" The answer is that it begins after the procedure call overhead and ends before the procedure return overhead. In between those two events, it's totally accurate.

    The Propeller, with a timing granularity of 12.5 ns, can definitely do what you want; but don't expect it to happen in C, Spin, or any other high-level language. For that, you will have to take the time to learn Propeller assembly (PASM). It's nothing to be afraid of, BTW: PASM is one of the friendliest, most approachable assembly languages out there and well worth the time to master.

    -Phil

    Phil,

    Thanks for the quick response... and I just was coming to that same conclusion. For the 10 uS and 20 usS critical timing I need, I thought of SPIN... but I guess you claim assembly would be better.

    Can in line assembly be placed inside a C program?... or is it best to assemble outside, and then include the function later during build?

    Regards,
    BTL24 (Brian)
  • jmgjmg Posts: 15,173
    edited 2013-11-20 20:56
    BTL24 wrote: »
    In summary, I am surprised that the caliber of processor with the prop cant get below 20 uS pause. What am I doing wrong? Is it the overhead of C?

    If this truly is the case... I will have to abandon the prop for a another processor. In my work, I have to manipulate bits and timing down to 100s of nS..sometimes smaller.

    On any uC sub-us is really going to need Assembler, and care.

    Can you clarify "manipulate bits and timing " - what granularity in edges do you need, what is Min and max time between edges, is this output only, or do you need to also capture/time edges arriving ?
  • jazzedjazzed Posts: 11,803
    edited 2013-11-20 21:13
    BTL24 wrote: »
    Can in line assembly be placed inside a C program?

    Yes. There are several examples. Here is one that mixes C+ASM. It could be faster without the C code.

    Care to guess what it does?
    __attribute__((fcache)) static void _outbyte(int bitcycles, int txmask, int value)
    {
      int j = 10;
      int waitcycles;
    
      waitcycles = CNT + bitcycles;
      while(j-- > 0) {
        __asm__ volatile("waitcnt %[_waitcycles], %[_bitcycles]"
        : [_waitcycles] "+r" (waitcycles)
        : [_bitcycles] "r" (bitcycles));
    
        __asm__ volatile("shr %[_value],#1 wc \n\t"
        "muxc outa, %[_mask]"
        : [_value] "+r" (value)
        : [_mask] "r" (txmask));
      }
    }
    
    For guidance on GCC in-line ASM look to this: http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

    Note that propeller-gcc in-line ASM is like PASM with regard to source and destination.
  • SRLMSRLM Posts: 5,045
    edited 2013-11-20 21:42
    BTL24 wrote: »
    Can in line assembly be placed inside a C program?... or is it best to assemble outside, and then include the function later during build?

    Well, by definition, "inline assembly" is inline and not linked in. Here's a tutorial on that:

    https://sites.google.com/site/propellergcc/documentation/tutorials/inline-asm-basics

    And this I2C driver uses inline assembly for the bit banging: https://github.com/libpropeller/libpropeller/blob/master/libpropeller/i2c/i2c_base.h
  • ersmithersmith Posts: 6,054
    edited 2013-11-21 04:46
    BTL24 wrote: »

    Is there a better way to get 10 uS pulses than using pause() command?

    Possibly not with the pause() function, but the waitcnt() function can give pauses down to about 0.2us (or 0.05us if you're inside an FCACHE loop or otherwise in COG code rather than LMM mode).
  • ersmithersmith Posts: 6,054
    edited 2013-11-21 05:05
    The Propeller, with a timing granularity of 12.5 ns, can definitely do what you want; but don't expect it to happen in C, Spin, or any other high-level language.
    It certainly can happen in C, although to get the maximum 12.5ns granularity will take some care, especially in LMM mode.

    For example:
    #include <stdio.h>
    #include <propeller.h>
    
    //
    // wait for N cycles
    // returns the actual number of cycles waited
    //
    
    __attribute__((fcache))
    int fastwait(int N)
    {
        int ti, tf;
        ti = CNT;
        waitcnt(ti+N);
        tf = CNT;
        return tf - ti;
    }
    
    int
    main()
    {
        unsigned int us, delay;;
        us = CLKFREQ / 1000000; // aim for 1 uS time
        printf("timing test:\n");
        delay = fastwait(us);
        printf("us = %d actual delay = %d\n", us, delay);
        return 0;
    }
    

    Produces:
    [ Entering terminal mode. Type ESC or Control-C to exit. ]
    timing test:
    us = 80 actual delay = 85
    
    4 of the extra 5 cycles are due to the fetch of CNT after the waitcnt; I'm not sure where the other cycle came from, but it's not C overhead (you'll get the same result with PASM).

    But Phil is right, PASM is a very friendly assembly language, and probably worth learning!

    Eric
  • BTL24BTL24 Posts: 54
    edited 2013-11-21 06:18
    jmg wrote: »
    On any uC sub-us is really going to need Assembler, and care.

    Can you clarify "manipulate bits and timing " - what granularity in edges do you need, what is Min and max time between edges, is this output only, or do you need to also capture/time edges arriving ?

    Agreed on assembler and care. I measured the prop C high() and low() functions... they each take 12 uS to execute, just by themselves! That, added to other overhead of calling functions and such, means I will never make the pulse times I desire with this approach. It is now making sense as to what is happening now in prop with C code. Assembly appears to be the answer.

    My Goal:

    What I am trying to do is create a serial stream of pulses with high and low values for precise uS of time. This will be used to send a signal to a string of GE Color Effects (GECE) smart pixels lights that reside on a 3 wire system... +5V, Gnd, and signal (serial stream). After an enumeration process (assigning each bulb and address), the color and intensity of light can be managed per bulb, as each is addressable.

    FYI...I have done this very reliably with SX processor for years using simple HIGH and LOW commands with PAUSEUS() function. By bit banging the signal line with pulses that meet the unique format and timing, I was able to talk to the bulbs successfully. Obviously the overhead with SX is less than with Prop C code. Trying to convert code to prop.

    GECE Serial Signal Format is as follows...

    Serial stream consists of :
    Start Pulse is high pulse for 10 uS
    Address of bulb - 6 bits
    Intensity - 8 bits
    Blue Color - 4 bits
    Green Color - 4 bits
    Red Color - 4 bits
    Idle time is 20 uS at a logical level of low


    Special Timing/Structure of "1" and "0"...
    Logical 0 is 10 uS low followed by 20 uS of high
    Logical 1 is 20 uS low followed by 10 uS of high
    Data is sent MSB first, LSB last

    I also may have to buffer up the Prop output of 3 v to 5 v levels . Not sure if GECE will recognize that low of a level... but that is another matter that I can easily solve.

    Regards,
    BTL24 (Brian)
  • BTL24BTL24 Posts: 54
    edited 2013-11-21 06:20
    jazzed wrote: »
    Yes. There are several examples. Here is one that mixes C+ASM. It could be faster without the C code.

    Care to guess what it does?
    __attribute__((fcache)) static void _outbyte(int bitcycles, int txmask, int value)
    {
      int j = 10;
      int waitcycles;
    
      waitcycles = CNT + bitcycles;
      while(j-- > 0) {
        __asm__ volatile("waitcnt %[_waitcycles], %[_bitcycles]"
        : [_waitcycles] "+r" (waitcycles)
        : [_bitcycles] "r" (bitcycles));
    
        __asm__ volatile("shr %[_value],#1 wc \n\t"
        "muxc outa, %[_mask]"
        : [_value] "+r" (value)
        : [_mask] "r" (txmask));
      }
    }
    
    For guidance on GCC in-line ASM look to this: http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

    Note that propeller-gcc in-line ASM is like PASM with regard to source and destination.

    Outstanding! It creates a delay in very fine resolution. AND I get to see how to inject assembly into C..


    I need to better understand fcache. I think this will be the way to speed things up. (or run in own cog).

    Thanks.

    Regards.
    BTL24 (Brian)
  • BTL24BTL24 Posts: 54
    edited 2013-11-21 06:21
    SRLM wrote: »
    Well, by definition, "inline assembly" is inline and not linked in. Here's a tutorial on that:

    https://sites.google.com/site/propellergcc/documentation/tutorials/inline-asm-basics

    And this I2C driver uses inline assembly for the bit banging: https://github.com/libpropeller/libpropeller/blob/master/libpropeller/i2c/i2c_base.h

    Thanks for the links...

    Brian
  • BTL24BTL24 Posts: 54
    edited 2013-11-21 06:23
    ersmith wrote: »
    It certainly can happen in C, although to get the maximum 12.5ns granularity will take some care, especially in LMM mode.

    For example:
    #include <stdio.h>
    #include <propeller.h>
    
    //
    // wait for N cycles
    // returns the actual number of cycles waited
    //
    
    __attribute__((fcache))
    int fastwait(int N)
    {
        int ti, tf;
        ti = CNT;
        waitcnt(ti+N);
        tf = CNT;
        return tf - ti;
    }
    
    int
    main()
    {
        unsigned int us, delay;;
        us = CLKFREQ / 1000000; // aim for 1 uS time
        printf("timing test:\n");
        delay = fastwait(us);
        printf("us = %d actual delay = %d\n", us, delay);
        return 0;
    }
    

    Produces:
    [ Entering terminal mode. Type ESC or Control-C to exit. ]
    timing test:
    us = 80 actual delay = 85
    
    4 of the extra 5 cycles are due to the fetch of CNT after the waitcnt; I'm not sure where the other cycle came from, but it's not C overhead (you'll get the same result with PASM).

    But Phil is right, PASM is a very friendly assembly language, and probably worth learning!

    Eric

    Thank you for the example. I learn the fastest with examples like yours. fcache apparently is critical to getting things moving fast with the prop.

    Regards,
    Brian
  • jmgjmg Posts: 15,173
    edited 2013-11-21 14:40
    ersmith wrote: »
    4 of the extra 5 cycles are due to the fetch of CNT after the waitcnt; I'm not sure where the other cycle came from, but it's not C overhead (you'll get the same result with PASM).

    The unexpected 5th cycle has to come from somewhere.

    What happens as you vary the delay value ? (there will be some min where it breaks )

    Also, how does the call overhead change ? - user code often does the pin-IO stuff either side of a customdelay() call, but for best precision, (least SW jitter), it is probably better to do the pin-IO inside the function, right next to the waitcnt() ?
  • kuronekokuroneko Posts: 3,623
    edited 2013-11-21 15:08
    jmg wrote: »
    The unexpected 5th cycle has to come from somewhere.
    There is no unexpected cycle. The code comes down to the following sequence:
    [COLOR="#FFA500"]mov     one, cnt
    add     one, #delay
    waitcnt one, #0[/COLOR]
    mov     two, cnt
    
    The difference between two and one is the time taken for add/waitcnt + 4. IOW it gives us the time for the first 3 insns. Which breaks down to 9{14} + 71. Nine cycles are required to stop waitcnt from blocking for a full period which gives us a base delay of 14 cycles. With the remainder of 71 you'll get 85.
  • jmgjmg Posts: 15,173
    edited 2013-11-21 16:17
    kuroneko wrote: »
    The difference between two and one is the time taken for add/waitcnt + 4. IOW it gives us the time for the first 3 insns. Which breaks down to 9{14} + 71. Nine cycles are required to stop waitcnt from blocking for a full period which gives us a base delay of 14 cycles. With the remainder of 71 you'll get 85.

    I do not quite follow this, The P1 data says MOV/ADD are 4 cycles, and that leaves WAITCNT which says 6+N, but it auto adjusts once it has got going.. so I can see 4 extras come from one ASM line. I can get 14 min as 4+4+6

    Maybe WaitCNT effectively needs one clock to complete/exit ? (which would make sense to me)
  • kuronekokuroneko Posts: 3,623
    edited 2013-11-21 16:41
    jmg wrote: »
    I do not quite follow this, The P1 data says MOV/ADD are 4 cycles, and that leaves WAITCNT which says 6+N, but it auto adjusts once it has got going.. so I can see 4 extras come from one ASM line. I can get 14 min as 4+4+6
    In order to achieve 14 cycle minimum you need to add 9 to cnt (being fetched first), otherwise you wait for rollover (cnt is sampled during the 3rd insn cycle, waitcnt does the first match during its 4th).
    mov     cnt, cnt
    add     cnt, #9{14} + N
    waitcnt cnt, #0
    
    With N equal 0 you get 14 cycles. With 80 passed in, N is 71 (14 + 71 = 85).

    attachment.php?attachmentid=105152&stc=1&d=1385080705
    657 x 565 - 18K
  • Mike GreenMike Green Posts: 23,101
    edited 2013-11-21 16:55
    In case you get confused about Kuroneko's example and its use of the "cnt" register, remember that the cog memory is actually 512 longs and each special function register (SFR) like CNT has a corresponding "shadow ram" location in the cog memory. For all read / write SFRs, you can't really access the shadow ram since all accesses refer to the SFR. For read-only SFRs like CNT, the SFR is only accessible as a source field. Any use of that location as a destination field actually accesses the shadow ram memory and Kuroneko is making use of that shadow ram location for temporary storage. Other, stranger, uses of shadow ram can occur. For example, read / write accesses write to shadow ram as well as the hardware register and instruction fetches are done from shadow ram rather than the SFRs.
  • jmgjmg Posts: 15,173
    edited 2013-11-21 17:21
    kuroneko wrote: »
    With N equal 0 you get 14 cycles. With 80 passed in, N is 71 (14 + 71 = 85).

    - but the user asked for 80, so the number added is 80. [add one, #delay]

    Missing from the cycles diagram is the exit cases, and exactly where in the 6 cycles do the added wait-clocks get packed.
    That detail determines where cnt is when the next opcode starts, which is the info I am chasing.

    The ideal code is where that 5 cycles is corrected for, inside the delay routine.
  • jmgjmg Posts: 15,173
    edited 2013-11-21 17:24
    Mike Green wrote: »
    In case you get confused about Kuroneko's example and its use of the "cnt" register, remember that the cog memory is actually 512 longs and each special function register (SFR) like CNT has a corresponding "shadow ram" location in the cog memory.

    Yes, I prefer the format of #16, especially for examples/tutorials.
  • kuronekokuroneko Posts: 3,623
    edited 2013-11-21 17:36
    jmg wrote: »
    - but the user asked for 80, so the number added is 80. [add one, #delay]
    Yes, did I say anything different? The point is that the insn sequence steals 9 out of those 80 to maintain its 14 cycle minimum. Which only leaves 71 extra (above minimum) cycles. Are we talking about the same thing here?
    jmg wrote: »
    Missing from the cycles diagram is the exit cases, and exactly where in the 6 cycles do the added wait-clocks get packed.
    That detail determines where cnt is when the next opcode starts, which is the info I am chasing.
    Can you elaborate? Not sure I understand the question.
  • jmgjmg Posts: 15,173
    edited 2013-11-21 17:48
    kuroneko wrote: »
    Can you elaborate? Not sure I understand the question.

    Your diagram shows S D w m . R for the 6 cycles min of WAIT, but does not show where that stretches.
    ie does it do [S D w m . . . . . R] or [S D w m . R R R R R ] or ?

    Whether is gains 1 count on entry, or exit, is less important than correcting for the +5 overall path.
  • kuronekokuroneko Posts: 3,623
    edited 2013-11-21 17:51
    jmg wrote: »
    Your diagram shows S D w m . R for the 6 cycles min of WAIT, but does not show where that stretches.
    ie does it do [S D w m . . . . . R] or [S D w m . R R R R R ] or ?
    The match cycle repeats, i.e.
    [S D w m m m m m m . R]
    
    which is actually not active code but the cog sleeping until some h/w decides it can continue.
  • jmgjmg Posts: 15,173
    edited 2013-11-21 18:04
    kuroneko wrote: »
    The match cycle repeats, i.e.
    [S D w m m m m m m . R]
    
    which is actually not active code but the cog sleeping until some h/w decides it can continue.

    Cool, so I make that, merging with your other code
    [ S  D  w  m  m  m  m  m  m  .  R][ S  D  e  R]
              14  .. 77 78 79 80 81 82  83 84 85
                              ^^EV            ^^RV
    EV is Exit-trigger value, and RV is read value.
    
    

    Porting this to a PinAction function, like the OP seeks, using the above example would this be coded something like ?
    __attribute__((fcache))
    void fastPulse(int N)
    {
        PinAction;
        waitcnt(cnt+N-5);  // 5 or now 9 ?
        PinAction;   // Pin edges are exactly N cycles apart, N >  ?
    }
    
    or maybe less compiler dependent is ?
    
    __attribute__((fcache))
    void fastPulse(int N)
    {
        int ti;
        PinAction;
        ti = CNT;
        waitcnt(ti+N-9);
        PinAction;   // Pin edges are exactly N cycles apart, N >  ?
    }
    
    
    
  • kuronekokuroneko Posts: 3,623
    edited 2013-11-21 18:19
    Whether it needs five or nine adjustment cycles really depends on the final order of insns. I'll have to play with the compiler a bit to see how the order is affected. But I'm sure that's now a minor issue :)
Sign In or Register to comment.