Shop OBEX P1 Docs P2 Docs Learn Events
Cogs running in parallel - Newbie stuff! — Parallax Forums

Cogs running in parallel - Newbie stuff!

Your typical useless local newbie here again, and I'm looking for information on using the major power of the Propeller2 chip - the multiple cogs running in parallel!

Example code and/or a detailed tutorial would be nice, and some pointers would help too!

For background, all this is supposed to be done in Spin2, tested there (on a P2 Edge w/ breadboard) then compiled to make it faster, but the whole point is to get things done in parallel.

Imagine, if you will, an orchestra. Pseudo-code:

In Cog 0, the Conductor raises the baton, and brings it down! Upon that signal
Cog 1, the violins go scrape
Cog 2, the woodwinds go tweet
Cog 3, the horns go honk
and in Cog 4 the tympani goes BOOM;

Obviously these should all happen in parallel. Each cog needs to share some data (e.g. the string "Beethoven's 5th") but not much (in my application, anyhow).

However, I'm terribly unclear on just how to put code in cogs (put code in the hub?), share data between cogs (put it into the hubram?) and fire off one signal that all the other cogs should pay attention to (cogatn?). If you stop a cog does it keep all its memories for the next start? Can you make them loop endlessly until the signal comes again or is restarting a better way?

I have code in Spin2 for Conductor(), violins(scrape), woodwinds(tweet), &c., but all running in sequence in Cog 0, making for lousy music*. Unfortunately, my search-fu upon the forum isn't coming up with much on how to parallel them - perhaps there's a thread or a project somewhere I've missed?

Anyhow, thanks much for what I have been able to find! I'll keep poking around. S.

  • Note: References to music are strictly for illustrative purposes. I'm not actually trying to do MIDI (or even video) here, just embedded parallel processing.

Comments

  • pik33pik33 Posts: 2,366
    edited 2021-03-30 12:58

    There are lot of possibilities.

    The simplest is:

    cogspin(cog#,,violins(),@sviolinstack)

    It starts the function violins() in the cog number cog# using a stack space in violinstack() which you have to declare

    You can start all your 4 functions in 4 cogs that way.

    They will run until you use cogstop (cog#)

    The cog can stop itself using cogstop(cogid)

    Better don't stop and restart cogs if you don't want to change cog's function. You can always make it sleep and wait for an event.

    They can talk to each other using:

    • hub ram locations
    • cogatn
    • you can start a pair of cogs with shared LUT RAM
    • smart pin as a storage
    • you can also waste some pins and make a serial communication between cogs using these pins as serial transmi/receive.

    This is a Propeller. There is always another method to do something.

  • @pik33 said:
    They will run until you use cogstop (cog#)

    The cog can stop itself using cogstop(cogid)

    A spin cog also stops itself when the function it is started with ends/returns

  • @pik33 said:

    This is a Propeller. There is always another method to do something.

    I don't really want lots of ways! I get enough of that from Windows (16 ways to do 16 different things, but lord help you if you want to do a 17th...) I'd like one way - or at least a comparison job of the multiple ways. Perhaps JonnyMac needs to do another article for Nuts&Volts :)

    @pik33 said:
    There are lot of possibilities.

    The simplest is:

    cogspin(cog#,,violins(),@sviolinstack)

    It starts the function violins() in the cog number cog# using a stack space in violinstack() which you have to declare

    This looks excellent, but how do I determine "sviolinstack"? In my real application I don't need much space (a dozen longs, perhaps, not counting subroutine calls), although I'll need some, and I want it to be fast, and I'll eventually need to feed a ring buffer into it (from chips external to the Prop(!)) but baby steps here. I think I can keep track of cog numbers in my head - once determined, they won't change, and there's no need to spontaneously start new ones (unless the Pink Panther invades my orchestra!).

    I'm thinking to throw 'flags' around, I'd be interested in the 16 'semaphores', sort of like, in the cog:

    10 repeat while !(downbeat)
    20 silence
    30 Then
    40 Play note, and goto 10

    That way the cog just keeps running. An interrupt would be even better.

    And it leads to all sorts of other fun stuff. In Spin2, how does one say "Put this variable in HubRam?" and how to make a cog play "fetch" (if I may... :p ) Sometimes the shared data will be the same, so re-fetching it's a waste. Sometimes it'll change, so has to be re-fetched (another flag?). I understand the documentation's a bit of a work in progress - hopefully I can figure it out and maybe writing it down will help. Thanks again, S.

  • pik33pik33 Posts: 2,366
    edited 2021-03-31 04:40

    Declare a spin variable in var section. 64 longs should be enough.

    var
    
    long violinstack[64]
    

    If the procedure is started by cogspin, you don't need to do anything to get your variable, simply use it.

    var
    
    long i
    

    from cog1:

    i:=1

    From cog 2

    repeat
      repeat while i=0
      '(do something with i)
      i:=0
    

    If you want to use your cog in cog ram mode with asm written code, you can pass the pointer to the cog via PTRA register

    var
    
    long i
    '(..)
    
    coginit(cog#, @code,@i)
    
    (...)
    dat
       org 
    (...)
      rdlong asmvar,ptra
    (...)
     asmvar  long 0
    (...)
    
    

    There is a lot of examples added to the Propeller Tool and Flexspin: read them.

  • Awesome! Looks good, I'll try it. I hope I won't need incessant hand-holding all the way, but old dog, new tricks. I'll update and get back to you when it's behaving itself. Thanks! S.

  • ScroungreScroungre Posts: 161
    edited 2021-04-01 08:29

    Hm. Some success - I wrote up some cog code, and it runs in its own cog happily enough. But then I spun up three more, that do basically the same thing on different data, and they trample on each other. Running each one separately they each come out with the correct results (I'm feeding them pre-calculated test numbers) but as soon as I spin up two or more, the results are all off.

    cogspin(1,Ax0_1(),@stack0)
    'cogspin(2,Ax2_3(),@stack2)
    'cogspin(3,Ax4_5(),@stack4)
    'cogspin(4,Ax6_7(),@stack6)

    Each on their own gives the correct results! (and the test data is such that all the results are different), but un-commenting any two (or more) of those lines causes all of them to give incorrect results. They're consistent between runs, and uncommenting different combinations gets you different numbers, but they're all wrong. Changing the stack size (I tried 32, 64, and 128) doesn't seem to make any difference.

    Obviously more poking at it is called for. Thanks for the headstart! S.

    ETA: Using entire bytes as flags seems to have helped a lot, although the waste bugs me a bit - I like using bits in a byte (or word, or long!) as flags, but I suspect it meant that particular byte got passed around through all the cogs, and sometimes showed up improperly updated. Ah well, seems happier now (although running it through the flexprop compiler cooked up all sorts of garbage. Wow. Not done yet...)

  • Cluso99Cluso99 Posts: 18,069

    Paste your whole code between three backticks begore and after you code. Then we might be able to help.
    As for flags v bytes v longs, best to use bytes or longs. When you get to pasm, remember code and variables in cog and lut ram can only be longs.

  • pik33pik33 Posts: 2,366

    You have to remember that if several cogs works on one variable, side effects may occur

    If 2 cogs will do

    rdlong something, somewhere
    add something, 1
    wrlong something, somewhere

    you may end with added 1 instead of 2.

    There are hardware locks available

  • Every time I try to paste code here it turns into a disaster area, trashing the format something fierce (CR/LF issues, among other things). But I'll keep trying.

    Most of the compiler weirdness turned out to be that it seems FlexProp & flexspin does NOT automatically initialize all variables to zero while the Propeller Tool does, so I got to stick in a lot more declarations of zero, and it got happier. Probably a good idea anyhow.

    Oddly enough I was getting bigger numbers than I expected - 999 was expected on one channel, and I kept getting things from 1,090 to 1,300-ish - they were all high by about the same percentage. But they're happy now. I considered lots of locks, but that would sort of defeat the parallelism. Lots of bytes worked (Effectively, I've cloned the conductor 16 times, and now there's sixteen near-simultaneous downbeats... :) )

    And for now I'm sorta trying to avoid PASM, not because it's bad in any way (I like assembler - wrote quite a lot of it for AVRs) but it's a learning curve I'll climb later. For now, compiled spin is fast enough (interpreted spin might be fast enough, but it's looking highly marginal. The 3-4x speedup the compiler seems to get me looks like it'll do fine). S.

  • VAR variables might be 0 after load but not after reset, they keep their current value. DAT variables are set by yourself.

    local variables in a method are NOT initialized just return values are set to 0 by default.

    This behavior is identical in SPIN1, SPIN2, PropTool, Pnut and FlexProp.

    Enjoy!

    Mike

  • @msrobots said:
    VAR variables might be 0 after load but not after reset, they keep their current value. DAT variables are set by yourself.

    local variables in a method are NOT initialized just return values are set to 0 by default.

    This behavior is identical in SPIN1, SPIN2, PropTool, Pnut and FlexProp.

    Enjoy!

    Mike

    VARs are always initialized to zero and there's plenty of code that would not work if that wasn't the case

  • I assumed that too, but was surprised that a reset (or soft reset via cmd) does not clear the hub RAM, neither on P1 or P2.

    It might be that the SPIN interpreter clears the VARs on load of the object, have not tested it, RES in PASM contains whatever is behind the DAT section since a COG always loads the full COG ram regarding of PASM size.

    Anyways it is a good thing in programming to initialize vars in general and not assume any given start value.

    Enjoy!

    Mike

  • ScroungreScroungre Posts: 161
    edited 2021-04-02 06:08

    This is readily testable, so I did. Here is the compleat code, three backticks 'n all**:

    ```' Move tinkering

    CON
    _clkfreq = 350_000_000 ' Why not run as fast as we can? (350M)
    DEBUG_BAUD = 2_000_000

    VAR
    stack0[32]
    stack1[32]
    stack2[32]
    stack3[32]

    BYTE S0
    BYTE S1
    BYTE S2
    BYTE S3
    BYTE S4
    BYTE S5
    BYTE S6
    BYTE S7

    BYTE B0
    BYTE B1
    BYTE B2
    BYTE B3
    BYTE B4
    BYTE B5
    BYTE B6
    BYTE B7

    BYTE Z0
    BYTE Z1
    BYTE Z2
    BYTE Z3
    BYTE Z4
    BYTE Z5
    BYTE Z6
    BYTE Z7

    PUB main()

    S0 := 1 ' Setting all the S bytes to a fixed value
    S1 := 2 '
    S2 := 3 '
    S3 := 4 '
    S4 := 5 '
    S5 := 6 '
    S6 := 7 '
    S7 := 8 '

    ' But deliberately NOT setting the B bytes to zero!
    ' But deliberately NOT setting the Z bytes to zero!

    ' Edge board LEDs:
    pinl(57) ' On program end, these both reappear.
    pinl(56)

    ' Cog 0 is us here now
    'cogspin(1,A0(),@stack0)
    'cogspin(2,A1(),@stack1)
    'cogspin(3,A2(),@stack2)
    'cogspin(4,A3(),@stack3)

    repeat 500 ' For now, to make it give up eventually
    SplusB()
    debug(udec_byte(S0), udec_byte(B0), udec_byte(Z0))
    debug(udec_byte(S1), udec_byte(B1), udec_byte(Z1))
    debug(udec_byte(S2), udec_byte(B2), udec_byte(Z2))
    debug(udec_byte(S3), udec_byte(B3), udec_byte(Z3))
    debug(udec_byte(S4), udec_byte(B4), udec_byte(Z4))
    debug(udec_byte(S5), udec_byte(B5), udec_byte(Z5))
    debug(udec_byte(S6), udec_byte(B6), udec_byte(Z6))
    debug(udec_byte(S7), udec_byte(B7), udec_byte(Z7))

    ' More Edge board LEDs
    repeat '
    pint(56) ' Just leaving it here after we're all done
    waitms(1000)
    pint(57)
    waitms(1000)

    PUB SplusB()

    Z0 := S0 + B0
    Z1 := S1 + B1
    Z2 := S2 + B2
    Z3 := S3 + B3
    Z4 := S4 + B4
    Z5 := S5 + B5
    Z6 := S6 + B6
    Z7 := S7 + B7

    ```

    I left in some cogspins (commented out) because I thought it might make a difference, but it doesn't.

    The Propeller Tool (2.5.3) gives the expected results if the variables are zeroed:

    When the code is merely cut and pasted into FlexProp (5.2.0), without editing, the results are garbage*:

    Turning the Propeller Edge off and on again seemed to make basically no difference. Perhaps I need a newer version of FlexProp. Have fun, all. S.

    • Turning the board off while the FlexProp debug window was open got me lots and lots of annoyed beeping. OS is Windows 7 64-bit.

    ** changing the tick direction changed not a thing.

  • @msrobots said:
    Anyways it is a good thing in programming to initialize vars in general and not assume any given start value.

    Enjoy!

    Mike

    Yes, one of the first rules!

  • TonyB_TonyB_ Posts: 2,178
    edited 2021-04-03 00:08

    @Scroungre said:
    This is readily testable, so I did. Here is the complete code, three backticks 'n all**:

    Three backticks must be on their own on a line:

    ' More tinkering
    
    CON
    _clkfreq = 350_000_000 ' Why not run as fast as we can? (350M)
    DEBUG_BAUD = 2_000_000
    
    VAR
    stack0[32]
    stack1[32]
    stack2[32]
    stack3[32]
    
    BYTE S0
    BYTE S1
    BYTE S2
    BYTE S3
    BYTE S4
    BYTE S5
    BYTE S6
    BYTE S7
    
    BYTE B0
    BYTE B1
    BYTE B2
    BYTE B3
    BYTE B4
    BYTE B5
    BYTE B6
    BYTE B7
    
    BYTE Z0
    BYTE Z1
    BYTE Z2
    BYTE Z3
    BYTE Z4
    BYTE Z5
    BYTE Z6
    BYTE Z7
    
    PUB main()
    
    S0 := 1 ' Setting all the S bytes to a fixed value
    S1 := 2 '
    S2 := 3 '
    S3 := 4 '
    S4 := 5 '
    S5 := 6 '
    S6 := 7 '
    S7 := 8 '
    
    ' But deliberately NOT setting the B bytes to zero!
    ' But deliberately NOT setting the Z bytes to zero!
    
    ' Edge board LEDs:
    pinl(57) ' On program end, these both reappear.
    pinl(56)
    
    ' Cog 0 is us here now
    'cogspin(1,A0(),@stack0)
    'cogspin(2,A1(),@stack1)
    'cogspin(3,A2(),@stack2)
    'cogspin(4,A3(),@stack3)
    
    repeat 500 ' For now, to make it give up eventually
    SplusB()
    debug(udec_byte(S0), udec_byte(B0), udec_byte(Z0))
    debug(udec_byte(S1), udec_byte(B1), udec_byte(Z1))
    debug(udec_byte(S2), udec_byte(B2), udec_byte(Z2))
    debug(udec_byte(S3), udec_byte(B3), udec_byte(Z3))
    debug(udec_byte(S4), udec_byte(B4), udec_byte(Z4))
    debug(udec_byte(S5), udec_byte(B5), udec_byte(Z5))
    debug(udec_byte(S6), udec_byte(B6), udec_byte(Z6))
    debug(udec_byte(S7), udec_byte(B7), udec_byte(Z7))
    
    ' More Edge board LEDs
    repeat '
    pint(56) ' Just leaving it here after we're all done
    waitms(1000)
    pint(57)
    waitms(1000)
    
    PUB SplusB()
    
    Z0 := S0 + B0
    Z1 := S1 + B1
    Z2 := S2 + B2
    Z3 := S3 + B3
    Z4 := S4 + B4
    Z5 := S5 + B5
    Z6 := S6 + B6
    Z7 := S7 + B7
    
    
  • @TonyB_ said:
    Three backticks must be on their own on a line:

    'anks. Learn something new every day. S.

Sign In or Register to comment.