Shop OBEX P1 Docs P2 Docs Learn Events
Multiple cogs and shared memory... — Parallax Forums

Multiple cogs and shared memory...

pgbpsupgbpsu Posts: 460
edited 2007-11-19 21:34 in Propeller 1
I have an application that will require 3 different cogs (all running assembly code) to share Hub memory. Two cogs (running more or less the same code in parallel) will populate the HUB memory. The third cog will pull data from HUB memory and spit it out on OUTA. I have successfully written a few assembly methods (with many thanks to all the contributors here), but never done multiple cogs or share memory via the HUB.

After hours and hours of reading the forums and pouring over deSilva's "Tutorial" I figured it was time to jump in. I've put together the following code to gain some understanding of how to do this. However I'm getting results I don't expect. Here's what I want to do followed by the code, which is clearly what I'm actually doing.

1. Set aside 40bytes of Hub memory called Shared - I'm thinking of this variable as an array. I know deSilva will say there are no arrays in spin, but I'm using this mostly as a way to conceive of the problem.
2. launch one cog to write the current system counter to bytes 0-3;8-11;16-19; etc - the even elements of Shared
3. launch second cog to write the current system counter to bytes 4-7;12-15; etc the odd elements of Shared
4. After filling all 40 bytes of shared, display the results.

I'm using PropTerminal (what a great tool! wink.gif ) to view the results.

There are a couple of commented-out lines in the assembly methods which I used to prove to myself that my way of putting info into Shared worked. I can successfully use one cog to put 0,2,4,6,8 into Shared and another to put 1,3,5,7,9. Getting them out as 0,1,2,3,4,5,6,7,8,9. So I think that part of the code is working. However, my "poor man's" way of sync the cogs (asking them both to waitcnt = 50) is WAY off, although I'm not sure why. That seems the most basic way to use that command.

Here's the code followed by my results:

{{ SharedMemoryTest.spin 
This is my first attempt at starting mutiple cogs and having
them populate main memory space.  This program should start 2
different cogs running very similar assembly code.  Cog1 should
fill even longs of the shared memory.  Cog2 should fill odd entries
in shared memory.  The shared memory is then displayed to a
terminal window using PropTerminal (written by October 2007,  Andy Schenk, Switzerland )
}}

CON

  _clkmode = xtal1 + pll16x
  _xinfreq = 5_000_000

  #0, WHITE,CYAN,RED,PINK,BLUE,GREEN,YELLOW,GRAY        'Standard colors

OBJ
  term   :       "PC_Interface"   'Include PropTerminal object for debugging
  Num    :       "Numbers"        'Include Numbers object for writing numbers to terminal
  
VAR
  long  cx,cy,ch,mx,my,mxo,myo  ' Variables for PropTerminal
  byte   Index                  ' Index into shared variable
  long   Counter1               ' Place to store cnt value
  long   Shared[noparse][[/noparse]10]             ' Shared variable (Spin and Assembly)

PUB Main
' get start time of this method
   Counter1 := cnt
   cognew(@ADC1, @Shared)       ' Launch cog to populate even half of shared variable
   cognew(@ADC2, @Shared)       ' Launch cog to populate odd half of shared variable
   waitcnt(5000)                          ' Do nothing so ADC1 and 2 have time to finish
   Display                                   ' Display results


PUB Display
  ' this method displays the results of our program

  'start the interface
  term.start(31,30)


  repeat while term.abs_x == 0    'wait for PropTerminal.exe started

  setpos(0,12)
  term.setcol(BLUE)
  Num.Init                                    'Initialize Numbers   

  term.Str(string("Start time of Program: "))                'display it and  
  term.Str(Num.ToStr((Counter1), Num#DDEC))           'result in decimal
  term.Out(13)
  
' Loop over our variable and display each entry in Shared
  repeat Index from 0 to 9
    term.Str(string("Index: "))                                   'then display it and  
    term.Str(Num.ToStr((Index), Num#DDEC))           'its result in decimal
    term.Str(Num.ToStr((Shared[noparse][[/noparse]Index]), Num#DDEC))           'its result in decimal
    term.Out(13)    

' get end time of this method
  Counter1 := cnt
  term.Str(string("End time of Display: "))                    'then display it and  
  term.Str(Num.ToStr((Counter1), Num#DDEC))           'its result in decimal
  term.Out(13)
  

PRI setpos(px, py)

  term.out(10)                                                  
  term.out(px)
  term.out(11)
  term.out(py)

DAT
ADC1          ORG       0
{{ This cog should put CNT into the "even" elements of Shared.
Consider Shared to be an array with 10 elements.  This method
should only alter Shared[noparse][[/noparse]0],Shared,Shared,...Shared[noparse][[/noparse]8]

Shared is 40 bytes long.  If we start at the beginning, our
first wrlong will fill bytes 0-3 of Shared, the next time
this method writes to Shared we want to write to 8-11, then
16-19.  After wrlong to addr1, I add 8 to addr1 to skip over
the "odd" entries.  Those should be popluated by the method
below.  }}
              mov       addr1, PAR              ' get starting addres of shared
              mov       iterations1, #5         ' set number of iterations = 1/2 length of Shared
              waitcnt   50, #100                 ' wait for CNT = 50
                                                         ' my attempt at syncing cogs
              mov       Val_Reg1, cnt
'              mov       Val_Reg1, #0
:loop       wrlong    Val_Reg1, addr1     ' write CNT to Shared
              add       addr1, #8
'              add       Val_reg1, #2
              mov       Val_reg1, cnt
              djnz      iterations1, #:loop    ' move to next group of samples  
              cogid     addr1                      ' only here because I was following
                                                         ' others examples NOT SURE WHAT THESE
              cogstop   addr1                     ' DO

addr1          res       1                          ' Main Memory address to write to 
Val_Reg1       res       1
iterations1    res       1                         ' Number of loops
               
ADC2          ORG       0
{{ This cog should put CNT into the ODD elements of Shared.
Consider Shared to be an array with 10 elements.  This method
should only alter Shared,Shared,Shared,...Shared[noparse][[/noparse]9]

Shared is 40 bytes long.  If we start at the beginning, we want our
first wrlong will fill bytes 4-7 of Shared, the next time
this method writes to Shared we want to write to 12-15, then
20-23.  I start by adding 4 to addr2 so we will start writing
into Shared.  Thereafter, add 8 to addr2 to skip over
the "even" entries.  Those should be popluated by the method
above.  }}
              mov       addr2, PAR              ' get starting addres of Shared
              mov       iterations2, #5         ' set number of iterations = 1/2 length of Shared
              add       addr2, #4                 ' skip over first 4-bytes.  These
                                                         ' should have been written above.  This
                                                         ' should now be ready to write odd entries in Shared
              waitcnt   50, #100                 ' wait for CNT = 50
                                                         ' my attempt at syncing cogs
              mov       Val_Reg2, cnt
'              mov       Val_Reg2, #1
:loop       wrlong    Val_Reg2, addr2      ' write CNT to Shared
              add       addr2, #8
'              add       Val_reg2, #2
              mov       Val_reg2, cnt
              djnz      iterations2, #:loop     ' move to next group of samples  
              cogid     addr2                       ' only here because I was following
                                                          ' others examples NOT SURE WHAT THESE
              cogstop   addr2                      ' DO

addr2          res       1
Val_Reg2       res       1
iterations2    res       1
               
                                                 




Here's what I get when I run this code: In the lines which read "Index: 0 50,660,879", 0 is the element of Shared and 50,660,879 is the CNT value that was placed there.

Start time of Program: 1,306,470,624
Index: 0 50,660,879
Index: 1 17,957,899
Index: 2 50,660,902
Index: 3 17,957,928
Index: 4 50,660,934
Index: 5 17,957,960
Index: 6 50,660,966
Index: 7 17,957,992
Index: 8 50,660,998
Index: 9 17,958,024
End time of Display: 249,393,280

I was expecting Index 0 to have a CNT value of 54. The cog should only move beyond the waitcnt command when system CNT = 50. Add to that the mov instructions (4 clocks) and I get 54. Since the two cogs are running in parallel (ADC1 and ADC2) I expect Index1 to also be 54. I expected subsequent entries in Shared to increase between 19 and 34 counts in both the even and odd elements. I also expected this difference between Index0 and 2,4,6,8 and Index1 and 3,5,7,9 to remain constant.

Clearly I'm missing some important elements. Like why the counter values are so far off from what I expect. Beyond that, I'd appreciate comments on my reasoning and implementation.
Constructive criticism welcome.

Thanks,
Peter

Comments

  • Fred HawkinsFred Hawkins Posts: 997
    edited 2007-11-19 17:22
    Add two bytes to your array. Let each loader routine toggle their byte when they are done. Let your reader check the two bytes, and read the array only when both bytes say the data is good to go. No wait states at all.
  • pgbpsupgbpsu Posts: 460
    edited 2007-11-19 17:46
    Hi Fred-

    Thanks for the tip. I'll give it a try. If I understand how this trick should work, it will only tell me when both cogs have finished filling there portion of the array, which is a step in the right direction. My application is very timing sensitive (hence everything in assembly). I don't see how this trick will help to sync the two cogs filling the array.

    It seems the fundamental problem with the code I posted it that the two cogs aren't starting to fill the array at exactly the same time despite my request that they waitcnt 50. That the first gap in my understanding.

    I'll give your suggestion a try and see what the results look like.

    Thanks,
    p
  • Fred HawkinsFred Hawkins Posts: 997
    edited 2007-11-19 17:56
    True, it ignors sync between the loaders. But I don't think that their data intersects in any way. They are independent routines trudging along in their own cog.

    I suppose the loaders ought to hang on the condition bytes being cleared by the reader when it has finished getting the data.
  • AribaAriba Posts: 2,685
    edited 2007-11-19 19:11
    pgbpsu

    WAITCNT 50,#100 does not wait for the value 50 but for the value in register 50. It can work if the value of reg 50 is the same in both cogs [noparse];)[/noparse]
    And you know that syncing with waitcnt to a fix value (#50 or any other) can take more than 53 seconds @80MHz in worst case (when the cnt was 51 before the waitcnt executes).

    So the better way is to calculate a cnt value in the spin Main methode before starting the 2 ASM cogs and pass this value to both cogs. You can use the Shared Array for that!

    in Main:
       Counter1 := cnt
       Shared[noparse][[/noparse]0] := Counter1+clkfreq  'Sync after 1 second
       cognew(@ADC1, @Shared)       ' Launch cog to populate even half of shared variable
       cognew(@ADC2, @Shared)       ' Launch cog to populate odd half of shared variable
    
    in the ASM cogs:
       rdlong syncval,par
       waitcnt syncval,#100
    ...
    syncval   res  1
    
    



    Andy

    (P.S. the "pc_interface" object has also a .dec methode: term.dec(Shared[noparse][[/noparse]Index]), you don't need the "Numbers" object)
  • pgbpsupgbpsu Posts: 460
    edited 2007-11-19 20:03
    Ariba-

    I was aware of the 53 second roll-over problem, but was willing to wait if it would sync my cogs. As you've pointed out, I wasn't implementing that correctly AND it's not necessary. Thanks.

    I've altered my code as per your suggestions and here's the result:

    Start time of Program: 331,237,295
    Index: 1 411,237,300
    Index: 2 411,237,300
    Index: 3 411,237,317
    Index: 4 411,237,319
    Index: 5 411,237,349
    Index: 6 411,237,351
    Index: 7 411,237,381
    Index: 8 411,237,383
    Index: 9 411,237,413
    Index: 10 411,237,415
    End time of Display: 564,729,423

    At first I couldn't figure out why entries 3 and 4 differed by 2 clk counts. idea.gif

    Since the two cogs which are populating my shared variable are consecutive, those two clk cycles must be the time it takes the HUB to move from cog0 to cog1 in the wrlong command. Am I finally getting the hang of this? hop.gif

    I think I can use this example (and the terrific PropTerminal) to move to my larger application. This simple program has convinced me I understand it well enough to tackle my real application.

    Thanks Ariba!
  • pgbpsupgbpsu Posts: 460
    edited 2007-11-19 20:08
    Ariba-

    Thanks for you code suggestions and for the great PropTerminal. I'd be dead in the water without that utility.

    What a great contribution to the community.
    Thanks,
    Peter
  • AribaAriba Posts: 2,685
    edited 2007-11-19 21:34
    I'm glad to hear that smile.gif

    Your results make a lot of sense:
    - the first 2 values are equal at both cogs because of the synchronisation with waitcnt
    - then the wrlong of the second cog synchronizes to the hub 2 clocks displaced from the first cog.
    - one loop needs 32 cycles because the wrlong synchronize always to a multiple of 16 clocks (2 clocks per cog * 8 cogs).

    Andy
Sign In or Register to comment.