Multiple cogs and shared memory...
pgbpsu
Posts: 460
I have an application that will require 3 different cogs (all running assembly code) to share Hub memory. Two cogs (running more or less the same code in parallel) will populate the HUB memory. The third cog will pull data from HUB memory and spit it out on OUTA. I have successfully written a few assembly methods (with many thanks to all the contributors here), but never done multiple cogs or share memory via the HUB.
After hours and hours of reading the forums and pouring over deSilva's "Tutorial" I figured it was time to jump in. I've put together the following code to gain some understanding of how to do this. However I'm getting results I don't expect. Here's what I want to do followed by the code, which is clearly what I'm actually doing.
1. Set aside 40bytes of Hub memory called Shared - I'm thinking of this variable as an array. I know deSilva will say there are no arrays in spin, but I'm using this mostly as a way to conceive of the problem.
2. launch one cog to write the current system counter to bytes 0-3;8-11;16-19; etc - the even elements of Shared
3. launch second cog to write the current system counter to bytes 4-7;12-15; etc the odd elements of Shared
4. After filling all 40 bytes of shared, display the results.
I'm using PropTerminal (what a great tool! ) to view the results.
There are a couple of commented-out lines in the assembly methods which I used to prove to myself that my way of putting info into Shared worked. I can successfully use one cog to put 0,2,4,6,8 into Shared and another to put 1,3,5,7,9. Getting them out as 0,1,2,3,4,5,6,7,8,9. So I think that part of the code is working. However, my "poor man's" way of sync the cogs (asking them both to waitcnt = 50) is WAY off, although I'm not sure why. That seems the most basic way to use that command.
Here's the code followed by my results:
Here's what I get when I run this code: In the lines which read "Index: 0 50,660,879", 0 is the element of Shared and 50,660,879 is the CNT value that was placed there.
Start time of Program: 1,306,470,624
Index: 0 50,660,879
Index: 1 17,957,899
Index: 2 50,660,902
Index: 3 17,957,928
Index: 4 50,660,934
Index: 5 17,957,960
Index: 6 50,660,966
Index: 7 17,957,992
Index: 8 50,660,998
Index: 9 17,958,024
End time of Display: 249,393,280
I was expecting Index 0 to have a CNT value of 54. The cog should only move beyond the waitcnt command when system CNT = 50. Add to that the mov instructions (4 clocks) and I get 54. Since the two cogs are running in parallel (ADC1 and ADC2) I expect Index1 to also be 54. I expected subsequent entries in Shared to increase between 19 and 34 counts in both the even and odd elements. I also expected this difference between Index0 and 2,4,6,8 and Index1 and 3,5,7,9 to remain constant.
Clearly I'm missing some important elements. Like why the counter values are so far off from what I expect. Beyond that, I'd appreciate comments on my reasoning and implementation.
Constructive criticism welcome.
Thanks,
Peter
After hours and hours of reading the forums and pouring over deSilva's "Tutorial" I figured it was time to jump in. I've put together the following code to gain some understanding of how to do this. However I'm getting results I don't expect. Here's what I want to do followed by the code, which is clearly what I'm actually doing.
1. Set aside 40bytes of Hub memory called Shared - I'm thinking of this variable as an array. I know deSilva will say there are no arrays in spin, but I'm using this mostly as a way to conceive of the problem.
2. launch one cog to write the current system counter to bytes 0-3;8-11;16-19; etc - the even elements of Shared
3. launch second cog to write the current system counter to bytes 4-7;12-15; etc the odd elements of Shared
4. After filling all 40 bytes of shared, display the results.
I'm using PropTerminal (what a great tool! ) to view the results.
There are a couple of commented-out lines in the assembly methods which I used to prove to myself that my way of putting info into Shared worked. I can successfully use one cog to put 0,2,4,6,8 into Shared and another to put 1,3,5,7,9. Getting them out as 0,1,2,3,4,5,6,7,8,9. So I think that part of the code is working. However, my "poor man's" way of sync the cogs (asking them both to waitcnt = 50) is WAY off, although I'm not sure why. That seems the most basic way to use that command.
Here's the code followed by my results:
{{ SharedMemoryTest.spin This is my first attempt at starting mutiple cogs and having them populate main memory space. This program should start 2 different cogs running very similar assembly code. Cog1 should fill even longs of the shared memory. Cog2 should fill odd entries in shared memory. The shared memory is then displayed to a terminal window using PropTerminal (written by October 2007, Andy Schenk, Switzerland ) }} CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 #0, WHITE,CYAN,RED,PINK,BLUE,GREEN,YELLOW,GRAY 'Standard colors OBJ term : "PC_Interface" 'Include PropTerminal object for debugging Num : "Numbers" 'Include Numbers object for writing numbers to terminal VAR long cx,cy,ch,mx,my,mxo,myo ' Variables for PropTerminal byte Index ' Index into shared variable long Counter1 ' Place to store cnt value long Shared[noparse][[/noparse]10] ' Shared variable (Spin and Assembly) PUB Main ' get start time of this method Counter1 := cnt cognew(@ADC1, @Shared) ' Launch cog to populate even half of shared variable cognew(@ADC2, @Shared) ' Launch cog to populate odd half of shared variable waitcnt(5000) ' Do nothing so ADC1 and 2 have time to finish Display ' Display results PUB Display ' this method displays the results of our program 'start the interface term.start(31,30) repeat while term.abs_x == 0 'wait for PropTerminal.exe started setpos(0,12) term.setcol(BLUE) Num.Init 'Initialize Numbers term.Str(string("Start time of Program: ")) 'display it and term.Str(Num.ToStr((Counter1), Num#DDEC)) 'result in decimal term.Out(13) ' Loop over our variable and display each entry in Shared repeat Index from 0 to 9 term.Str(string("Index: ")) 'then display it and term.Str(Num.ToStr((Index), Num#DDEC)) 'its result in decimal term.Str(Num.ToStr((Shared[noparse][[/noparse]Index]), Num#DDEC)) 'its result in decimal term.Out(13) ' get end time of this method Counter1 := cnt term.Str(string("End time of Display: ")) 'then display it and term.Str(Num.ToStr((Counter1), Num#DDEC)) 'its result in decimal term.Out(13) PRI setpos(px, py) term.out(10) term.out(px) term.out(11) term.out(py) DAT ADC1 ORG 0 {{ This cog should put CNT into the "even" elements of Shared. Consider Shared to be an array with 10 elements. This method should only alter Shared[noparse][[/noparse]0],Shared,Shared,...Shared[noparse][[/noparse]8] Shared is 40 bytes long. If we start at the beginning, our first wrlong will fill bytes 0-3 of Shared, the next time this method writes to Shared we want to write to 8-11, then 16-19. After wrlong to addr1, I add 8 to addr1 to skip over the "odd" entries. Those should be popluated by the method below. }} mov addr1, PAR ' get starting addres of shared mov iterations1, #5 ' set number of iterations = 1/2 length of Shared waitcnt 50, #100 ' wait for CNT = 50 ' my attempt at syncing cogs mov Val_Reg1, cnt ' mov Val_Reg1, #0 :loop wrlong Val_Reg1, addr1 ' write CNT to Shared add addr1, #8 ' add Val_reg1, #2 mov Val_reg1, cnt djnz iterations1, #:loop ' move to next group of samples cogid addr1 ' only here because I was following ' others examples NOT SURE WHAT THESE cogstop addr1 ' DO addr1 res 1 ' Main Memory address to write to Val_Reg1 res 1 iterations1 res 1 ' Number of loops ADC2 ORG 0 {{ This cog should put CNT into the ODD elements of Shared. Consider Shared to be an array with 10 elements. This method should only alter Shared,Shared,Shared,...Shared[noparse][[/noparse]9] Shared is 40 bytes long. If we start at the beginning, we want our first wrlong will fill bytes 4-7 of Shared, the next time this method writes to Shared we want to write to 12-15, then 20-23. I start by adding 4 to addr2 so we will start writing into Shared. Thereafter, add 8 to addr2 to skip over the "even" entries. Those should be popluated by the method above. }} mov addr2, PAR ' get starting addres of Shared mov iterations2, #5 ' set number of iterations = 1/2 length of Shared add addr2, #4 ' skip over first 4-bytes. These ' should have been written above. This ' should now be ready to write odd entries in Shared waitcnt 50, #100 ' wait for CNT = 50 ' my attempt at syncing cogs mov Val_Reg2, cnt ' mov Val_Reg2, #1 :loop wrlong Val_Reg2, addr2 ' write CNT to Shared add addr2, #8 ' add Val_reg2, #2 mov Val_reg2, cnt djnz iterations2, #:loop ' move to next group of samples cogid addr2 ' only here because I was following ' others examples NOT SURE WHAT THESE cogstop addr2 ' DO addr2 res 1 Val_Reg2 res 1 iterations2 res 1
Here's what I get when I run this code: In the lines which read "Index: 0 50,660,879", 0 is the element of Shared and 50,660,879 is the CNT value that was placed there.
Start time of Program: 1,306,470,624
Index: 0 50,660,879
Index: 1 17,957,899
Index: 2 50,660,902
Index: 3 17,957,928
Index: 4 50,660,934
Index: 5 17,957,960
Index: 6 50,660,966
Index: 7 17,957,992
Index: 8 50,660,998
Index: 9 17,958,024
End time of Display: 249,393,280
I was expecting Index 0 to have a CNT value of 54. The cog should only move beyond the waitcnt command when system CNT = 50. Add to that the mov instructions (4 clocks) and I get 54. Since the two cogs are running in parallel (ADC1 and ADC2) I expect Index1 to also be 54. I expected subsequent entries in Shared to increase between 19 and 34 counts in both the even and odd elements. I also expected this difference between Index0 and 2,4,6,8 and Index1 and 3,5,7,9 to remain constant.
Clearly I'm missing some important elements. Like why the counter values are so far off from what I expect. Beyond that, I'd appreciate comments on my reasoning and implementation.
Constructive criticism welcome.
Thanks,
Peter
Comments
Thanks for the tip. I'll give it a try. If I understand how this trick should work, it will only tell me when both cogs have finished filling there portion of the array, which is a step in the right direction. My application is very timing sensitive (hence everything in assembly). I don't see how this trick will help to sync the two cogs filling the array.
It seems the fundamental problem with the code I posted it that the two cogs aren't starting to fill the array at exactly the same time despite my request that they waitcnt 50. That the first gap in my understanding.
I'll give your suggestion a try and see what the results look like.
Thanks,
p
I suppose the loaders ought to hang on the condition bytes being cleared by the reader when it has finished getting the data.
WAITCNT 50,#100 does not wait for the value 50 but for the value in register 50. It can work if the value of reg 50 is the same in both cogs [noparse];)[/noparse]
And you know that syncing with waitcnt to a fix value (#50 or any other) can take more than 53 seconds @80MHz in worst case (when the cnt was 51 before the waitcnt executes).
So the better way is to calculate a cnt value in the spin Main methode before starting the 2 ASM cogs and pass this value to both cogs. You can use the Shared Array for that!
Andy
(P.S. the "pc_interface" object has also a .dec methode: term.dec(Shared[noparse][[/noparse]Index]), you don't need the "Numbers" object)
I was aware of the 53 second roll-over problem, but was willing to wait if it would sync my cogs. As you've pointed out, I wasn't implementing that correctly AND it's not necessary. Thanks.
I've altered my code as per your suggestions and here's the result:
Start time of Program: 331,237,295
Index: 1 411,237,300
Index: 2 411,237,300
Index: 3 411,237,317
Index: 4 411,237,319
Index: 5 411,237,349
Index: 6 411,237,351
Index: 7 411,237,381
Index: 8 411,237,383
Index: 9 411,237,413
Index: 10 411,237,415
End time of Display: 564,729,423
At first I couldn't figure out why entries 3 and 4 differed by 2 clk counts.
Since the two cogs which are populating my shared variable are consecutive, those two clk cycles must be the time it takes the HUB to move from cog0 to cog1 in the wrlong command. Am I finally getting the hang of this?
I think I can use this example (and the terrific PropTerminal) to move to my larger application. This simple program has convinced me I understand it well enough to tackle my real application.
Thanks Ariba!
Thanks for you code suggestions and for the great PropTerminal. I'd be dead in the water without that utility.
What a great contribution to the community.
Thanks,
Peter
Your results make a lot of sense:
- the first 2 values are equal at both cogs because of the synchronisation with waitcnt
- then the wrlong of the second cog synchronizes to the hub 2 clocks displaced from the first cog.
- one loop needs 32 cycles because the wrlong synchronize always to a multiple of 16 clocks (2 clocks per cog * 8 cogs).
Andy