cog 2 cog communication

kbash · 2019-01-08 18:06

Does anyone have an example of passing data back and forth between two cogs using the lookup ram? I went through the docs, they explain what CAN be done but are somewhat sparse in explanation about HOW to do it. Some example code would be helpful.

Mark_T · 2019-01-09 12:58

I think, rereading the documentation, that in this mode both LUT rams for both cogs become shared memory, in
that all writes are duplicated to both (if enabled by the receiver cog). The intention is to use events to signal
such writes, or locks for protecting datastructures in the RAM. Well, that's my take. I guess a shared circular
buffer would be a good test case.

Cluso99 · 2019-01-09 14:08

Yes, both cogs will contain the same data being written to either, once the receiver cogs set this condition.
This feature is to permit pairs of adjacent cogs to share/pass information ie to work in cooperation.
The P2 is new. Expecting example code at this time is unreasonable. That is why there are Engineering Samples out there so that experienced users can start to write test code and tools to test and prepare the P2 for launch.

evanh · 2019-01-09 20:16

A couple of points:
1. LUTSON must be issued in both cogs if you want bidirectional flow. As Cluso indicated, A LUTSON instruction enables writes to that cog's LUTRAM only. To write back to the other cog also requires that other cog to issue its own LUTSON.
2. There is a known flaw in the FPGA design as it stands, where reading with one cog and writing the same location with the other cog at the same time will corrupt the data. I should test this out in the ES silicon now that I have it, thanks for reminder, ...

kbash · 2019-01-09 22:18

Thanks guys.

I managed to get my code working last night by passing data through hub ram. One cog is waiting for movement data to be passed to it by another cog before it makes a 6 axis move. I don't have a machine hooked up to my P2 yet but the diagnostics (Leds) I planted in the code seem to verify that it might be running.

Cluso99, It is unreasonable to EXPECT that sample code will be available, but I hope it's not unreasonable to ask and HOPE that someone has come across a particular issue and figured it out. To many of you who have been working with the FPGAs you are already over some of the simple humps that those of us who waited for silicon are now facing. I hope you won't be offended if I ask similar questions in the future.

I spent several hours today figuring out the LUT stuff. Attached is an example of one cog setting values in the LUT of another cog. In this example, Cog 0 starts and initializes a blink program into Cog 2

This blink program adds the value at LUT 0 to the base (57) LED address.

The Cog 3 program slowly indexes the value from 7 to 0 in LUT 0. It also blinks LED 56 mainly to show Cog 3 is working.

If anyone thinks this example is worth attaching to the example code in google docs feel free, or let me know and I'll try to figure out how to do it myself.

evanh, The P2 docs said that I needed to use SETLUT #1. That seems to work fine. What is LUTSON for? This is the first time I've seen that one.

{ The following code is an example of one cog passing data to another cog using their shared LUT RAM.

    Cog 2 blinks an led on the P2-EV board with a base address of port 57.  
    The value located in LUT 0 is added to the base port address.
  
    Cog 3 slowly decriments this value from 7 to 0 in the shared LUT RAM
    changing the blinking LED port by 1 each time. 

   Ken Bash
}



dat
		orgh	0
 
 
		org
 
 
coginit	cognum,#@blink  ' init a single cog to use the blink program 

coginit	cognum2,#@blinkndex
 

'   *** longs declared here should be used as constants and not written to by the program. 
'   ***   variables go at the end of the program 

cognum		long	2          ' set the cog ajoining even/odd cog pair 
cognum2 	long    3


      		 cogstop 0                         ' stop the initialization cog 0

 


		org

 
blink
                wrlut #5, #0     	'write 5 into the lut position 0  as a start value 
              

                SETLUTS #1		' enabble LUT sharing between two cogs

blink1
               	mov x, #57		' set the base address for the LED to blink	


                rdlut val1 ,#0          ' read the value from the lut to val1

                add  x, val1            ' add vla1 to the blink address 
                mov z, x                ' save the led address in z 

		drvnot	x		'output and flip that pin
		shl	x,#16		'shift up to make it big
	   	waitx	x		'wait that many clocks
                
              				' now clear the previous LED in case it was left on
                add z, #1
                drvh z			 

		jmp	#blink1		'do it again





blinkndex ' index the lookup table value

             '   SETLUTS #1             ' enabble LUT sharing between two cogs (not needed on cog 3) 

		mov y, #300       	' set a slower blink time in y    
                mov val2, #5

                         
blinkndex2
             	mov y, #500       	' set a slower blink time in y  

	        drvnot   #56            ' blink LED 56 to show blinkdex is running in cog 3
		shl	y,#16		'shift up to make it big
		waitx   y		'wait that many clocks



     		sub val2, #1		'decriment val2 by 1
                wrlut val2, #0     	'write val2 into the lut position 0                  
     

                cmp val2, #0 wz          ' when vla2 reaches 0, fall through and start again

               	if_nz jmp #blinkndex2 

               	mov val2, #7 		 'restart the blink count at led 63 

               	jmp #blinkndex2 

 


x		res	1
y		res	1
z 		res	1

val1		res     1
val2 		res     1

evanh · 2019-01-09 23:23

LUTSON is just the alias for SETLUTS #1. And LUTSOFF is alias for SETLUTS #0. I've been using the aliases and kind of forgotten there was alternatives.

msrobots · 2019-01-09 23:51

You should put your cogstop above cognum and cognum2.

You are running into your data

Mike

kbash · 2019-01-10 02:33

Thanks Mike. I can use all the help I can get. My programming is a lot like setting a monkey in front of a keyboard typing random things then seeing if anything works. I eventually managed to get the P1 to do what I needed in assembly combined with spin. I suspect I will with the P2 as well .

I've worked with assembly "All my life". I learned to program concanting bit codes into hex on an SDK-86 and learned Forth on a Rockwell AIM-65. The P2 is a lot like combining both of those together but I've lost a LOT of brain cells between that time and now. Thanks for any pointers you care to give.

I find that some of the stuff that passes back and forth here on the forum makes my head ache just trying to figure out what the hell people are talking about. I also find that some of the stuff intended to "Clarify" the operation and instruction of the P2 come out more like Sanskrit than logic.

At the top of the P2 instructions spreadsheet is the following statement:
** If #S and cogex, PC += signed(S). If #S and hubex, PC += signed(S*4). If S, PC = register S.

I've been following the development of the P2 since Chip first started working on it but I read stuff like that and just shake my head in wonder.

Thanks for the help. I'll let you know if I stumble on anything else that actually works.

K.B.

evanh · 2019-01-10 02:50

kbash wrote: »

I find that some of the stuff that passes back and forth here on the forum makes my head ache just trying to figure out what the hell people are talking about. I also find that some of the stuff intended to "Clarify" the operation and instruction of the P2 come out more like Sanskrit than logic.

Lol, don't worry, we're all guessing at times. My education is half baked for sure.

Learning old news for the first time still feels new. (That's gotta be a quote!

)

evanh · 2019-01-10 02:57

kbash wrote: »

At the top of the P2 instructions spreadsheet is the following statement:
** If #S and cogex, PC += signed(S). If #S and hubex, PC += signed(S*4). If S, PC = register S.

Chip is very concise at times. That detail is about branching address of conditional branching instructions. If #S vs If S are two addressing mode variations, immediate vs register direct. It's stating that register direct mode uses absolute addressing whereas immediate mode uses PC-relative addressing. And immediate is further split into two cases of cogexec vs hubexec because cog addressing is longword scale while hub addressing is byte scale.

EDIT: Had a look at what it affects and narrowed the applicability.
EDIT2: Clarify more.

Electrodude · 2019-01-10 04:16

So you can do SETLUTS #1 from one cog in a pair but not the other if you want to? It seems like it should be somehow useful, but I can't think of any uses for it.

msrobots · 2019-01-10 05:02

@kbash,

I can feel you. Sadly I have to admit I am a Parallax addict. Since a decade now I am following this forum. Almost religiously. The P2 saga kept me coming back every morning and evening. Finally I have a EVAL board in my hands and am diving into the possibilities.

My guess is that I read every post about the P2 at least twice, but I still struggle with a lot of it. I did not understand a shxx of the discussion about the random generator. Same goes for that ADC thread. But I still need to read it somehow.

The P1 was at first a challenge for me, rethinking the way to program. Multiple cores running in parallel and sharing Mailboxes in HUB. But I found out it is brilliant. The P2 now takes all of this to a new level. This LUT sharing you showed is not complicated, it is simple once written. But opens so much possibilities. Now two cores can work as a team without the need of using the slower HUB.

As far as the documentation goes, I mostly try to learn by reading code, not Manuals. My most used reference for the P1 is that short reference sheet in the propeller tool. Rarely I need to look something up in the long description of commands. But for sure I read the whole thing tons of times.

I am just a Code Monkey. I work for my living as a Code Monkey for over 35 years. If I would be one of those sophisticated programmers I would not need to work anymore. But I am not.

Currently the P2 is a challenge for me, I try to find, run and hoard as much examples I can find, to get a 'feeling' of the assembler code. What I really like is, that fastspin shows the pasm output and is for me very helpful. Propbasic does this also and is a nice tool to use for the P1.

exciting times!

Mike

evanh · 2019-01-10 08:01

Electrodude wrote: »

So you can do SETLUTS #1 from one cog in a pair but not the other if you want to? It seems like it should be somehow useful, but I can't think of any uses for it.

I used only one way flow for the emulated sinc3 filter to get maximised update rate on the filter. LUTRAM was used to mailbox the final filter stage, every eight clocks, to be decimated by a slower loop without any variation in the sinc3 filter loop time.

That's when I discovered the garbaged data when the decimation sampling was at a specific phase alignment to the filter update. And, thinking about it, probably was also the cause of the partial garbaging when decimation was not synchronised to the sinc3 filter.

EDIT: Here's a link - https://forums.parallax.com/discussion/comment/1457556/#Comment_1457556
That's the sinc3 part. Note it doesn't have a LUTSON but is writing to $3ff. The other cog does the LUTSON to be able to read that written data.

kbash · 2019-01-10 15:51

.@msrobots

Hey Mike, I tried putting the cogstop 0 before the cognum and cognum2. The program doesn't run that way on my Eval board.

oddly enough, when I put cogstop 0 both before AND after the cognum definitions... it works fine.

It's obvious that I don't really understand much of what Pnut is doing but I'm not going to worry about it as long as I can somehow manage to get things running.

I hate feeling like I'm doing "Poke and Hope" programming, but I'll wait for the "Real" tools to be available before I worry much about understand the nuts and bolts of the compiler.

Rayman · 2019-01-10 17:03

doesn't it have to be:

cogstop #0

Think you're missing the #

kbash · 2019-01-10 18:03

Good catch Rayman,

Code works fine that way. Thanks!

That brings the instructions I think I understand up equal to the number of impossible things I can believe before breakfast.

Mark_T · 2019-01-27 02:02

Here's my sample code for a producer/consumer pair using a LUT buffer with event-counters. With event counters
a single-producer/single-consumer pair needs no other locking as each cog only alters its own event counter.

I set up a pair of cogs sharing LUT.

CON
    OSCMODE = $010c3f04
    FREQ = 160_000_000
    BAUD = 2*115200

    ' buffer in LUT from start of LUT
    BUFSIZE  = 250
    ' event counter addresses in LUT
    INSERT_EVCOUNT = 250
    EXTRACT_EVCOUNT = 251

    ' DAC setup
    dither = %0000_0000_000_10100_00000000_01_00010_0    ' remember to wypin
    DACpin = 48  ' output DAC

OBJ
    ser: "SmartSerial.spin2"
	

PUB  Demo
    clkset (OSCMODE, FREQ)
    ser.start (63, 62, 0, BAUD)
    ser.str (string ("LUT share prod consume"))
    ser.tx(13)
    ser.tx(10)
  
    coginit (%0_1_0001, @workers)  ' start pair of cogs

DAT
		ORG	0
workers
		wrlut	#0, #INSERT_EVCOUNT   ' LUT not cleared at cog start, so must init variables in it
		wrlut	#0, #EXTRACT_EVCOUNT  ' either cog might get here first

		LUTSON  ' LUT sharing enabled

		cogid	cognum    ' cogs have to agree which is producer and which consumer
		testb	cognum, #0  wz
	if_z	jmp	#producer


' consumer truncates items from the buffer to 16 bits and output on DAC pin
consumer	wrpin	##dither, #DACpin
.loop		call	#extract
		zerox   item, #15
		wypin	item, #DACpin
		jmp	#.loop

' producer generates random samples with random delays
producer	
.loop 		getrnd	item
		mov	del, item
		shr	del, #16
		waitx	del
		call	#insert
		jmp	#producer


' event-counter style producer consumer pair using buffer at start of LUT ram

insert		sub	ins_count, #BUFSIZE
.insert_wait	RDLUT	ext_count, #EXTRACT_EVCOUNT
		cmp	ins_count, ext_count  wz
	if_z	jmp	#.insert_wait  ' wait on buffer full (inptr == outptr+BUFSIZE)

		WRLUT   item, wrptr    ' currently restricted to start of LUT ram by using incmod
		incmod	wrptr, #BUFSIZE-1

		add	ins_count, #BUFSIZE+1
		WRLUT	ins_count, #INSERT_EVCOUNT   ' signal new item
insert_ret	ret

wrptr		long	0   ' wrapping pointer for adding items
ins_count	long	0   ' insert event counter


extract
.extract_wait   RDLUT	ins_count, #INSERT_EVCOUNT
		cmp	ext_count, ins_count  wz
	if_z	jmp	#.extract_wait ' wait on buffer empty (inptr == outptr)

		RDLUT	item, rdptr    ' currently restricted to start of LUT ram by using incmod
		incmod	rdptr, #BUFSIZE-1

		add	ext_count, #1
		WRLUT	ext_count, #EXTRACT_EVCOUNT  ' signal item taken
extract_ret	ret

rdptr		long	0   ' wrapping pointer for removing items
ext_count	long	0   ' extract event counter

cognum		res	1
item		res	1
del		res	1

		FIT	$1F0

evanh · 2019-01-27 02:30

Mark,
Be prepared for both event counts to glitch on rare occasions because of the hardware flaw with lutRAM sharing. Here's some test code to show the issue - https://forums.parallax.com/discussion/comment/1462672/#Comment_1462672

evanh · 2019-01-27 02:37

The good news is that the glitch only occurs when there is an incrementing WRLUT. Because of the simple actions this would just trigger an increment at the glitching RDLUT routine, which is what's wanted in the first place.

Mark_T · 2019-01-27 12:56

The code never tries to write to the same address at the same time from different cogs, and the event counters are strictly incrementing, so I think my example is completely immune - I've checksummed the values at both ends for 10^9 transfers with randomized timing and seen no difference or lock up.

kbash · 2019-01-27 15:37

Hi evanh, I had some difficulty getting a program to work with the LUT value passing. I suspect it was my own fault trying to pass data at the wrong time and unfamiliarity with using LUT sharing . I have my code working fine with two cogs passing data back and forth through the hub ram but when I tried to convert it to use the LUT only I ran into problems.

I saw the code in your link but I have no idea if I'm running into anything similar.

I will eventually need to have two cogs sending data back and forth through the LUT. Is there any quick "Trick" you know of ( inserting delays, etc ) to ensure that I'm not running into this issue?

evanh · 2019-01-27 16:00

Two workarounds I can think of (As handshakes to ensure a read and write of lutRAM are not coinciding) is:
1. Use COGATN or LUT write hardware event types.
2. Use the "long repository" mailbox feature of a spare smartpin.

EDIT: Removed LOCKREL event from list because it is a hub-op with variable stall amount. A hubRAM mailbox could also be used with bigger stalls again.

evanh · 2019-01-27 16:15

Mark_T wrote: »

... event counters are strictly incrementing, so I think my example is completely immune - I've checksummed the values at both ends for 10^9 transfers with randomized timing and seen no difference or lock up.

Good, it does appear to inherently recover from the glitch. Just be wary of this if you start expanding features of your code.

cog 2 cog communication

Comments