cog 2 cog communication


Does anyone have an example of passing data back and forth between two cogs using the lookup ram? I went through the docs, they explain what CAN be done but are somewhat sparse in explanation about HOW to do it. Some example code would be helpful.

Comments

  • I think, rereading the documentation, that in this mode both LUT rams for both cogs become shared memory, in
    that all writes are duplicated to both (if enabled by the receiver cog). The intention is to use events to signal
    such writes, or locks for protecting datastructures in the RAM. Well, that's my take. I guess a shared circular
    buffer would be a good test case.
  • Yes, both cogs will contain the same data being written to either, once the receiver cogs set this condition.
    This feature is to permit pairs of adjacent cogs to share/pass information ie to work in cooperation.
    The P2 is new. Expecting example code at this time is unreasonable. That is why there are Engineering Samples out there so that experienced users can start to write test code and tools to test and prepare the P2 for launch.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • evanhevanh Posts: 6,614
    edited 2019-01-09 - 20:19:33
    A couple of points:
    1. LUTSON must be issued in both cogs if you want bidirectional flow. As Cluso indicated, A LUTSON instruction enables writes to that cog's LUTRAM only. To write back to the other cog also requires that other cog to issue its own LUTSON.
    2. There is a known flaw in the FPGA design as it stands, where reading with one cog and writing the same location with the other cog at the same time will corrupt the data. I should test this out in the ES silicon now that I have it, thanks for reminder, ...

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • Thanks guys.


    I managed to get my code working last night by passing data through hub ram. One cog is waiting for movement data to be passed to it by another cog before it makes a 6 axis move. I don't have a machine hooked up to my P2 yet but the diagnostics (Leds) I planted in the code seem to verify that it might be running.

    Cluso99, It is unreasonable to EXPECT that sample code will be available, but I hope it's not unreasonable to ask and HOPE that someone has come across a particular issue and figured it out. To many of you who have been working with the FPGAs you are already over some of the simple humps that those of us who waited for silicon are now facing. I hope you won't be offended if I ask similar questions in the future.


    I spent several hours today figuring out the LUT stuff. Attached is an example of one cog setting values in the LUT of another cog. In this example, Cog 0 starts and initializes a blink program into Cog 2

    This blink program adds the value at LUT 0 to the base (57) LED address.

    The Cog 3 program slowly indexes the value from 7 to 0 in LUT 0. It also blinks LED 56 mainly to show Cog 3 is working.


    If anyone thinks this example is worth attaching to the example code in google docs feel free, or let me know and I'll try to figure out how to do it myself.



    evanh, The P2 docs said that I needed to use SETLUT #1. That seems to work fine. What is LUTSON for? This is the first time I've seen that one.


    { The following code is an example of one cog passing data to another cog using their shared LUT RAM.
    
        Cog 2 blinks an led on the P2-EV board with a base address of port 57.  
        The value located in LUT 0 is added to the base port address.
      
        Cog 3 slowly decriments this value from 7 to 0 in the shared LUT RAM
        changing the blinking LED port by 1 each time. 
    
       Ken Bash
    }
    
    
    
    dat
    		orgh	0
     
     
    		org
     
     
    coginit	cognum,#@blink  ' init a single cog to use the blink program 
    
    coginit	cognum2,#@blinkndex
     
    
    '   *** longs declared here should be used as constants and not written to by the program. 
    '   ***   variables go at the end of the program 
    
    cognum		long	2          ' set the cog ajoining even/odd cog pair 
    cognum2 	long    3
    
    
          		 cogstop 0                         ' stop the initialization cog 0
    
     
    
    
    		org
    
     
    blink
                    wrlut #5, #0     	'write 5 into the lut position 0  as a start value 
                  
    
                    SETLUTS #1		' enabble LUT sharing between two cogs
    
    blink1
                   	mov x, #57		' set the base address for the LED to blink	
    
    
                    rdlut val1 ,#0          ' read the value from the lut to val1
    
                    add  x, val1            ' add vla1 to the blink address 
                    mov z, x                ' save the led address in z 
    
    		drvnot	x		'output and flip that pin
    		shl	x,#16		'shift up to make it big
    	   	waitx	x		'wait that many clocks
                    
                  				' now clear the previous LED in case it was left on
                    add z, #1
                    drvh z			 
    
    		jmp	#blink1		'do it again
    
    
    
    
    
    blinkndex ' index the lookup table value
    
                 '   SETLUTS #1             ' enabble LUT sharing between two cogs (not needed on cog 3) 
    
    		mov y, #300       	' set a slower blink time in y    
                    mov val2, #5
    
                             
    blinkndex2
                 	mov y, #500       	' set a slower blink time in y  
    
    	        drvnot   #56            ' blink LED 56 to show blinkdex is running in cog 3
    		shl	y,#16		'shift up to make it big
    		waitx   y		'wait that many clocks
    
    
    
         		sub val2, #1		'decriment val2 by 1
                    wrlut val2, #0     	'write val2 into the lut position 0                  
         
    
                    cmp val2, #0 wz          ' when vla2 reaches 0, fall through and start again
    
                   	if_nz jmp #blinkndex2 
    
                   	mov val2, #7 		 'restart the blink count at led 63 
    
                   	jmp #blinkndex2 
    
     
    
    
    x		res	1
    y		res	1
    z 		res	1
    
    val1		res     1
    val2 		res     1
    
    

  • LUTSON is just the alias for SETLUTS #1. And LUTSOFF is alias for SETLUTS #0. I've been using the aliases and kind of forgotten there was alternatives.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • You should put your cogstop above cognum and cognum2.

    You are running into your data

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.

  • Thanks Mike. I can use all the help I can get. My programming is a lot like setting a monkey in front of a keyboard typing random things then seeing if anything works. I eventually managed to get the P1 to do what I needed in assembly combined with spin. I suspect I will with the P2 as well .

    I've worked with assembly "All my life". I learned to program concanting bit codes into hex on an SDK-86 and learned Forth on a Rockwell AIM-65. The P2 is a lot like combining both of those together but I've lost a LOT of brain cells between that time and now. Thanks for any pointers you care to give.

    I find that some of the stuff that passes back and forth here on the forum makes my head ache just trying to figure out what the hell people are talking about. I also find that some of the stuff intended to "Clarify" the operation and instruction of the P2 come out more like Sanskrit than logic.

    At the top of the P2 instructions spreadsheet is the following statement:
    ** If #S and cogex, PC += signed(S). If #S and hubex, PC += signed(S*4). If S, PC = register S.


    I've been following the development of the P2 since Chip first started working on it but I read stuff like that and just shake my head in wonder.

    Thanks for the help. I'll let you know if I stumble on anything else that actually works.

    K.B.



  • kbash wrote: »
    I find that some of the stuff that passes back and forth here on the forum makes my head ache just trying to figure out what the hell people are talking about. I also find that some of the stuff intended to "Clarify" the operation and instruction of the P2 come out more like Sanskrit than logic.

    Lol, don't worry, we're all guessing at times. My education is half baked for sure.

    Learning old news for the first time still feels new. (That's gotta be a quote! :))
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanhevanh Posts: 6,614
    edited 2019-01-10 - 03:22:12
    kbash wrote: »
    At the top of the P2 instructions spreadsheet is the following statement:
    ** If #S and cogex, PC += signed(S). If #S and hubex, PC += signed(S*4). If S, PC = register S.

    Chip is very concise at times. That detail is about branching address of conditional branching instructions. If #S vs If S are two addressing mode variations, immediate vs register direct. It's stating that register direct mode uses absolute addressing whereas immediate mode uses PC-relative addressing. And immediate is further split into two cases of cogexec vs hubexec because cog addressing is longword scale while hub addressing is byte scale.


    EDIT: Had a look at what it affects and narrowed the applicability.
    EDIT2: Clarify more. :)
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • So you can do SETLUTS #1 from one cog in a pair but not the other if you want to? It seems like it should be somehow useful, but I can't think of any uses for it.
  • @kbash,

    I can feel you. Sadly I have to admit I am a Parallax addict. Since a decade now I am following this forum. Almost religiously. The P2 saga kept me coming back every morning and evening. Finally I have a EVAL board in my hands and am diving into the possibilities.

    My guess is that I read every post about the P2 at least twice, but I still struggle with a lot of it. I did not understand a shxx of the discussion about the random generator. Same goes for that ADC thread. But I still need to read it somehow.

    The P1 was at first a challenge for me, rethinking the way to program. Multiple cores running in parallel and sharing Mailboxes in HUB. But I found out it is brilliant. The P2 now takes all of this to a new level. This LUT sharing you showed is not complicated, it is simple once written. But opens so much possibilities. Now two cores can work as a team without the need of using the slower HUB.

    As far as the documentation goes, I mostly try to learn by reading code, not Manuals. My most used reference for the P1 is that short reference sheet in the propeller tool. Rarely I need to look something up in the long description of commands. But for sure I read the whole thing tons of times.

    I am just a Code Monkey. I work for my living as a Code Monkey for over 35 years. If I would be one of those sophisticated programmers I would not need to work anymore. But I am not.

    Currently the P2 is a challenge for me, I try to find, run and hoard as much examples I can find, to get a 'feeling' of the assembler code. What I really like is, that fastspin shows the pasm output and is for me very helpful. Propbasic does this also and is a nice tool to use for the P1.

    exciting times!

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • evanhevanh Posts: 6,614
    edited 2019-01-10 - 08:13:55
    So you can do SETLUTS #1 from one cog in a pair but not the other if you want to? It seems like it should be somehow useful, but I can't think of any uses for it.

    I used only one way flow for the emulated sinc3 filter to get maximised update rate on the filter. LUTRAM was used to mailbox the final filter stage, every eight clocks, to be decimated by a slower loop without any variation in the sinc3 filter loop time.

    That's when I discovered the garbaged data when the decimation sampling was at a specific phase alignment to the filter update. And, thinking about it, probably was also the cause of the partial garbaging when decimation was not synchronised to the sinc3 filter.

    EDIT: Here's a link - https://forums.parallax.com/discussion/comment/1457556/#Comment_1457556
    That's the sinc3 part. Note it doesn't have a LUTSON but is writing to $3ff. The other cog does the LUTSON to be able to read that written data.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • .@msrobots

    Hey Mike, I tried putting the cogstop 0 before the cognum and cognum2. The program doesn't run that way on my Eval board.


    oddly enough, when I put cogstop 0 both before AND after the cognum definitions... it works fine.

    It's obvious that I don't really understand much of what Pnut is doing but I'm not going to worry about it as long as I can somehow manage to get things running.


    I hate feeling like I'm doing "Poke and Hope" programming, but I'll wait for the "Real" tools to be available before I worry much about understand the nuts and bolts of the compiler.

  • RaymanRayman Posts: 9,297
    edited 2019-01-10 - 17:03:35
    doesn't it have to be:
    cogstop #0
    
    Think you're missing the #
    Prop Info and Apps: http://www.rayslogic.com/
  • Good catch Rayman,

    Code works fine that way. Thanks!

    That brings the instructions I think I understand up equal to the number of impossible things I can believe before breakfast.
  • Here's my sample code for a producer/consumer pair using a LUT buffer with event-counters. With event counters
    a single-producer/single-consumer pair needs no other locking as each cog only alters its own event counter.

    I set up a pair of cogs sharing LUT.
    CON
        OSCMODE = $010c3f04
        FREQ = 160_000_000
        BAUD = 2*115200
    
        ' buffer in LUT from start of LUT
        BUFSIZE  = 250
        ' event counter addresses in LUT
        INSERT_EVCOUNT = 250
        EXTRACT_EVCOUNT = 251
    
        ' DAC setup
        dither = %0000_0000_000_10100_00000000_01_00010_0    ' remember to wypin
        DACpin = 48  ' output DAC
    
    OBJ
        ser: "SmartSerial.spin2"
    	
    
    PUB  Demo
        clkset (OSCMODE, FREQ)
        ser.start (63, 62, 0, BAUD)
        ser.str (string ("LUT share prod consume"))
        ser.tx(13)
        ser.tx(10)
      
        coginit (%0_1_0001, @workers)  ' start pair of cogs
    
    DAT
    		ORG	0
    workers
    		wrlut	#0, #INSERT_EVCOUNT   ' LUT not cleared at cog start, so must init variables in it
    		wrlut	#0, #EXTRACT_EVCOUNT  ' either cog might get here first
    
    		LUTSON  ' LUT sharing enabled
    
    		cogid	cognum    ' cogs have to agree which is producer and which consumer
    		testb	cognum, #0  wz
    	if_z	jmp	#producer
    
    
    ' consumer truncates items from the buffer to 16 bits and output on DAC pin
    consumer	wrpin	##dither, #DACpin
    .loop		call	#extract
    		zerox   item, #15
    		wypin	item, #DACpin
    		jmp	#.loop
    
    ' producer generates random samples with random delays
    producer	
    .loop 		getrnd	item
    		mov	del, item
    		shr	del, #16
    		waitx	del
    		call	#insert
    		jmp	#producer
    
    
    ' event-counter style producer consumer pair using buffer at start of LUT ram
    
    insert		sub	ins_count, #BUFSIZE
    .insert_wait	RDLUT	ext_count, #EXTRACT_EVCOUNT
    		cmp	ins_count, ext_count  wz
    	if_z	jmp	#.insert_wait  ' wait on buffer full (inptr == outptr+BUFSIZE)
    
    		WRLUT   item, wrptr    ' currently restricted to start of LUT ram by using incmod
    		incmod	wrptr, #BUFSIZE-1
    
    		add	ins_count, #BUFSIZE+1
    		WRLUT	ins_count, #INSERT_EVCOUNT   ' signal new item
    insert_ret	ret
    
    wrptr		long	0   ' wrapping pointer for adding items
    ins_count	long	0   ' insert event counter
    
    
    extract
    .extract_wait   RDLUT	ins_count, #INSERT_EVCOUNT
    		cmp	ext_count, ins_count  wz
    	if_z	jmp	#.extract_wait ' wait on buffer empty (inptr == outptr)
    
    		RDLUT	item, rdptr    ' currently restricted to start of LUT ram by using incmod
    		incmod	rdptr, #BUFSIZE-1
    
    		add	ext_count, #1
    		WRLUT	ext_count, #EXTRACT_EVCOUNT  ' signal item taken
    extract_ret	ret
    
    rdptr		long	0   ' wrapping pointer for removing items
    ext_count	long	0   ' extract event counter
    
    cognum		res	1
    item		res	1
    del		res	1
    
    		FIT	$1F0
    
  • Mark,
    Be prepared for both event counts to glitch on rare occasions because of the hardware flaw with lutRAM sharing. Here's some test code to show the issue - https://forums.parallax.com/discussion/comment/1462672/#Comment_1462672
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • The good news is that the glitch only occurs when there is an incrementing WRLUT. Because of the simple actions this would just trigger an increment at the glitching RDLUT routine, which is what's wanted in the first place.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • The code never tries to write to the same address at the same time from different cogs, and the event counters are strictly incrementing, so I think my example is completely immune - I've checksummed the values at both ends for 10^9 transfers with randomized timing and seen no difference or lock up.
  • Hi evanh, I had some difficulty getting a program to work with the LUT value passing. I suspect it was my own fault trying to pass data at the wrong time and unfamiliarity with using LUT sharing . I have my code working fine with two cogs passing data back and forth through the hub ram but when I tried to convert it to use the LUT only I ran into problems.

    I saw the code in your link but I have no idea if I'm running into anything similar.

    I will eventually need to have two cogs sending data back and forth through the LUT. Is there any quick "Trick" you know of ( inserting delays, etc ) to ensure that I'm not running into this issue?
  • evanhevanh Posts: 6,614
    edited 2019-01-27 - 18:51:21
    Two workarounds I can think of (As handshakes to ensure a read and write of lutRAM are not coinciding) is:
    1. Use COGATN or LUT write hardware event types.
    2. Use the "long repository" mailbox feature of a spare smartpin.

    EDIT: Removed LOCKREL event from list because it is a hub-op with variable stall amount. A hubRAM mailbox could also be used with bigger stalls again.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • Mark_T wrote: »
    ... event counters are strictly incrementing, so I think my example is completely immune - I've checksummed the values at both ends for 10^9 transfers with randomized timing and seen no difference or lock up.

    Good, it does appear to inherently recover from the glitch. Just be wary of this if you start expanding features of your code.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
Sign In or Register to comment.