cog 2 cog communication
in Propeller 2
Does anyone have an example of passing data back and forth between two cogs using the lookup ram? I went through the docs, they explain what CAN be done but are somewhat sparse in explanation about HOW to do it. Some example code would be helpful.

Comments
that all writes are duplicated to both (if enabled by the receiver cog). The intention is to use events to signal
such writes, or locks for protecting datastructures in the RAM. Well, that's my take. I guess a shared circular
buffer would be a good test case.
This feature is to permit pairs of adjacent cogs to share/pass information ie to work in cooperation.
The P2 is new. Expecting example code at this time is unreasonable. That is why there are Engineering Samples out there so that experienced users can start to write test code and tools to test and prepare the P2 for launch.
1. LUTSON must be issued in both cogs if you want bidirectional flow. As Cluso indicated, A LUTSON instruction enables writes to that cog's LUTRAM only. To write back to the other cog also requires that other cog to issue its own LUTSON.
2. There is a known flaw in the FPGA design as it stands, where reading with one cog and writing the same location with the other cog at the same time will corrupt the data. I should test this out in the ES silicon now that I have it, thanks for reminder, ...
I managed to get my code working last night by passing data through hub ram. One cog is waiting for movement data to be passed to it by another cog before it makes a 6 axis move. I don't have a machine hooked up to my P2 yet but the diagnostics (Leds) I planted in the code seem to verify that it might be running.
Cluso99, It is unreasonable to EXPECT that sample code will be available, but I hope it's not unreasonable to ask and HOPE that someone has come across a particular issue and figured it out. To many of you who have been working with the FPGAs you are already over some of the simple humps that those of us who waited for silicon are now facing. I hope you won't be offended if I ask similar questions in the future.
I spent several hours today figuring out the LUT stuff. Attached is an example of one cog setting values in the LUT of another cog. In this example, Cog 0 starts and initializes a blink program into Cog 2
This blink program adds the value at LUT 0 to the base (57) LED address.
The Cog 3 program slowly indexes the value from 7 to 0 in LUT 0. It also blinks LED 56 mainly to show Cog 3 is working.
If anyone thinks this example is worth attaching to the example code in google docs feel free, or let me know and I'll try to figure out how to do it myself.
evanh, The P2 docs said that I needed to use SETLUT #1. That seems to work fine. What is LUTSON for? This is the first time I've seen that one.
{ The following code is an example of one cog passing data to another cog using their shared LUT RAM. Cog 2 blinks an led on the P2-EV board with a base address of port 57. The value located in LUT 0 is added to the base port address. Cog 3 slowly decriments this value from 7 to 0 in the shared LUT RAM changing the blinking LED port by 1 each time. Ken Bash } dat orgh 0 org coginit cognum,#@blink ' init a single cog to use the blink program coginit cognum2,#@blinkndex ' *** longs declared here should be used as constants and not written to by the program. ' *** variables go at the end of the program cognum long 2 ' set the cog ajoining even/odd cog pair cognum2 long 3 cogstop 0 ' stop the initialization cog 0 org blink wrlut #5, #0 'write 5 into the lut position 0 as a start value SETLUTS #1 ' enabble LUT sharing between two cogs blink1 mov x, #57 ' set the base address for the LED to blink rdlut val1 ,#0 ' read the value from the lut to val1 add x, val1 ' add vla1 to the blink address mov z, x ' save the led address in z drvnot x 'output and flip that pin shl x,#16 'shift up to make it big waitx x 'wait that many clocks ' now clear the previous LED in case it was left on add z, #1 drvh z jmp #blink1 'do it again blinkndex ' index the lookup table value ' SETLUTS #1 ' enabble LUT sharing between two cogs (not needed on cog 3) mov y, #300 ' set a slower blink time in y mov val2, #5 blinkndex2 mov y, #500 ' set a slower blink time in y drvnot #56 ' blink LED 56 to show blinkdex is running in cog 3 shl y,#16 'shift up to make it big waitx y 'wait that many clocks sub val2, #1 'decriment val2 by 1 wrlut val2, #0 'write val2 into the lut position 0 cmp val2, #0 wz ' when vla2 reaches 0, fall through and start again if_nz jmp #blinkndex2 mov val2, #7 'restart the blink count at led 63 jmp #blinkndex2 x res 1 y res 1 z res 1 val1 res 1 val2 res 1You are running into your data
Mike
Thanks Mike. I can use all the help I can get. My programming is a lot like setting a monkey in front of a keyboard typing random things then seeing if anything works. I eventually managed to get the P1 to do what I needed in assembly combined with spin. I suspect I will with the P2 as well .
I've worked with assembly "All my life". I learned to program concanting bit codes into hex on an SDK-86 and learned Forth on a Rockwell AIM-65. The P2 is a lot like combining both of those together but I've lost a LOT of brain cells between that time and now. Thanks for any pointers you care to give.
I find that some of the stuff that passes back and forth here on the forum makes my head ache just trying to figure out what the hell people are talking about. I also find that some of the stuff intended to "Clarify" the operation and instruction of the P2 come out more like Sanskrit than logic.
At the top of the P2 instructions spreadsheet is the following statement:
** If #S and cogex, PC += signed(S). If #S and hubex, PC += signed(S*4). If S, PC = register S.
I've been following the development of the P2 since Chip first started working on it but I read stuff like that and just shake my head in wonder.
Thanks for the help. I'll let you know if I stumble on anything else that actually works.
K.B.
Lol, don't worry, we're all guessing at times. My education is half baked for sure.
Learning old news for the first time still feels new. (That's gotta be a quote!
Chip is very concise at times. That detail is about branching address of conditional branching instructions. If #S vs If S are two addressing mode variations, immediate vs register direct. It's stating that register direct mode uses absolute addressing whereas immediate mode uses PC-relative addressing. And immediate is further split into two cases of cogexec vs hubexec because cog addressing is longword scale while hub addressing is byte scale.
EDIT: Had a look at what it affects and narrowed the applicability.
EDIT2: Clarify more.
I can feel you. Sadly I have to admit I am a Parallax addict. Since a decade now I am following this forum. Almost religiously. The P2 saga kept me coming back every morning and evening. Finally I have a EVAL board in my hands and am diving into the possibilities.
My guess is that I read every post about the P2 at least twice, but I still struggle with a lot of it. I did not understand a shxx of the discussion about the random generator. Same goes for that ADC thread. But I still need to read it somehow.
The P1 was at first a challenge for me, rethinking the way to program. Multiple cores running in parallel and sharing Mailboxes in HUB. But I found out it is brilliant. The P2 now takes all of this to a new level. This LUT sharing you showed is not complicated, it is simple once written. But opens so much possibilities. Now two cores can work as a team without the need of using the slower HUB.
As far as the documentation goes, I mostly try to learn by reading code, not Manuals. My most used reference for the P1 is that short reference sheet in the propeller tool. Rarely I need to look something up in the long description of commands. But for sure I read the whole thing tons of times.
I am just a Code Monkey. I work for my living as a Code Monkey for over 35 years. If I would be one of those sophisticated programmers I would not need to work anymore. But I am not.
Currently the P2 is a challenge for me, I try to find, run and hoard as much examples I can find, to get a 'feeling' of the assembler code. What I really like is, that fastspin shows the pasm output and is for me very helpful. Propbasic does this also and is a nice tool to use for the P1.
exciting times!
Mike
I used only one way flow for the emulated sinc3 filter to get maximised update rate on the filter. LUTRAM was used to mailbox the final filter stage, every eight clocks, to be decimated by a slower loop without any variation in the sinc3 filter loop time.
That's when I discovered the garbaged data when the decimation sampling was at a specific phase alignment to the filter update. And, thinking about it, probably was also the cause of the partial garbaging when decimation was not synchronised to the sinc3 filter.
EDIT: Here's a link - https://forums.parallax.com/discussion/comment/1457556/#Comment_1457556
That's the sinc3 part. Note it doesn't have a LUTSON but is writing to $3ff. The other cog does the LUTSON to be able to read that written data.
Hey Mike, I tried putting the cogstop 0 before the cognum and cognum2. The program doesn't run that way on my Eval board.
oddly enough, when I put cogstop 0 both before AND after the cognum definitions... it works fine.
It's obvious that I don't really understand much of what Pnut is doing but I'm not going to worry about it as long as I can somehow manage to get things running.
I hate feeling like I'm doing "Poke and Hope" programming, but I'll wait for the "Real" tools to be available before I worry much about understand the nuts and bolts of the compiler.
Code works fine that way. Thanks!
That brings the instructions I think I understand up equal to the number of impossible things I can believe before breakfast.
a single-producer/single-consumer pair needs no other locking as each cog only alters its own event counter.
I set up a pair of cogs sharing LUT.
CON OSCMODE = $010c3f04 FREQ = 160_000_000 BAUD = 2*115200 ' buffer in LUT from start of LUT BUFSIZE = 250 ' event counter addresses in LUT INSERT_EVCOUNT = 250 EXTRACT_EVCOUNT = 251 ' DAC setup dither = %0000_0000_000_10100_00000000_01_00010_0 ' remember to wypin DACpin = 48 ' output DAC OBJ ser: "SmartSerial.spin2" PUB Demo clkset (OSCMODE, FREQ) ser.start (63, 62, 0, BAUD) ser.str (string ("LUT share prod consume")) ser.tx(13) ser.tx(10) coginit (%0_1_0001, @workers) ' start pair of cogs DAT ORG 0 workers wrlut #0, #INSERT_EVCOUNT ' LUT not cleared at cog start, so must init variables in it wrlut #0, #EXTRACT_EVCOUNT ' either cog might get here first LUTSON ' LUT sharing enabled cogid cognum ' cogs have to agree which is producer and which consumer testb cognum, #0 wz if_z jmp #producer ' consumer truncates items from the buffer to 16 bits and output on DAC pin consumer wrpin ##dither, #DACpin .loop call #extract zerox item, #15 wypin item, #DACpin jmp #.loop ' producer generates random samples with random delays producer .loop getrnd item mov del, item shr del, #16 waitx del call #insert jmp #producer ' event-counter style producer consumer pair using buffer at start of LUT ram insert sub ins_count, #BUFSIZE .insert_wait RDLUT ext_count, #EXTRACT_EVCOUNT cmp ins_count, ext_count wz if_z jmp #.insert_wait ' wait on buffer full (inptr == outptr+BUFSIZE) WRLUT item, wrptr ' currently restricted to start of LUT ram by using incmod incmod wrptr, #BUFSIZE-1 add ins_count, #BUFSIZE+1 WRLUT ins_count, #INSERT_EVCOUNT ' signal new item insert_ret ret wrptr long 0 ' wrapping pointer for adding items ins_count long 0 ' insert event counter extract .extract_wait RDLUT ins_count, #INSERT_EVCOUNT cmp ext_count, ins_count wz if_z jmp #.extract_wait ' wait on buffer empty (inptr == outptr) RDLUT item, rdptr ' currently restricted to start of LUT ram by using incmod incmod rdptr, #BUFSIZE-1 add ext_count, #1 WRLUT ext_count, #EXTRACT_EVCOUNT ' signal item taken extract_ret ret rdptr long 0 ' wrapping pointer for removing items ext_count long 0 ' extract event counter cognum res 1 item res 1 del res 1 FIT $1F0Be prepared for both event counts to glitch on rare occasions because of the hardware flaw with lutRAM sharing. Here's some test code to show the issue - https://forums.parallax.com/discussion/comment/1462672/#Comment_1462672
I saw the code in your link but I have no idea if I'm running into anything similar.
I will eventually need to have two cogs sending data back and forth through the LUT. Is there any quick "Trick" you know of ( inserting delays, etc ) to ensure that I'm not running into this issue?
1. Use COGATN or LUT write hardware event types.
2. Use the "long repository" mailbox feature of a spare smartpin.
EDIT: Removed LOCKREL event from list because it is a hub-op with variable stall amount. A hubRAM mailbox could also be used with bigger stalls again.
Good, it does appear to inherently recover from the glitch. Just be wary of this if you start expanding features of your code.