cog 2 cog communication
kbash
Posts: 117
in Propeller 2
Does anyone have an example of passing data back and forth between two cogs using the lookup ram? I went through the docs, they explain what CAN be done but are somewhat sparse in explanation about HOW to do it. Some example code would be helpful.
Comments
that all writes are duplicated to both (if enabled by the receiver cog). The intention is to use events to signal
such writes, or locks for protecting datastructures in the RAM. Well, that's my take. I guess a shared circular
buffer would be a good test case.
This feature is to permit pairs of adjacent cogs to share/pass information ie to work in cooperation.
The P2 is new. Expecting example code at this time is unreasonable. That is why there are Engineering Samples out there so that experienced users can start to write test code and tools to test and prepare the P2 for launch.
1. LUTSON must be issued in both cogs if you want bidirectional flow. As Cluso indicated, A LUTSON instruction enables writes to that cog's LUTRAM only. To write back to the other cog also requires that other cog to issue its own LUTSON.
2. There is a known flaw in the FPGA design as it stands, where reading with one cog and writing the same location with the other cog at the same time will corrupt the data. I should test this out in the ES silicon now that I have it, thanks for reminder, ...
I managed to get my code working last night by passing data through hub ram. One cog is waiting for movement data to be passed to it by another cog before it makes a 6 axis move. I don't have a machine hooked up to my P2 yet but the diagnostics (Leds) I planted in the code seem to verify that it might be running.
Cluso99, It is unreasonable to EXPECT that sample code will be available, but I hope it's not unreasonable to ask and HOPE that someone has come across a particular issue and figured it out. To many of you who have been working with the FPGAs you are already over some of the simple humps that those of us who waited for silicon are now facing. I hope you won't be offended if I ask similar questions in the future.
I spent several hours today figuring out the LUT stuff. Attached is an example of one cog setting values in the LUT of another cog. In this example, Cog 0 starts and initializes a blink program into Cog 2
This blink program adds the value at LUT 0 to the base (57) LED address.
The Cog 3 program slowly indexes the value from 7 to 0 in LUT 0. It also blinks LED 56 mainly to show Cog 3 is working.
If anyone thinks this example is worth attaching to the example code in google docs feel free, or let me know and I'll try to figure out how to do it myself.
evanh, The P2 docs said that I needed to use SETLUT #1. That seems to work fine. What is LUTSON for? This is the first time I've seen that one.
You are running into your data
Mike
Thanks Mike. I can use all the help I can get. My programming is a lot like setting a monkey in front of a keyboard typing random things then seeing if anything works. I eventually managed to get the P1 to do what I needed in assembly combined with spin. I suspect I will with the P2 as well .
I've worked with assembly "All my life". I learned to program concanting bit codes into hex on an SDK-86 and learned Forth on a Rockwell AIM-65. The P2 is a lot like combining both of those together but I've lost a LOT of brain cells between that time and now. Thanks for any pointers you care to give.
I find that some of the stuff that passes back and forth here on the forum makes my head ache just trying to figure out what the hell people are talking about. I also find that some of the stuff intended to "Clarify" the operation and instruction of the P2 come out more like Sanskrit than logic.
At the top of the P2 instructions spreadsheet is the following statement:
** If #S and cogex, PC += signed(S). If #S and hubex, PC += signed(S*4). If S, PC = register S.
I've been following the development of the P2 since Chip first started working on it but I read stuff like that and just shake my head in wonder.
Thanks for the help. I'll let you know if I stumble on anything else that actually works.
K.B.
Lol, don't worry, we're all guessing at times. My education is half baked for sure.
Learning old news for the first time still feels new. (That's gotta be a quote! )
Chip is very concise at times. That detail is about branching address of conditional branching instructions. If #S vs If S are two addressing mode variations, immediate vs register direct. It's stating that register direct mode uses absolute addressing whereas immediate mode uses PC-relative addressing. And immediate is further split into two cases of cogexec vs hubexec because cog addressing is longword scale while hub addressing is byte scale.
EDIT: Had a look at what it affects and narrowed the applicability.
EDIT2: Clarify more.
I can feel you. Sadly I have to admit I am a Parallax addict. Since a decade now I am following this forum. Almost religiously. The P2 saga kept me coming back every morning and evening. Finally I have a EVAL board in my hands and am diving into the possibilities.
My guess is that I read every post about the P2 at least twice, but I still struggle with a lot of it. I did not understand a shxx of the discussion about the random generator. Same goes for that ADC thread. But I still need to read it somehow.
The P1 was at first a challenge for me, rethinking the way to program. Multiple cores running in parallel and sharing Mailboxes in HUB. But I found out it is brilliant. The P2 now takes all of this to a new level. This LUT sharing you showed is not complicated, it is simple once written. But opens so much possibilities. Now two cores can work as a team without the need of using the slower HUB.
As far as the documentation goes, I mostly try to learn by reading code, not Manuals. My most used reference for the P1 is that short reference sheet in the propeller tool. Rarely I need to look something up in the long description of commands. But for sure I read the whole thing tons of times.
I am just a Code Monkey. I work for my living as a Code Monkey for over 35 years. If I would be one of those sophisticated programmers I would not need to work anymore. But I am not.
Currently the P2 is a challenge for me, I try to find, run and hoard as much examples I can find, to get a 'feeling' of the assembler code. What I really like is, that fastspin shows the pasm output and is for me very helpful. Propbasic does this also and is a nice tool to use for the P1.
exciting times!
Mike
I used only one way flow for the emulated sinc3 filter to get maximised update rate on the filter. LUTRAM was used to mailbox the final filter stage, every eight clocks, to be decimated by a slower loop without any variation in the sinc3 filter loop time.
That's when I discovered the garbaged data when the decimation sampling was at a specific phase alignment to the filter update. And, thinking about it, probably was also the cause of the partial garbaging when decimation was not synchronised to the sinc3 filter.
EDIT: Here's a link - https://forums.parallax.com/discussion/comment/1457556/#Comment_1457556
That's the sinc3 part. Note it doesn't have a LUTSON but is writing to $3ff. The other cog does the LUTSON to be able to read that written data.
Hey Mike, I tried putting the cogstop 0 before the cognum and cognum2. The program doesn't run that way on my Eval board.
oddly enough, when I put cogstop 0 both before AND after the cognum definitions... it works fine.
It's obvious that I don't really understand much of what Pnut is doing but I'm not going to worry about it as long as I can somehow manage to get things running.
I hate feeling like I'm doing "Poke and Hope" programming, but I'll wait for the "Real" tools to be available before I worry much about understand the nuts and bolts of the compiler.
Code works fine that way. Thanks!
That brings the instructions I think I understand up equal to the number of impossible things I can believe before breakfast.
a single-producer/single-consumer pair needs no other locking as each cog only alters its own event counter.
I set up a pair of cogs sharing LUT.
Be prepared for both event counts to glitch on rare occasions because of the hardware flaw with lutRAM sharing. Here's some test code to show the issue - https://forums.parallax.com/discussion/comment/1462672/#Comment_1462672
I saw the code in your link but I have no idea if I'm running into anything similar.
I will eventually need to have two cogs sending data back and forth through the LUT. Is there any quick "Trick" you know of ( inserting delays, etc ) to ensure that I'm not running into this issue?
1. Use COGATN or LUT write hardware event types.
2. Use the "long repository" mailbox feature of a spare smartpin.
EDIT: Removed LOCKREL event from list because it is a hub-op with variable stall amount. A hubRAM mailbox could also be used with bigger stalls again.
Good, it does appear to inherently recover from the glitch. Just be wary of this if you start expanding features of your code.