synchronizing code actions between cogs

David B · 2006-02-22 00:23

Just out of curiosity...

on synchronized triggerring from an external event -

If a shared IO pin is made input, all Cogs can read it, right? So can several instances of code trigger simultaneously by having·several Cogs waiting on the same pin, and detecting when an external event toggles that pin?

on synchronized triggerring from an internal event -

Is the system counter similar to the SX RTCC in that it is a counter that increments on every clock? Will its current (32 bit) value be accessable to each CPU, like the RTCC is in the SX? So could several Cogs simultaneously run some code at some particular counter value?

on synchronized triggerring from another cog -

Can one cog read an IO pin as an input that another cog has set as output? So can one or more cogs run when triggerred to go by another cog? Or is there a better way that 2 or more cogs can communicate without having to wait for the hub to come around?

How fast does the Hub circulate? How long does it leave each Cog enabled? In other words, how much information could a single cog pass thru the hub to RAM on one pass? And how long might a cog have to wait for its turn to access the ram through the hub?

If a cog is disabled, or sleeping or whatever, will the hub skip it and service the other cogs faster?

David

Tracy Allen · 2006-02-22 07:44

I'll take a chance and tackle your questions. (I can stand to be corrected and learn).

The answers to your questions that require a yes/no answer are IMHO, all "yes".

WRT cogs reading pins and triggering simultaneously on a pin, yes. A spin interpreter instruction you might use to accomplish that would be WAITPEQ (wait for pin equal) or WAITPNE (...not equal). These commands have a mask (which allows you to select which pins will be watched), and what state you are looking for on those pins. The cog then effectively shuts down (low power) until the target state is entered (or departed in the case of PNE). At that point in time, the code starts operating immediately, that is to say, within one clock cycle. While it takes 4 clock cycles to execute each instruction (with a few exceptions), the response of WAITPEQ to the target state does not have to wait for the downbeat of a 4/4 measure. There is no downbeat as such The sync would be one clock cycle. The equivalent asm instruction is WAITPEQ D,S. There is an efficient mapping from the interpreter to the machine language.

The system counter is a register that can be read by any cog at any time, and it increments on every clock cycle. The salient command to sync all processors would be WAITCNT (value). That woud cause them to wait until the system clock register = value. If you want the processor to pause for a certain amount of time, then use WAITCNT (value+cnt). That gives a relative offset from the current reading. "cnt" refers to the current value of the system clock register, which is available at any time to all cogs.

In my first program attempt, I made a little routine to flash an led, which used a WAITCNT delay, then used the same led as a photodiode to charge a capacitor in a kind of RCTIME arrangement to measure light level. The steps in that were, read the system clock (cnt register again) to get the current value into a variable, then WAITPEQ with a mask to wait for the desired pin to go high, and then read the counter again and subtract the initial count (modulo 2^32) to get an RCTIME value. Then off to another routine that used WAITCNT (bitdelay + cnt) in a loop send out the resulting time as serial digits. That is just to give you an idea of how these commands work. That kind of synchronization to events cold be a powerful element in this processor and I think the synchronization schemes between cogs could be the same.

Cogs can monitor one another too, and can I think serve a watchdog function to keep other cogs running, via io pins. I think that answers your question about one cog communicating with another via the io pins. Yes, they can. But they have to follow the rules about how the pins resolve contentions.

The hub circulates at half the rate of the system clock and makes a complete circuit of all the cogs in 16 clock cycles. Always. Completely deterministic. Even if a cog is asleep or empty it still gets a visit by the hub. A cog has to be ready and waiting with a hub instruction when the hub arrives. If not, it has to wait 15 more clock cycles before its chance comes again. At each hub access, the cog can execute only one hub instruction, to read or write the main memory. You can see that those operations will be quite slow in comparison to intracog operations. It is possible for the cog to become synchronized to the hub for extended operations, at which point transfer of data between a cog and main memory is limited to one long (32 bits) per 16 clock cycles. That is throughput from the standpoint of the cog. The instructions from different cogs accessing the main memory can overlap (even though each hub instruction takes 7 clock cycles -- it is kind of complicated). So from the standpoint of the hub or the main memory, the throughput could reach one long per 2 clock cycles when all cogs are accessing at full tilt. There are mechanisms for keeping the memory accesses from colliding, but I think there are lots of possibilities for bugs there if the programmer is not careful.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Post Edited (Tracy Allen) : 2/22/2006 8:37:05 AM GMT

Stan671 · 2006-04-18 21:35

I am asking these very basic timing questions because I have not worked with a multi-processor system like this before and I have to get my mind around how the timing and sequencing works for the shared main memory.

So, it sounds like a cog can read a byte, word or long from main memory _OR_ write·a byte, word or long·to main memory when it gets it's chance with the hub.· But it cannot do both in one turn at the hub.· Is this correct?

Many of the fetches a cog does from main memory will just be the Spin interpreter reading a token and it's parameters anyway.

For a Spin command·like "X = X + 1", this·would be followed by·a main memory read in order to get the variable data.· Then there will be a·period of work inside the cog as the Spin interpreter does it's arithmetic·on the data.· Then this would be followed by a write back out to main memory of the variable data.· And each time the cog was ready to interface with main memory, it would wait patiently for it's turn with the hub and then do the read _OR write operation with main memory.· Is this (perhaps oversimplified) but correct?

I was trying to think up a scenaro when it would be a problem with more than one cog reading and writing the same memory location, but I kept coming up with silly sounding examples.· The point is that the programmer must be aware that a memory location can be changed by another cog in between the time that one cog reads the location, works on the data, and is ready to write it back. This can be either a very clever and powerful feature if used carefully or a debuging nightmare if abused or done accidently.

My next question relates to re-rentrant code and the difference between public and private variables.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Stan Dobrowski

Paul Baker · 2006-04-18 21:55

You are correct there is no RMW (read-modify-write) hub operations on hub memory. In nearly all data passing routines you will have at most one producer and one consumer, in such situations there would never be a collision. You can however run into problems with dynamic multi-element data structures (such as a stack) when you have multiple producers or consumers, the problem arises from updating the pointer takes multiple hub accesses (read, update it, write it back). If two cogs are trying to pop or push during the same rotation a collision may occur. The propeller has provided 8 locks (aka semaphores) which are RMW in nature to permit such operations. A lock is a binary value indicating a status of locked or unlocked, and a cog attempting to use a data structure which could be collided with another cog, would lock the semaphore and·its previous value is returned. If the returned value is locked, the cog does nothing (another cog is using the structure), if the returned value is unlocked, it proceeds to use the data structure and unlocks the semaphore when its done.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
1+1=10

Post Edited (Paul Baker) : 4/19/2006 2:11:52 AM GMT

Stan671 · 2006-04-19 01:24

Paul, thank you for the info.· You described what I wanted to say much better than I was able to.

I see your point about one producer and one consumer not being a problem.· That is clear to me.

The problem I was worried about was multiple producers and one consumer, such as a buffer for messages going out to an LCD display.· Now I see how the semaphores come into play.

The more I study the Propeller, the more I am impressed!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Stan Dobrowski

synchronizing code actions between cogs

Comments