COG internals

Buddha · 2006-06-27 12:55

The documentation states that MIPS = Freq in MHz / 4 * Number of Active Cogs.

This seems to indicate that either the Cogs are clocked at 1/4 Freq internally, or that they have a 4-stage fetch-execute cycle. My best guess so far is something along the lines of:

Fetch (instruction)
Load (data)
Execute (instruction)
Store (data)

Is this how a Cog processes an instruction every 4 clocks, or is that way off base?

Thanks! [noparse]:)[/noparse]

cgracey · 2006-06-27 16:51

It goes like this:

0:· Read S for Inst

1: ·Read D for Inst

2:· Read Inst+1

3:· Write D for Inst, Inst++, loop

This way, the ALU has two clocks to settle in before D must be written back.

Buddha said...
The documentation states that MIPS = Freq in MHz / 4 * Number of Active Cogs.

This seems to indicate that either the Cogs are clocked at 1/4 Freq internally, or that they have a 4-stage fetch-execute cycle. My best guess so far is something along the lines of:

Fetch (instruction)
Load (data)
Execute (instruction)
Store (data)

Is this how a Cog processes an instruction every 4 clocks, or is that way off base?

Thanks! [noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Buddha · 2006-06-27 22:24

Interesting! That definitely makes sense from a hardware design point of view.

What happens starting at cycle zero, when a COG is first started? Does it idle for a few cycles so it can execute step 2 and come back around to 0 (i.e. the very first instruction on startup takes 8 cycles)?

Or does it execute a special "read first instruction" step for one cycle (i.e. the first instruction takes 5 cycles)?

Or does something else happen, like the HUB stuffs the first instruction into the internal CPU register before it starts execution?

cgracey · 2006-06-28 02:35

When a COG is launched, its hardware forces the instruction sequence:

RDLONG· 0, START+0*4

RDLONG· 1, START+1*4

RDLONG· 2, START+2*4

....

RDLONG· 511, START+511*4

YOUR CODE $000

YOUR CODE $001

....

So, you can see that after loading all 512 locations using forced RDLONG instructions, the COG program counter rolls over, right into your code at $000. BTW, the last 16 RDLONGs read 0's, clearing the RAM behind the I/O registers. The DIRA, CTRA, CTRB, and VCFG I/O registers are all cleared asynchronously as long as the COG is inactive (COG reset is held low). This way, as long as (or immediately when) a COG is inactive, its influence completely ceases as its important I/O registers are cleared. When the next program gets loaded, all the registers from $1F0-$1FF get cleared to 0, as well. This means that you can rely on all I/O registers, except the read-only's PAR, CNT, INA, INB, being cleared when your program starts.

Buddha said...
Interesting! That definitely makes sense from a hardware design point of view.

What happens starting at cycle zero, when a COG is first started? Does it idle for a few cycles so it can execute step 2 and come back around to 0 (i.e. the very first instruction on startup takes 8 cycles)?

Or does it execute a special "read first instruction" step for one cycle (i.e. the first instruction takes 5 cycles)?

Or does something else happen, like the HUB stuffs the first instruction into the internal CPU register before it starts execution?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Post Edited (Chip Gracey (Parallax)) : 6/28/2006 2:38:56 AM GMT

Cliff L. Biffle · 2006-06-28 05:02

Chip,

So, does this mean that a read from INA will be latched at the second cycle of the instruction? That's good info.

What's the latency in cycles for COGINIT (from start of COGINIT to start of first instruction in the other COG)?

More generally, is it possible to get two COGs running 2 (or 1 or 3) cycles out of sync, so they could interleave reads from INA?

cgracey · 2006-06-28 06:35

Cliff L. Biffle said...
Chip,

So, does this mean that a read from INA will be latched at the second cycle of the instruction? That's good info.
You could look at it like that, yes!

What's the latency in cycles for COGINIT (from start of COGINIT to start of first instruction in the other COG)?
The first COG has to sync to the hub to do·his COGINIT. This takes 7..22 cycles, depending on his hub-phase. After that, the other COG will take 512 hub cycles to load his 512 words (496, really) before he starts executing. This all takes ~103us at 80MHz.

More generally, is it possible to get two COGs running 2 (or 1 or 3) cycles out of sync, so they could interleave reads from INA?

Here's how you do this:

Before launching any COGs that will sync together, you pick a CNT value that is on the near horizon, but far enough away that all COGs will be launched and executing. 20,000 + CNT is a generous amount of time for this. After determining that value, you either write it into the program to be launched, and then launch the COGs, or launch the COGs with their PAR parameters pointing to a long containing this value. Once the COGs start up, the first thing they do is a WAITCNT on this pre-computed value. That will put them all into perfect sync. To get them 2 clocks out of sync, you could give them slightly different values to do the WAITCNT on. They will remain in sync (or relative sync in the case of the 2-clock offset for INA reading) until you perform a hub instruction which will sync them to the hub, instead. Of course, this assumes that their execution flow is identical (same programs with same branch patterns).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Cliff L. Biffle · 2006-06-28 16:01

Chip,

Of course! I'm spending so much time on single-cycle-instruction machines lately that I missed this entirely.

How silly of me. Thanks!

Buddha · 2006-07-06 05:27

What's the actual start address of the instructions that are loaded into the first cog on reset? The manual says that the boot loader starts at $F002, but that doesn't seem like it could be the code start since RDLONG drops the first two address bits, making it impossible to read an un-aligned long. So, does the code start at $F004, $FDFF, or some other location?

cgracey · 2006-07-06 06:37

Buddha,

The booter is·at $F800 and the interpreter is at $F004. You will not be able to disassemble these programs, though, since the data is scrambled and only gets unscrambled by the HUB during launching. This is the only 'code protection' that the chip has and it's designed to slow down others from making me-too Propeller-like chip products. This issue has never come up before, but I figured your next post·would be something about·"How·come this code·looks jumbled?" I would like to share the booter and interpreter with our interested customers, but I don't want to let it out because I don't want to make it too simple for a competitor to get started. If they have to write their own interpreter in 496 longs that is compatible with ours, it may never happen, but if we publish the code, then·they'll make an innocuous change here and their and claim it as their own. I don't want that to happen.

Buddha said...
What's the actual start address of the instructions that are loaded into the first cog on reset? The manual says that the boot loader starts at $F002, but that doesn't seem like it could be the code start since RDLONG drops the first two address bits, making it impossible to read an un-aligned long. So, does the code start at $F004, $FDFF, or some other location?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Post Edited (Chip Gracey (Parallax)) : 7/6/2006 6:46:07 AM GMT

Buddha · 2006-07-06 16:38

Chip Gracey (Parallax) said...
The booter is at $F800 and the interpreter is at $F004. You will not be able to disassemble these programs, though, since the data is scrambled and only gets unscrambled by the HUB during launching. This is the only 'code protection' that the chip has and it's designed to slow down others from making me-too Propeller-like chip products.

That's completely understandable. Thanks for the info! That definitely saved me some time from banging my head against a glass wall that I wouldn't even know was there. [noparse]:)[/noparse]

COG internals

Comments