Synchronizing multiple cogs?

Bill Henning · 2006-09-26 07:53

Hi,

I've been reading the Propeller manual, and it has been giving me a LOT of interesting ideas; however I find that a few things are not clear... and a few things I'd like to see in the future

1) Are all the cogs in lock-step?

ie most instructions take four clocks to execute (a few take 8), and it takes 7-22 clocks to synchronize to the hub. Is it safe to assume that the four cycle "instruction windows" are in sync, so that there cannot be a 1·- 3 cycle offset between instruction timing cross-cog?

2) If I do a READ/WRITE to global memory, and I am careful to schedule instructions in 16 clock "bundles", can· I count on the next global window to be in sync?

3) A "pulse" instruction, that could be used as a single instruction strobe signal would be great... in both
·· H->L and L->H flavors [noparse]:)[/noparse]

4) exposing a "STROBE" signal that was an output whenever a value was written to the output port would also be great

5) a counter mode where the counter incremented every time it was read would be great, it would allow it to be used as an auto increment pointer

6)·I can't see any way of byte-addressing a cog's local memory!

7) MORE RAM PLEASE! ... for compatibility, leave the 32k ram / 32k rom where it is, but please add at least another 64k of global ram (128 would be even better) and 32k-64k of flash (especially if it was read protectable - I can't see how to ship an application where the code cannot be read out!)

I must say, I am very very close to ordering a devel board and a few chips... I've been having a lot of fun playing with the propeller on paper, seeing what I could make it do.

Best,

Bill

Post Edited (Bill Henning) : 9/26/2006 8:08:36 AM GMT

simonl · 2006-09-26 09:25

Hi Bill,

Not sure if this is what you're looking for: http://forums.parallax.com/forums/default.aspx?f=25&m=136129

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cheers,

Simon

Mike Green · 2006-09-26 13:58

1) Because the WAITxxx instructions have a granularity of one clock cycle, it is possible (easy) for cogs to be at different phases of the 4 clock cycle.
2) It depends on what else you're doing (see #1)
3 & 4) In assembly language, you have a granularity of 50ns per instruction and you can write to all output bits in one instruction so you can have a strobe that's synchronous with the data write. The minimum strobe width is 50ns.
5) Maybe. Pointers in assembly are awkward at very high speeds. It really takes several instructions to manage them, particularly storing results in cog memory (you have to increment the instruction itself since there's no indirection). Still, with the raw speed, you can do a lot very fast.
6) Yep! With the barrel shifter, it's very easy to break apart 32 bit words. You can put out a byte stream with a 50ns strobe at a rate of one byte every 500ns packing 4 bytes per long word.
7) Yeah. More RAM would be nice and, I believe is intended for the next generation of Propeller chips, but you can do a lot with what's there and you can easily attach more. There's serial EEPROM up to several MB and serial FRAM up to 32K bytes per chip. You can directly access SD/MMC cards if you want. With a little external circuitry, you can use external parallel static RAM although it eats up I/O pins.
There has been extensive discussion here with Parallax on the issue of protecting intellectual property with the Propeller. Their feeling is that you can't prevent determined people from stealing your object code, that there are lots of ways to make it hard enough with the existing system that casual thieves won't do it successfully, and the cost of trying to do it better at present would unreasonably limit the capabilities of the Propeller chip.
It's one thing to fiddle with a paper description. The Demo Board is cheap enough that you can see what the thing can actually do and really judge the development environment.

pjv · 2006-09-26 14:57

Hi Bill;

In regard to your first point, (and please don't hold me to this as I'm going by memory) that when I was first exploring the Propeller, I was able to synchronize any cog to to another at the clock level. It was a little tricky because one had to generate waits that effecively were fractional instructions in length. I didn't make notes on how it was done, so one would have to experiment again, but I seem to remember that it was possible.

I have been away from the Propeller for a while now, but was fascinated in experimenting with it's intricacies, and I intend to take it up again as soon as time permits. I will try to recreate this important concept.

Cheers,

Peter (pjv)

Paul Baker · 2006-09-26 17:49

Though others have done a good job answering your questions I'll answer them with an insiders perspective.
1) No, cogs can be synchronized to any of the 4 phases, there are situations where this extra granularity is desirable.

2) When doing global memory accesses in assembly, there is only time to perform two instructions before entering into the next access if you want to catch every available window.

3) A single pulse strobe is possible by using a counter when in DUTY mode, by setting FREQ to 1 and PHS to (0 - #clock cycles delay), a single cycle pulse is generated with enough time afterwards to freeze the counter (by setting FREQ to 0) or reload PHS with the next delay to generate the next single cycle pulse. By setting FREQ to -1 ($FFFF_FFFF) opposite polarity single clock cycle pulses are also possible.

4) By properly setting the # of cycles of delay using the technique above, the pulse can occur simultaneous to the cog performing a data write operation.

5) Using both counters can be used to generate an auto-increment/decrement situation. Using the first counter as a pulse strobe of a period equal to the execution path and the second as a POS detector (or NEG detector if reverse polarity for the strobe is used), the second counter will have an address that automatically shifts in the desired direction every time it is accessed.

6) Using the barrel shifter in multiples of 8 (and optionally an AND mask) provides access to single bytes.

7) The next version of Propeller will have more memory. The others are issues that are being considered, but commenting on them at this juncture would be speculation, so I won't.

The use of onboard EEPROM using code protection as Mike has said is a security blanket. Anyone who has access to the proper tools (which are very common in the industry) and enough determination can break any code protection scheme in existence. This doesn't mean we won't consider implementing it, but it is a strong consideration.

Something that many people don't realize is that mechanical measures can be taken to provide protection comparable to software protection. I am toying with the idea of writing an application note on it, but it will be quite a while before I get to it. Here is a synopsis: Use the QFN version of the Propeller, use a similar lead-less package for the EEPROM such as the 8-DFN. Place each part on the opposite side of the board so that the SDA and SCL pads line up. Place the vias so that they are interior to the chip (care must be taken so that the vias are not too close to the ground plane, a higher quality board manufacturer with small vias may be necessary). Design the paste mask so that paste applied on the ground plane of each chip is not near the vias. Attach both parts to the board and cover both with electronic encapsulant epoxy of hardness 85 or more and high chemical resistivity. This should provide a board which is very difficult to remove the EEPROM or gain access to the SDA and SCL signal paths. Also the final board should neither have the programing interface nor use either P30 or P31 (these should have no traces leading from them) since loading a program directly into RAM would provide access to the EEPROM. This method isn't perfect, but neither is software copy protection.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Bill Henning · 2006-09-27 05:58

simonl

- thanks, interesting thread you pointed me to; not quite what I wanted, but useful!

pjv

Thanks, I will explore that as soon as I get the board I am ordering this week [noparse]:)[/noparse]

Bill Henning · 2006-09-27 05:59

Mike

1&2) Thanks, that's exactly what I was not sure about.
3&4) So with one cog, I can at best generate a 10MHz square wave (eor, djnz)
5) I was thinking of using the counter as the address for referencing global memory
6) that works for extracting bytes; writing bytes requires several instructions due to masking
7) yes, I really want more ram!

I know its not perfect, but it would still be nice to have on-board flash that was tough to read out; as I don't see a jtag on the propeller (even though it may well be there, but undocumented) it would be decent protection.

Bill Henning · 2006-09-27 06:10

Paul

Thanks for the additional information and clarification!

Yes, after considering it, I can see where the ability to sync to single master clock cycles would be handy; althought it does complicate instruction scheduling a bit if I want to get fancy and schedule tightly timed transfers (for say high-resulution VGA generation).

I can see I have to think a lot more about how the timers can be used; that looks very very useful! It looks like by being VERY careful, I could use a properly configured timer to coincide with writing a byte value to a D/A converter, and/or also use a counter to generate addresses for global memory reads... I was aware of the barrel shifter, I was lamenting the lack of an easy way of writing an isolated byte or short in cog memory, or to the IO pins.

I'm glad to hear the next propeller will have more memory [noparse]:)[/noparse]

I can hardly wait to play with the current one (will order this week if the starter kit comes into stock! I want a printed manual) and I'll also get a couple of spare chips.

Thank you for the mechanical idea, it is an interesting approach - but in my opinion, it is still easier to read out the eeprom (after separating it) than reading out the guts of a chip (propeller) without a documented jtag interface.

A followup question... if I have more than one cog wait for the same event (same clock count, same pin state change) can I assume that they will start in lock-step with each other - ie wake up at the same time?

Thank you for your help! Even getting the part I am having a blast thinking of ideas I can apply the propeller to....

Paul Baker · 2006-09-27 14:57

To answer your last question, yes the Propeller easily handles synchronizing to an event. The simplest is synchronizing to a pin(s) state. However synchronizing to a counter value is also possible, it just take a little more effort to figure out some counter value that will occur in the future, communicate this value to all cogs such that they all will be waiting on this value before it passes. Its not hard to do unless you are trying to figure the minimal clock value in achieve synchronization. When synchronization occurs all will be in lock step, however if accessing the main memory occurs, they will be 2 cycle displaced afterwards according to thier cog's location because of the nature of main memory accesses. But as log as the cogs stay·within themselves they will remain synchronized.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Post Edited (Paul Baker (Parallax)) : 9/27/2006 3:02:40 PM GMT

Mike Green · 2006-09-27 15:42

Have a look at the hires VGA text driver. This uses two synchronized cogs which alternatively load up a buffer in cog memory, then generate 4 scan lines at a resolution up to 1024 x 768. The Hydra is due to be released this Fall with lots of examples of multiple synchronized cogs doing sprite and other graphics video generation.

You can generate a square wave with a frequency up to about 128MHz using the counters. There's a Frequency Synthesizer program in the Object Library that allows you to specify two frequencies that can be mixed (since there are two counters per cog).

The built-in counters have only one bit that can be sent to an I/O pin automatically. The counter value is effectively a cog memory location so you can manipulate it with instructions.

If you want arbitrary read/write access to individual bytes in a cog table, it will take several instructions. Typically, you're scanning sequentially through a sequence of bytes which is much faster. It's usually faster to keep the byte buffer in HUB memory since that is byte addressable and, if you're careful, you can make use of most of the HUB cycle waiting time (I think there's 11 clocks between successive HUB accesses and you can do 2 - 4 clock instructions in that time).

Bill Henning · 2006-09-27 17:00

Thanks Paul, that answers my question! VERY interesting and flexible archictecture; I am ordering a couple of cheap LCD panes to drive with a propeller.

Mike - that looks like exactly what I want to take a look at! Thank you! I will also check out the Frequency Synthesizer object. Yes, it looks like I'll need to keep all frame buffer memory in hub memory, but I am still considering keeping some CLUT's and/or a few characer cells in cog memory (mostly to free up global memory).

I find the architecture fascinating... and since I don't mind assembly language and carefully hand-scheduling instructions the possibilities are amazing.

Paul Baker · 2006-09-27 17:49

Bill, a feature of Propeller assembly you will find highly useful is conditional execution, this makes symmetric multipath execution a breeze. Coding case-like branches in SX was a pain to get every branch to execute in the same number of clocks, this is virtually eliminated as a concern within the Propeller.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Mike Green · 2006-09-27 18:50

Do try running multiple displays.· It's really neat to have a program that uses a hires VGA text display and have a TV display on the side for debugging running at the same time along with a mouse, keyboard, high speed SPI channel, etc.

Synchronizing multiple cogs?

Comments