Synchronizing multiple cogs?
Bill Henning
Posts: 6,445
Hi,
I've been reading the Propeller manual, and it has been giving me a LOT of interesting ideas; however I find that a few things are not clear... and a few things I'd like to see in the future
1) Are all the cogs in lock-step?
ie most instructions take four clocks to execute (a few take 8), and it takes 7-22 clocks to synchronize to the hub. Is it safe to assume that the four cycle "instruction windows" are in sync, so that there cannot be a 1·- 3 cycle offset between instruction timing cross-cog?
2) If I do a READ/WRITE to global memory, and I am careful to schedule instructions in 16 clock "bundles", can· I count on the next global window to be in sync?
3) A "pulse" instruction, that could be used as a single instruction strobe signal would be great... in both
·· H->L and L->H flavors [noparse]:)[/noparse]
4) exposing a "STROBE" signal that was an output whenever a value was written to the output port would also be great
5) a counter mode where the counter incremented every time it was read would be great, it would allow it to be used as an auto increment pointer
6)·I can't see any way of byte-addressing a cog's local memory!
7) MORE RAM PLEASE! ... for compatibility, leave the 32k ram / 32k rom where it is, but please add at least another 64k of global ram (128 would be even better) and 32k-64k of flash (especially if it was read protectable - I can't see how to ship an application where the code cannot be read out!)
I must say, I am very very close to ordering a devel board and a few chips... I've been having a lot of fun playing with the propeller on paper, seeing what I could make it do.
Best,
Bill
Post Edited (Bill Henning) : 9/26/2006 8:08:36 AM GMT
I've been reading the Propeller manual, and it has been giving me a LOT of interesting ideas; however I find that a few things are not clear... and a few things I'd like to see in the future
1) Are all the cogs in lock-step?
ie most instructions take four clocks to execute (a few take 8), and it takes 7-22 clocks to synchronize to the hub. Is it safe to assume that the four cycle "instruction windows" are in sync, so that there cannot be a 1·- 3 cycle offset between instruction timing cross-cog?
2) If I do a READ/WRITE to global memory, and I am careful to schedule instructions in 16 clock "bundles", can· I count on the next global window to be in sync?
3) A "pulse" instruction, that could be used as a single instruction strobe signal would be great... in both
·· H->L and L->H flavors [noparse]:)[/noparse]
4) exposing a "STROBE" signal that was an output whenever a value was written to the output port would also be great
5) a counter mode where the counter incremented every time it was read would be great, it would allow it to be used as an auto increment pointer
6)·I can't see any way of byte-addressing a cog's local memory!
7) MORE RAM PLEASE! ... for compatibility, leave the 32k ram / 32k rom where it is, but please add at least another 64k of global ram (128 would be even better) and 32k-64k of flash (especially if it was read protectable - I can't see how to ship an application where the code cannot be read out!)
I must say, I am very very close to ordering a devel board and a few chips... I've been having a lot of fun playing with the propeller on paper, seeing what I could make it do.
Best,
Bill
Post Edited (Bill Henning) : 9/26/2006 8:08:36 AM GMT
Comments
Not sure if this is what you're looking for: http://forums.parallax.com/forums/default.aspx?f=25&m=136129
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cheers,
Simon
2) It depends on what else you're doing (see #1)
3 & 4) In assembly language, you have a granularity of 50ns per instruction and you can write to all output bits in one instruction so you can have a strobe that's synchronous with the data write. The minimum strobe width is 50ns.
5) Maybe. Pointers in assembly are awkward at very high speeds. It really takes several instructions to manage them, particularly storing results in cog memory (you have to increment the instruction itself since there's no indirection). Still, with the raw speed, you can do a lot very fast.
6) Yep! With the barrel shifter, it's very easy to break apart 32 bit words. You can put out a byte stream with a 50ns strobe at a rate of one byte every 500ns packing 4 bytes per long word.
7) Yeah. More RAM would be nice and, I believe is intended for the next generation of Propeller chips, but you can do a lot with what's there and you can easily attach more. There's serial EEPROM up to several MB and serial FRAM up to 32K bytes per chip. You can directly access SD/MMC cards if you want. With a little external circuitry, you can use external parallel static RAM although it eats up I/O pins.
There has been extensive discussion here with Parallax on the issue of protecting intellectual property with the Propeller. Their feeling is that you can't prevent determined people from stealing your object code, that there are lots of ways to make it hard enough with the existing system that casual thieves won't do it successfully, and the cost of trying to do it better at present would unreasonably limit the capabilities of the Propeller chip.
It's one thing to fiddle with a paper description. The Demo Board is cheap enough that you can see what the thing can actually do and really judge the development environment.
In regard to your first point, (and please don't hold me to this as I'm going by memory) that when I was first exploring the Propeller, I was able to synchronize any cog to to another at the clock level. It was a little tricky because one had to generate waits that effecively were fractional instructions in length. I didn't make notes on how it was done, so one would have to experiment again, but I seem to remember that it was possible.
I have been away from the Propeller for a while now, but was fascinated in experimenting with it's intricacies, and I intend to take it up again as soon as time permits. I will try to recreate this important concept.
Cheers,
Peter (pjv)
1) No, cogs can be synchronized to any of the 4 phases, there are situations where this extra granularity is desirable.
2) When doing global memory accesses in assembly, there is only time to perform two instructions before entering into the next access if you want to catch every available window.
3) A single pulse strobe is possible by using a counter when in DUTY mode, by setting FREQ to 1 and PHS to (0 - #clock cycles delay), a single cycle pulse is generated with enough time afterwards to freeze the counter (by setting FREQ to 0) or reload PHS with the next delay to generate the next single cycle pulse. By setting FREQ to -1 ($FFFF_FFFF) opposite polarity single clock cycle pulses are also possible.
4) By properly setting the # of cycles of delay using the technique above, the pulse can occur simultaneous to the cog performing a data write operation.
5) Using both counters can be used to generate an auto-increment/decrement situation. Using the first counter as a pulse strobe of a period equal to the execution path and the second as a POS detector (or NEG detector if reverse polarity for the strobe is used), the second counter will have an address that automatically shifts in the desired direction every time it is accessed.
6) Using the barrel shifter in multiples of 8 (and optionally an AND mask) provides access to single bytes.
7) The next version of Propeller will have more memory. The others are issues that are being considered, but commenting on them at this juncture would be speculation, so I won't.
The use of onboard EEPROM using code protection as Mike has said is a security blanket. Anyone who has access to the proper tools (which are very common in the industry) and enough determination can break any code protection scheme in existence. This doesn't mean we won't consider implementing it, but it is a strong consideration.
Something that many people don't realize is that mechanical measures can be taken to provide protection comparable to software protection. I am toying with the idea of writing an application note on it, but it will be quite a while before I get to it. Here is a synopsis: Use the QFN version of the Propeller, use a similar lead-less package for the EEPROM such as the 8-DFN. Place each part on the opposite side of the board so that the SDA and SCL pads line up. Place the vias so that they are interior to the chip (care must be taken so that the vias are not too close to the ground plane, a higher quality board manufacturer with small vias may be necessary). Design the paste mask so that paste applied on the ground plane of each chip is not near the vias. Attach both parts to the board and cover both with electronic encapsulant epoxy of hardness 85 or more and high chemical resistivity. This should provide a board which is very difficult to remove the EEPROM or gain access to the SDA and SCL signal paths. Also the final board should neither have the programing interface nor use either P30 or P31 (these should have no traces leading from them) since loading a program directly into RAM would provide access to the EEPROM. This method isn't perfect, but neither is software copy protection.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
- thanks, interesting thread you pointed me to; not quite what I wanted, but useful!
pjv
Thanks, I will explore that as soon as I get the board I am ordering this week [noparse]:)[/noparse]
1&2) Thanks, that's exactly what I was not sure about.
3&4) So with one cog, I can at best generate a 10MHz square wave (eor, djnz)
5) I was thinking of using the counter as the address for referencing global memory
6) that works for extracting bytes; writing bytes requires several instructions due to masking
7) yes, I really want more ram!
I know its not perfect, but it would still be nice to have on-board flash that was tough to read out; as I don't see a jtag on the propeller (even though it may well be there, but undocumented) it would be decent protection.
Thanks for the additional information and clarification!
Yes, after considering it, I can see where the ability to sync to single master clock cycles would be handy; althought it does complicate instruction scheduling a bit if I want to get fancy and schedule tightly timed transfers (for say high-resulution VGA generation).
I can see I have to think a lot more about how the timers can be used; that looks very very useful! It looks like by being VERY careful, I could use a properly configured timer to coincide with writing a byte value to a D/A converter, and/or also use a counter to generate addresses for global memory reads... I was aware of the barrel shifter, I was lamenting the lack of an easy way of writing an isolated byte or short in cog memory, or to the IO pins.
I'm glad to hear the next propeller will have more memory [noparse]:)[/noparse]
I can hardly wait to play with the current one (will order this week if the starter kit comes into stock! I want a printed manual) and I'll also get a couple of spare chips.
Thank you for the mechanical idea, it is an interesting approach - but in my opinion, it is still easier to read out the eeprom (after separating it) than reading out the guts of a chip (propeller) without a documented jtag interface.
A followup question... if I have more than one cog wait for the same event (same clock count, same pin state change) can I assume that they will start in lock-step with each other - ie wake up at the same time?
Thank you for your help! Even getting the part I am having a blast thinking of ideas I can apply the propeller to....
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
Post Edited (Paul Baker (Parallax)) : 9/27/2006 3:02:40 PM GMT
You can generate a square wave with a frequency up to about 128MHz using the counters. There's a Frequency Synthesizer program in the Object Library that allows you to specify two frequencies that can be mixed (since there are two counters per cog).
The built-in counters have only one bit that can be sent to an I/O pin automatically. The counter value is effectively a cog memory location so you can manipulate it with instructions.
If you want arbitrary read/write access to individual bytes in a cog table, it will take several instructions. Typically, you're scanning sequentially through a sequence of bytes which is much faster. It's usually faster to keep the byte buffer in HUB memory since that is byte addressable and, if you're careful, you can make use of most of the HUB cycle waiting time (I think there's 11 clocks between successive HUB accesses and you can do 2 - 4 clock instructions in that time).
Mike - that looks like exactly what I want to take a look at! Thank you! I will also check out the Frequency Synthesizer object. Yes, it looks like I'll need to keep all frame buffer memory in hub memory, but I am still considering keeping some CLUT's and/or a few characer cells in cog memory (mostly to free up global memory).
I find the architecture fascinating... and since I don't mind assembly language and carefully hand-scheduling instructions the possibilities are amazing.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.