What would you want more of, cogs or RAM?

Paul Baker · 2006-12-20 19:02

Mike is correct, the ADC capabilities were a serendipitous discovery, but because it wasn't deliberately planned for it isn't optimized. The next chip will likely have one key element changed to make the current ADC capabilities better.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Post Edited (Paul Baker (Parallax)) : 12/20/2006 7:09:20 PM GMT

paulmac · 2006-12-20 23:08

Hi all,

The Propeller is the chip which finally got me interested enough in microcontollers to actually spend cash and try to implement some ideas. Why? 8 cogs! It's jaw-dropping stuff.
I think that 16 would be even better.

Christof Eb. · 2006-12-21 09:15

Hi Chip and all,

well, it is nearly Christmas, time for some wishes?

16cogs because the cogs give the big versatility of the chip and they stand for the dedicated special i/o- features of other controllers.
and as much Ram as possible.

and:
for easy interfacing:
* full 5V - compatibility without these 1k resistors
* internal switchable pullup resistors
* the delta-sigma adc concept is good and rather fast, I think. Pehaps it could be improved if there was the ability of some hysteresis reducing the noise. A special reference voltage input could possibly enlarge accuracy?

* I would like the hub access to be equal for all cogs as default and then give the possibiltity to assign slots for access, because there will be always slow tasks like serial i/o.

* Assembler programs should work running from main memory too. Yes, this will be very much slower than assemler now. But this will open the system for compilers. And a Spin compiler will still be faster than the Spin interpreter. I personally like Spin. But I think, if you want to sell this chip for industry, the usage of a known language is a key feature. "Register"- variables could reside in cog ram.

* Ability for high level debugging. I think for this there must be a possibility to freeze all cogs at a certain breakpoint in one cog. Stepping must then restart all cogs at that position until the next breakpoint is reached. The cog-ram should be readable or copied into hub-ram after a breakpoint.

cgracey · 2006-12-21 21:55

Sorry for my slience on this matter, but the capability was planned. It can be optimized on the next chip, though, by closing the feedback loop between two adjacent pins. That means the signals don't have to go across the chip, in and out of cogs, and back out, eating valuable feedback time.

Paul Baker (Parallax) said...
Mike is correct, the ADC capabilities were a serendipitous discovery, but because it wasn't deliberately planned for it isn't optimized. The next chip will likely have one key element changed to make the current ADC capabilities better.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

IanM · 2006-12-21 22:55

Chip, any details yet on improving the RF signal generation (in terms of purity of signal) for the Prop 2?

Cheers, Ian

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Ian Mitchell
www.research.utas.edu.au

Paul Baker · 2006-12-22 01:19

We talked about arbitrary N/M PLLs but with 32 instances (if we go with 16 cogs) thats alot of real-estate.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

scotta · 2007-01-10 18:52

It would be nice to have the 8 cog processor, with another 8 cogs that don't share
the hub, but communicate with the other cogs with a few bytes of ram.

Reason: Most of my cogs doing real-time background tasks, they only need a few
bytes in, and a few bytes out to function.

Larger ram would help writing a C compiler.

How about floating support in hardware ?

Scott

GdSis · 2007-01-10 19:44

Hello,

More I use this chip more I like it. Very different to other uC, so different that I find hard to write programs without some·kind of interrupts. I know I·have to shift my mind paradigm to·match this new chip yet, but even with 8 cogs I feel a waste to pool·for·events·or to use wait instructions. POS/NEG detector counter modes somewhat addresses·pin/timer events, but polling counter results while doing some other thing put you out of sync with those events. IMHO, a cog local interrupt (maybe implemented as a new counter mode?) would improve speed up things even more, allowing to multiplex tasks inside a cog.

Another important thing I couldn't address yet is·some copy protection measure.

Respectfully, Gus

Tracy Allen · 2007-01-10 20:04

When this choice first came up, I had voted for the more RAM option. But the more I think about it and consider comments here, I want to change my vote for more cogs. That really is the core attraction of the Prop.

An aside: In yesterday's MacWorld extravaganza, Steve Jobs quoted guru Alan Kay, "People who are really serious about software should design their own hardware." That could certainly apply to Parallax and the Propeller.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

hinv · 2007-01-10 20:20

When it comes to interupts, I never really got the hang of them. When I played with microcontrollers last time, it was an 68HC11, and if I remember right, the interrupts were to complicated, so I never actually implemented them. I used to be in the hardware peripheral camp because hardware was sooooo much faster than software.

With the propeller, things are quite a bit different. You can use a COG as your hardware device, and should be a simple programming paradigm to pick up. Correct me if I am wrong, but you should be able to multitask in much the way DOS did, not by preempting the running program, but by a program scheduling time with a kernel, in a TSR fassion. For a Unix/Irix guy who hated DOS, this may be a same thing for me to be saying, but I understand Unix's shortcommings in RealTime applications.

My recommandation is to go with 16 cogs ONLY if the unused ones wouldn't use up shared timeslots like the current situation. Failing that, 8cogs, 256kb.
I am REALLY excited either way about having 64 I/O's!

When it comes down to it, I hope that whatever path you choose, you gain popularity. I don't understand you Parallax wasn't even listed in the article in Embedded System Design October 2006 "What Processor in in your product?" article, even when they listed down to 1%. I hope the propeller fixes this.

Thanks for reading the ramblings of a newby,

Doug

Mike Green · 2007-01-10 20:44

Keep in mind that one of the reasons for the Propeller being designed this way is to avoid the need for multitasking. Any time you share a processor among several execution "threads", you have the overhead of context switching (saving flags, registers, program counter, etc., then restoring them from somewhere else) and the added level of complexity for actions that might take some time (like I/O). When there are enough processors to do the needed simultaneous work, this all gets much simpler, efficient, and easier to understand (with the corresponding increase in reliability).

Part of the tension between having 8 cogs (and more memory) vs. 16 cogs (and the same memory) is that there's been enough development of code that some of us are already using 8 or close to 8 cogs and foresee the need for more, yet would like more memory as well.

hinv · 2007-01-10 21:11

Hi Mike,

I understand the paradigm, and can't wait to get my demo board here and run your OS on it. I do wonder, however, at 160MHz/160MIPS per cog, why one would not multitask since things like watching a serial line use very few of those mips. If your clock is at 160MHz, doesn't that mean that all of the COGs will run at 160MHz? If so, that means quite a lot of power/heat used/generated just watching a serial line for instance. I would think that it would be better to do a sort of TSR type multitasking on a cog when cog utilization or power is an issue.

GdSis · 2007-01-10 21:55

Mike Green said...
Keep in mind that one of the reasons for the Propeller being designed this way is to avoid the need for multitasking. Any time you share a processor among several execution "threads", you have the overhead of context switching (saving flags, registers, program counter, etc., then restoring them from somewhere else) and the added level of complexity for actions that might take some time (like I/O). When there are enough processors to do the needed simultaneous work, this all gets much simpler, efficient, and easier to understand (with the corresponding increase in reliability).

Part of the tension between having 8 cogs (and more memory) vs. 16 cogs (and the same memory) is that there's been enough development of code that some of us are already using 8 or close to 8 cogs and foresee the need for more, yet would like more memory as well.

Mike,
The key words here are "When there are enough processors to do the needed simultaneous work". Without interrups you soon·find the 8 cog barrier. I'm just starting with this chip and I already did, I'm sure you did too.·Then you have to start·cleverly pingponging to do more tasks...and adding·all kind of problems as lost determinism, out of sync events, etc. That in my opinion adds more overhead than a simple context switching. I·feel it's a waste to have a 20/160 mips cog sit there just to watch an·event. That said I like the current design simplycity, but I think a cog can give much more juice.

Respecfully, Gus

Mike Green · 2007-01-10 22:12

With the cost of silicon real estate being what it is (low and going lower), it's not a waste to have an idle cog, particularly if it doesn't take any significant power, if what you get is simpler, more reliable, cheaper to produce code that may in fact just wait for an event to happen.

GdSis · 2007-01-10 22:12

hinv said...
Hi Mike,

I understand the paradigm, and can't wait to get my demo board here and run your OS on it. I do wonder, however, at 160MHz/160MIPS per cog, why one would not multitask since things like watching a serial line use very few of those mips. If your clock is at 160MHz, doesn't that mean that all of the COGs will run at 160MHz? If so, that means quite a lot of power/heat used/generated just watching a serial line for instance. I would think that it would be better to do a sort of TSR type multitasking on a cog when cog utilization or power is an issue.

hinv,
A Cog waiting function·runs on low power status, drawing very small power so that isn't a problem but you are right with the mips thing.
TSRs pseudo-multitasking in the old DOS are·interrupt driven!

Gus

hinv · 2007-01-10 22:54

Sorry for the ignorance, but can a program running on one cog suspend or stop code running on another cog? I had suspected that TSR's were interupt driven, but there may be another way to do it.
If you are a programmer running out of cogs, and for instance you are monitoring 4 serial ports, you could have 1 cog monitoring 4 of them, and do other stuff too in the main loop, but this would not lend itself to reusable, cheaper to produce code.

The real issues in question as I see it are:
1) Is the share resource system going to be the same round robin approach that would lead to lower performance shared resources with more cogs?
2) Is there going an inexpensive, high speed way to add memory? It doesn't even have to be shared for some apps, but it has to be fast and reliable. I have seen the 30pin SIMM solution, but I don't know how fast it is because I am still waiting for my Demo Board.

If those 2 problems are satisfactorily solved, put me in the 16cog camp, which I would think use lower power for those apps that fit in 8 cogs and 128k.

Mike Green · 2007-01-10 23:13

A program running on one cog can stop any cog (including itself). There is no way to suspend a cog although the program running in a cog can wait for some outside event consuming little power.

You're not likely to see an inexpensive, high speed way to add memory. Serial memory is relatively slow. Parallel memory is very consumptive of I/O pins (therefore chip area) and power (because of the speed and the power demand of off-chip connections). Still, SPI serial memory can easily be clocked in excess of 2MHz with the current Propeller and only uses 3-4 I/O pins.

Lawson · 2007-01-10 23:42

Hopefully the Prop 2 will include the suggested upgrades to the counter/video hardware mentioned earlier in this thread. The mods that accelerate synchronis serial communication (like SPI or I2C) will go a LONG way to solving the COG to COG, Prop to Prop, and COG to external memory communication issues.

On the issue of interrupts. I hate them personally, but with the current prop interrupts can be simulated with Bill Hennings Primatives (i.e. running assembly right out of the hub) Thankfully interupts are only one of many ways to simulate multi-tasking on a microprocessor. In the Seattle Robotics Society Encoder back articles is one on a cooperative multitasking "OS." Basically, each task periodically stores it's context and passes control to a task manager which then decides which task to run next. Another method I've seen is to use State-Machines. this is a simple extensable method to create a fast-polling system to simulate multi-tasking. I'm shure there are many other methods too.

My 2 cents,
Marty

Paul Baker · 2007-01-10 23:48

The cooperative multitasking method you describe is already possible using the JMPRET instruction. A lightweight multitasking routine using JMPRET is shown in the FullDuplexSerial object. A full scale context switching system is possible using the example as a base.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Tracy Allen · 2007-01-10 23:57

Hinv, I think part of the answer to your question 1 is that yes, it would be a 16 cycle round robin. However, to compensate,
-- the new prop would have a faster clock (up to 160mhz), compared to currently 80 mhz suggested max
-- instructions are pipelined at 1:1 instead of 1:4, so compare instruction execution at 160 MIPS to current 20 MIPS
-- the hub rotation would be 16 clock cycles for 16 cogs, compared to current 16 clock cycles for 8 cogs.
-- hub instructions would take 2 clock cycles compared to current 7 clock cycles (leaving 14 clock cycles "free" between accesses if there are 16 cogs.
-- the chip would have 64 i/o pins, so you could dedicate some of those to a fast parallel memory scheme, or use the high speed serial.This is after all an embedded processor and the need for any more memory at all and the specific type required will be highly application dependent.
(above specs to be taken with grain of salt)

There was discussion earlier in this thread about how to prioritize hub access, but it got complicated fast.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Post Edited (Tracy Allen) : 1/11/2007 12:02:16 AM GMT

pjv · 2007-01-11 00:26

Hi All;

I'm really glad the interrupt thing has surfaced ...... for high performance, the lack of it leaves a huge hole.

Having written a really tight preemptive multi tasking OS for the SX (only 99 bytes long), I drool at having an interrupt available in each cog to be able to do some awesome deterministic stuff with the Propeller.

The interrupt could be very simple; just a jump vector on a counter match (JMPEQ). The silicon would not need to do much of a context save, perhaps only the return address ... the interrupt handler can likely do the rest, although will need to think that through.

Having 8 cogs is great, and 16 is even greater, but at 160 mips each, they are spending too much time sitting waiting for someting to happen. With a simple minded clock based interrupt one would be able to make this chip SCREAM through mountains code and keep its determinism !

Cheers,

Peter (pjv)

Bill Henning · 2007-01-11 00:50

Hi pjv,

My large model code addresses this, I have an (untested) multi-tasking kernel written for cogs that allows them to execute code in hub memory normally with 95%+ of the performance of executing code out of cog memory (when single threaded); theoretically this code can run 20+ threads per cog. On the next generation propeller it would be trivial to extend the pico kernel to have "interrupts" that can be checked every 'n' large model instructions, vectoring to either cog or large model code; the infrastructure for this is already in place.

The reason I have not been posting more code and work on this is due to a lack of time; currently I am working 70h+/wk leaving little time for the propeller; however I *HAVE* been squeezing some time in; working on the needed tool chain. Chip's ORGX extension to the Spin environment only whetted my appetite; the IDE was still very limiting for large model, so I started to work on a tool chain and environment for large model code (single threaded and multi-threaded pico kernels for cogs, a memory management library, and a large model assembler, to be followed by a linker and a large model compiler)

The current status is:

- single threaded pico kernel completed, not tested
- multi-threaded pico kernel completed, not tested
- memory management infrastructure defined, not implemented yet
- macro assembler 80% completed
- linker is currently being designed

Given the progress being made on decoding the spin byte code, I even have some hope of being able to launch spin objects under my large model infrastructure... and I'm trying to keep the assembler as compatible with Spin's as I can, so most assembly objects will either need no modifications, or just some trivial ones.

The assembler supports conditional assembly, nested include files, nested macros, and has built in support for generating large model code.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

scotta · 2007-01-11 02:03

The ability to single step cogA from cogB, through hardware or software

pjv · 2007-01-11 05:41

Hi Bill;

Well, you HAVE been busy......

Do I then understand that the code executed is assembler code? And determinism is totally the case?

Can this work while still operating other deterministic code in different, perhaps unrelated cogs?

Keep up the good work .... at the pace you're working, I hope your body doesn't give out !

Cheers,

Peter (pjv)

Bill Henning · 2007-01-11 06:50

Hi pjv,
Busy is good [noparse]:)[/noparse]
The code executed is assembler code; see the 'large memory model' thread I started a couple of months ago for more info. Fortunately/unfortunately that is when I picked up the other consulting gig, and my hours went nuts, not leaving much time for working with the propeller.
The large model code is not as deterministic as code running in a cog, but if you are careful, and schedule instructions carefully, it can be fairly deterministic; if you are willing to give up some performance, it can be quite deterministic (by not unrolling the fetch/exec loop, it will then take on·32 cycles per instruction for most instructions, and predictable times for others, compared to the 20 cycles per instruction for the four way unrolled loop; however you can have totally deterministic FCACHE'd blocks [noparse]:)[/noparse]
Each cog may run in one of the following modes:
- small model (regular assembly code, compatible with current drivers and cog code)
-·large model / single threaded
-·large model / multi-threaded
I also really hope to support cogs running Spin code, Forth code, etc etc
This evening I had some time to work on the assembler. It is now parsing source files, including nested includes, but it is not quite generating code yet (even though all the information is there; the instruction table has the binary bit patterns, the effect codes are defined, the condition codes too :-) ), nor are macros fully implemented yet. I'm within a few hours of work of generating static object code; so I'm pausing to define a loadable object format. I'm thinking something simple to start with, assembling to static addresses, generating a .obj file and a .sym file for the linker (to be written).
I'll probably start putting up some design docs on my blog soon.
Best,
Bill
p.s.
Thanks for the concern - my body·did give·out a bit yesterday... I slept right through my LOUD alarm clock. Guess I was exhausted.
·UPDATE: assembler is generating code now :-) .... I am testing it for correctness now

pjv said...
Hi Bill;

Well, you HAVE been busy......

Do I then understand that the code executed is assembler code? And determinism is totally the case?

Can this work while still operating other deterministic code in different, perhaps unrelated cogs?

Keep up the good work .... at the pace you're working, I hope your body doesn't give out !

Cheers,

Peter (pjv)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

Post Edited (Bill Henning) : 1/11/2007 8:52:49 AM GMT

Gadgetman · 2007-01-11 09:24

I'm moving more and more over to the 16COG camp...

The reason?
I got hold of a 'Silverlit X-uFO' RC-controlled flying... whatever...
I want to rip out the junk electronics and put a Propeller inside, making it semi-autonomous.
(Of course a machine with 4 spinning propellers needs a Propeller chip)

The problem?
I NEED to read 2 - 5 Analog signals for stability(2-axis gyro, and as I progress, 3-axis accellereometer. This replaces the original 2-axis mechanical whirlygig gyro), and I'll be using IR proximity sensors to avoid bumping it into stuff. That will take at least 2(bottom and forwards) and up to 6(top, bottom, four directions) AD inputs. That is at least 4 AD inputs, with a theoretical 11 possible.

That doesn't leave a lot of COGs free to do DA to control the 4 separately-powered motors...
(Not to mention, any AI or receiving commands)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...

Gavin · 2007-01-11 12:22

Gadgetman,
Get pwm output sensors not the analog ones, measure period width with a cog.
Could even multiplex them if you want onto one input pin.
Two timers per cog, but I reckon you only need one cog for 4 pwm motor speeds.
Good way to start would be a ball bot, write code to stop it falling over.

Had a similar idea, looking for a very small GPS unit to add to gyro, accels for micro navigation module.
Prop is perfect for UAV stuff. Was thinking same module could plug into CAR DVD player and read maps off SD card.

Got my second micro SD card and ordered the hydra book to study graphics, maps only need 8 colours but I don't know enough about bitmaps with the prop. Putting 3D flight paths into a SD card is another level above that.

Gavin

pjv · 2007-01-11 16:27

Hi Gadgetman;

You don't neccessarily have to use the Parallax proposed counter method of making a virtual A/D. The method use in the SXes also works fine, then use the cog's counter as the time base for setting the epoch of the conversion, and internal cog memory as accumulators. In this manner a single cog can operate as 16 A/Ds limited by the 32 pin count.

For slightly poorer performance and speed, you can in fact create up to 32 single pin virtual A/Ds all in one cog. In this case the limit would be the width of the bus, as well as the pin count.

Cheers,

Peter (pjv)

Gadgetman · 2007-01-11 17:55

Thanks for all the tips...

It may be an idea to take it to another thread before the great guys at Parallax accuses me of thread hijacking...
http://forums.parallax.com/forums/default.aspx?f=15&m=159155

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...

crgwbr · 2007-01-11 18:07

I might of read it incorectly; but, did some one same 160 MIPS per Cog and 16 Cogs. If thats true, that would equal a total 2.56 GHz of proccessing power. Thats about double the speed my computer runs at. Add a hard drive and some more ram and you've got yourself a pretty decent desktop computer.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
NerdMaster
For
Life

What would you want more of, cogs or RAM?

Comments