Shop OBEX P1 Docs P2 Docs Learn Events
What would you want more of, cogs or RAM? - Page 8 — Parallax Forums

What would you want more of, cogs or RAM?

1568101129

Comments

  • Paul BakerPaul Baker Posts: 6,351
    edited 2006-12-20 19:02
    Mike is correct, the ADC capabilities were a serendipitous discovery, but because it wasn't deliberately planned for it isn't optimized. The next chip will likely have one key element changed to make the current ADC capabilities better.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.

    Post Edited (Paul Baker (Parallax)) : 12/20/2006 7:09:20 PM GMT
  • paulmacpaulmac Posts: 51
    edited 2006-12-20 23:08
    Hi all,

    The Propeller is the chip which finally got me interested enough in microcontollers to actually spend cash and try to implement some ideas. Why? 8 cogs! It's jaw-dropping stuff.
    I think that 16 would be even better. freaked.gif
  • Christof Eb.Christof Eb. Posts: 1,175
    edited 2006-12-21 09:15
    Hi Chip and all,

    well, it is nearly Christmas, time for some wishes?

    16cogs because the cogs give the big versatility of the chip and they stand for the dedicated special i/o- features of other controllers.
    and as much Ram as possible.

    and:
    for easy interfacing:
    * full 5V - compatibility without these 1k resistors
    * internal switchable pullup resistors
    * the delta-sigma adc concept is good and rather fast, I think. Pehaps it could be improved if there was the ability of some hysteresis reducing the noise. A special reference voltage input could possibly enlarge accuracy?

    * I would like the hub access to be equal for all cogs as default and then give the possibiltity to assign slots for access, because there will be always slow tasks like serial i/o.

    * Assembler programs should work running from main memory too. Yes, this will be very much slower than assemler now. But this will open the system for compilers. And a Spin compiler will still be faster than the Spin interpreter. I personally like Spin. But I think, if you want to sell this chip for industry, the usage of a known language is a key feature. "Register"- variables could reside in cog ram.

    * Ability for high level debugging. I think for this there must be a possibility to freeze all cogs at a certain breakpoint in one cog. Stepping must then restart all cogs at that position until the next breakpoint is reached. The cog-ram should be readable or copied into hub-ram after a breakpoint.
  • cgraceycgracey Posts: 14,134
    edited 2006-12-21 21:55
    Sorry for my slience on this matter, but the capability was planned. It can be optimized on the next chip, though, by closing the feedback loop between two adjacent pins. That means the signals don't have to go across the chip, in and out of cogs, and back out, eating valuable feedback time.
    Paul Baker (Parallax) said...
    Mike is correct, the ADC capabilities were a serendipitous discovery, but because it wasn't deliberately planned for it isn't optimized. The next chip will likely have one key element changed to make the current ADC capabilities better.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • IanMIanM Posts: 40
    edited 2006-12-21 22:55
    Chip, any details yet on improving the RF signal generation (in terms of purity of signal) for the Prop 2?

    Cheers, Ian

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Ian Mitchell
    www.research.utas.edu.au
  • Paul BakerPaul Baker Posts: 6,351
    edited 2006-12-22 01:19
    We talked about arbitrary N/M PLLs but with 32 instances (if we go with 16 cogs) thats alot of real-estate.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • scottascotta Posts: 168
    edited 2007-01-10 18:52
    It would be nice to have the 8 cog processor, with another 8 cogs that don't share
    the hub, but communicate with the other cogs with a few bytes of ram.

    Reason: Most of my cogs doing real-time background tasks, they only need a few
    bytes in, and a few bytes out to function.

    Larger ram would help writing a C compiler.

    How about floating support in hardware ?

    Scott
  • GdSisGdSis Posts: 12
    edited 2007-01-10 19:44
    Hello,

    More I use this chip more I like it. Very different to other uC, so different that I find hard to write programs without some·kind of interrupts. I know I·have to shift my mind paradigm to·match this new chip yet, but even with 8 cogs I feel a waste to pool·for·events·or to use wait instructions. POS/NEG detector counter modes somewhat addresses·pin/timer events, but polling counter results while doing some other thing put you out of sync with those events. IMHO, a cog local interrupt (maybe implemented as a new counter mode?) would improve speed up things even more, allowing to multiplex tasks inside a cog.

    Another important thing I couldn't address yet is·some copy protection measure.

    Respectfully, Gus
  • Tracy AllenTracy Allen Posts: 6,662
    edited 2007-01-10 20:04
    When this choice first came up, I had voted for the more RAM option. But the more I think about it and consider comments here, I want to change my vote for more cogs. That really is the core attraction of the Prop.

    An aside: In yesterday's MacWorld extravaganza, Steve Jobs quoted guru Alan Kay, "People who are really serious about software should design their own hardware." That could certainly apply to Parallax and the Propeller.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
  • hinvhinv Posts: 1,255
    edited 2007-01-10 20:20
    When it comes to interupts, I never really got the hang of them. When I played with microcontrollers last time, it was an 68HC11, and if I remember right, the interrupts were to complicated, so I never actually implemented them. I used to be in the hardware peripheral camp because hardware was sooooo much faster than software.

    With the propeller, things are quite a bit different. You can use a COG as your hardware device, and should be a simple programming paradigm to pick up. Correct me if I am wrong, but you should be able to multitask in much the way DOS did, not by preempting the running program, but by a program scheduling time with a kernel, in a TSR fassion. For a Unix/Irix guy who hated DOS, this may be a same thing for me to be saying, but I understand Unix's shortcommings in RealTime applications.

    My recommandation is to go with 16 cogs ONLY if the unused ones wouldn't use up shared timeslots like the current situation. Failing that, 8cogs, 256kb.
    I am REALLY excited either way about having 64 I/O's!

    When it comes down to it, I hope that whatever path you choose, you gain popularity. I don't understand you Parallax wasn't even listed in the article in Embedded System Design October 2006 "What Processor in in your product?" article, even when they listed down to 1%. I hope the propeller fixes this.

    Thanks for reading the ramblings of a newby,

    Doug
  • Mike GreenMike Green Posts: 23,101
    edited 2007-01-10 20:44
    Keep in mind that one of the reasons for the Propeller being designed this way is to avoid the need for multitasking. Any time you share a processor among several execution "threads", you have the overhead of context switching (saving flags, registers, program counter, etc., then restoring them from somewhere else) and the added level of complexity for actions that might take some time (like I/O). When there are enough processors to do the needed simultaneous work, this all gets much simpler, efficient, and easier to understand (with the corresponding increase in reliability).

    Part of the tension between having 8 cogs (and more memory) vs. 16 cogs (and the same memory) is that there's been enough development of code that some of us are already using 8 or close to 8 cogs and foresee the need for more, yet would like more memory as well.
  • hinvhinv Posts: 1,255
    edited 2007-01-10 21:11
    Hi Mike,

    I understand the paradigm, and can't wait to get my demo board here and run your OS on it. I do wonder, however, at 160MHz/160MIPS per cog, why one would not multitask since things like watching a serial line use very few of those mips. If your clock is at 160MHz, doesn't that mean that all of the COGs will run at 160MHz? If so, that means quite a lot of power/heat used/generated just watching a serial line for instance. I would think that it would be better to do a sort of TSR type multitasking on a cog when cog utilization or power is an issue.
  • GdSisGdSis Posts: 12
    edited 2007-01-10 21:55
    Mike Green said...
    Keep in mind that one of the reasons for the Propeller being designed this way is to avoid the need for multitasking. Any time you share a processor among several execution "threads", you have the overhead of context switching (saving flags, registers, program counter, etc., then restoring them from somewhere else) and the added level of complexity for actions that might take some time (like I/O). When there are enough processors to do the needed simultaneous work, this all gets much simpler, efficient, and easier to understand (with the corresponding increase in reliability).

    Part of the tension between having 8 cogs (and more memory) vs. 16 cogs (and the same memory) is that there's been enough development of code that some of us are already using 8 or close to 8 cogs and foresee the need for more, yet would like more memory as well.
    Mike,
    The key words here are "When there are enough processors to do the needed simultaneous work". Without interrups you soon·find the 8 cog barrier. I'm just starting with this chip and I already did, I'm sure you did too.·Then you have to start·cleverly pingponging to do more tasks...and adding·all kind of problems as lost determinism, out of sync events, etc. That in my opinion adds more overhead than a simple context switching. I·feel it's a waste to have a 20/160 mips cog sit there just to watch an·event. That said I like the current design simplycity, but I think a cog can give much more juice.

    Respecfully, Gus
  • Mike GreenMike Green Posts: 23,101
    edited 2007-01-10 22:12
    With the cost of silicon real estate being what it is (low and going lower), it's not a waste to have an idle cog, particularly if it doesn't take any significant power, if what you get is simpler, more reliable, cheaper to produce code that may in fact just wait for an event to happen.
  • GdSisGdSis Posts: 12
    edited 2007-01-10 22:12
    hinv said...
    Hi Mike,

    I understand the paradigm, and can't wait to get my demo board here and run your OS on it. I do wonder, however, at 160MHz/160MIPS per cog, why one would not multitask since things like watching a serial line use very few of those mips. If your clock is at 160MHz, doesn't that mean that all of the COGs will run at 160MHz? If so, that means quite a lot of power/heat used/generated just watching a serial line for instance. I would think that it would be better to do a sort of TSR type multitasking on a cog when cog utilization or power is an issue.
    hinv,
    A Cog waiting function·runs on low power status, drawing very small power so that isn't a problem but you are right with the mips thing.
    TSRs pseudo-multitasking in the old DOS are·interrupt driven!

    Gus
  • hinvhinv Posts: 1,255
    edited 2007-01-10 22:54
    Sorry for the ignorance, but can a program running on one cog suspend or stop code running on another cog? I had suspected that TSR's were interupt driven, but there may be another way to do it.
    If you are a programmer running out of cogs, and for instance you are monitoring 4 serial ports, you could have 1 cog monitoring 4 of them, and do other stuff too in the main loop, but this would not lend itself to reusable, cheaper to produce code.

    The real issues in question as I see it are:
    1) Is the share resource system going to be the same round robin approach that would lead to lower performance shared resources with more cogs?
    2) Is there going an inexpensive, high speed way to add memory? It doesn't even have to be shared for some apps, but it has to be fast and reliable. I have seen the 30pin SIMM solution, but I don't know how fast it is because I am still waiting for my Demo Board.

    If those 2 problems are satisfactorily solved, put me in the 16cog camp, which I would think use lower power for those apps that fit in 8 cogs and 128k.
  • Mike GreenMike Green Posts: 23,101
    edited 2007-01-10 23:13
    A program running on one cog can stop any cog (including itself). There is no way to suspend a cog although the program running in a cog can wait for some outside event consuming little power.

    You're not likely to see an inexpensive, high speed way to add memory. Serial memory is relatively slow. Parallel memory is very consumptive of I/O pins (therefore chip area) and power (because of the speed and the power demand of off-chip connections). Still, SPI serial memory can easily be clocked in excess of 2MHz with the current Propeller and only uses 3-4 I/O pins.
  • LawsonLawson Posts: 870
    edited 2007-01-10 23:42
    Hopefully the Prop 2 will include the suggested upgrades to the counter/video hardware mentioned earlier in this thread. The mods that accelerate synchronis serial communication (like SPI or I2C) will go a LONG way to solving the COG to COG, Prop to Prop, and COG to external memory communication issues.

    On the issue of interrupts. I hate them personally, but with the current prop interrupts can be simulated with Bill Hennings Primatives (i.e. running assembly right out of the hub) Thankfully interupts are only one of many ways to simulate multi-tasking on a microprocessor. In the Seattle Robotics Society Encoder back articles is one on a cooperative multitasking "OS." Basically, each task periodically stores it's context and passes control to a task manager which then decides which task to run next. Another method I've seen is to use State-Machines. this is a simple extensable method to create a fast-polling system to simulate multi-tasking. I'm shure there are many other methods too.

    My 2 cents,
    Marty
  • Paul BakerPaul Baker Posts: 6,351
    edited 2007-01-10 23:48
    The cooperative multitasking method you describe is already possible using the JMPRET instruction. A lightweight multitasking routine using JMPRET is shown in the FullDuplexSerial object. A full scale context switching system is possible using the example as a base.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • Tracy AllenTracy Allen Posts: 6,662
    edited 2007-01-10 23:57
    Hinv, I think part of the answer to your question 1 is that yes, it would be a 16 cycle round robin. However, to compensate,
    -- the new prop would have a faster clock (up to 160mhz), compared to currently 80 mhz suggested max
    -- instructions are pipelined at 1:1 instead of 1:4, so compare instruction execution at 160 MIPS to current 20 MIPS
    -- the hub rotation would be 16 clock cycles for 16 cogs, compared to current 16 clock cycles for 8 cogs.
    -- hub instructions would take 2 clock cycles compared to current 7 clock cycles (leaving 14 clock cycles "free" between accesses if there are 16 cogs.
    -- the chip would have 64 i/o pins, so you could dedicate some of those to a fast parallel memory scheme, or use the high speed serial.This is after all an embedded processor and the need for any more memory at all and the specific type required will be highly application dependent.
    (above specs to be taken with grain of salt)

    There was discussion earlier in this thread about how to prioritize hub access, but it got complicated fast.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com

    Post Edited (Tracy Allen) : 1/11/2007 12:02:16 AM GMT
  • pjvpjv Posts: 1,903
    edited 2007-01-11 00:26
    Hi All;

    I'm really glad the interrupt thing has surfaced ...... for high performance, the lack of it leaves a huge hole.

    Having written a really tight preemptive multi tasking OS for the SX (only 99 bytes long), I drool at having an interrupt available in each cog to be able to do some awesome deterministic stuff with the Propeller.

    The interrupt could be very simple; just a jump vector on a counter match (JMPEQ). The silicon would not need to do much of a context save, perhaps only the return address ... the interrupt handler can likely do the rest, although will need to think that through.

    Having 8 cogs is great, and 16 is even greater, but at 160 mips each, they are spending too much time sitting waiting for someting to happen. With a simple minded clock based interrupt one would be able to make this chip SCREAM through mountains code and keep its determinism !

    Cheers,

    Peter (pjv)
  • Bill HenningBill Henning Posts: 6,445
    edited 2007-01-11 00:50
    Hi pjv,

    My large model code addresses this, I have an (untested) multi-tasking kernel written for cogs that allows them to execute code in hub memory normally with 95%+ of the performance of executing code out of cog memory (when single threaded); theoretically this code can run 20+ threads per cog. On the next generation propeller it would be trivial to extend the pico kernel to have "interrupts" that can be checked every 'n' large model instructions, vectoring to either cog or large model code; the infrastructure for this is already in place.

    The reason I have not been posting more code and work on this is due to a lack of time; currently I am working 70h+/wk leaving little time for the propeller; however I *HAVE* been squeezing some time in; working on the needed tool chain. Chip's ORGX extension to the Spin environment only whetted my appetite; the IDE was still very limiting for large model, so I started to work on a tool chain and environment for large model code (single threaded and multi-threaded pico kernels for cogs, a memory management library, and a large model assembler, to be followed by a linker and a large model compiler)

    The current status is:

    - single threaded pico kernel completed, not tested
    - multi-threaded pico kernel completed, not tested
    - memory management infrastructure defined, not implemented yet
    - macro assembler 80% completed
    - linker is currently being designed

    Given the progress being made on decoding the spin byte code, I even have some hope of being able to launch spin objects under my large model infrastructure... and I'm trying to keep the assembler as compatible with Spin's as I can, so most assembly objects will either need no modifications, or just some trivial ones.

    The assembler supports conditional assembly, nested include files, nested macros, and has built in support for generating large model code.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com - a new blog about microcontrollers
  • scottascotta Posts: 168
    edited 2007-01-11 02:03
    The ability to single step cogA from cogB, through hardware or software
  • pjvpjv Posts: 1,903
    edited 2007-01-11 05:41
    Hi Bill;

    Well, you HAVE been busy......

    Do I then understand that the code executed is assembler code? And determinism is totally the case?

    Can this work while still operating other deterministic code in different, perhaps unrelated cogs?

    Keep up the good work .... at the pace you're working, I hope your body doesn't give out !

    Cheers,

    Peter (pjv)
  • Bill HenningBill Henning Posts: 6,445
    edited 2007-01-11 06:50
    Hi pjv,
    Busy is good [noparse]:)[/noparse]
    The code executed is assembler code; see the 'large memory model' thread I started a couple of months ago for more info. Fortunately/unfortunately that is when I picked up the other consulting gig, and my hours went nuts, not leaving much time for working with the propeller.
    The large model code is not as deterministic as code running in a cog, but if you are careful, and schedule instructions carefully, it can be fairly deterministic; if you are willing to give up some performance, it can be quite deterministic (by not unrolling the fetch/exec loop, it will then take on·32 cycles per instruction for most instructions, and predictable times for others, compared to the 20 cycles per instruction for the four way unrolled loop; however you can have totally deterministic FCACHE'd blocks [noparse]:)[/noparse]
    Each cog may run in one of the following modes:
    - small model (regular assembly code, compatible with current drivers and cog code)
    -·large model / single threaded
    -·large model / multi-threaded
    I also really hope to support cogs running Spin code, Forth code, etc etc
    This evening I had some time to work on the assembler. It is now parsing source files, including nested includes, but it is not quite generating code yet (even though all the information is there; the instruction table has the binary bit patterns, the effect codes are defined, the condition codes too :-) ), nor are macros fully implemented yet. I'm within a few hours of work of generating static object code; so I'm pausing to define a loadable object format. I'm thinking something simple to start with, assembling to static addresses, generating a .obj file and a .sym file for the linker (to be written).
    I'll probably start putting up some design docs on my blog soon.
    Best,
    Bill
    p.s.
    Thanks for the concern - my body·did give·out a bit yesterday... I slept right through my LOUD alarm clock. Guess I was exhausted.
    ·UPDATE: assembler is generating code now :-) .... I am testing it for correctness now
    pjv said...
    Hi Bill;

    Well, you HAVE been busy......

    Do I then understand that the code executed is assembler code? And determinism is totally the case?

    Can this work while still operating other deterministic code in different, perhaps unrelated cogs?

    Keep up the good work .... at the pace you're working, I hope your body doesn't give out !

    Cheers,

    Peter (pjv)
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com - a new blog about microcontrollers

    Post Edited (Bill Henning) : 1/11/2007 8:52:49 AM GMT
  • GadgetmanGadgetman Posts: 2,436
    edited 2007-01-11 09:24
    I'm moving more and more over to the 16COG camp...

    The reason?
    I got hold of a 'Silverlit X-uFO' RC-controlled flying... whatever...
    I want to rip out the junk electronics and put a Propeller inside, making it semi-autonomous.
    (Of course a machine with 4 spinning propellers needs a Propeller chip)

    The problem?
    I NEED to read 2 - 5 Analog signals for stability(2-axis gyro, and as I progress, 3-axis accellereometer. This replaces the original 2-axis mechanical whirlygig gyro), and I'll be using IR proximity sensors to avoid bumping it into stuff. That will take at least 2(bottom and forwards) and up to 6(top, bottom, four directions) AD inputs. That is at least 4 AD inputs, with a theoretical 11 possible.

    That doesn't leave a lot of COGs free to do DA to control the 4 separately-powered motors...
    (Not to mention, any AI or receiving commands)

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Don't visit my new website...
  • GavinGavin Posts: 134
    edited 2007-01-11 12:22
    Gadgetman,
    Get pwm output sensors not the analog ones, measure period width with a cog.
    Could even multiplex them if you want onto one input pin.
    Two timers per cog, but I reckon you only need one cog for 4 pwm motor speeds.
    Good way to start would be a ball bot, write code to stop it falling over.

    Had a similar idea, looking for a very small GPS unit to add to gyro, accels for micro navigation module.
    Prop is perfect for UAV stuff. Was thinking same module could plug into CAR DVD player and read maps off SD card.

    Got my second micro SD card and ordered the hydra book to study graphics, maps only need 8 colours but I don't know enough about bitmaps with the prop. Putting 3D flight paths into a SD card is another level above that.

    Gavin
  • pjvpjv Posts: 1,903
    edited 2007-01-11 16:27
    Hi Gadgetman;

    You don't neccessarily have to use the Parallax proposed counter method of making a virtual A/D. The method use in the SXes also works fine, then use the cog's counter as the time base for setting the epoch of the conversion, and internal cog memory as accumulators. In this manner a single cog can operate as 16 A/Ds limited by the 32 pin count.

    For slightly poorer performance and speed, you can in fact create up to 32 single pin virtual A/Ds all in one cog. In this case the limit would be the width of the bus, as well as the pin count.

    Cheers,

    Peter (pjv)
  • GadgetmanGadgetman Posts: 2,436
    edited 2007-01-11 17:55
    Thanks for all the tips...

    It may be an idea to take it to another thread before the great guys at Parallax accuses me of thread hijacking...
    http://forums.parallax.com/forums/default.aspx?f=15&m=159155

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Don't visit my new website...
  • crgwbrcrgwbr Posts: 614
    edited 2007-01-11 18:07
    I might of read it incorectly; but, did some one same 160 MIPS per Cog and 16 Cogs. If thats true, that would equal a total 2.56 GHz of proccessing power. Thats about double the speed my computer runs at. Add a hard drive and some more ram and you've got yourself a pretty decent desktop computer.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    NerdMaster
    For
    Life
Sign In or Register to comment.