What would you want more of, cogs or RAM?

cgraceycgracey Posts: 11,130
edited 2010-12-15 - 16:21:50 in Propeller 1
What would you rather have in a future Propeller chip:

Option·1: 16 cogs with 128KB of hub RAM. Hub access once every 16 clocks.
Option·2: 8 cogs with 256KB of hub RAM. Hub access once every 8 clocks.

Note that each cog would run at about 160 MIPS, as opposed to the current 20 MIPS.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


Chip Gracey
Parallax, Inc.

Post Edited (Chip Gracey (Parallax)) : 11/25/2006 6:16:55 AM GMT
«13456729

Comments

  • AndreLAndreL Posts: 1,004
    edited 2006-11-25 - 06:19:57
    Guess what -- why decide, build both! Seriously, problems that are compute intensive use one model, problems that are codespace intensive use the other prop. Think BIG, think 3-5 years from now. Microchip/AVR/Renesas each have about a zillion flavors, so I think that a little bit of variety might be the way to go here.

    Andre'
  • CobaltCobalt Posts: 31
    edited 2006-11-25 - 06:32:29
    Personally, I would like to see more cogs (and hopefully more I/O pins with them).

    This is because I feel that with more than 8 cogs, you can make some very impressive embedded systems!
    I know that I would love to be able to make system that can use multiple vga monitors, a keyboard, mouse and other interface nonsence.· The extra cogs would let you run multiple vga's at higher resolutions and have enough cogs left to support the other stuff.· More I/O pins would rock as well...because at 8 pins per vga hookup, the current Propeller can only support 4 monitors maximum - and that is using every pin possible.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    while alive = 1
    wakeup
    program(propeller)
    eat(3)
    sleep(7)
  • Bill HenningBill Henning Posts: 6,445
    edited 2006-11-25 - 07:29:32
    Hmm... i surface from consulting work, and face a tough question...

    Before I can intelligently decide which way to vote, I'd like to know how many cycles the hub instructions will take...

    Also, I'm guessing you are targeting 160MHz... correct?
    Chip Gracey (Parallax) said...
    What would you rather have in a future Propeller chip:

    Option·1: 16 cogs with 128KB of hub RAM. Hub access once every 16 clocks.
    Option·2: 8 cogs with 256KB of hub RAM. Hub access once every 8 clocks.

    Note that each cog would run at about 160 MIPS, as opposed to the current 20 MIPS.
    www.mikronauts.com / E-mail: mikronauts _at_ gmail _dot_ com / @Mikronauts on Twitter
    RoboPi: The most advanced Robot controller for the Raspberry Pi (Propeller based)
  • cgraceycgracey Posts: 11,130
    edited 2006-11-25 - 07:32:27
    Bill Henning said...

    Hmm... i surface from consulting work, and face a tough question...

    Before I can intelligently decide which way to vote, I'd like to know how many cycles the hub instructions will take...

    Also, I'm guessing you are targeting 160MHz... correct?

    Hub instructions will take 2 clocks, so you can fit six regular instructions between them. 160MHz is the clock goal. Cogs are pipelined so instructions take 1 clock.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • Bill HenningBill Henning Posts: 6,445
    edited 2006-11-25 - 07:40:29
    Very cool!

    16 cogs please with 128k!

    Are you also adding more ROM space?

    Plus the enhancements to the video shift registers and timers I suggested earlier, and the I/O strobes.

    oohhh the things we will be able to do!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    Chip....

    Are you going to stay in a 40 pin dip package, or going to say plcc 84 to get the 'port b' on-line? Having both choices would be good; and it could be a packaging option.

    If you are going 84 pin, we can get the port strobes without sacrificing general purpose I/O pins!!!!!!

    are branches still going to take effectively one clock if taken / two if not?

    If you need more cycles, how about adding delay slots, so that we can use the otherwise wasted cycles?
    Chip Gracey (Parallax) said...
    Bill Henning said...

    Hmm... i surface from consulting work, and face a tough question...

    Before I can intelligently decide which way to vote, I'd like to know how many cycles the hub instructions will take...

    Also, I'm guessing you are targeting 160MHz... correct?


    Hub instructions will take 2 clocks, so you can fit six regular instructions between them. 160MHz is the clock goal. Cogs are pipelined so instructions take 1 clock.
    www.mikronauts.com / E-mail: mikronauts _at_ gmail _dot_ com / @Mikronauts on Twitter
    RoboPi: The most advanced Robot controller for the Raspberry Pi (Propeller based)
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 22,325
    edited 2006-11-25 - 07:42:42
    Chip,

    Before I consider the ramifications of each: 'still 512 longs per cog, right?

    Thanks,
    Phil
    “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away. -Antoine de Saint-Exupery
  • Ian CameronIan Cameron Posts: 2
    edited 2006-11-25 - 07:47:03
    Option·1: 16 cogs with 128KB of hub RAM. Hub access once every 16 clocks.
    Option·2: 8 cogs with 256KB of hub RAM. Hub access once every 8 clocks.
    Option 1, it has more of both! I feel more guilty about using an extra cog than RAM.

    Ian.
  • cgraceycgracey Posts: 11,130
    edited 2006-11-25 - 07:52:42
    Right.
    Phil Pilgrim (PhiPi) said...
    Chip,

    Before I consider the ramifications of each: 'still 512 longs per cog, right?

    Thanks,
    Phil
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • Bill HenningBill Henning Posts: 6,445
    edited 2006-11-25 - 08:03:26
    Chip, the more I think about it, the more I want 16 cogs / 128K

    Consider tight loops... once sync'ed to the hub "rotor", if I understand your intentions...


    tightloop rdbyte b, ptr ' 2 cycles

    ' WE HAVE ROOM FOR 13 INSTRUCTIONS!!!!!!
    ' FANCY VM's here we come!!!!
    ' FANCY VGA MODES TOO!!!

    djnz somecount, #tightloop ' 1 cycle when taken

    SPIN/FORTH/large model·etc simple primitives could execute at 10MIPS (160MHz/16 clocks per hub access)

    think of the fancy video drivers... "real" sprites become possible!

    Post Edited (Bill Henning) : 11/25/2006 8:07:52 AM GMT
    www.mikronauts.com / E-mail: mikronauts _at_ gmail _dot_ com / @Mikronauts on Twitter
    RoboPi: The most advanced Robot controller for the Raspberry Pi (Propeller based)
  • Bill HenningBill Henning Posts: 6,445
    edited 2006-11-25 - 08:07:40
    Not to mention...

    16 cogs * 160 mips = 2,560 mips = 2.56bips!
    www.mikronauts.com / E-mail: mikronauts _at_ gmail _dot_ com / @Mikronauts on Twitter
    RoboPi: The most advanced Robot controller for the Raspberry Pi (Propeller based)
  • potatoheadpotatohead Posts: 9,715
    edited 2006-11-25 - 08:09:12
    I'm with Andre, build both.

    But, build the larger RAM model first. If the current speed / RAM limitation holds, and I think it will, then we will once again be struggling with on-chip RAM.

    One of the biggest strengths is the all in one chip approach. Larger RAM space = more one chip applications.

    The higher speed HUB access means faster big mem code. That tells me one COG could do a lot of things that do not require the highest speeds as well. We would end up getting a lot more per COG.

    Post Edited (potatohead) : 11/25/2006 8:14:08 AM GMT
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 22,325
    edited 2006-11-25 - 08:18:25
    Chip,

    In that case, I'd say eight cogs with more RAM. My reasoning is this: Current programs that use close to eight cogs usually have some sort of interleaving going on -- similarly-programmed cogs taking turns (for performance reasons) doing the same thing. But 160 MIPS addresses the speed issue quite handily, freeing these cogs for other, more independent tasks. So, with better-optimized speed and cog usage, the resource limitation that will be most keenly felt is cog RAM. Therefore, hub access speed becomes paramount for overlay swapping and the like. And the eight-cog option delivers hub access at double the rate of the 16-cog option.

    -Phil
    “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away. -Antoine de Saint-Exupery
  • Bill HenningBill Henning Posts: 6,445
    edited 2006-11-25 - 08:44:23
    Dang it Phil, now I want both options for different uses!
    www.mikronauts.com / E-mail: mikronauts _at_ gmail _dot_ com / @Mikronauts on Twitter
    RoboPi: The most advanced Robot controller for the Raspberry Pi (Propeller based)
  • cgraceycgracey Posts: 11,130
    edited 2006-11-25 - 09:03:20
    Bill Henning said...
    Dang it Phil, now I want both options for different uses!
    Yeah, I was thinking along the same lines as Phil when I asked the question. It's been nagging me that with 16 cogs, hub memory would only be twice as fast as it is now (in MHz), while total chip MIPS would be 16x. This would greatly skew instructions to hub accesses, effectively starving cogs of needed memory bandwidth. By having only 8 cogs, but with twice the hub accesses, memory-intensive apps could run a lot faster. The need to gang cogs for memory-bandwidth reasons (like in video displays) would be·greatly reduced. You could still fit 6 instructions between hub accesses. Imagine trying to always come up with 14, though. That would rarely work out nicely.


    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • cgraceycgracey Posts: 11,130
    edited 2006-11-25 - 09:24:00
    Maybe when a cog is launched, its hub-access requirement could be stated, and then the launch would pass/fail based not just on whether or not a cog was available, but also on whether or not a requested-bandwidth hub slot was available. For example, you could have 1:4 being the highest, then 1:8, 1:16, and finally 1:32. Every program should use the lowest-possible setting. It would take only a bit of logic in the hub to negotiate the setup·requests and then serve them deterministically thereafter.

    0· 1· 2· 3· 4· 5· 6· 7· 8· 9· 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

    A (1:4)···· A·········· A·········· A·········· A·········· A·········· A·········· A
    ·· A·········· A·········· A·········· A·········· A·········· A·········· A·········· A
    ····· A·········· A·········· A·········· A·········· A·········· A·········· A·········· A
    ········ A·········· A·········· A·········· A·········· A·········· A·········· A·········· A

    B (1:8)················ B······················ B······················ B
    ·· B······················ B······················ B······················ B
    ····· B······················ B······················ B······················ B
    ········ B······················ B······················ B······················ B
    ··········· B······················ B······················ B······················ B
    ·············· B······················ B······················ B······················ B
    ················· B······················ B······················ B······················ B
    ···················· B······················ B······················ B······················ B

    C (1:16)······································· C
    ·· C·············································· C
    ····· C·············································· C
    ········ C·············································· C
    ··········· C·············································· C
    ·············· C·············································· C
    ················· C·············································· C
    ···················· C·············································· C
    ······················· C·············································· C
    ·························· C·············································· C
    ····························· C·············································· C
    ································ C·············································· C
    ··································· C·············································· C
    ······································ C·············································· C
    ········································· C·············································· C
    ············································ C·············································· C

    D (1:32)······································
    ·· D···········································
    ····· D········································
    ········ D·····································
    ··········· D··································
    ·············· D·······························
    ················· D····························
    ···················· D·························
    ······················· D······················
    ·························· D···················
    ····························· D················
    ································ D·············
    ··································· D··········
    ······································ D·······
    ········································· D····
    ············································ D·
    ··············································· D·
    ·················································· D·
    ····················································· D·
    ························································ D·
    ··························································· D·
    ······························································ D·
    ································································· D·
    ···································································· D·
    ······································································· D·
    ·········································································· D·
    ············································································· D·
    ················································································ D·
    ··················································································· D·
    ······················································································ D·
    ························································································· D·
    ···························································································· D·





    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.

    Post Edited (Chip Gracey (Parallax)) : 11/25/2006 9:40:30 AM GMT
  • Dennis FerronDennis Ferron Posts: 480
    edited 2006-11-25 - 09:24:57
    Go with the RAM. With the ability to generate video and talk to the mouse and keyboard, the Propeller can be used to make homebrew computers, but with only 32K of RAM it is a big limitation on the operating system that can be run and the kinds of languages that can be fit into it. IMO 256K is the point at which it goes from being a microcontroller to being a computer. Many more languages could be compiled to fit into 256K than 128K, and a true multitasking OS could be written.

    It would also be good for games. Games need space to store graphics, sounds, etc. With more RAM you could do more than 4 colors per tile, higher resolutions, and 2 video pages. And how about bigger maps for real time strategy games?

    I think we're going to be hitting the memory limitations faster than the speed limitations. If you have a lot of RAM but fewer processors, you can use some of that memory to make more memory hungry but faster algorithms to conserve processing power. (For example, using memory to store indexes for faster searches.) But, if you have a lot of processors and little RAM, once you run out of space, that's it, that's all there is. Then you have to try to squeeze bytes here and there, and you have to stop adding new features to the program.

    Not to cast stones but I think the people who want 16 cogs and less memory are reacting more from the emotional attraction of the idea than the concrete benefits vs. more RAM. You can use two Props to get 16 cogs if you need to manage that many independent processes but using 2 cogs (edit: I mean props) to get double the RAM doesn't work out so well because you can't share the RAM across them.

    The multiple cogs are just one of the things that makes the Propeller revolutionary. The "secret sauce" of the current propeller is not just the cogs, it's both the cogs and the generous amount of RAM, so increasing either is great.

    But having 64 I/O pins will be wonderful. How will you get that into a DIP package? Can you use the "giant" 64 pin DIP package like the Motorola 68000 used? Because I have a lot of sockets for those chips!

    Post Edited (Dennis Ferron) : 11/25/2006 9:28:37 AM GMT
  • Bill HenningBill Henning Posts: 6,445
    edited 2006-11-25 - 09:31:17
    Hmmm...

    For he 16 cog / 128k case:·· The cycles would not be wasted on very CISCy vm's, and each cog would still get twice the HUB bandwidth cogs get now - and there would be 16 cogs...

    on the other hand...

    Chip, I'd be curious to see how many cog's you'd need to do the tiled 1280x1024 vga mode with both approaches, compared to the current approach.

    Also, I have an idea for another video mode.

    4 bits per pixel

    instead of passing in a long with the four six bit color lookup values, the long is used as the base address of a 16 entry color lookup table in cog memory (or dare·I dream... a 256 entry LUT in hub space?)
    Chip Gracey (Parallax) said...
    Bill Henning said...
    Dang it Phil, now I want both options for different uses!
    Yeah, I was thinking along the same lines as Phil when I asked the question. It's been nagging me that with 16 cogs, hub memory would only be twice as fast as it is now (in MHz), while total chip MIPS would be 16x. This would greatly skew instructions to hub accesses, effectively starving cogs of needed memory bandwidth. By having only 8 cogs, but with twice the hub accesses, memory-intensive apps could run a lot faster. The need to gang cogs for memory-bandwidth reasons (like in video displays) would be·greatly reduced. You could still fit 6 instructions between hub accesses. Imagine trying to always come up with 14, though. That would rarely work out nicely.


    www.mikronauts.com / E-mail: mikronauts _at_ gmail _dot_ com / @Mikronauts on Twitter
    RoboPi: The most advanced Robot controller for the Raspberry Pi (Propeller based)
  • nutsonnutson Posts: 240
    edited 2006-11-25 - 09:35:40
    If a choice needs to be made, I·back the 8 Cog / 256MB ram option. As Phil mentioned, at 160MIPs it is possible·to multiplex all·low speed tasks as serial interfaces, keyboard, I2C into one Cog using Bill's·method.·A·single·Cog can handle a·video or·VGA·output channel,·we·would still have 6 x 160 = 960 MIPS left (!) for·data·processing·tasks. No need for more Cogs. Program·and·display memory·size and access time will become the limiting factor.· Example image processing: a·modest grafical processing/display area of 256·x 256·x 8 bit takes·already·64KByte.

    Producing multiple versions would need·volumes·to·ramp up steeply. The·overhead cost for a version, especially the masks for the stepper (you need 25-30 of them for a version!) are significant in·submicron processes..

    Nico Hattink
  • cgraceycgracey Posts: 11,130
    edited 2006-11-25 - 09:54:27
    To do the 1280x1024 vga tile driver, it would take only one cog in the future chip·if it had 8 cogs. If it had 16 cogs, it would require two cogs·because of the relative hub bottleneck. In the current chip, it takes 3 cogs.
    Bill Henning said...

    Chip, I'd be curious to see how many cog's you'd need to do the tiled 1280x1024 vga mode with both approaches, compared to the current approach.

    Also, I have an idea for another video mode.

    4 bits per pixel

    instead of passing in a long with the four six bit color lookup values, the long is used as the base address of a 16 entry color lookup table in cog memory (or dare·I dream... a 256 entry LUT in hub space?)· That sounds good. There might need to be·a FIFO though, to overcome the conchunkulation between the hub and video timing.


    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.

    Post Edited (Chip Gracey (Parallax)) : 11/25/2006 10:01:22 AM GMT
  • Jim CJim C Posts: 76
    edited 2006-11-25 - 11:50:57
    I was first for the 16-cog option. But seeing the Big-Video driver can fit in one cog of the 8-cog version, I'd vote for the 8-cog propeller that has more cog hub memory

    Jim C
  • inakiinaki Posts: 262
    edited 2006-11-25 - 12:17:05
    I vote for the 256K HUB memory/8 COGs.
    By the way, do you have any plan for a future (even distant) extension of the COG's space ?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
  • cgraceycgracey Posts: 11,130
    edited 2006-11-25 - 12:22:04
    inaki said...
    I vote for the 256K HUB memory/8 COGs.
    By the way, do you have any plan for a future (even distant) extension of the COG's space ?


    Yes, but it would be a 64-bit version with up to 64K longs per cog. This would be best for 90nm or smaller technology.

    The only way we could augment the current architecture's cog RAM would be to have switchable banks, say in the $000-$0FF region.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.

    Post Edited (Chip Gracey (Parallax)) : 7/5/2007 3:46:33 AM GMT
  • GavinGavin Posts: 134
    edited 2006-11-25 - 13:02:25
    8 faster cogs, more memory for OS's and more bits for colour. Any advantage to going to only 12 cogs and not 16?
    Perhaps an external high speed memory bus? Could be used for prop to prop coms and/or memory.
    64 I/O

    It sounds like faster cogs will allow better video on less cogs, freeing up more cogs for other stuff anyway.

    Gavin
  • parskoparsko Posts: 501
    edited 2006-11-25 - 13:43:30
    Chip Gracey (Parallax) said...
    This would be best for 90nm or smaller technology.

    Chip, I'm working my arse off to help get you guys to 90nm! Now I know what you're using to print your litho! Now that I know, my imagination is running wild at where Parallax will be able to go with this. Once you guys start scaling this beautifully simply design, we're going to get some rediculous speeds out of the Propeller. I'm happy I got in on the scene at day two!

    From my perspective, 8 cogs, 256k RAM, faster HUB access.

    In order to control each cylinder of a V10 engine running at 10,000rpm (600us per control cycle), you'd need the RAM for large lookups, and the faster HUB access to get that data. I see 16 COGs as a daunting task to try to manage in software. Propeller Assembly is so wicked fast and efficient, that if you doubled it's speed and hub access, we'd all be sitting pretty! Plus, as was said, with the larger RAM and speed, one could write software routines to do things that would otherwise be taking up a cog or two, like the floating math.

    .........Wow, you guys are getting this sort of speed out of these litho tools, wooooowwwwwwwwshocked.gif You are just barely into the lasers! freaked.gif freaked.gif

    -Parsko
  • M. K. BorriM. K. Borri Posts: 278
    edited 2006-11-25 - 14:15:56
    I would have to vote for cogs overall. Especially since I'm, going to bet that a future prop tool will have the LMM ASM as an option, so memory is less of a concern [noparse];)[/noparse]
  • DuckHeadDuckHead Posts: 7
    edited 2006-11-25 - 14:17:16
    I back up the 8 cog version, but please introduce some fast way·to send data between cogs. I have to agree with the rest of the crowd; this is an awsome controller. I love it and can't wait to see what you come up with in the future. Keep up the good work!!
    roll.gif·
  • James LongJames Long Posts: 1,181
    edited 2006-11-25 - 14:31:07
    I would think.....you could produce the 8 cog version first. Then ......if you think there is a demand.....produce the 16 cog version.

    The other way around is not financially smart. Although I'm sure the production cost are similar.

    I agree with parsko.....managing 16 cogs would be a chore...unless the software was updated to show available cogs and such.

    These are only my opinions, so take them as that.

    If the chip was fast enough with eight cogs.....it would be totally possible to cog swap for other processes.

    I would go with 8 cogs, faster, with more hub memory.

    My opinion,

    James L
  • CJCJ Posts: 470
    edited 2006-11-25 - 14:38:50
    I would go with the 8cog/256K model, with that extra hub access speed, you won't need as many cogs to do stuff like video (the current cog hog). I can only imagine what everyone will come up with.

    edit: looking at the numbers that model would be getting hub access at 20Mhzshocked.gif (smells like a logic analyzer to me)

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Who says you have to have knowledge to use it?

    I've killed a fly with my bare mind.

    Post Edited (CJ) : 11/25/2006 2:53:23 PM GMT
  • ALIBEALIBE Posts: 299
    edited 2006-11-25 - 15:04:12
    Chip,

    Many times I have found myself needing extra processing power and then immediately extra memory to support the extra processing. I would tend to think of the needs going hand in hand.· However, if I were to pick one vs the other, I would pick more cogs and more i/o pins vs more memory and current # of cogs.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "any small object, accidentally dropped, goes and hides behind a larger object."


    ALIBE - Artificial LIfe BEing. In search of building autonoumous land robot
    http://ALIBE.crosscity.com/
    ·
  • M. K. BorriM. K. Borri Posts: 278
    edited 2006-11-25 - 15:29:04
    Funny how when we got our Props we probably were all like "This is an actual computer, I'll never fill it up".... and that was what, six months ago?

    640k should be enough for everybody, right? [noparse];)[/noparse]
Sign In or Register to comment.