Ditto for hippy's suggestion. Chip had mentioned possibly having some kind of decoder for supporting USB and Ethernet reception and tweeking the video generator for easier encoding of the same. The encoder/decoder could also be used to support Prop to Prop high speed networking with low coding overhead (important given the limited cog memory).
How big would a finished chip be with an 8mm sq die?, is the extra 4 sq mm's mega bucks more? and·which manufacture·or what chip(s) are you trying to compete with?
The 16-cog die·would be 64 square millimeters·(8x8), at a cost of about $.10/mm, or $6.40 before packaging and yield loss. The current·die is 52 square millimeters at a cost of about $.05/mm, or $2.60 before packaging and yield loss. So, it would·be a lot more money.
We are not aiming to compete with any particular manufacturer or·chip. We're just doing what we·feel inspired to do.
I think of our competition·as being·everything that misdirects people from the joy that comes from working, creating, and learning. I could list a hundred things that contribute to this, but the mass media would be near the top. When we moved, I got rid of all our TV's. What a better home! Now, if I could just stop reading·the "news" on the internet.
Well it was a nice ride anyway. Don't suppose you would consider just releasing a double-speed Prop-I or give a little time to making the 64 IO Prop-I work? Clearly Prop-II won't be available for a long time. Meanwhile, we can continue making the best of what we have now. Thanks for the repositioning.
Here's a thought, although one that should maybe remain unspoken:
Providing every resource to every cog for its exclusive use will always result in wasted silicon. After all, how many Prop I apps use eight video generators? Or 16 counters? How many Prop II apps would need 64 analog inputs? Would it make any sense for adjacent cogs to share some of these resources? A COGNEW would have to include, as an argument, which shared resources it needed, and the hardware would assign either the first cog that satisfied the request (first fit), or the cog with the least available resources that satisfied the request (best fit). Once a cog claimed a shared resource from the "easement" with its neighbor, it would have exclusive use of that resource until it stopped.
Or does this just open a huge can of worms that would escalate your tech support headaches? For example, a successful allocation may depend on the order in which the requests were made.
(BTW, none of these issues shold be a source of disappointment for anyone. Budgeting, whether it's silicon or tax dollars is just a fact of life that provides opportunities for creativity, as opposed to gluttony.)
Chip, does the number of cogs have to be a power of 2? If 16 is too expensive, how about 12? My outdoor robot is using 7 cogs now and a GPS and CMU Cam will probably add 2 more. Current road ahead is to leave out the GPS or CMU Cam or do serious surgery on the floating point object to get everything into one cog instead of 2. This robot does not use a subsumption architechure, but if that architechure is used, a cog per behavior is a clean implementation.
1. You are right LMM is not "parallel power processing". However I'm not sure the Prop is targeted at that anyway. But as a micro-controller having GOGS running PASM at full speed handling peripheral hardware/tasks, instead of messing with interrupts, is blissful. Then you often only need one SPIN/LMM main program strolling around doing the bulk of the application.
2. I agree.
Hippy:
Much disappointment for sure. I also LONG for that transputer like communication between COGS intra- and inter- Prop chip.
Parallax:
I love this open discussion but I'm starting to think about the Osborne computer story. Osborne supposedly went bankrupt despite the highly successful Osborne I because they announced a much better Osborne II was coming causing sales of the first version to dry up as people waited.
Perhaps it's time to resume "quite mode" again.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Erik Friesen said...
Hmm. Do I understand the above to mean a semi- pic style banksel?
Well whatever but if you do it that way please make it easy to keep straight.
It would actually simplify your code by keeping INP (really INA, INB, INC, IND...) at a single register address. Imagine that your assembly app needs to·update a PARameter location with a certain pin:
······ rdlong·· pin,PTRA ······ setport· pin············ 'set INP window·to INA, INB, INC, or IND
If you didn't have this port indirection, your code might look like this:
······ rdlong·· pin,PTRA ······ test···· pin,#$20··· wc· 'set S in mov instruction to INA, INB, INC, or IND ·if_c· add····· loop,#3········ '(this only affects the single move) ······ test···· pin,#$40··· wc ·if_c· add····· loop,#6 ······ nop
So, it helps genericize code that talks to ports selected at run-time. The penalty for compile-time port code is that you'd have to do a 'setport' instruction to select the port.
If 16 cogs are so important to so many people, we can put them in there. We'll certainly make accommodation in the architecture for even more than that.
For me, 16 cogs are less important than having the chip be reasonably inexpensive. If the chip cost goes above, say, $25, it suddenly becomes less attractive.
...Providing every resource to every cog for its exclusive use will always result in wasted silicon. After all, how many Prop I apps use eight video generators? Or 16 counters? How many Prop II apps would need 64 analog inputs? Would it make any sense for adjacent cogs to share some of these resources? A COGNEW would have to include, as an argument, which shared resources it needed, and the hardware would assign either the first cog that satisfied the request (first fit), or the cog with the least available resources that satisfied the request (best fit). Once a cog claimed a shared resource from the "easement" with its neighbor, it would have exclusive use of that resource until it stopped...
If we were to split up resources, this sounds like the way to do it. I don't think we will, though, because while saving silicon, it would make the chip more complex. And sometimes its fun to have video generators in every cog·because you can do things like renderers that require concurrency.
There's always the option Andre' proposed, two versions where a 16 cog version has 128k and a 8 cog version has 256k. It causes all sorts of nightmares with code compatibility for objects relying on deterministic programming because each would have a different HUB window scheme (once per 16 clocks vs once per 8 clocks), but it is a potential path.
Chip Gracey (Parallax) said...
If 16 cogs are so important to so many people, we can put them in there. We'll certainly make accommodation in the architecture for even more than that.
I don't think that 16 is a magic number or anything, but the more the better!
I know it's kinda late, but one more vote for "Clean Slate". However, with one major caveat. There should be minimal differences between SPIN on the Prop II and SPIN on the Prop I. The differences should be small enough that an utility could be made to convert Prop I code to Prop II code. In other words, don't suddenly decide that this chip will be programmed in C, abandoning SPIN (not likely to happen, but...). That being said, improvements to SPIN, like conditional compiling, data structures, etc., would be welcome.
Why not have shared cog memory? In other words, each cog gets a chunk of memory when initialized. For example, 16 cogs all running at the same time would split 2 MB of SRAM for 128K each. However, if the program only uses one cog out of the 16, that cog would have access to all of the 2 MB. This is, of course, very tricky, but the programmer would simply need to know that if they anticipate initializing one or more extra cogs later in the program, they shouldn't use memory past a certain point. That point would be moot simply with the use of a flag. By default, set a flag to indicate that each cog only gets access to 128K. Allow advanced programmers to change this flag as needed.
This might allow better graphics and sound and even the possibility of an OS to be made, all currently impossible (or extremely difficult) with the current available memory.
You know, it is really strange to be able to sit down and give suggestions to the engineer of a product. This is creeping me out! [noparse]:D[/noparse] You're supposed to be chained to a desk in your fluorescent-lit dunge.. ur.. office with no customer contact allowed. I wonder what would happen if intel, nVidia, Microchip, Atmel, etc., would allow this.
MarkS said...
...I wonder what would happen if intel, nVidia, Microchip, Atmel, etc., would allow this.
We·would all be really interested in learning about how those nVidia chips work. They perform many complex transforms, something like the equivalent of a 1,000 line program, for every pixel on the screen, at 60Hz update rates. Can you imagine that?
...I wonder what would happen if intel, nVidia, Microchip, Atmel, etc., would allow this.
We would all be really interested in learning about how those nVidia chips work. They perform really complex transforms, something like the equivalent of a 1,000 line program for every pixel on the screen, at 60Hz update rates. Can you imagine that? It's hard.
I've tried thinking about that. I have a book, "Computer Graphics: Principles and Practice" which I believe was originally printed in 1990. It's almost a joke when they describe the massive hardware systems needed to display 10,000 Gouraud-shaded 100 pixel triangles per second. What can be done now with a single chip is mind blowing and only 18 years have passed.
Is a doubling of the hub RAM and ROM (if I read the layout correctly) a necessary prerequisite for 16 cogs? Also, does the die have to be square, or can it be longer in one dimension?
Is a doubling of the hub RAM and ROM (if I read the layout correctly) a necessary prerequisite for 16 cogs? Also, does the die have to be square, or can it be longer in one dimension?
-Phil
In those layouts, the 8 cog version has 128KB of hub ram, while the 16 cog version has 256KB.
Generally, squarish is better, as it fits into the smallest package·and there is the least scribe-line loss. ·
The one thing that we plan to add is function pointers, so that you can indirectly call PUBs/PRIs.
Yippie !!!··And when you can pass an object around as a parameter
to a method and be able to call its methods by '.' notation and access
'protected' variables directly rather than with accessor methods,
I'll be really impressed.
A pseudo-DMA function would also be much appreciated since we
can't access pins very fast or much faster even with the Prop-II.
Pseudo-DMA would let us specify a buffer and maximum access
length and provide start, stop, and terminal-count flags in a
control/status register. This could be a mode in the counter registers.
Those are the guts of the cogs which are being defined now. We've got all the elemental blocks designed and laid out already, we just need to nail down the logic for the cog guts. Each cog has a 3-read/1-write port ram that is long and skinny. The guts will mate up against the skinny end. In the 8 cog picture, those 'mem_cog' boxes need to rotate 90 degrees.
Comments
We are not aiming to compete with any particular manufacturer or·chip. We're just doing what we·feel inspired to do.
I think of our competition·as being·everything that misdirects people from the joy that comes from working, creating, and learning. I could list a hundred things that contribute to this, but the mass media would be near the top. When we moved, I got rid of all our TV's. What a better home! Now, if I could just stop reading·the "news" on the internet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
- just my opinion
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Here's a thought, although one that should maybe remain unspoken:
Providing every resource to every cog for its exclusive use will always result in wasted silicon. After all, how many Prop I apps use eight video generators? Or 16 counters? How many Prop II apps would need 64 analog inputs? Would it make any sense for adjacent cogs to share some of these resources? A COGNEW would have to include, as an argument, which shared resources it needed, and the hardware would assign either the first cog that satisfied the request (first fit), or the cog with the least available resources that satisfied the request (best fit). Once a cog claimed a shared resource from the "easement" with its neighbor, it would have exclusive use of that resource until it stopped.
Or does this just open a huge can of worms that would escalate your tech support headaches? For example, a successful allocation may depend on the order in which the requests were made.
(BTW, none of these issues shold be a source of disappointment for anyone. Budgeting, whether it's silicon or tax dollars is just a fact of life that provides opportunities for creativity, as opposed to gluttony.)
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
Post Edited (Phil Pilgrim (PhiPi)) : 8/28/2008 9:42:23 PM GMT
John Abshier
1. You are right LMM is not "parallel power processing". However I'm not sure the Prop is targeted at that anyway. But as a micro-controller having GOGS running PASM at full speed handling peripheral hardware/tasks, instead of messing with interrupts, is blissful. Then you often only need one SPIN/LMM main program strolling around doing the bulk of the application.
2. I agree.
Hippy:
Much disappointment for sure. I also LONG for that transputer like communication between COGS intra- and inter- Prop chip.
Parallax:
I love this open discussion but I'm starting to think about the Osborne computer story. Osborne supposedly went bankrupt despite the highly successful Osborne I because they announced a much better Osborne II was coming causing sales of the first version to dry up as people waited.
Perhaps it's time to resume "quite mode" again.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
You said.
"" Providing every resource to every cog for its exclusive use will always result in wasted silicon. ""
Yes. It is corect but.
Resource to every cog is just power in Prop in parallel procesing.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
Sapieha
······ rdlong·· pin,PTRA
······ setport· pin············ 'set INP window·to INA, INB, INC, or IND
loop·· mov····· state,INP
······ shr······state,pin
······ and····· state,#1
······ wrlong·· state,PAR[noparse][[/noparse]1]
······ jmp····· #loop
If you didn't have this port indirection, your code might look like this:
······ rdlong·· pin,PTRA
······ test···· pin,#$20··· wc· 'set S in mov instruction to INA, INB, INC, or IND
·if_c· add····· loop,#3········ '(this only affects the single move)
······ test···· pin,#$40··· wc
·if_c· add····· loop,#6
······ nop
loop·· mov····· state,INA
······ shr······state,pin
······ and····· state,#1
······ wrlong·· state,PAR[noparse][[/noparse]1]
······ jmp····· #loop
So, it helps genericize code that talks to ports selected at run-time. The penalty for compile-time port code is that you'd have to do a 'setport' instruction to select the port.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
This is awesome .. what did you have for breakfast this morning ?.. Whatever it was - have more !
Regards,
John
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Necessity is the mother of invention'
Those who can, do.Those who can’t, teach.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
Not to mention they can be used for lots of things besides video. If they are there, people will use them, like the counters.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
I don't think that 16 is a magic number or anything, but the more the better!
I know it's kinda late, but one more vote for "Clean Slate". However, with one major caveat. There should be minimal differences between SPIN on the Prop II and SPIN on the Prop I. The differences should be small enough that an utility could be made to convert Prop I code to Prop II code. In other words, don't suddenly decide that this chip will be programmed in C, abandoning SPIN (not likely to happen, but...). That being said, improvements to SPIN, like conditional compiling, data structures, etc., would be welcome.
Why not have shared cog memory? In other words, each cog gets a chunk of memory when initialized. For example, 16 cogs all running at the same time would split 2 MB of SRAM for 128K each. However, if the program only uses one cog out of the 16, that cog would have access to all of the 2 MB. This is, of course, very tricky, but the programmer would simply need to know that if they anticipate initializing one or more extra cogs later in the program, they shouldn't use memory past a certain point. That point would be moot simply with the use of a flag. By default, set a flag to indicate that each cog only gets access to 128K. Allow advanced programmers to change this flag as needed.
This might allow better graphics and sound and even the possibility of an OS to be made, all currently impossible (or extremely difficult) with the current available memory.
Post Edited (MarkS) : 8/28/2008 10:45:56 PM GMT
Only reqest to Video counters > More run modes.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
Sapieha
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
The image on the left is a 6mm X 6mm proposed floorplan, while the image on the right is an 8mm X 8mm proposed floorplan.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe
IC Layout Engineer
Parallax, Inc.
If intel, nVidia, Microchip, Atmel, etc., would allow this. It had more customer friendy produkts.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
Sapieha
I've tried thinking about that. I have a book, "Computer Graphics: Principles and Practice" which I believe was originally printed in 1990. It's almost a joke when they describe the massive hardware systems needed to display 10,000 Gouraud-shaded 100 pixel triangles per second. What can be done now with a single chip is mind blowing and only 18 years have passed.
Well, the solution to this is to use a 45-nm process.
Is a doubling of the hub RAM and ROM (if I read the layout correctly) a necessary prerequisite for 16 cogs? Also, does the die have to be square, or can it be longer in one dimension?
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
Post Edited (Phil Pilgrim (PhiPi)) : 8/28/2008 11:16:44 PM GMT
Generally, squarish is better, as it fits into the smallest package·and there is the least scribe-line loss.
·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
On My opinion 16 cog version is beter solution.
One cuestion? What is blue areas?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
Sapieha
Shhh... That's where the magic is located. It's a secret.
to a method and be able to call its methods by '.' notation and access
'protected' variables directly rather than with accessor methods,
I'll be really impressed.
A pseudo-DMA function would also be much appreciated since we
can't access pins very fast or much faster even with the Prop-II.
Pseudo-DMA would let us specify a buffer and maximum access
length and provide start, stop, and terminal-count flags in a
control/status register. This could be a mode in the counter registers.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.