It will be able to use the additional cog ram (lut) as internal stack space which will save heaps of hub access, and therefore good speed improvement!!
Also, there is hub cache reading. This will buffer up the hub spin bytecode, again resulting in a good performance increase!!
Concurrent multiprocessor remains, IMHO, the most appropriate description. The Smart a pins will only reinforce that, as they too will do some processing and can be concurrent as COGS can.
Chips docs and the low level work can use COG, and there will be a niche use of that term because of it.
More general audience material can be whatever else may make sense.
Concurrent multiprocessor remains, IMHO, the most appropriate description. The Smart a pins will only reinforce that, as they too will do some processing and can be concurrent as COGS can.
Chips docs and the low level work can use COG, and there will be a niche use of that term because of it.
More general audience material can be whatever else may make sense.
Are the "smart pins" actually programmable or just configurable? In other words, is there an instruction set for the smart pin processors?
Concurrent multiprocessor remains, IMHO, the most appropriate description. The Smart a pins will only reinforce that, as they too will do some processing and can be concurrent as COGS can.
Chips docs and the low level work can use COG, and there will be a niche use of that term because of it.
More general audience material can be whatever else may make sense.
Are the "smart pins" actually programmable or just configurable? In other words, is there an instruction set for the smart pin processors?
I'd say configurable, but they may act more like programmable.
I would rather "cogs" be referred to as "cogs/cores" or just "cores". It was fine when the P1 was released. The world has changed now, and everyone looking at chips knows what cores are. (IMHO)
Would be interesting if the smart pins had a tiny specialised cpu. I wonder how much would fit in the same space???
I think Chip is still shooting for 64.
I've suggested 32 paired cells, but if Chip can pack in 64, (without lowering CODE) that allows 1 Rx and 63 Tx for example, which is certainly attention getting !
63 Frequency Counting or capture channels would also be attention getting.
If that does not fit, 1 Rx & 31 Tx is still impressive.
One question is how many customers do they gain, with any boost from 31 to 63 ?
You know, if smart pin logic size is an issue, maybe Chip could just support the smart pins on Port B. Leave Port A purely digital (like P1). Having to choose between 64 semi-smart pins and 32 digital + 32 very-smart pins, I think I'd rather have the latter.
Just thought I'd throw that out there.
Of course, it may all be academic. Chip may be able to cram it all in there in the end.
Whoops, SmartPins might be so advanced that they seem to reduce the previous need for more Cores.
Once this is FINAL, it really might be worth looking at a P2-Mini with same number of pins and 8 Cores?
Though IIRC Core die space is minimal so the overall die size savings might be minimal or too small to really save anything on production cost.
There are worse problems one could have
Hopefully 'programming' SmartPins can be made simple/logical enough to really attract and keep new users attention. I think this could be a real Killer Feature that will set P2 apart from the crowd.
Whoops, SmartPins might be so advanced that they seem to reduce the previous need for more Cores.
...
Hopefully 'programming' SmartPins can be made simple/logical enough to really attract and keep new users attention. I think this could be a real Killer Feature that will set P2 apart from the crowd.
I believe a good Smart Pin cell will create considerable overlap with FPGA uses, which is a good place for Parallax to head.
Low end FPGA tend to not fall in price, but keep adding more (rather like USB flash drives), and low end FPGAs/high end CPLDs are often used to augment MCUs with 'many more of one peripheral' case.
Once this is FINAL, it really might be worth looking at a P2-Mini with same number of pins and 8 Cores?
Though IIRC Core die space is minimal so the overall die size savings might be minimal or too small to really save anything on production cost.
A Same pins variant does not make so much sense, as it does not offer new mechanical design choices, and the PAD ring broadly sets the die size.
There is nothing to stop a large customer completing a design that uses (say) 6 COGs and 128K ram and 28 pins, from getting Parallax to do a variant,
That just needs enough zeros on the order
Parallax could look at a BGA version & should discuss that with OnSemi.
Guys even in the p1 we have parallel processors, not concurrent ones.
Even if the guy programming the module uses it or not. You can use the cogs concurrent (as often done) but you do not have to.
Example @Lonesocks SPI drivers for FSRW.
Fast transfer of the hub buffer into the cog (thanks @kuroneko) then writing to SD in parallel execution while main process can continue. Write behind in parallel not concurrent.
Same on reading. Read ahead. So you request Sector x and get it delivered. While you are processing it the driver will in parallel fetch x+1. And is able to deliver it to you in case you need the next sector as fast as you can transfer between cog and hub (thanks again @kuroneko)
This is just one example.
Working with parallel Processors needs a different mindset. And this will be more important on the P2. It is way different from threads.
A lot of examples use blocking calls to the cogs. Put something into a mailbox and WAIT for having it done. Signaled by 0 in the mailbox, or so.
The way it was explained to me is concurrency refers to individual work units able to perform their work at the same time.
Parallel is the same, but with work units working on a related problem set in place of a single work unit doing the same.
A concurrent multiprocessor is one capable of many kinds of differentiated processing happening at the same time. Parallel processors tend to be more about distributing a similar task across many processors.
GPU shaders would be a great example of parallel processing. A multiuser system handling lots of different tasks, depending on what the users wanted would be a concurrent multiprocessing environment, when more than one processor is available.
That's info I got in answer to some of my multiprocessing related questions when learning about the SGI multiprocessor systems and how they differed from a GPU, etc... and those were how SGI broke it down for me.
So I've always seen a distinction along the lines of symmetry. Very similar, or even the same bit of code well distributed across processors is parallel. More differentiated code and potentially tasks is more generally known as concurrent.
Propellers do both kinds of things, which is why I said superset.
Early on, Andre' apparently having come from a similar school, arrived at a similar place, also calling the Prop a concurrent multiprocessor.
Good grief indeed!
Edit: I may have also see this distinction made a lot longer ago too. I'm off to browse some Rockwell and MOS datasets to check.
Good grief. I have been searching for the difference between "concurrent" and "parallel" for years.
I still don't get it.
"concurrent" as in things happening at the same time.
"parallel" as in things happening at the same time.
Durp, durp...
There is no way I can do "concurrent" or "parallel" on a single CPU. It's physically impossible.
Yeah, we can run threads and multiple processes on a single CPU. So? As far as my code is concerned it's the same.
I am probably completely wrong here but didn't the Transputer break up the same task amongst other Transputers as opposed to concurrently working on some other task?
Good grief. I have been searching for the difference between "concurrent" and "parallel" for years.
I still don't get it.
"concurrent" as in things happening at the same time.
"parallel" as in things happening at the same time.
Durp, durp...
There is no way I can do "concurrent" or "parallel" on a single CPU. It's physically impossible.
Yeah, we can run threads and multiple processes on a single CPU. So? As far as my code is concerned it's the same.
I am probably completely wrong here but didn't the Transputer break up the same task amongst other Transputers as opposed to concurrently working on some other task?
concurrent - multiple (usually unrelated) tasks being performed at the same time on multiple processors
parallel - multiple related tasks being performed at the same time on multiple processors.
I see it as parallel programming needs special tools to turn an algorithm into a parallel program - this is where thing generally get stuck, trying to break code down into parallel programs.
Map/reduce gets to a parallel processing solution in a Hadoop cluster. What is done with most multiple processor systems is just concurrent processing.
At least that's how my little brain wraps around it.
A single Transputer chip only had one processor. So down at that hardware level it was no more parallel than most other micro-processors.
The language supplied with the Transputer, Occam, had parallelism built right into it's syntax. For example:
SEQ
doSomething()
doSomethingElse()
PAR
doSomething()
doSomethingElse()
Those statements in SEQ would execute in sequential time order as normal. Those statements in PAR could be run in parallel. You actually had to use SEQ, there was no default assumption of sequential execution as in most other languages. Of course on a single processor those parallel statements get time sliced so it does not buy you performance. It does make program construction a lot easier though.
But then comes the transputer magic. With it's fast chip to chip data links it could spread your code around a number of processors. Those PAR parts could actually be running parallel. Significantly, the very same code could run on one processor or many.
All this is based on the idea of Communicating Sequential Programming (CSP) as laid down by Tony Hoare in 1978.
The modern day XMOS chips carry on this tradition with the XC language. A C like languages that enforces CSP. But the xcore chips do actually have multiple cores.
To say it a little different: by looking to the Occam code, there is no visible difference between concurrent and parallel. The different processes communicate via virtual channels. The scheduler activates code when a channel is filled. So the processes run concurrently on one processor. But, the virtual channel can be connected to a physical connection to another transputer chip and now the processes are distributed on two processors, so the run partly in parallel.
Odd. Most people on discovering the Transputer went "Wow", and proceeded to dream up applications for vast arrays of them. The company I worked for put together a 1024 node Transputer array for analysis of radar returns from a 3D phased array radar.
Had Atari managed to get their Transputer workstation off the ground things might have gone differently.
But as you say. The Transputer and Occam and the whole CSP concept was just too much for the world at the time. Things have indeed changed, now clustering everything is all the rage and Google develops the Go language to help do so. Which is basically a modern day Occam
And yes, the Transputer was pricey. By the time the price came down it could no longer compete on raw performance. Sadly the T9000 successor failed to materialize.
Ah well. We have XMOS today to fly the flag of the Transputer concept.
Been out of the prop 2 loop for about a year. Seems Parallax is still developing the prop 2. A few questions:
* Projected release date?
* Projected price?
* Summary of specs for Jan. 2017?
* Party/convention date?
Internal strife at Arduino seemed to chill excitement over their products, especially the Arduino Due ($49.95). For a while, the Due was no longer being sold by Arduino.cc and some vendors were out of stock! That left the field wide open for a 32-bit mcu. Well, the Teensy 3.5 & 3.6 filled that void. It's breadboard friendly, peripheral rich, faster than Due, low priced ($29.95), and Arduino IDE compatible. It hit the mark.
IMHO, Parallax missed the boat. I hope the Prop 2 nears it's introduction to the market.
Been out of the prop 2 loop for about a year. Seems Parallax is still developing the prop 2. A few questions:
* Projected release date?
* Projected price?
* Summary of specs for Jan. 2017?
* Party/convention date?
Internal strife at Arduino seemed to chill excitement over their products, especially the Arduino Due ($49.95). For a while, the Due was no longer being sold by Arduino.cc and some vendors were out of stock! That left the field wide open for a 32-bit mcu. Well, the Teensy 3.5 & 3.6 filled that void. It's breadboard friendly, peripheral rich, faster than Due, low priced ($29.95), and Arduino IDE compatible. It hit the mark.
IMHO, Parallax missed the boat. I hope the Prop 2 nears it's introduction to the market.
Not certain on release date, but all hardware design has settled now. Just doing some new layout to reflect the current schematic and waiting to start the synthesis. I don't know what it will cost, but I imagine around $6-$10. The die is a huge 72 square mm. There's a lot in there. As far as spec's go, if you go to the "Prop2 FPGA Files!!!" thread in this forum, click on the doc's link in the first post. It's pretty much all in there. Party? Yes, when that day comes.
We've missed a lot of boats, so far, and may miss a few more. I think the Prop2 will be uniquely useful and productive to use. It's meant to be fully understood, inside and out, and it can breathe a lot of random signals.
The 555 timer chip was introduced in 1971. A very simple little chip. But very flexible. It's simple arrangement of analogue comparator, flip-flop etc could be configured for all kinds of different uses. Even today, in the face of thousands of other "smarter" chips including micro-controllers and such, the humble 555 sells by the billion.
I sometimes wonder how it would be if the 555 did not come out in 1971 but was instead launched in recent years. Would all the kids of today look at it and sneer, laughing at how crude it is compared to a AVR Tiny whatever? Not understanding what can be done with it. Resulting in no 555 sales.
All of which makes me sometimes wonder if the P1 is like that 555 that was introduced into the wrong era? It's not a high speed compute engine, it's not a SoC loaded with hardware peripherals, it is a very simple and flexible device that the kids don't know what to do with.
I was introduced to the 555 about four years later, if it wasn't for for RS and an LED counter module kit needing a clock input, it would have been later than that.
kid's today don't have a lot of extra time for a hobby, between the required study's and sports activities, or even a forty-five minute or more bus ride like my children had. The free time that was left in the day wasn't used for brain work, except for the mounds of homework to be accomplished.
I wasn't an honor student, so I found the time to pursue what interested me.
My oldest grandchild is in high school, and has the long hair thing going on, it just seems a little anoying to me, he and his younger brother just don't wear it well. It's either in there face, or up in a bun like a sumo wrestler.
This old hippie will try the P2, party yes, and I will try a slice of vegan pizza and what ever you wash that down with.
The hippie movement has made a resurgence, nothing like the original. Just a superficial copy, not willing to understand the way things work. If it's broke, toss it out and get the latest model.
Just to add one thing, I think "Blocky" is great, someone new to robotics can get in there and move a new robot around, and feel like they really accomplished something, which they have. And if they want to get into the the real nuts and bolts of it, they can do that to.
Comments
It will be able to use the additional cog ram (lut) as internal stack space which will save heaps of hub access, and therefore good speed improvement!!
Also, there is hub cache reading. This will buffer up the hub spin bytecode, again resulting in a good performance increase!!
Chips docs and the low level work can use COG, and there will be a niche use of that term because of it.
More general audience material can be whatever else may make sense.
If it were me, nope. Really interested parties can join the fun and learn lots on an FPGA.
When we arrive at a tapeout, shuttle, etc... Maybe start thinking about it on a working proto.
Maybe it gets complied to PASM too.
I would rather "cogs" be referred to as "cogs/cores" or just "cores". It was fine when the P1 was released. The world has changed now, and everyone looking at chips knows what cores are. (IMHO)
Would be interesting if the smart pins had a tiny specialised cpu. I wonder how much would fit in the same space???
You know, if smart pin logic size is an issue, maybe Chip could just support the smart pins on Port B. Leave Port A purely digital (like P1). Having to choose between 64 semi-smart pins and 32 digital + 32 very-smart pins, I think I'd rather have the latter.
Just thought I'd throw that out there.
Of course, it may all be academic. Chip may be able to cram it all in there in the end.
The longer term worry is what impact Pin Logic Area may have on RAM size, and if that 512k is impacted then 32 smarter cells may be a choice.
<ducks for cover>
Personally, I don't see the point of doing SPI/I2C/serial with smart pins when there are plenty of cogs to spare for that.
ADC and DAC are nice though. Hopefully, there will still be USB and SDRAM support...
Once this is FINAL, it really might be worth looking at a P2-Mini with same number of pins and 8 Cores?
Though IIRC Core die space is minimal so the overall die size savings might be minimal or too small to really save anything on production cost.
There are worse problems one could have
Hopefully 'programming' SmartPins can be made simple/logical enough to really attract and keep new users attention. I think this could be a real Killer Feature that will set P2 apart from the crowd.
I believe a good Smart Pin cell will create considerable overlap with FPGA uses, which is a good place for Parallax to head.
Low end FPGA tend to not fall in price, but keep adding more (rather like USB flash drives), and low end FPGAs/high end CPLDs are often used to augment MCUs with 'many more of one peripheral' case.
A Same pins variant does not make so much sense, as it does not offer new mechanical design choices, and the PAD ring broadly sets the die size.
There is nothing to stop a large customer completing a design that uses (say) 6 COGs and 128K ram and 28 pins, from getting Parallax to do a variant,
That just needs enough zeros on the order
Parallax could look at a BGA version & should discuss that with OnSemi.
ie Same die, more than one package is common.
Even if the guy programming the module uses it or not. You can use the cogs concurrent (as often done) but you do not have to.
Example @Lonesocks SPI drivers for FSRW.
Fast transfer of the hub buffer into the cog (thanks @kuroneko) then writing to SD in parallel execution while main process can continue. Write behind in parallel not concurrent.
Same on reading. Read ahead. So you request Sector x and get it delivered. While you are processing it the driver will in parallel fetch x+1. And is able to deliver it to you in case you need the next sector as fast as you can transfer between cog and hub (thanks again @kuroneko)
This is just one example.
Working with parallel Processors needs a different mindset. And this will be more important on the P2. It is way different from threads.
A lot of examples use blocking calls to the cogs. Put something into a mailbox and WAIT for having it done. Signaled by 0 in the mailbox, or so.
That is indeed concurrent execution.
But you do not have to do that.
Enjoy!
Mike
I still don't get it.
"concurrent" as in things happening at the same time.
"parallel" as in things happening at the same time.
Durp, durp...
There is no way I can do "concurrent" or "parallel" on a single CPU. It's physically impossible.
Yeah, we can run threads and multiple processes on a single CPU. So? As far as my code is concerned it's the same.
Parallel is the same, but with work units working on a related problem set in place of a single work unit doing the same.
A concurrent multiprocessor is one capable of many kinds of differentiated processing happening at the same time. Parallel processors tend to be more about distributing a similar task across many processors.
GPU shaders would be a great example of parallel processing. A multiuser system handling lots of different tasks, depending on what the users wanted would be a concurrent multiprocessing environment, when more than one processor is available.
That's info I got in answer to some of my multiprocessing related questions when learning about the SGI multiprocessor systems and how they differed from a GPU, etc... and those were how SGI broke it down for me.
So I've always seen a distinction along the lines of symmetry. Very similar, or even the same bit of code well distributed across processors is parallel. More differentiated code and potentially tasks is more generally known as concurrent.
Propellers do both kinds of things, which is why I said superset.
Early on, Andre' apparently having come from a similar school, arrived at a similar place, also calling the Prop a concurrent multiprocessor.
Good grief indeed!
Edit: I may have also see this distinction made a lot longer ago too. I'm off to browse some Rockwell and MOS datasets to check.
I am probably completely wrong here but didn't the Transputer break up the same task amongst other Transputers as opposed to concurrently working on some other task?
I know that you will know this stuff
Any relation to Tom Sawyer ;-)
parallel - multiple related tasks being performed at the same time on multiple processors.
I see it as parallel programming needs special tools to turn an algorithm into a parallel program - this is where thing generally get stuck, trying to break code down into parallel programs.
Map/reduce gets to a parallel processing solution in a Hadoop cluster. What is done with most multiple processor systems is just concurrent processing.
At least that's how my little brain wraps around it.
Ah, the Transputer. A great example.
A single Transputer chip only had one processor. So down at that hardware level it was no more parallel than most other micro-processors.
The language supplied with the Transputer, Occam, had parallelism built right into it's syntax. For example:
Those statements in SEQ would execute in sequential time order as normal. Those statements in PAR could be run in parallel. You actually had to use SEQ, there was no default assumption of sequential execution as in most other languages. Of course on a single processor those parallel statements get time sliced so it does not buy you performance. It does make program construction a lot easier though.
But then comes the transputer magic. With it's fast chip to chip data links it could spread your code around a number of processors. Those PAR parts could actually be running parallel. Significantly, the very same code could run on one processor or many.
All this is based on the idea of Communicating Sequential Programming (CSP) as laid down by Tony Hoare in 1978.
The modern day XMOS chips carry on this tradition with the XC language. A C like languages that enforces CSP. But the xcore chips do actually have multiple cores.
So, parallel, concurrent? Meh.
The neat thing about the Transputer was that the whole thing from chip to programming language was designed around that CSP idea.
Those channels in the Occam language reflected hardware features. The PAR reflected the way processes could be distributed easily over many chips.
Even a simple the concept of time is built into Occam's syntax.
It's that same "integrated whole" feeling you get with the Propeller and Spin/PASM
Of course there was no IOT and a nice rgb board would set you back about $3000,,,
My how things have changed.
Had Atari managed to get their Transputer workstation off the ground things might have gone differently.
But as you say. The Transputer and Occam and the whole CSP concept was just too much for the world at the time. Things have indeed changed, now clustering everything is all the rage and Google develops the Go language to help do so. Which is basically a modern day Occam
And yes, the Transputer was pricey. By the time the price came down it could no longer compete on raw performance. Sadly the T9000 successor failed to materialize.
Ah well. We have XMOS today to fly the flag of the Transputer concept.
* Projected release date?
* Projected price?
* Summary of specs for Jan. 2017?
* Party/convention date?
Internal strife at Arduino seemed to chill excitement over their products, especially the Arduino Due ($49.95). For a while, the Due was no longer being sold by Arduino.cc and some vendors were out of stock! That left the field wide open for a 32-bit mcu. Well, the Teensy 3.5 & 3.6 filled that void. It's breadboard friendly, peripheral rich, faster than Due, low priced ($29.95), and Arduino IDE compatible. It hit the mark.
IMHO, Parallax missed the boat. I hope the Prop 2 nears it's introduction to the market.
Not certain on release date, but all hardware design has settled now. Just doing some new layout to reflect the current schematic and waiting to start the synthesis. I don't know what it will cost, but I imagine around $6-$10. The die is a huge 72 square mm. There's a lot in there. As far as spec's go, if you go to the "Prop2 FPGA Files!!!" thread in this forum, click on the doc's link in the first post. It's pretty much all in there. Party? Yes, when that day comes.
We've missed a lot of boats, so far, and may miss a few more. I think the Prop2 will be uniquely useful and productive to use. It's meant to be fully understood, inside and out, and it can breathe a lot of random signals.
Which boat would that be? No Arduino can come close to the real-time capabilities of even the P1.
I was introduced to the 555 about four years later, if it wasn't for for RS and an LED counter module kit needing a clock input, it would have been later than that.
kid's today don't have a lot of extra time for a hobby, between the required study's and sports activities, or even a forty-five minute or more bus ride like my children had. The free time that was left in the day wasn't used for brain work, except for the mounds of homework to be accomplished.
I wasn't an honor student, so I found the time to pursue what interested me.
My oldest grandchild is in high school, and has the long hair thing going on, it just seems a little anoying to me, he and his younger brother just don't wear it well. It's either in there face, or up in a bun like a sumo wrestler.
This old hippie will try the P2, party yes, and I will try a slice of vegan pizza and what ever you wash that down with.
The hippie movement has made a resurgence, nothing like the original. Just a superficial copy, not willing to understand the way things work. If it's broke, toss it out and get the latest model.
Just to add one thing, I think "Blocky" is great, someone new to robotics can get in there and move a new robot around, and feel like they really accomplished something, which they have. And if they want to get into the the real nuts and bolts of it, they can do that to.