Right. That's one long per clock. I'm trying not to think about executing from hub RAM, directly, with automatic RDOCTL's in the background. CALLA/RETA would be a suitable stack, after widening the PC bits from 9 to 18.
Wow! That's the most encouraging thing I've heard from you in a while. "Trying not to think about executing from hub RAM directly" suggests to me that you're at least tempted to think about it! :-)
This chip can do a combination of things without an operating system that just isn't really out there.
Let's hope that is the case. I think it might be, barring some research to really nail more of this down.
Essentially, I'm saying if the P2 is really good, it can bend that curve some, and being a little late to be that good is perfectly fine in the long run.
If the P2 were just another ARM device, certainly windows would be closing.
It is not, in fact it sits somewhat off on its own, and many things outside of the P2 chip design itself, work to its advantage.
* Software development is proceeding in parallel, in a way that could not have happened 10 years ago.
* There is some potential customer overlap with FPGA, again proving an advantage as time moves forward
* FPGA vendors are leaving some markets behind, as they chase the leading edge
* Large FPGAs of the type that can emulate a P2, are falling in price
Looking at just MHz (or ns), the RAM base of the P2, combined with the sluggish FLASH speeds of others, means it can give a time-domain granularity and determinism other micro controller vendors can only envy - in spite of being 15 years late in process terms.
* Large Micros are getting larger - that means even MORE software, and likely an operating system.
Suddenly, you have an expanding base of (RaspPi el al) users who would like to re-discover hard real time.
* Large Pixel count LCDs are getting cheaper, and now an expanding base of users wants better display performance, but ideally without needing to whack-on-a-Linux-box to get it.
It is smarter to focus on those emerging markets, and ensure the P2, when released, can give those customers something they need.
Here is a pic that I think reflects the current data bus architecture between hub/cog/aux.
By the way the AUX RAM already effectively has two ports in Cluso's earlier pic (though one side is read only), so maybe we could sit a 2:1 MUX between the video generator and the AUX RAM and connect the second input of this MUX to the hub and make it a R/W port instead of read only. If you do this in theory you get the ability to select between the hub accessing the AUX RAM (in any non video COGs), or the video generator accessing it. The COG containing the AUX RAM sets this up as it wants depending on what it needs to do. This approach could better utilize the existing dual port cabilities of the AUX RAM in non video applications, because not all COGs are going to use their video generator, and when they don't the dual port cability of AUX is otherwise wasted. It also means you don't really need to increase the AUX RAM for the sharing of data, you already have it in there, it's more control logic but not necessarily more RAM cells.
This would basically allow a driver COG to continually make available some of its AUX RAM contents to all other COGs somewhere in the common hub address space (the ROM hole?) without needing to push data out all the time via the hub, or read/poll it in all the time which is always going to be adding extra latency. Multi COG I/O drivers could benefit if they can share data faster, and perhaps cache drivers too. The hub mechanism solves multiple writer access of this AUX RAM between other COGs, except for cases where the native COG and hub both fight to write to the same AUX RAM address (one needs to have priority, probably the native COG should).
...treat the P2 as an iterative, community-influence (designed, tested, etc) CPU that only ever manifested as a FPGA core. From that, we get the P3 (an actual ASIC). From there, you can have the P4 be the next iterative design, followed by the P5 as the next ASIC.
In a way, this honors the real work and value of the P2, while also conveying the progress made as an actual commercial product (as the P3).
So when do revenues start to come in?
I've already predicted that the P2, as a real chip, won't be on the shelves of distributors, in sensible quantities, for another 12 months.
You could make an equally valid case it is ~15 years late, as the 180nm process first released in 1999.
If the P2 were just another ARM device, certainly windows would be closing.
It is not, in fact it sits somewhat off on its own, and many things outside of the P2 chip design itself, work to its advantage.
* Software development is proceeding in parallel, in a way that could not have happened 10 years ago.
* There is some potential customer overlap with FPGA, again proving an advantage as time moves forward
* FPGA vendors are leaving some markets behind, as they chase the leading edge
* Large FPGAs of the type that can emulate a P2, are falling in price
Looking at just MHz (or ns), the RAM base of the P2, combined with the sluggish FLASH speeds of others, means it can give a time-domain granularity and determinism other micro controller vendors can only envy - in spite of being 15 years late in process terms.
* Large Micros are getting larger - that means even MORE software, and likely an operating system.
Suddenly, you have an expanding base of (RaspPi el al) users who would like to re-discover hard real time.
* Large Pixel count LCDs are getting cheaper, and now an expanding base of users wants better display performance, but ideally without needing to whack-on-a-Linux-box to get it.
It is smarter to focus on those emerging markets, and ensure the P2, when released, can give those customers something they need.
This is how I kind of see things, too.
Yes, it could be argued that the Prop2 is so late by some metrics that it's not even worth making. But, how many other chips out there have been (or are being) painstakingly designed by people (me, you, others here) who've loved programming for perhaps 30 years, on average, who remember the old feelings of confidence and sheer joy that came from developing on/for systems that were reliable and responded logically to all their efforts, allowing them to build machines that worked perfectly? That kind of experience is long gone, buried by impenetrable layers of mucky-muck that have sapped almost all the fun out of engineering. The younger generation doesn't even know what it's missing. "Normal" to them is usually "broken" to us. They're relegated to playing with a lot of mediocre junk, and as a strange result, technology is relegated to serving as a social umbrella under which the would-be engineers gather, identifying themselves as "nerds" and "geeks" who can "hack" things. They're in need of some real nutrition, I think.
Here is an "proposed" pic with 256bit hub RD/WROCTET (8 longs) hub/aux access...
By building the Aux Ram in 8 blocks of 32 longs (256 longs) and also permitting byte/word/long access...
* The hub could transfer directly to/from any Aux Ram block of 32 longs in a single hub access cycle.
* For video, the cog no longer has to setup SPx and perform 8 PUSH instructions to copy the longs from cache to Aux Ram - a huge saving!
* By also permitting BYTE/WORD/LONG access to the Aux Ram, the Aux can also act as the 8 Long Cache.
* The cog can immediately access the Aux Ram without the need to get it out of (or put it into) the cache and put it into Aux Ram.
Yes, it could be argued that the Prop2 is so late by some metrics that it's not even worth making. But, how many other chips out there have been (or are being) painstakingly designed by people (me, you) who've loved programming for perhaps 30 years, on average, who remember the old feelings of confidence and sheer joy that came from developing on/for systems that were reliable and responded logically to all their efforts, allowing them to build machines that worked perfectly? That kind of experience is long gone, buried by impenetrable layers of mucky-muck that have sapped almost all the fun out of engineering. The younger generation doesn't even know what it's missing. "Normal" to them is usually "broken" to us. They're relegated to playing with a lot of mediocre junk, and as a strange result, technology is relegated to serving as a social umbrella under which the would-be engineers gather, identifying themselves as "nerds" and "geeks" who can "hack" things. They're in need of some real nutrition, I think.
Couldn't agree more!
I have no enthusiasm for getting to know the ins and outs of ARM. However, I can understand a lot of the P1 & P2 implementation details, and with the explanations you "share" with us, my learning is exploding.
The last month or two has seen the P2 dynamically shift into such powerful cores with much more regular instruction set, and many additional instructions that make life so much easier, and also result in more optimised code space too. These delays have come with such terrific improvement, as long as it does not hurt the income stream for Parallax, the resultant markets for the prop have grown exponentially because of what it can now do.
And of course, in the meantime we have some really great code being written on the FPGA. Brian's (ozpropdev) single cog has just shown what could be achieved before these latest mods. Imagine what he can do now!
Lastly, letting us take part in the process has been a fantastic experience. One with no equal in any other organisation. I feel so honoured to be a part of it.
Thanks Chip, Ken, Beau & Parallax!
If they don't get it from this chip, it's not going to be practical for them to get it, IMHO. So here's the thing. We've been here before, and it seems a mess, but then it's excellent, and the cycle repeats, and then I think it won't happen.
And a lot of people are connected to P2 happening and doing well. Perhaps it's just a lack of perspective on my part. The chip being discussed right now is a whole lot better than the one we almost had. Amazing really.
I'm going likely pull or trim down the other post as I've heard what I need to. Thanks for expressing what you two did JMG & Chip.
I like DQUAD
RDOCTO
RDOCTOC
WROCTO
Well there is RDDQUAD... and that's getting long.
Call it a page? That's a common term for 256 bytes. RDPAGE, WRPAGE, RDPAGEC?
I didn't look at like that before, but moving whole pages to and from the HUB is powerful. Maybe that's a big enough chunk of data to abandon the nomenclature.
If they don't get it from this chip, it's not going to be practical for them to get it, IMHO. So here's the thing. We've been here before, and it seems a mess, but then it's excellent, and the cycle repeats, and then I think it won't happen.
And a lot of people are connected to P2 happening and doing well. Perhaps it's just a lack of perspective on my part. The chip being discussed right now is a whole lot better than the one we almost had. Amazing really. Thanks for expressing what you two did JMG & Chip.
I like DQUAD
RDOCTO
RDOCTOC
WROCTO
Well there is RDDQUAD... and that's getting long.
Call it a page? That's a common term for 256 bytes. RDPAGE, WRPAGE?
How about:
RDPARA
RDPARAC
WRPARA
PARAgraph - smaller than a page.
PAGE looks better, but it isn't quite a page. We could bend the meaning, though.
Lastly, letting us take part in the process has been a fantastic experience. One with no equal in any other organisation. I feel so honoured to be a part of it.
Thanks Chip, Ken, Beau & Parallax!
I was so disappointed when the last chip run failed.
Now I think it was the best thing that could have happened for the P2, Parallax, and all of us that will eventually use it. The improvements in the instruction set that Chip has done alone make what we have now significantly better! Add to that the additional functionality of more Hub RAM, wider data paths, and (if I am reading it right) the theoretical possibility to run cog programs direct from hub? This really is getting the P3, but in 6 months instead of 7 years!
I completely agree with what makes this chip so special compared to ARM or Pic or any SoC out there. My project has been on the back burner for almost 20 years. The mechanical hardware was long ago designed. The hold up is the brains. The first attempt was a PC driving a user interface to an SX chip over a serial port that then tried to coordinate 5 other SX chips. While it 'worked' it was not something that I would want to sell or even try to support for in house use. The 32 bit Pics are very capable but you still have very complicated code, or are forced into a RTOS and the learning curve there, and STILL need multiple systems trying to talk together in a kludge that again is not something I would want to market or support. The P2 has the potential to not only do it all, but do it BETTER than any other combo system out there.
Just look at the Cubieboard2 with the A20 on it. Looks GREAT! Tons of memory, high quality graphics, USB, Ethernet, lots of I/O left over... Then you realize you HAVE to use an OS to get it to even turn on. Now we are looking at Android or Linux and dealing with very complex software where it would be SO easy to make a change one place an break something someplace else.
The P2 is going to be a great educational tool. My son is having a BLAST with his P1 learning C and designing robots (both fun and functional) that it might one day run. He is 11. With the P1 I can easily teach him real time process control with simple C.
The P2, for those who think outside of the box, will open up a whole new world of commercial possibilities.
Comments
Can we please stop calling these octets? I'm getting a little bit disappointed hearing we can transfer a whole 8 bits.
http://en.wikipedia.org/wiki/Octet_%28computing%29
But anyways, the back and forth that brought about the "octolongs" has been really exciting to follow in realtime.
What the heck are 8 longs called?
Yes, it is very exiting!
You could make an equally valid case it is ~15 years late, as the 180nm process first released in 1999.
If the P2 were just another ARM device, certainly windows would be closing.
It is not, in fact it sits somewhat off on its own, and many things outside of the P2 chip design itself, work to its advantage.
* Software development is proceeding in parallel, in a way that could not have happened 10 years ago.
* There is some potential customer overlap with FPGA, again proving an advantage as time moves forward
* FPGA vendors are leaving some markets behind, as they chase the leading edge
* Large FPGAs of the type that can emulate a P2, are falling in price
Looking at just MHz (or ns), the RAM base of the P2, combined with the sluggish FLASH speeds of others, means it can give a time-domain granularity and determinism other micro controller vendors can only envy - in spite of being 15 years late in process terms.
* Large Micros are getting larger - that means even MORE software, and likely an operating system.
Suddenly, you have an expanding base of (RaspPi el al) users who would like to re-discover hard real time.
* Large Pixel count LCDs are getting cheaper, and now an expanding base of users wants better display performance, but ideally without needing to whack-on-a-Linux-box to get it.
It is smarter to focus on those emerging markets, and ensure the P2, when released, can give those customers something they need.
Thank you.
I like DQUADs, it fits well I think.
@JMG: Maybe.
I like many of the points you raise. Hope they play out. Clock is ticking. I'm looking forward to chips.
OCTET from the Oxford Dictionary...
noun
- a group of eight people or things, in particular:
- a group of eight musicians.
- a musical composition for eight voices or instruments.
- the first eight lines of a sonnet.
- Chemistry a stable group of eight electrons occupying a single shell in an atom.
Origin:mid 19th century: from Italian ottetto or German Oktett, on the pattern of duet and quartet
http://www.oxforddictionaries.com/definition/english/octet?q=octet
Here we are referring to 8 longs so it is an "octet of longs".
Anyway, Chip will work out what he will call the instruction.
A BS1, 160,000,000 times a second...
LOL, this discussion happened before.
1 bit - bit
2 bits - snak
4 bits - Nibble (or Nybble)
8 bits - Byte
16 bits - Word
32 bits - Long
64 bits - Dlong (Double long)
128 bits - Qlong (Quad long)
256 bits - Olong (Octo long)
I honestly don't care. Fun to discuss though. Whatever ends up being in the docs is very highly likely what I'll end up using.
Just saw Cluso's post. Yeah, there it is, right there.
By the way the AUX RAM already effectively has two ports in Cluso's earlier pic (though one side is read only), so maybe we could sit a 2:1 MUX between the video generator and the AUX RAM and connect the second input of this MUX to the hub and make it a R/W port instead of read only. If you do this in theory you get the ability to select between the hub accessing the AUX RAM (in any non video COGs), or the video generator accessing it. The COG containing the AUX RAM sets this up as it wants depending on what it needs to do. This approach could better utilize the existing dual port cabilities of the AUX RAM in non video applications, because not all COGs are going to use their video generator, and when they don't the dual port cability of AUX is otherwise wasted. It also means you don't really need to increase the AUX RAM for the sharing of data, you already have it in there, it's more control logic but not necessarily more RAM cells.
This would basically allow a driver COG to continually make available some of its AUX RAM contents to all other COGs somewhere in the common hub address space (the ROM hole?) without needing to push data out all the time via the hub, or read/poll it in all the time which is always going to be adding extra latency. Multi COG I/O drivers could benefit if they can share data faster, and perhaps cache drivers too. The hub mechanism solves multiple writer access of this AUX RAM between other COGs, except for cases where the native COG and hub both fight to write to the same AUX RAM address (one needs to have priority, probably the native COG should).
So when do revenues start to come in?
I've already predicted that the P2, as a real chip, won't be on the shelves of distributors, in sensible quantities, for another 12 months.
This is how I kind of see things, too.
Yes, it could be argued that the Prop2 is so late by some metrics that it's not even worth making. But, how many other chips out there have been (or are being) painstakingly designed by people (me, you, others here) who've loved programming for perhaps 30 years, on average, who remember the old feelings of confidence and sheer joy that came from developing on/for systems that were reliable and responded logically to all their efforts, allowing them to build machines that worked perfectly? That kind of experience is long gone, buried by impenetrable layers of mucky-muck that have sapped almost all the fun out of engineering. The younger generation doesn't even know what it's missing. "Normal" to them is usually "broken" to us. They're relegated to playing with a lot of mediocre junk, and as a strange result, technology is relegated to serving as a social umbrella under which the would-be engineers gather, identifying themselves as "nerds" and "geeks" who can "hack" things. They're in need of some real nutrition, I think.
By building the Aux Ram in 8 blocks of 32 longs (256 longs) and also permitting byte/word/long access...
* The hub could transfer directly to/from any Aux Ram block of 32 longs in a single hub access cycle.
* For video, the cog no longer has to setup SPx and perform 8 PUSH instructions to copy the longs from cache to Aux Ram - a huge saving!
* By also permitting BYTE/WORD/LONG access to the Aux Ram, the Aux can also act as the 8 Long Cache.
* The cog can immediately access the Aux Ram without the need to get it out of (or put it into) the cache and put it into Aux Ram.
I have no enthusiasm for getting to know the ins and outs of ARM. However, I can understand a lot of the P1 & P2 implementation details, and with the explanations you "share" with us, my learning is exploding.
The last month or two has seen the P2 dynamically shift into such powerful cores with much more regular instruction set, and many additional instructions that make life so much easier, and also result in more optimised code space too. These delays have come with such terrific improvement, as long as it does not hurt the income stream for Parallax, the resultant markets for the prop have grown exponentially because of what it can now do.
And of course, in the meantime we have some really great code being written on the FPGA. Brian's (ozpropdev) single cog has just shown what could be achieved before these latest mods. Imagine what he can do now!
Lastly, letting us take part in the process has been a fantastic experience. One with no equal in any other organisation. I feel so honoured to be a part of it.
Thanks Chip, Ken, Beau & Parallax!
I think maybe OCTO could work. Or DQUAD. Or DQLONG. Or DQL.
RDOCTO
RDOCTOC
WROCTO
I wish OCTO started with a consonant, instead of a vowel. Those instruction names don't look right, or would take some getting used to.
Agreed!
If they don't get it from this chip, it's not going to be practical for them to get it, IMHO. So here's the thing. We've been here before, and it seems a mess, but then it's excellent, and the cycle repeats, and then I think it won't happen.
And a lot of people are connected to P2 happening and doing well. Perhaps it's just a lack of perspective on my part. The chip being discussed right now is a whole lot better than the one we almost had. Amazing really.
I'm going likely pull or trim down the other post as I've heard what I need to. Thanks for expressing what you two did JMG & Chip.
I like DQUAD
RDOCTO
RDOCTOC
WROCTO
Well there is RDDQUAD... and that's getting long.
Call it a page? That's a common term for 256 bytes. RDPAGE, WRPAGE, RDPAGEC?
I didn't look at like that before, but moving whole pages to and from the HUB is powerful. Maybe that's a big enough chunk of data to abandon the nomenclature.
How about:
RDPARA
RDPARAC
WRPARA
PARAgraph - smaller than a page.
PAGE looks better, but it isn't quite a page. We could bend the meaning, though.
Paragraph... LOL,
RDVLW, WRVLW, RDVLWC
Very long word?
That's probably it for me and naming ideas. In the end, it gets called what it's called, and once we all type it a few times, great!
One last one: Block.
RDBLK, WRBLK, RDBLKC
Too easily confused with PARAmeter.
perhaps
RDWIDE
RDWIDEC
WRWIDE
I think WIDE is it. It reads well and is very simple to remember.
RDHUB1
RDHUB2
etc...
Or maybe:
RDHUB reg, addr, #1
RDHUB reg, addr, #2
etc...
What do the numbers represent?
Now I think it was the best thing that could have happened for the P2, Parallax, and all of us that will eventually use it. The improvements in the instruction set that Chip has done alone make what we have now significantly better! Add to that the additional functionality of more Hub RAM, wider data paths, and (if I am reading it right) the theoretical possibility to run cog programs direct from hub? This really is getting the P3, but in 6 months instead of 7 years!
I completely agree with what makes this chip so special compared to ARM or Pic or any SoC out there. My project has been on the back burner for almost 20 years. The mechanical hardware was long ago designed. The hold up is the brains. The first attempt was a PC driving a user interface to an SX chip over a serial port that then tried to coordinate 5 other SX chips. While it 'worked' it was not something that I would want to sell or even try to support for in house use. The 32 bit Pics are very capable but you still have very complicated code, or are forced into a RTOS and the learning curve there, and STILL need multiple systems trying to talk together in a kludge that again is not something I would want to market or support. The P2 has the potential to not only do it all, but do it BETTER than any other combo system out there.
Just look at the Cubieboard2 with the A20 on it. Looks GREAT! Tons of memory, high quality graphics, USB, Ethernet, lots of I/O left over... Then you realize you HAVE to use an OS to get it to even turn on. Now we are looking at Android or Linux and dealing with very complex software where it would be SO easy to make a change one place an break something someplace else.
The P2 is going to be a great educational tool. My son is having a BLAST with his P1 learning C and designing robots (both fun and functional) that it might one day run. He is 11. With the P1 I can easily teach him real time process control with simple C.
The P2, for those who think outside of the box, will open up a whole new world of commercial possibilities.
Keep up the great work Chip/Ken/crew.