It probably is a WAIT instruction. I believe all XMOS chips have 8 hardware threads per core. And would have been designed explicitly to independently suspend context for each thread.
That was something that gave Chip some headaches when trying to merge HubExec with the pre-existing time-slicing on the Prop2-Hot design.
Of course the current Prop2 design dropped the threads to simplify the Cogs, but we gained double the Cogs in the process ... And interrupts too, although I'm not entirely convinced this was the best idea.
Of course the current Prop2 design dropped the threads to simplify the Cogs, but we gained double the Cogs in the process ... And interrupts too, although I'm not entirely convinced this was the best idea.
Oh ye of little faith. They may not be needed by most users but for those that do need them they will be a great boon.
I think introducing interrupts is a step backwards. It's totally antagonistic to the elegance and regularity of the multi-core solution to the problem that interrupts are supposed to solve.
I'm pretty sure that if a COG could interleave alternate instructions from two threads that would achieve what can be done with a traditional background loop and an interrupt routine in 99.999% of cases. It would be a simpler more elegant solution. Basically you end up with two half speed processors that are easy to reason about and give deterministic timing easily. If that could be stretched to 4 interleaved instruction streams instead of one background loop and 3 interrupt routines it gets even better.
Sadly my threading suggestion had to be dumped along with a lot of other stuff when the P2-hot got too hot and we had to adopt the stone age interrupt solution. Actually I was amazed when Chip squeezed those interleaved threads in there.
I think introducing interrupts is a step backwards. It's totally antagonistic to the elegance and regularity of the multi-core solution to the problem that interrupts are supposed to solve.
+1
Yeah, I was a little surprised to see the pro-interrupt faction prevail here. But, as others have said, you don't have to use them if you don't want to.
I think introducing interrupts is a step backwards. It's totally antagonistic to the elegance and regularity of the multi-core solution to the problem that interrupts are supposed to solve.
I'm pretty sure that if a COG could interleave alternate instructions from two threads that would achieve what can be done with a traditional background loop and an interrupt routine in 99.999% of cases. It would be a simpler more elegant solution. Basically you end up with two half speed processors that are easy to reason about and give deterministic timing easily. If that could be stretched to 4 interleaved instruction streams instead of one background loop and 3 interrupt routines it gets even better.
Sadly my threading suggestion had to be dumped along with a lot of other stuff when the P2-hot got too hot and we had to adopt the stone age interrupt solution. Actually I was amazed when Chip squeezed those interleaved threads in there.
Ah well.
What you are suggesting was very close to what I initially proposed. The major difference was that there would be a secondary program counter/status flags and an external signal would cause execution to switch to that pc/status when the current instruction finished.
Yeah, I was a little surprised to see the pro-interrupt faction prevail here. But, as others have said, you don't have to use them if you don't want to.
Not really - interrupts use less silicon, so whilst they may not be as elegant as hard time slicing, they are simpler, cheaper and most important, possible.
Interrupts also allows dual-code-path use above 50% fSys- hard slices are very deterministic, but they are also coarse.
A feature that means the device cannot ship, is not a feature at all
At first, I wasn't crazy about the interrupt mechanism. But I've since changed my mind. The reality is that the interrupts are just another option. You don't need to use them (except when you do).
At first, I wasn't crazy about the interrupt mechanism. But I've since changed my mind. The reality is that the interrupts are just another option. You don't need to use them (except when you do).
The fact that they are local to a single COG means you can grab an object from OBEX that uses interrupts without having to worry about how to integrate it into your application or without even caring that it uses interrupts.
I'm all for the possible. I guess we will never know how impossible the interleaved instruction threads were, the P2-hot was doomed by many other features as well.
Yes, David, the fact that interrupts are confined to a COG and and object is what caused me to stop campaigning against them.
We have what we have and I'm happy. Or would be if I actually had it. As it were.
Shame that it wasn't built. We could have lived within the power constraints. At least we would have had a P2. I am not so sure anymore, and if it does eventually see light of day, I am not so sure it will sell well either. There are just so many other chips around now.
Shame that it wasn't built. We could have lived within the power constraints. At least we would have had a P2.
For a few minutes, anyway, before it died from overheating.
No, the P2Hot was this:
It was the culmination of trying to realize everyones' fantasies of what the P2 should be. Rather than being a detour, I suppose it was a necessary step along the path to the current design, since it proved to all concerned parties that you can't have everything you might wish for.
One problem with P2HOT was that all the RAM was active on every clock. By dividing the RAM up into blocks, only a small block would be active each clock. However, the egg-beater means that the whole RAM may be active on every clock too, since all 16 cogs can access the hub on each clock. RAM was 128KB back then, and now it is 512KB. IIRC the hub is now using OnSemi's ram cells whereas I think the P2HOT used hand crafted ram cells.
There were also other things that were calculated as worst case (everything active) whereas in real life that would not normally be so. Things like all 4 threads active in every cog.
So it is quite probable that the P2HOT would have been fine for many uses, particularly the case where all we wanted was a better P1.
Of course, the new P2 design has 512KB HUB RAM, and that is a major requirement today.
Anyway, the subject is moot. We don't have anything!
It wasn't an SRAM activation issue, it was something about the 128-bit HubRAM databus width running around every Cog. Chip posted a link to a neat write-up about "dark silicon" and using it as a solution to thermal design problems. SRAM blocks count as dark silicon.
The other things were things like the huge multiplier instruction and other similar complex instructions that had to resolve in super short order.
The hardware threads did not impose a thermal cost but did add a pile of context state data that wasn't planned for.
We were told many years ago that we would get free P2s for working on PropGCC. I wonder if that deal is still in effect? Maybe there was an expiration date on it? :-)
We were told many years ago that we would get free P2s for working on PropGCC. I wonder if that deal is still in effect? Maybe there was an expiration date on it? :-)
Hi David
this could be read in 2 different ways
1. getting P2s and starting to work on PropGC
==> makes no sense - when people have first P2s in their hands PropGC must be there, full, working, with libraries, drivers, examples ... to get them a really quick start or it will create lot's of frustration and kill the momentum of the P2 introduction.
2. give the people who make the P2-PropGC a reward for their work
==> definitely they will have earned it ...
not that I ever will use PropGC -
I am fully happy with Tachyon
which will be there, ready as a full system probably very early
We were told many years ago that we would get free P2s for working on PropGCC. I wonder if that deal is still in effect? Maybe there was an expiration date on it? :-)
Hi David
this could be read in 2 different ways
1. getting P2s and starting to work on PropGC
==> makes no sense - when people have first P2s in their hands PropGC must be there, full, working, with libraries, drivers, examples ... to get them a really quick start or it will create lot's of frustration and kill the momentum of the P2 introduction.
2. give the people who make the P2-PropGC a reward for their work
==> definitely they will have earned it ...
not that I ever will use PropGC -
I am fully happy with Tachyon
which will be there, ready as a full system probably very early
Originally, PropGCC for P1 was just a stepping stone to PropGCC for P2. Unfortunately, the timeframe got extended so much that the P2 version didn't follow immediately after the P1 version. Of course, I am not really worried if I get a free P2. I'd be happy to buy one if there was one to buy. I'm looking forward to it and probably to trying Tachyon again.
I'm looking forward to it and probably to trying Tachyon again.
don't wait for P2 to try Tachyon again.
The new version 4.5 has gotten a new wordcode kernel instead the original bytecode kernel.
And is so much improved in speed and functionality.
Thanks to NOT having P2 yet ;-)
PropGCC may not be ready when the P2 is available, but p2gcc will be. Actually it already exists, but with a few limitations. I've been working on floating point support for the past few weeks. I'm hoping to post an update by the end of this week or sometime next week. I've been working on porting an orbital simulator program that I wrote a few years ago. I basically have it working, but it needs to be cleaned up. I also implemented a 640x480 16-color display that is used by the orbital simulator. Maybe once I post it somebody could try porting it to Tachyon. It would be interesting to see if it could be done in Forth.
PropGCC may not be ready when the P2 is available, but p2gcc will be. Actually it already exists, but with a few limitations. I've been working on floating point support for the past few weeks. I'm hoping to post an update by the end of this week or sometime next week. I've been working on porting an orbital simulator program that I wrote a few years ago. I basically have it working, but it needs to be cleaned up. I also implemented a 640x480 16-color display that is used by the orbital simulator. Maybe once I post it somebody could try porting it to Tachyon. It would be interesting to see if it could be done in Forth.
Ummm... I suspect I asked this a while back but just in case, why not spend your time getting up to speed on GCC internals so you can help with retargeting GCC to P2 instead of working on p2gcc? Won't all of that effort be wasted once the real GCC is available?
p2gcc uses PropGCC to generate P1 COG assembly code. p2gcc then uses a utility called s2pasm to convert this to P2 hubexec assembly. p2asm is used to assemble the file to an object file that uses the format I used with the Taz C compiler. The object files are linked together using p2link, which I also borrowed from Taz C. Most of the work I've done on p2gcc was to write the s2pasm converted and to create the p2gcc script file that ties everything together. I've also written a few library routines that are needed to run a C program.
I have looked at the internals of GCC, but it is beyond my understanding. The task of implementing GCC on the P2 is probably 100 times the effort that I've spent on doing p2gcc. For someone else that is experienced with GCC it may only be 10 times the effort that I've spent. Also, I've gotten no support from Parallax on this project. Those are the reasons that I work on p2gcc instead of GCC.
To be honest, I really haven't done much on p2gcc for the last 2 months. I am mostly playing around with demo programs that use p2gcc. The floating point routines that I currently working on are more a way to understand the qxxx instructions in the P2. Maybe some of this work can be reuse if an when Parallax supports a GCC project for the P2.
p2gcc uses PropGCC to generate P1 COG assembly code. p2gcc then uses a utility called s2pasm to convert this to P2 hubexec assembly. p2asm is used to assemble the file to an object file that uses the format I used with the Taz C compiler. The object files are linked together using p2link, which I also borrowed from Taz C. Most of the work I've done on p2gcc was to write the s2pasm converted and to create the p2gcc script file that ties everything together. I've also written a few library routines that are needed to run a C program
Thanks.
Does that first step impose a code size limit, or can you run above the true COG size limit (virtual COG?) to eventually get larger P2 hubexec assembly ?
p2gcc uses PropGCC to generate P1 COG assembly code. p2gcc then uses a utility called s2pasm to convert this to P2 hubexec assembly. p2asm is used to assemble the file to an object file that uses the format I used with the Taz C compiler. The object files are linked together using p2link, which I also borrowed from Taz C. Most of the work I've done on p2gcc was to write the s2pasm converted and to create the p2gcc script file that ties everything together. I've also written a few library routines that are needed to run a C program.
I have looked at the internals of GCC, but it is beyond my understanding. The task of implementing GCC on the P2 is probably 100 times the effort that I've spent on doing p2gcc. For someone else that is experienced with GCC it may only be 10 times the effort that I've spent. Also, I've gotten no support from Parallax on this project. Those are the reasons that I work on p2gcc instead of GCC.
To be honest, I really haven't done much on p2gcc for the last 2 months. I am mostly playing around with demo programs that use p2gcc. The floating point routines that I currently working on are more a way to understand the qxxx instructions in the P2. Maybe some of this work can be reuse if an when Parallax supports a GCC project for the P2.
David, I'm not a C++ programmer, but I just tried a test with a simple "Hello World" cpp program. It compiles OK, but the "cout <<" statement requires library functions that I don't have. The names are also very long, and I currently have a limit of 39 characters per name. I can change that in the future to make the name length unlimited. I replaced the "cout <<" with a printf, and it works OK, but it's not a very thorough test of C++.
jmg, fortunately PropGCC doesn't put a limit on the size of a COG program at compile time. The limit is enforced only at link time. So I can produce large programs P2 programs using the COG model. The current program that I'm working on is about 83K in size. It also uses 150K for the display, so I'm using most of the 256K that's available on the DE2-115. Here's a screen shot of the output of the program.
David, I'm not a C++ program, but I just tried a test with a simple "Hello World" cpp program. It compiles OK, but the "cout <<" statement requires library functions that I don't have. The names are also very long, and I currently have a limit of 39 characters per name. I can change that in the future to make the name length unlimited. I replaced the "cout <<" with a printf, and it works OK, but it's not a very thorough test of C++.
Sounds like it's mostly library work that needs to be done at this point. Maybe your approach is good enough for a first release of the P2? It would save Parallax money not having to do PropGCC for P2. On the other hand, I suspect that we could make use of more of the P2 features if we directly targeted it.
Comments
That was something that gave Chip some headaches when trying to merge HubExec with the pre-existing time-slicing on the Prop2-Hot design.
Oh ye of little faith. They may not be needed by most users but for those that do need them they will be a great boon.
I'm pretty sure that if a COG could interleave alternate instructions from two threads that would achieve what can be done with a traditional background loop and an interrupt routine in 99.999% of cases. It would be a simpler more elegant solution. Basically you end up with two half speed processors that are easy to reason about and give deterministic timing easily. If that could be stretched to 4 interleaved instruction streams instead of one background loop and 3 interrupt routines it gets even better.
Sadly my threading suggestion had to be dumped along with a lot of other stuff when the P2-hot got too hot and we had to adopt the stone age interrupt solution. Actually I was amazed when Chip squeezed those interleaved threads in there.
Ah well.
Yeah, I was a little surprised to see the pro-interrupt faction prevail here. But, as others have said, you don't have to use them if you don't want to.
-Phil
What you are suggesting was very close to what I initially proposed. The major difference was that there would be a secondary program counter/status flags and an external signal would cause execution to switch to that pc/status when the current instruction finished.
Interrupts also allows dual-code-path use above 50% fSys- hard slices are very deterministic, but they are also coarse.
A feature that means the device cannot ship, is not a feature at all
I'm all for the possible. I guess we will never know how impossible the interleaved instruction threads were, the P2-hot was doomed by many other features as well.
Yes, David, the fact that interrupts are confined to a COG and and object is what caused me to stop campaigning against them.
We have what we have and I'm happy. Or would be if I actually had it. As it were.
Shame that it wasn't built. We could have lived within the power constraints. At least we would have had a P2. I am not so sure anymore, and if it does eventually see light of day, I am not so sure it will sell well either. There are just so many other chips around now.
No, the P2Hot was this:
It was the culmination of trying to realize everyones' fantasies of what the P2 should be. Rather than being a detour, I suppose it was a necessary step along the path to the current design, since it proved to all concerned parties that you can't have everything you might wish for.
-Phil
There were also other things that were calculated as worst case (everything active) whereas in real life that would not normally be so. Things like all 4 threads active in every cog.
So it is quite probable that the P2HOT would have been fine for many uses, particularly the case where all we wanted was a better P1.
Of course, the new P2 design has 512KB HUB RAM, and that is a major requirement today.
Anyway, the subject is moot. We don't have anything!
The other things were things like the huge multiplier instruction and other similar complex instructions that had to resolve in super short order.
The hardware threads did not impose a thermal cost but did add a pile of context state data that wasn't planned for.
I'll buy one if you make it.
Alas, the P2 prototype PCB and other samples you sent me have compound dust on them.
My compound interest has mostly faded, but I'd still buy a real P2 for novelty if nothing else.
this could be read in 2 different ways
1. getting P2s and starting to work on PropGC
==> makes no sense - when people have first P2s in their hands PropGC must be there, full, working, with libraries, drivers, examples ... to get them a really quick start or it will create lot's of frustration and kill the momentum of the P2 introduction.
2. give the people who make the P2-PropGC a reward for their work
==> definitely they will have earned it ...
not that I ever will use PropGC -
I am fully happy with Tachyon
which will be there, ready as a full system probably very early
The new version 4.5 has gotten a new wordcode kernel instead the original bytecode kernel.
And is so much improved in speed and functionality.
Thanks to NOT having P2 yet ;-)
- and why someone might use one or the other ?
I have looked at the internals of GCC, but it is beyond my understanding. The task of implementing GCC on the P2 is probably 100 times the effort that I've spent on doing p2gcc. For someone else that is experienced with GCC it may only be 10 times the effort that I've spent. Also, I've gotten no support from Parallax on this project. Those are the reasons that I work on p2gcc instead of GCC.
To be honest, I really haven't done much on p2gcc for the last 2 months. I am mostly playing around with demo programs that use p2gcc. The floating point routines that I currently working on are more a way to understand the qxxx instructions in the P2. Maybe some of this work can be reuse if an when Parallax supports a GCC project for the P2.
Does that first step impose a code size limit, or can you run above the true COG size limit (virtual COG?) to eventually get larger P2 hubexec assembly ?
jmg, fortunately PropGCC doesn't put a limit on the size of a COG program at compile time. The limit is enforced only at link time. So I can produce large programs P2 programs using the COG model. The current program that I'm working on is about 83K in size. It also uses 150K for the display, so I'm using most of the 256K that's available on the DE2-115. Here's a screen shot of the output of the program.
I somehow imagined an orbital simulator would be text based for some reason