The P2 now is to the point where it is several different architectures built on the same P1 architecture. From Heater's list, you can almost come away with features that will be used in a mutually exclusive fashion unless you are someone like Chip (a very small group of gifted Propeller developers).
Us common folk will see a bigger better P1 with faster COGs to run PASM and a bigger place to put our Spin (or whatever) business logic and glue code.
We may consciously try some HUBEXEC at first, those threads look interesting, try blending some things together, work in some stack space in the AUX area.....
It's going to be a mishmash of code and confusion for quite a while. Some of it will click in our heads, some of it will stick as useful, some of it will stay as separate architecture solving different problems.
Power to those that can blend it all together and make it sing!
I hope the fun and wonder and elegant simplicity hasn't disappeared from playing at the machine level!
Do we have any detail on that. I seem to recall seeing that as well, but can't find it.
I'm wondering how that would work, it seems like it would be possible that depending on how the tasks are setup it might never get hub access.
I believe that's why the task-scheduling approach was modified. The idea is to allow task time slots to be arranged such that every task is eventually guaranteed to get a hub slot (though I guess you could have a number of misses leading up to it).
I haven't looked in detail but didn't ARM go mad with the "thumb" instruction set to produce smaller code. Then they threw in hardware execution of Java byte codes (Does anyone even use that?). Then they have half a dozen different floating point implementations going on and different generations of SIMD instructions. Chaos.
I guess 300 odd instructions on the P2 comes down to 150 different operations or so when we take into account A/B variants that can easily be memorized. That is still a lot though.
Now what about all those execution modes:
1) In COG
2) In HUB
3) Single threaded
4) Hardware threaded
5) Stack in HUB?
6) Stack in AUX
7) Greedy or not?
That's a lot to keep track of. And what instructions can I use in what modes and how does everything interact with everything else?
How many of these "modes" will, say, propgcc ever support.
I'm just feeling overwhelmed at the moment. Tell me it's just a case of not being able to see the forest for the trees.
It's just a case of not being able to see the forest for the trees. - do you feel better now?
A big part of the instructions are SET and GET instructions that access hardware-registers. On Prop1 that was all done with one instruction: MOV.
Now it is as you would count:
MOV dst,DIRA
MOV OUTA,src
MOV dst,CNT
..
all as single instructions.
Many of the new instructions other than GET and SET makes programming easier instead of more complicated. You can multiply, divide, set and read pins with a pin number, access bits in a register without masks and so on.
Yes, the various execution modes add a lot of complexity, mainly hardware tasks (but that's your fault ).
Fortunatly you can just ignore them and use the Prop2 like the Prop1 if it's too complex for you.
What's easy for gurus isn't easy or apparent for the beginner or new comer to the strange world of the Prop, especially in light that the P2 targeted to commercial sector(in order to recoup the costs involved, if it doesn't recoup, forget about the dreamy processor). If the gurus here want to present it in it's best light, I'd suggest they sit down and churn out some intro PDF's showing how easy it is to code for the P2 in C and PASM, etc.
Now what about all those execution modes:
1) In COG
2) In HUB
3) Single threaded
4) Hardware threaded
5) Stack in HUB?
6) Stack in AUX
7) Greedy or not?
Some might see 7, I would see that as only 3 - and that is not uncommon on small Micros.
1) COG/HUB
2) Single/Multi threaded
3) Stack in HUB/AUX
1) Main program starts with the main/glue code (business logic) running in COG0 using HUBEXEC mode (GREEDY)
2) Main fires off COG1 to run in hardware threaded mode to handle keyboard, mouse and a UART connection to a Bluetooth
3) Main fires off COG2 to give me a pretty HDMI display
4) Main fires off COG3 with something that is using AUX as a buffer Q
5) Main fires off COG4 to use an SD
.
.
.
To all this, you add the various memory models in PropCGG and how they can and can't interact, which things you can do from which mode and which you can't, our nice simple and elegant Propeller world has become complex, difficult to document, provide tutorials, examples and APP notes for and present to potential new users or actual new adopters in a digestible manner.
We are in the process of moving to Las Vegas great place for a P2 meeting:) I was tasked with driving Libby's car out so I have been incommunicado for the last week and what a week you guys
have had:)
I think I am one of the few hobbyists trying to follow all of this, and I can say that the complexity doesn't bother me a bit.
I don't understand it but there are a lot of other things that I don't understand, and it doesn't keep me from doing what I want to do.
In fact, I am delighted by it all. A few months back I asked about digitizing HDMI signals and I was told "NO." Now that we have hungry cogs, hub exec and zillion bit transfers to the hub,
it sort of looks like it is a matter of invention.
What I am wondering about today is what is there in the hardware that will facilitate multi-Prop2 architectures?
1) Main program starts with the main/glue code (business logic) running in COG0 using HUBEXEC mode (GREEDY)
2) Main fires off COG1 to run in hardware threaded mode to handle keyboard, mouse and a UART connection to a Bluetooth
3) Main fires off COG2 to give me a pretty HDMI display
4) Main fires off COG3 with something that is using AUX as a buffer Q
5) Main fires off COG4 to use an SD
Wot ? No Audio ? With I2S in SERDES, (likely) Audio could gain some valuable processing for mixing/gain/effects.
What I am wondering about today is what is there in the hardware that will facilitate multi-Prop2 architectures?
Super high-speed UART, something like 100Mbit/sec.
With regard to complexity of the Prop2 I think it often comes down to working examples, with each instruction or hardware config, in the instruction manual. This obviously makes a lot more work for the manual writers but it makes the learning process way faster by the simple process of giving context to particular definitions.
What I am wondering about today is… what is there in the hardware that will facilitate multi-Prop2 architectures?
The Serdes there now can do 32 bit transfers, The final serdes should give more choices and maybe more buffers.
If DDR Quad SPI make the cut, that could transfer 50MB/s at practical port IO speeds, but may push clock skew management.
Wot ? No Audio ? With I2S in SERDES, (likely) Audio could gain some valuable processing for mixing/gain/effects.
Maybe, but I wasn't sure as to any restrictions, limitations, special cases or other considerations that I had to follow to use the SERDES. The App note I found only covered a few of the mode/language combinations. :frown:
Super high-speed UART, something like 100Mbit/sec.
That would need to be Sync mode, USART not the classic Async UART, and 100MHz is likely to hit port skew and delay issues. Other parts seem to be able to get above 50MHz, but not a lot above 50MHz.
Maybe, but I wasn't sure as to any restrictions, limitations, special cases or other considerations that I had to follow to use the SERDES. The App note I found only covered a few of the mode/language combinations. :frown:
Thanks, that is what I thought, just wanted to be sure.
One of the outcomes of slot sharing, if we get it, is likely to be pressure to not poll the hub in a tight loop unless really needed.
We will likely want to come up with some good non-Hub methods of signaling between cogs so data only needs read from the hub when ready instead of being constantly polled using hub reads.
With apologies to Monty Python...
Because every slot is sacred
Every slot is great
If a slot is wasted
God gets quite irate
One of the outcomes of slot sharing, if we get it, is likely to be pressure to not poll the hub in a tight loop unless really needed.
We will likely want to come up with some good non-Hub methods of signaling between cogs so data only needs read from the hub when ready instead of being constantly polled using hub reads.
If a floor support does not make it in silicon, you can always use waits to slow down the polling, to whatever speed you really need.
Many COGs code will be fine will far less than max-rate polling.
Thanks guys. I really like the concept of hungry cogs… it just "sounds" so right.
Has there been any word on Parallax's various other FPGA initiatives. I have a DE0-nano and I
am going to be lost in the dust if I can't upgrade around the time of the Chip's next release.
With regard to complexity of the Prop2 I think it often comes down to working examples, with each instruction or hardware config, in the instruction manual. This obviously makes a lot more work for the manual writers but it makes the learning process way faster by the simple process of giving context to particular definitions.
One of my concerns, how many of the gurus, kibitzers, experts and other contributors are going to be around when it comes time to write documentation, provide working, clear, well documented examples and app notes and then be technical editors for the features they have fought so hard for. Parallax has limited resources for these types of things that are CRITICAL to a successful launch. EVERYBODY and their brother wants to contribute to the design....once FPGA blobs are out or first run chips, everyone will run off and play and we'll be surrounded by crashing silence!
The manual will be a BEAST and the amount of information for any writer or team of writers to cover will be HUGE!
That would need to be Sync mode, USART not the classic Async UART, and 100MHz is likely to hit port skew and delay issues. Other parts seem to be able to get above 50MHz, but not a lot above 50MHz.
I had the impression Chip had it designed for half system-clock. It only needs a consistence bit rate. The amount of skew is not important.
How reliable it actually ends up being I guess is another matter but I don't see, the currently non-existent, sync mode being any better really. The reason why sync mode is being requested is for compatibility/flexibility.
I plan on being around for those things in my areas of interest and maybe some beyond. Who knows?
You are not alone in your concern. It's really easy to just have fun influencing the design. It is quite another to contribute meaningfully beyond that.
Another open contribution and evolution to be encouraged of the forum members ...
Two proven things that thin the herd: a pack of lionesses or asking for volunteers to help with documentation! (especially if they have a new toy to play with!)
Sounds like you're getting close. Congratulations! It's been a long haul but very interesting to watch from the sidelines. Dare I ask what ever happened to the idea of a CALL instruction variant that places its return address in a known COG location? Is that in there already or still planned?
I haven't forgotten about that, David. I'll revisit that at the end of the current hub exec work. It's funny you mentioned this, because it just popped back into my head last night.
I'm just feeling overwhelmed at the moment. Tell me it's just a case of not being able to see the forest for the trees.
There are a lot of instructions, but they distill down to only several categories. Just think of them as friends you haven't met yet, who will all want to help you out.
The manual will be a BEAST and the amount of information for any writer or team of writers to cover will be HUGE!
Actually, I don't think the manual needs to be that big. It's the app notes that can go on forever. The manual just needs to explain how it works. What you DO with the chip is the endless matter.
Comments
Us common folk will see a bigger better P1 with faster COGs to run PASM and a bigger place to put our Spin (or whatever) business logic and glue code.
We may consciously try some HUBEXEC at first, those threads look interesting, try blending some things together, work in some stack space in the AUX area.....
It's going to be a mishmash of code and confusion for quite a while. Some of it will click in our heads, some of it will stick as useful, some of it will stay as separate architecture solving different problems.
Power to those that can blend it all together and make it sing!
I hope the fun and wonder and elegant simplicity hasn't disappeared from playing at the machine level!
I believe that's why the task-scheduling approach was modified. The idea is to allow task time slots to be arranged such that every task is eventually guaranteed to get a hub slot (though I guess you could have a number of misses leading up to it).
It's just a case of not being able to see the forest for the trees. - do you feel better now?
A big part of the instructions are SET and GET instructions that access hardware-registers. On Prop1 that was all done with one instruction: MOV.
Now it is as you would count: all as single instructions.
Many of the new instructions other than GET and SET makes programming easier instead of more complicated. You can multiply, divide, set and read pins with a pin number, access bits in a register without masks and so on.
Yes, the various execution modes add a lot of complexity, mainly hardware tasks (but that's your fault ).
Fortunatly you can just ignore them and use the Prop2 like the Prop1 if it's too complex for you.
Andy
Some might see 7, I would see that as only 3 - and that is not uncommon on small Micros.
1) COG/HUB
2) Single/Multi threaded
3) Stack in HUB/AUX
GCC already has more modes than 3 it can manage right now, see
https://sites.google.com/site/propellergcc/documentation/faq
.. it's just a case of not being able to see the forest for the trees.
- but spare a thought for Chip, he is underneath all this...
1) Main program starts with the main/glue code (business logic) running in COG0 using HUBEXEC mode (GREEDY)
2) Main fires off COG1 to run in hardware threaded mode to handle keyboard, mouse and a UART connection to a Bluetooth
3) Main fires off COG2 to give me a pretty HDMI display
4) Main fires off COG3 with something that is using AUX as a buffer Q
5) Main fires off COG4 to use an SD
.
.
.
To all this, you add the various memory models in PropCGG and how they can and can't interact, which things you can do from which mode and which you can't, our nice simple and elegant Propeller world has become complex, difficult to document, provide tutorials, examples and APP notes for and present to potential new users or actual new adopters in a digestible manner.
C.W.
have had:)
I think I am one of the few hobbyists trying to follow all of this, and I can say that the complexity doesn't bother me a bit.
I don't understand it but there are a lot of other things that I don't understand, and it doesn't keep me from doing what I want to do.
In fact, I am delighted by it all. A few months back I asked about digitizing HDMI signals and I was told "NO." Now that we have hungry cogs, hub exec and zillion bit transfers to the hub,
it sort of looks like it is a matter of invention.
What I am wondering about today is what is there in the hardware that will facilitate multi-Prop2 architectures?
That's correct.
Wot ? No Audio ? With I2S in SERDES, (likely) Audio could gain some valuable processing for mixing/gain/effects.
Super high-speed UART, something like 100Mbit/sec.
With regard to complexity of the Prop2 I think it often comes down to working examples, with each instruction or hardware config, in the instruction manual. This obviously makes a lot more work for the manual writers but it makes the learning process way faster by the simple process of giving context to particular definitions.
The Serdes there now can do 32 bit transfers, The final serdes should give more choices and maybe more buffers.
If DDR Quad SPI make the cut, that could transfer 50MB/s at practical port IO speeds, but may push clock skew management.
Maybe, but I wasn't sure as to any restrictions, limitations, special cases or other considerations that I had to follow to use the SERDES. The App note I found only covered a few of the mode/language combinations. :frown:
SRDES is still a work in progress...
Sideline is a very relative term here. :P
Thanks, that is what I thought, just wanted to be sure.
One of the outcomes of slot sharing, if we get it, is likely to be pressure to not poll the hub in a tight loop unless really needed.
We will likely want to come up with some good non-Hub methods of signaling between cogs so data only needs read from the hub when ready instead of being constantly polled using hub reads.
With apologies to Monty Python...
Because every slot is sacred
Every slot is great
If a slot is wasted
God gets quite irate
C.W.
If a floor support does not make it in silicon, you can always use waits to slow down the polling, to whatever speed you really need.
Many COGs code will be fine will far less than max-rate polling.
Has there been any word on Parallax's various other FPGA initiatives. I have a DE0-nano and I
am going to be lost in the dust if I can't upgrade around the time of the Chip's next release.
One of my concerns, how many of the gurus, kibitzers, experts and other contributors are going to be around when it comes time to write documentation, provide working, clear, well documented examples and app notes and then be technical editors for the features they have fought so hard for. Parallax has limited resources for these types of things that are CRITICAL to a successful launch. EVERYBODY and their brother wants to contribute to the design....once FPGA blobs are out or first run chips, everyone will run off and play and we'll be surrounded by crashing silence!
The manual will be a BEAST and the amount of information for any writer or team of writers to cover will be HUGE!
I had the impression Chip had it designed for half system-clock. It only needs a consistence bit rate. The amount of skew is not important.
How reliable it actually ends up being I guess is another matter but I don't see, the currently non-existent, sync mode being any better really. The reason why sync mode is being requested is for compatibility/flexibility.
Another open contribution and evolution to be encouraged of the forum members ...
You are not alone in your concern. It's really easy to just have fun influencing the design. It is quite another to contribute meaningfully beyond that.
Two proven things that thin the herd: a pack of lionesses or asking for volunteers to help with documentation! (especially if they have a new toy to play with!)
I haven't forgotten about that, David. I'll revisit that at the end of the current hub exec work. It's funny you mentioned this, because it just popped back into my head last night.
There are a lot of instructions, but they distill down to only several categories. Just think of them as friends you haven't met yet, who will all want to help you out.
Hub instructions all stall the pipeline, except when read cache hits occur.
Actually, I don't think the manual needs to be that big. It's the app notes that can go on forever. The manual just needs to explain how it works. What you DO with the chip is the endless matter.
No. There is a 32-bit XCH (exchange) system that can route 32-bits per clock between/among any/all cogs.