There are no more multi-personality wait/loop instructions, though there probably ought to be for WAITVID, so that other tasks can continue when one is doing a WAITVID.
Please clarify. Does this mean that that a task cannot do a WAITXXX instruction on a pin or CNT without blocking all tasks? i.e. tasks have to poll for pin and time events?
I guess we are OK with that, it's what FullDuplexSerial does in the P1 now.
Eh, where did all this SONOS thing come from? As far as I know it's not on the table as OnSemi don't offer it. Not to mention it's yet another huge change in plans that we don't need at this point.
Please clarify. Does this mean that that a task cannot do a WAITXXX instruction on a pin or CNT without blocking all tasks? i.e. tasks have to poll for pin and time events?
I guess we are OK with that, it's what FullDuplexSerial does in the P1 now.
WAITxxx instructions will stall the cog, so they are not useful for multitasking programs. They wouldn't make sense in multitasking, anyway, because what they are looking for may come and go before the task gets another chance to check for the WAITxxx target condition. Tasks will need to code those checks in other ways.
Just an idea: if there is more cog ram than can be addressed by 9 bits, it can be reached by pointers: "mov reg, ptr" and ptr can address 32 bits. Or: whenever ptr exceeds core space, it is redirected to hub memory. ErNa
... if there is more cog ram than can be addressed by 9 bits,...
There isn't. The COG does not have RAM.The COG only has registers. At least on a P1. Hopefully on a P2 as well.
496 registers is huge by comparison to any other processor you can buy.
And as a bonus you can use those registers to cache instructions you want to run fast. Can't do that on any other processor I know of.
See, how naming and describing things differently changes the entire expectation.
...it can be reached by pointers: "mov reg, ptr"...
That instruction looks awfully like a RDLONG or WRLONG except working into some RAM space that is not shared. Would it actually be much more performant than using HUB RAM with RDLONG/WRLONG or using hubexec?
Just an idea: if there is more cog ram than can be addressed by 9 bits, it can be reached by pointers: "mov reg, ptr" and ptr can address 32 bits. Or: whenever ptr exceeds core space, it is redirected to hub memory. ErNa
This could easily be done using INDx. At which point, you could actually do the following:
Increase the COG registers to 1024.
Increase INDA, INDB, and PC to 10 bits.
All addresses over 511 are only accessible via INDA/INDB (this might make an argument for a few more INDx registers).
An obvious usage pattern would be to load the program into the upper 512 bytes, then use the lower 512 just for data. And, this also has the advantage that even more cog registers could be added without any changes to compiled code!
>You can bet SONOS is patented and requires $ per chip.
You license it and they will provide the expertise, of course they need to get paid.
But only 2 extra mask vs 7 for flash, the manufacturing cost is maybe something Parallax could afford? (and if OnSemi can license and fab it)
Having 16 Insta-Core with Smart Insta-IO, will sound good for the marketing department.
Having some type of Non-Volatile storage no matter how little without relaying on external ICs is what the Prop badly needs.
On the Prop the Sonos are not addressable,
It's just copied to the Cog at power up, there should be a hubop that copies a cog back to sonos (5ms to erase and 2ms to write)
If the 2 extra masks can just be put on top of sram for fast parallel loading, it should simply design.
That's an important differentiator Heater. Heck, if we are going to tweak on the lingo, that's worth some discussion, and would help people better understand the addressing differences they find on a Prop.
You see, if you tell someone this new fangled micro-controller has 512KB of RAM, usable for code or data, and 16 thirty two bit processors running at however many MIPS they might be quite interested. Especially if they want lots of pins and analogy stuff.
You might throw in the fact that the processors have 496 registers. At that point they might fall of their chairs given that every other CPU they have met only has a handful of registers. People, especially compiler writers, are always complaining there are not enough registers. Well now they have them.
Oh, and by the way, you can run code from those registers directly for extra boost and fine grained real-time control. At that point you might be getting open mouthed stares of amazement and disbelief.
See?
That is such a different story than the old P1. It runs code from COG RAM, but that's really small, you have to have some interpreter, Spin, in order to make bigger programs, but that's horrible slow. Oh you can use C but the LMM interpreter makes very big code.
Totally agreed. I've got the better part of an essay done about this. Spent some time with peers as I said I would. After talking things through, we came to the conclusion there is no net gain for Parallax, US, and Propeller in general, to be had by "being like the other guy"
There is a basic conflict between education needs, marketing, and our overall discussion here. Unifying that around what the other guys do isn't good for us at all.
What we need to do is compartmentalize these things, maximize them, and present as needed. I'm in full agreement with you on how to "spin" our distinctive features. Embrace this, don't dilute it!
SETEVENT pin,#event ' rising edge, falling edge, statechange etc etc etc, could be 9 pin mask so any of 9 types of events
POLLPIN event, pin wc ' ask pin for last event, could optionally loop in place (task, cog) until it gets an event, clears event from smart pin for that cog, D is set to event sent from the pin
Basically, have a smart pin send a message back, releasing cog/task to continue. While cog is waiting, it only uses quiescent current.
WAITxxx instructions will stall the cog, so they are not useful for multitasking programs. They wouldn't make sense in multitasking, anyway, because what they are looking for may come and go before the task gets another chance to check for the WAITxxx target condition. Tasks will need to code those checks in other ways.
That is such a different story than the old P1. It runs code from COG RAM, but that's really small, you have to have some interpreter, Spin, in order to make bigger programs, but that's horrible slow. Oh you can use C but the LMM interpreter makes very big code.
And the C CMM interpreter makes smaller code. LMM code will not be smaller on the new chip, but the new chip will have more memory. LMM will be faster than the old LMM.
Totally agreed. I've got the better part of an essay done about this. Spent some time with peers as I said I would. After talking things through, we came to the conclusion there is no net gain for Parallax, US, and Propeller in general, to be had by "being like the other guy"
I agree mainly because of the cost (cash and opportunity) for Parallax to be like the other guy. Industry standards would always need to be met for example, and Parallax usually foo-foos such things as not being important. It would take a lot of effort for Parallax to understand and implement current industry standards, so it's better to just do what they have always done: I.E. Whatever suits Parallax is good enough for the customers they have or may expect to get.
I've got the better part of an essay done about this.
Oh my God. It takes me the better part of a day and ten cups of tea to read some of your posts here. A whole essay might be too much
I agree, "being like the other guy" or claiming to be is a waste of effort. The other guys do that "other guy" thing much better and cheaper. Luckily a Propeller is not like them.
1) The Prop II with it's speed, memory size and hubexec, is not confined like the P1 it is a radically different beast despite being very similar. The "spin" should be very different from the P1.
2) Use terms people can identify with. Things they want. And tell them we have lots of it!
What we need to do is compartmentalize these things, maximize them, and present as needed. I'm in full agreement with you on how to "spin" our distinctive features. Embrace this, don't dilute it!
Exactly.
The terms need to be 'standard' so that people get it at first mention THEN those terms can be used to describe how unique the Propeller is. That way your discussion is, right from the start, about the cool features of the Propeller rather than a long vocabulary lesson so they have any clue as to what you are saying.
Your description of:
Multi-core processor with 512K ram and 16 - 32bit cores each with equal access to all 64 smart I/O and 512 32bit registers each that can be used for data, pointers or local program cache for fast execution of code as needed. Shared access to RAM allows core to core communication and data sharing.
Really is something that people would GET the first time it is heard. Now the conversation turns to how they can leverage these amazing capabilities to their advantage.
For the new user, the first experience programming a prop needs to be as familiar to what they are used to doing as possible.
The term "HUBEXEC" should be dropped as that should be the 'default' mode as it would be what many people would use to start. Then you can get into using the 512 registers to run cached code for driver optimization as an advanced programming tool. If the tool chain is setup from the beginning to do what we know as HUBEXEC as default then programming the Prop would feel very familiar to any programmer.
Again it is all about TERMS and how they create impressions when first heard. Someone new to the Prop hearing "and you do this with HUBEXEC mode..." would instantly think "oh, great, another learning curve...".
They should be able to open SPIN or GCC for the first time and write a standard template example program and have it just work. They don't need to know it is technically running in what is called HUBEXEC mode. The performance option would be to select 'compile to cache code'. The guys that really want to maximize the thing will dive in and read the back end docks that get into the details of how the cogs run code and hand code PASM.
That's a very good point and follows on from HUB becoming just "memory".
+1
Yes, HUBEXEC is a feature most other controllers have too (sort of), but they don't have to call it that. Reiterating what Kerry S. said: it is the default mode of operation.
Propeller Chip may get less eccentric in spite of itself.
Propeller Chip may get less eccentric in spite of itself.
Reminds me of an episode of "Yes Prime Minister"...
James Hacker: Eccentricity can be a virtue.
Sir Humphrey: If you call it individualism.
Bernard Woolley: That's one of those irregular verbs, isn't it. I have an independent mind, you are an eccentric, he is round the twist.
The terms need to be 'standard' so that people get it at first mention THEN those terms can be used to describe how unique the Propeller is. That way your discussion is, right from the start, about the cool features of the Propeller rather than a long vocabulary lesson so they have any clue as to what you are saying.
Your description of:
Multi-core processor with 512K ram and 16 - 32bit cores each with equal access to all 64 smart I/O and 512 32bit registers each that can be used for data, pointers or local program cache for fast execution of code as needed. Shared access to RAM allows core to core communication and data sharing.
Really is something that people would GET the first time it is heard. Now the conversation turns to how they can leverage these amazing capabilities to their advantage.
For the new user, the first experience programming a prop needs to be as familiar to what they are used to doing as possible.
The term "HUBEXEC" should be dropped as that should be the 'default' mode as it would be what many people would use to start. Then you can get into using the 512 registers to run cached code for driver optimization as an advanced programming tool. If the tool chain is setup from the beginning to do what we know as HUBEXEC as default then programming the Prop would feel very familiar to any programmer.
Again it is all about TERMS and how they create impressions when first heard. Someone new to the Prop hearing "and you do this with HUBEXEC mode..." would instantly think "oh, great, another learning curve...".
They should be able to open SPIN or GCC for the first time and write a standard template example program and have it just work. They don't need to know it is technically running in what is called HUBEXEC mode. The performance option would be to select 'compile to cache code'. The guys that really want to maximize the thing will dive in and read the back end docks that get into the details of how the cogs run code and hand code PASM.
SETEVENT pin,#event ' rising edge, falling edge, statechange etc etc etc, could be 9 pin mask so any of 9 types of events
POLLPIN event, pin wc ' ask pin for last event, could optionally loop in place (task, cog) until it gets an event, clears event from smart pin for that cog, D is set to event sent from the pin
Basically, have a smart pin send a message back, releasing cog/task to continue. While cog is waiting, it only uses quiescent current.
Task can't miss event it was looking for.
This probably should be in the Pin thread, as it relates to Pin-smarts ? (and this thread is diverted on semantics)
I think you are asking for the pin to manage the WAIT and have a sticky flag the tasking COG can check ?
eg setting a Counter to -1 and waiting on Overflow, would wait on any INC condition on the counter (H,L,_/=,=\_ etc_)
The latency would be in setting up the Pin Cell to do what you want.
That would avoid missing events, but it also loses the phase-locked-release nature of WAIT.
I guess a single task keeps that and if you really need that phase-locked-release, you dedicated a 1 task COG.
In most cases, the 4 Cy sampling window in the task would be ok ?
Could be another case for the more parallel bus access to the pin Cells ?
You see, if you tell someone this new fangled micro-controller has 512KB of RAM, usable for code or data, and 16 thirty two bit processors running at however many MIPS they might be quite interested. Especially if they want lots of pins and analogy stuff.
You might throw in the fact that the processors have 496 registers. At that point they might fall of their chairs given that every other CPU they have met only has a handful of registers. People, especially compiler writers, are always complaining there are not enough registers. Well now they have them.
Oh, and by the way, you can run code from those registers directly for extra boost and fine grained real-time control. At that point you might be getting open mouthed stares of amazement and disbelief.
See?
That is such a different story than the old P1. It runs code from COG RAM, but that's really small, you have to have some interpreter, Spin, in order to make bigger programs, but that's horrible slow. Oh you can use C but the LMM interpreter makes very big code.
potatohead:
Don't forget to list all the intelligent peripherals that the prop can have. List them all, and that they can be on (almost) any pins. Then say they are soft and emulated by the cores, so there is no limitation to what mix of peripherals you can have. (this is a problem with P1 - prospective customers don't see the big list of peripherals that those other chips have)
And don't forget to have Parallax "Gold Standard" code available in one easy to find spot with a App Notes. Code that is known, understood and supported by Parallax FEs. There can be optional, experimental and derived versions in OBEX but not the standard Parallax soft peripherals.
potatohead:
Don't forget to list all the intelligent peripherals that the prop can have. List them all, and that they can be on (almost) any pins. Then say they are soft and emulated by the cores, so there is no limitation to what mix of peripherals you can have. (this is a problem with P1 - prospective customers don't see the big list of peripherals that those other chips have)
Totally agree. In addition to building the list, Parallax (IMHO) must decide if it officially supports that list or not as it becomes kind of an SDK for the chip. With other chips what gets between the real world and the programmer's code is the chip (with all it's included HW peripherals), on the Propeller you have to include an additional layer (SW peripherals) which you didn't develop and which have variations in style and also bring a certain degree of uncertainty. A warranted support from the chip manufacturer (rather than the unpaid community) erases that uncertainty.
Now lets put ourselves in the shoes of the newcomer. It's kind of similar to the open-source community, which is free but implies hidden costs. When you buy a Propeller did you pay already for that support/warranty or is it up to you the developer to fill/clear that uncertainty?
And the C CMM interpreter makes smaller code. LMM code will not be smaller on the new chip, but the new chip will have more memory. LMM will be faster than the old LMM.
Will there even be a need for LMM at all now that we can execute code directly from hub memory? What I'd like to see is a hybrid hub execution and CMM mode where some code can be compiled to CMM opcodes if it isn't time critical and some can be in native PASM. We couldn't mix LMM and CMM before because both kernels wouldn't fit in COG memory at the same time but we should be able to have a CMM kernel in COG memory (or even hub memory) and use it along with native code running from either hub memory or COG memory or both. (Eric will probably kill me for saying that though since it might be complex to support in the compiler!)
This only changes the compiler, which now must recognize #LO, #HI, #XOR, #OFF, #Z, #NZ, #C, #NC as reserved names, but simplifies it down to one instruction.
This reduces the instruction count (theoretically) and simplifies the manual.
The only change to the existing instruction coding is I have rearranged the lower 3 bits of "S" (mainly to permit #0 and #1 for CLR and SET).
Will there even be a need for LMM at all now that we can execute code directly from hub memory?
Technically LMM is not the right word. The point is that LMM used 32 bit instructions, and there will be little difference in size of code in for the same programs between LMM and hubexec/native/default whatever it's called. CMM will probably still be a valid thing to have. Given the amount of head-room we may end up with more interesting things too. Mixed code models though? Isn't that a bit bizarre? Seems rather complicated - maybe worth a white paper.
Technically LMM is not the right word. The point is that LMM used 32 bit instructions, and there will be little difference in size of code in for the same programs between LMM and hubexec/native/default whatever it's called. CMM will probably still be a valid thing to have. Given the amount of head-room we may end up with more interesting things too. Mixed code models though? Isn't that a bit bizarre? Seems rather complicated - maybe worth a white paper.
As long as you have the ability to hook external RAM onto the Propeller, there will be an LMM mode.
The Hubexec mode is an intermediate mode between normal cog execution and LMM. Maybe it should be called HMM.
Comments
I guess we are OK with that, it's what FullDuplexSerial does in the P1 now.
WAITxxx instructions will stall the cog, so they are not useful for multitasking programs. They wouldn't make sense in multitasking, anyway, because what they are looking for may come and go before the task gets another chance to check for the WAITxxx target condition. Tasks will need to code those checks in other ways.
496 registers is huge by comparison to any other processor you can buy.
And as a bonus you can use those registers to cache instructions you want to run fast. Can't do that on any other processor I know of.
See, how naming and describing things differently changes the entire expectation.
That instruction looks awfully like a RDLONG or WRLONG except working into some RAM space that is not shared. Would it actually be much more performant than using HUB RAM with RDLONG/WRLONG or using hubexec?
.
And, run fast deterministically at that.
This could easily be done using INDx. At which point, you could actually do the following:
An obvious usage pattern would be to load the program into the upper 512 bytes, then use the lower 512 just for data. And, this also has the advantage that even more cog registers could be added without any changes to compiled code!
You license it and they will provide the expertise, of course they need to get paid.
But only 2 extra mask vs 7 for flash, the manufacturing cost is maybe something Parallax could afford? (and if OnSemi can license and fab it)
Having 16 Insta-Core with Smart Insta-IO, will sound good for the marketing department.
Having some type of Non-Volatile storage no matter how little without relaying on external ICs is what the Prop badly needs.
On the Prop the Sonos are not addressable,
It's just copied to the Cog at power up, there should be a hubop that copies a cog back to sonos (5ms to erase and 2ms to write)
If the 2 extra masks can just be put on top of sram for fast parallel loading, it should simply design.
That's an important differentiator Heater. Heck, if we are going to tweak on the lingo, that's worth some discussion, and would help people better understand the addressing differences they find on a Prop.
"Tweaking" on the lingo. Yep, very important.
You see, if you tell someone this new fangled micro-controller has 512KB of RAM, usable for code or data, and 16 thirty two bit processors running at however many MIPS they might be quite interested. Especially if they want lots of pins and analogy stuff.
You might throw in the fact that the processors have 496 registers. At that point they might fall of their chairs given that every other CPU they have met only has a handful of registers. People, especially compiler writers, are always complaining there are not enough registers. Well now they have them.
Oh, and by the way, you can run code from those registers directly for extra boost and fine grained real-time control. At that point you might be getting open mouthed stares of amazement and disbelief.
See?
That is such a different story than the old P1. It runs code from COG RAM, but that's really small, you have to have some interpreter, Spin, in order to make bigger programs, but that's horrible slow. Oh you can use C but the LMM interpreter makes very big code.
There is a basic conflict between education needs, marketing, and our overall discussion here. Unifying that around what the other guys do isn't good for us at all.
What we need to do is compartmentalize these things, maximize them, and present as needed. I'm in full agreement with you on how to "spin" our distinctive features. Embrace this, don't dilute it!
SETEVENT pin,#event ' rising edge, falling edge, statechange etc etc etc, could be 9 pin mask so any of 9 types of events
POLLPIN event, pin wc ' ask pin for last event, could optionally loop in place (task, cog) until it gets an event, clears event from smart pin for that cog, D is set to event sent from the pin
Basically, have a smart pin send a message back, releasing cog/task to continue. While cog is waiting, it only uses quiescent current.
Task can't miss event it was looking for.
And the C CMM interpreter makes smaller code. LMM code will not be smaller on the new chip, but the new chip will have more memory. LMM will be faster than the old LMM.
I agree, "being like the other guy" or claiming to be is a waste of effort. The other guys do that "other guy" thing much better and cheaper. Luckily a Propeller is not like them. Exactly. My only points really are:
1) The Prop II with it's speed, memory size and hubexec, is not confined like the P1 it is a radically different beast despite being very similar. The "spin" should be very different from the P1.
2) Use terms people can identify with. Things they want. And tell them we have lots of it!
Exactly.
The terms need to be 'standard' so that people get it at first mention THEN those terms can be used to describe how unique the Propeller is. That way your discussion is, right from the start, about the cool features of the Propeller rather than a long vocabulary lesson so they have any clue as to what you are saying.
Your description of:
Multi-core processor with 512K ram and 16 - 32bit cores each with equal access to all 64 smart I/O and 512 32bit registers each that can be used for data, pointers or local program cache for fast execution of code as needed. Shared access to RAM allows core to core communication and data sharing.
Really is something that people would GET the first time it is heard. Now the conversation turns to how they can leverage these amazing capabilities to their advantage.
For the new user, the first experience programming a prop needs to be as familiar to what they are used to doing as possible.
The term "HUBEXEC" should be dropped as that should be the 'default' mode as it would be what many people would use to start. Then you can get into using the 512 registers to run cached code for driver optimization as an advanced programming tool. If the tool chain is setup from the beginning to do what we know as HUBEXEC as default then programming the Prop would feel very familiar to any programmer.
Again it is all about TERMS and how they create impressions when first heard. Someone new to the Prop hearing "and you do this with HUBEXEC mode..." would instantly think "oh, great, another learning curve...".
They should be able to open SPIN or GCC for the first time and write a standard template example program and have it just work. They don't need to know it is technically running in what is called HUBEXEC mode. The performance option would be to select 'compile to cache code'. The guys that really want to maximize the thing will dive in and read the back end docks that get into the details of how the cogs run code and hand code PASM.
That's a very good point and follows on from HUB becoming just "memory".
+1
Yes, HUBEXEC is a feature most other controllers have too (sort of), but they don't have to call it that. Reiterating what Kerry S. said: it is the default mode of operation.
Propeller Chip may get less eccentric in spite of itself.
Reminds me of an episode of "Yes Prime Minister"...
+1
Alex
This probably should be in the Pin thread, as it relates to Pin-smarts ? (and this thread is diverted on semantics)
I think you are asking for the pin to manage the WAIT and have a sticky flag the tasking COG can check ?
eg setting a Counter to -1 and waiting on Overflow, would wait on any INC condition on the counter (H,L,_/=,=\_ etc_)
The latency would be in setting up the Pin Cell to do what you want.
That would avoid missing events, but it also loses the phase-locked-release nature of WAIT.
I guess a single task keeps that and if you really need that phase-locked-release, you dedicated a 1 task COG.
In most cases, the 4 Cy sampling window in the task would be ok ?
Could be another case for the more parallel bus access to the pin Cells ?
Don't forget to list all the intelligent peripherals that the prop can have. List them all, and that they can be on (almost) any pins. Then say they are soft and emulated by the cores, so there is no limitation to what mix of peripherals you can have. (this is a problem with P1 - prospective customers don't see the big list of peripherals that those other chips have)
Totally agree. In addition to building the list, Parallax (IMHO) must decide if it officially supports that list or not as it becomes kind of an SDK for the chip. With other chips what gets between the real world and the programmer's code is the chip (with all it's included HW peripherals), on the Propeller you have to include an additional layer (SW peripherals) which you didn't develop and which have variations in style and also bring a certain degree of uncertainty. A warranted support from the chip manufacturer (rather than the unpaid community) erases that uncertainty.
Now lets put ourselves in the shoes of the newcomer. It's kind of similar to the open-source community, which is free but implies hidden costs. When you buy a Propeller did you pay already for that support/warranty or is it up to you the developer to fill/clear that uncertainty?
Alex
Note the fix to REP (not fixed in the attached file)
become...
This only changes the compiler, which now must recognize #LO, #HI, #XOR, #OFF, #Z, #NZ, #C, #NC as reserved names, but simplifies it down to one instruction.
This reduces the instruction count (theoretically) and simplifies the manual.
The only change to the existing instruction coding is I have rearranged the lower 3 bits of "S" (mainly to permit #0 and #1 for CLR and SET).
Thoughts anyone???
As long as you have the ability to hook external RAM onto the Propeller, there will be an LMM mode.
The Hubexec mode is an intermediate mode between normal cog execution and LMM. Maybe it should be called HMM.
Ross.