But there is more to the hub than just shared RAM.
What do you mean ? HUB is 512K Bytes of SRAM, accessible by all COGs, so Shared RAM seems to be quite a good banner description ?
All shared RAM has rules, so those details can come later. The Streamer also shares the RAM, as do the FIFOs, but none of that conflicts with the 'Shared RAM' banner ?
I also think that in all the instructions and Documentations one should replace COG by CORE, if in Instructions this hits the magical 8 tab stop one can replace all COG by COR, same length and more understandable for non P1 people.
Replacing HUB by SHARED MEMORY or SHARED RESOURCES makes also sense to me, it's not far from the truth, but more common as language.
The main point are the Smart-Pins. They differentiate the P2 from everything else.
Don't confuse them any more, let them first swallow the 64 parallel running programmable Pin Sub Systems on top of 8 CORES.
Nothing to do with assembler but a comment Mike just made about the 64 I/O. I know we call them smart pins or SmartPins or something but I like to describe them as a "pin peripheral" or as a peripheral per pin, as distinct from a bunch of peripherals that may or may not be able to route to a pin, or use at all.
Above all, if we write the PC based tools in python I'm sure we would have a wider range of folks who could contribute and maintain this code. It primarily needs to be cli first so that it can be tied into whatever IDEs and GUIs someone comes up with or uses.
Also, "shared memory" is not a hardware term. It refers to software multitasking functionality where the OS allocates common physical RAM for sharing data between tasks/processes/programs in an organised manner.
HubRAM would just be called "main memory" in general computing. You could call it unified memory but that's a term normally reserved for nonsymmetrical processor arrangements, eg: CPU+GPU. CogRAM is the processor general register set. Obviously programs normally can't be executed from the general registers. LUTRAM would normally be classed as dedicated memory but it also can have executing code.
We certainly don't want to be doing something like what XMOS did - abusing the term "core". All-in-all, the specific names, HubRAM/CogRAM/LUTRAM, server a useful purpose.
Nothing to do with assembler but a comment Mike just made about the 64 I/O. I know we call them smart pins or SmartPins or something but I like to describe them as a "pin peripheral" or as a peripheral per pin, as distinct from a bunch of peripherals that may or may not be able to route to a pin, or use at all.
Above all, if we write the PC based tools in python I'm sure we would have a wider range of folks who could contribute and maintain this code. It primarily needs to be cli first so that it can be tied into whatever IDEs and GUIs someone comes up with or uses.
You're going to write a C compiler in Python? That would be interesting since Python itself is written in C last I knew.
Nothing to do with assembler but a comment Mike just made about the 64 I/O. I know we call them smart pins or SmartPins or something but I like to describe them as a "pin peripheral" or as a peripheral per pin, as distinct from a bunch of peripherals that may or may not be able to route to a pin, or use at all.
Above all, if we write the PC based tools in python I'm sure we would have a wider range of folks who could contribute and maintain this code. It primarily needs to be cli first so that it can be tied into whatever IDEs and GUIs someone comes up with or uses.
You're going to write a C compiler in Python? That would be interesting since Python itself is written in C last I knew.
This also highlights the immaturity of battling over language supremacy German (by way of example only) is a great language for commanding an army with a loud voice but other languages have their advantages. C itself may have been written in an earlier version of C but the early versions of C might have been written in B which was written in BCPL which descended from CPL which descended from ALGOL etc but basically any language can be and are written in assembly. So while assembly is great for low-level stuff, and maybe even for booting up a new language (that can thereafter bootstrap itself), it is a lot of hard "verk".
Python is easy to learn and use and certainly seems portable. I would never expect to have the P2 run Python, maybe a crippled Python, but what's the point of that?
I suggested an open source Python tool chain and/or Pnut with command line support.
Yes, so I went looking for examples of Python-coded Assemblers, as you ( & others) suggested.
That Python-hosting-Assembler discussion could be moved into another thread ?
Wow!
It seems we have run off the rails here people.
Getting back to the heading of "Assembler mnemonics and syntax", so far I think we have suggestions of
* Change COG to CORE etc, HUB to ?? to 'mainstream' the P2 semantics.
* Improve the labels and immediate usage to reduce user errors
* Reduce conflicts with C/GAS
* macros and conditional assembly (already in some offerings?)
Maybe a simple table of existing P2 Assembler paths/tools, and what they support, and their source urls, would help ?
AFAIK what Parallax calls HUB has generally been referred to as SHARED MEMORY or SHARED RAM on computers with more than one CPU. While I have no problem with the current COG/HUB nomenclature, using CORE/SHARED RAM would probably mean more to someone who is not familiar with the Prop.
But there is more to the hub than just shared RAM.
Yes, there are more shared resources in HUB than ram, but just calling it HUB is not very descriptive to someone familiar with multi-CPU systems but not with the Prop architecture. The term "HUB" can still be used, but described better, perhaps something like "The HUB contains the resources that are shared by all the cores, and provides the SHARED RAM, CORDIC, STREAMER, FIFO,...".
Also, "shared memory" is not a hardware term. It refers to software multitasking functionality where the OS allocates common physical RAM for sharing data between tasks/processes/programs in an organised manner.
? NXP seem quite happy using it like this, for their multi core microcontrollers :
Interprocessor communications via shared memory and interrupt events
Dual core debugging
Not COG.CORE ram - that is purely local to that CORE..
ST have arbitrarily chosen to market some as "shared" when it's just an ordinary use. I can do that too - Cogram has multiple intra-cog resources using it: Program counter for one address source, effective addresses for another address source, also direct mode, and of course destination address too. So that's the instruction register, S/D registers, and S/D ALU ports and R port/register also. Six Four obvious sharings for cogram alone.
This is going way OT, I'd like to this thread "mainly" staying on topic. We can easily start a new thread and discuss architecture and naming ad infinitum
Not COG.CORE ram - that is purely local to that CORE..
ST have arbitrarily chosen to market some as "shared" when it's just an ordinary use. I can do that too - Cogram has multiple intra-cog resources using it: Program counter for one address source, effective addresses for another address source, also direct mode, and of course destination address too. So that's the instruction register, S/D registers, and S/D ALU ports and R port/register also. Six obvious sharings for cogram alone.
The main problem is that the term "shared ram" has multiple definitions, not one rigid definition. From a P1 hardware view the cog/core ram is not shared. The CPU uses those registers to perform its functions, but they are not shared with hub or other cogs/cores. I'm not sure since I have not followed all the changes, but on the P2 the core ram may be shared with the streamer and other cores.
There is not really anything in the instructions that need changing for HUB. Just in the documentation when referring to the hub memory, call it shared memory. When you talk about hubexec, mention that it's execution from shared memory.
The HUB is shared resources for all the Cores.
There is not really anything in the instructions that need changing for HUB. Just in the documentation when referring to the hub memory, call it shared memory. When you talk about hubexec, mention that it's execution from shared memory..
.. or that HUBEXEC could become SMEXEC ? allowing HUB to be retired almost entirely.
kwinn,
......
I was just demonstrating how anything can be misused if desired. The XMOS naming of "cores" is a good example of doing it wrong.
I agree to an extent. What XMOS has is what I would consider "hardware assisted threading" or "hardware assisted task switching". It was an ingenious idea and allowed a single core to switch between tasks with virtually no overhead. Granted, each task ran at half speed, but that was still much better than software switching does. They may have called the tasks cores because from a software perspective each task would behave as if it was running on a separate core.
What XMOS has is what I would consider "hardware assisted threading" or "hardware assisted task switching". It was an ingenious idea and allowed a single core to switch between tasks with virtually no overhead. Granted, each task ran at half speed, but that was still much better than software switching does. They may have called the tasks cores because from a software perspective each task would behave as if it was running on a separate core.
Not quite so simple.
In the XMOS architecture threads are pipelined/interleaved such that one can have up to 4 threads all running at the same speed as if one only had 1. Each one runs at the maximum MIPS of the machine.
Starting a 5th thread slows things down a bit as that exceeds the 4 pipeline slots. A 6th thread slows things a bit more. Until one gets to eight threads where they are all running at half the MIPS rating of the machine.
From a software perspective each thread appears as a separate core. EXCEPT starting and stopping those 5th, 6th, 7th, 8th threads modulates the speed of all the others. The illusion is shattered and timing determinism goes out the window.
I recall having a lengthy debate with David May on the XMOS forum, until he finally conceded that timing determinism of the XMOS was not quite as advertised.
For this reason I did not like the way XMOS marketing suddenly started calling threads "cores".
To be fair though, they did mostly call them "logical cores". And the XMOS devices have all kind of other features to ensure deterministic timing, clocked I/O ports for example. One is not supposed to be timing by instruction counting.
What XMOS has is what I would consider "hardware assisted threading" or "hardware assisted task switching". It was an ingenious idea and allowed a single core to switch between tasks with virtually no overhead. Granted, each task ran at half speed, but that was still much better than software switching does. They may have called the tasks cores because from a software perspective each task would behave as if it was running on a separate core.
Not quite so simple.
In the XMOS architecture threads are pipelined/interleaved such that one can have up to 4 threads all running at the same speed as if one only had 1. Each one runs at the maximum MIPS of the machine.
Starting a 5th thread slows things down a bit as that exceeds the 4 pipeline slots. A 6th thread slows things a bit more. Until one gets to eight threads where they are all running at half the MIPS rating of the machine.
From a software perspective each thread appears as a separate core. EXCEPT starting and stopping those 5th, 6th, 7th, 8th threads modulates the speed of all the others. The illusion is shattered and timing determinism goes out the window.
I recall having a lengthy debate with David May on the XMOS forum, until he finally conceded that timing determinism of the XMOS was not quite as advertised.
For this reason I did not like the way XMOS marketing suddenly started calling threads "cores".
To be fair though, they did mostly call them "logical cores". And the XMOS devices have all kind of other features to ensure deterministic timing, clocked I/O ports for example. One is not supposed to be timing by instruction counting.
Aren't you being a bit hard on XMOS? It seems to me that no matter how many "cores" you use you still get deterministic timing from a system wide perspective. It's just that you have to consider the system as a whole. You can't just look at each of the cores individually. I guess that isn't true if you have cores starting and stopping all the time but if you always have the same number running then the timing is deterministic.
Perhaps. I just did not like the way hardware scheduled threads became "logical cores" and cores became "tiles". It was not originally pitched like that but some marketing wool pulling that came later.
If you have to consider the system as a whole then each hardware thread does not have deterministic timing. The Propeller does not suffer from that, every COG runs at the same pace all the time no matter what others are doing.
It is that independent, deterministic timing of the Propeller that makes mixing and matching ones Spin objects with others from OBEX or wherever else so easy. They do not get in each others way timing wise. Unlike systems that use interrupts to juggle multiple things at the same time.
With XMOS one would normally not have all threads running at the same time. Threads will stop as they wait on timers, I/O pins, communications channels etc. This random stopping and starting of threads will modulate the execution speed of running threads.
On the other hand, as I said, one is not supposed to count instructions to get the timing right on XMOS, there are hardware facilities to get I/O timing spot on and the compiler can tell you if your code can meet timing constraints.
Perhaps. I just did not like the way hardware scheduled threads became "logicl cores" and cores became "tiles". It was not originally pitched like that but some marketing wool pulling that came later.
Comments
Replacing HUB by SHARED MEMORY or SHARED RESOURCES makes also sense to me, it's not far from the truth, but more common as language.
The main point are the Smart-Pins. They differentiate the P2 from everything else.
Don't confuse them any more, let them first swallow the 64 parallel running programmable Pin Sub Systems on top of 8 CORES.
Enjoy
Mike
Above all, if we write the PC based tools in python I'm sure we would have a wider range of folks who could contribute and maintain this code. It primarily needs to be cli first so that it can be tied into whatever IDEs and GUIs someone comes up with or uses.
HubRAM would just be called "main memory" in general computing. You could call it unified memory but that's a term normally reserved for nonsymmetrical processor arrangements, eg: CPU+GPU. CogRAM is the processor general register set. Obviously programs normally can't be executed from the general registers. LUTRAM would normally be classed as dedicated memory but it also can have executing code.
We certainly don't want to be doing something like what XMOS did - abusing the term "core". All-in-all, the specific names, HubRAM/CogRAM/LUTRAM, server a useful purpose.
Edit: Maybe you can start from this: https://github.com/ShivamSarodia/ShivyC
Yep, one smartpin is a multifunction peripheral dedicated to one pin. Naming it, in this case smartpin, is a good thing.
Capitalising, or not, is a matter of preference I guess. I'm easy - cogram, hubram, lutram
This also highlights the immaturity of battling over language supremacy German (by way of example only) is a great language for commanding an army with a loud voice but other languages have their advantages. C itself may have been written in an earlier version of C but the early versions of C might have been written in B which was written in BCPL which descended from CPL which descended from ALGOL etc but basically any language can be and are written in assembly. So while assembly is great for low-level stuff, and maybe even for booting up a new language (that can thereafter bootstrap itself), it is a lot of hard "verk".
Python is easy to learn and use and certainly seems portable. I would never expect to have the P2 run Python, maybe a crippled Python, but what's the point of that?
Yes, there are more shared resources in HUB than ram, but just calling it HUB is not very descriptive to someone familiar with multi-CPU systems but not with the Prop architecture. The term "HUB" can still be used, but described better, perhaps something like "The HUB contains the resources that are shared by all the cores, and provides the SHARED RAM, CORDIC, STREAMER, FIFO,...".
? NXP seem quite happy using it like this, for their multi core microcontrollers :
Interprocessor communications via shared memory and interrupt events
Dual core debugging
likewise ST use terms like "common shared memory"
Dual-Port memory is certainly shared memory.
-Phil
Not COG.CORE ram - that is purely local to that CORE..
@Phil - heck, I'd write it in Forth
The main problem is that the term "shared ram" has multiple definitions, not one rigid definition. From a P1 hardware view the cog/core ram is not shared. The CPU uses those registers to perform its functions, but they are not shared with hub or other cogs/cores. I'm not sure since I have not followed all the changes, but on the P2 the core ram may be shared with the streamer and other cores.
I'm now thinking ST is using the correct meaning.
I was just demonstrating how anything can be misused if desired. The XMOS naming of "cores" is a good example of doing it wrong.
The HUB is shared resources for all the Cores.
.. or that HUBEXEC could become SMEXEC ? allowing HUB to be retired almost entirely.
In the XMOS architecture threads are pipelined/interleaved such that one can have up to 4 threads all running at the same speed as if one only had 1. Each one runs at the maximum MIPS of the machine.
Starting a 5th thread slows things down a bit as that exceeds the 4 pipeline slots. A 6th thread slows things a bit more. Until one gets to eight threads where they are all running at half the MIPS rating of the machine.
From a software perspective each thread appears as a separate core. EXCEPT starting and stopping those 5th, 6th, 7th, 8th threads modulates the speed of all the others. The illusion is shattered and timing determinism goes out the window.
I recall having a lengthy debate with David May on the XMOS forum, until he finally conceded that timing determinism of the XMOS was not quite as advertised.
For this reason I did not like the way XMOS marketing suddenly started calling threads "cores".
To be fair though, they did mostly call them "logical cores". And the XMOS devices have all kind of other features to ensure deterministic timing, clocked I/O ports for example. One is not supposed to be timing by instruction counting.
If you have to consider the system as a whole then each hardware thread does not have deterministic timing. The Propeller does not suffer from that, every COG runs at the same pace all the time no matter what others are doing.
It is that independent, deterministic timing of the Propeller that makes mixing and matching ones Spin objects with others from OBEX or wherever else so easy. They do not get in each others way timing wise. Unlike systems that use interrupts to juggle multiple things at the same time.
With XMOS one would normally not have all threads running at the same time. Threads will stop as they wait on timers, I/O pins, communications channels etc. This random stopping and starting of threads will modulate the execution speed of running threads.
On the other hand, as I said, one is not supposed to count instructions to get the timing right on XMOS, there are hardware facilities to get I/O timing spot on and the compiler can tell you if your code can meet timing constraints.