Shop OBEX P1 Docs P2 Docs Learn Events
Assembler mnemonics and syntax - Page 3 — Parallax Forums

Assembler mnemonics and syntax

13»

Comments

  • jmg wrote: »
    David Betz wrote: »
    But there is more to the hub than just shared RAM.
    What do you mean ? HUB is 512K Bytes of SRAM, accessible by all COGs, so Shared RAM seems to be quite a good banner description ?

    All shared RAM has rules, so those details can come later. The Streamer also shares the RAM, as do the FIFOs, but none of that conflicts with the 'Shared RAM' banner ?
    Doesn't the hub still have locks?
  • I also think that in all the instructions and Documentations one should replace COG by CORE, if in Instructions this hits the magical 8 tab stop one can replace all COG by COR, same length and more understandable for non P1 people.

    Replacing HUB by SHARED MEMORY or SHARED RESOURCES makes also sense to me, it's not far from the truth, but more common as language.

    The main point are the Smart-Pins. They differentiate the P2 from everything else.

    Don't confuse them any more, let them first swallow the 64 parallel running programmable Pin Sub Systems on top of 8 CORES.

    Enjoy

    Mike
  • Nothing to do with assembler but a comment Mike just made about the 64 I/O. I know we call them smart pins or SmartPins or something but I like to describe them as a "pin peripheral" or as a peripheral per pin, as distinct from a bunch of peripherals that may or may not be able to route to a pin, or use at all.

    Above all, if we write the PC based tools in python I'm sure we would have a wider range of folks who could contribute and maintain this code. It primarily needs to be cli first so that it can be tied into whatever IDEs and GUIs someone comes up with or uses.
  • evanhevanh Posts: 15,126
    Also, "shared memory" is not a hardware term. It refers to software multitasking functionality where the OS allocates common physical RAM for sharing data between tasks/processes/programs in an organised manner.

    HubRAM would just be called "main memory" in general computing. You could call it unified memory but that's a term normally reserved for nonsymmetrical processor arrangements, eg: CPU+GPU. CogRAM is the processor general register set. Obviously programs normally can't be executed from the general registers. LUTRAM would normally be classed as dedicated memory but it also can have executing code.

    We certainly don't want to be doing something like what XMOS did - abusing the term "core". All-in-all, the specific names, HubRAM/CogRAM/LUTRAM, server a useful purpose.

  • David BetzDavid Betz Posts: 14,511
    edited 2018-08-14 23:53
    Nothing to do with assembler but a comment Mike just made about the 64 I/O. I know we call them smart pins or SmartPins or something but I like to describe them as a "pin peripheral" or as a peripheral per pin, as distinct from a bunch of peripherals that may or may not be able to route to a pin, or use at all.

    Above all, if we write the PC based tools in python I'm sure we would have a wider range of folks who could contribute and maintain this code. It primarily needs to be cli first so that it can be tied into whatever IDEs and GUIs someone comes up with or uses.
    You're going to write a C compiler in Python? That would be interesting since Python itself is written in C last I knew. :smile:

    Edit: Maybe you can start from this: https://github.com/ShivamSarodia/ShivyC

  • evanhevanh Posts: 15,126
    edited 2018-08-15 00:03
    Peter,
    Yep, one smartpin is a multifunction peripheral dedicated to one pin. Naming it, in this case smartpin, is a good thing.

    Capitalising, or not, is a matter of preference I guess. I'm easy - cogram, hubram, lutram

  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2018-08-15 00:06
    David Betz wrote: »
    Nothing to do with assembler but a comment Mike just made about the 64 I/O. I know we call them smart pins or SmartPins or something but I like to describe them as a "pin peripheral" or as a peripheral per pin, as distinct from a bunch of peripherals that may or may not be able to route to a pin, or use at all.

    Above all, if we write the PC based tools in python I'm sure we would have a wider range of folks who could contribute and maintain this code. It primarily needs to be cli first so that it can be tied into whatever IDEs and GUIs someone comes up with or uses.
    You're going to write a C compiler in Python? That would be interesting since Python itself is written in C last I knew. :smile:

    This also highlights the immaturity of battling over language supremacy :) German (by way of example only) is a great language for commanding an army with a loud voice but other languages have their advantages. C itself may have been written in an earlier version of C but the early versions of C might have been written in B which was written in BCPL which descended from CPL which descended from ALGOL etc but basically any language can be and are written in assembly. So while assembly is great for low-level stuff, and maybe even for booting up a new language (that can thereafter bootstrap itself), it is a lot of hard "verk".

    Python is easy to learn and use and certainly seems portable. I would never expect to have the P2 run Python, maybe a crippled Python, but what's the point of that?
  • kwinnkwinn Posts: 8,697
    David Betz wrote: »
    kwinn wrote: »
    jmg wrote: »
    ozpropdev wrote: »
    I suggested an open source Python tool chain and/or Pnut with command line support.
    Yes, so I went looking for examples of Python-coded Assemblers, as you ( & others) suggested.
    That Python-hosting-Assembler discussion could be moved into another thread ?


    ozpropdev wrote: »
    Wow!
    It seems we have run off the rails here people.
    Getting back to the heading of "Assembler mnemonics and syntax", so far I think we have suggestions of

    * Change COG to CORE etc, HUB to ?? to 'mainstream' the P2 semantics.
    * Improve the labels and immediate usage to reduce user errors
    * Reduce conflicts with C/GAS
    * macros and conditional assembly (already in some offerings?)

    Maybe a simple table of existing P2 Assembler paths/tools, and what they support, and their source urls, would help ?



    AFAIK what Parallax calls HUB has generally been referred to as SHARED MEMORY or SHARED RAM on computers with more than one CPU. While I have no problem with the current COG/HUB nomenclature, using CORE/SHARED RAM would probably mean more to someone who is not familiar with the Prop.
    But there is more to the hub than just shared RAM.

    Yes, there are more shared resources in HUB than ram, but just calling it HUB is not very descriptive to someone familiar with multi-CPU systems but not with the Prop architecture. The term "HUB" can still be used, but described better, perhaps something like "The HUB contains the resources that are shared by all the cores, and provides the SHARED RAM, CORDIC, STREAMER, FIFO,...".
  • jmgjmg Posts: 15,140
    evanh wrote: »
    Also, "shared memory" is not a hardware term. It refers to software multitasking functionality where the OS allocates common physical RAM for sharing data between tasks/processes/programs in an organised manner.

    ? NXP seem quite happy using it like this, for their multi core microcontrollers :

    Interprocessor communications via shared memory and interrupt events
    Dual core debugging


    likewise ST use terms like "common shared memory"

    Dual-Port memory is certainly shared memory.
  • evanhevanh Posts: 15,126
    All RAM is shared in some way or other.

  • David Betz wrote:
    You're going to write a C compiler in Python? That would be interesting since Python itself is written in C last I knew.
    Heck, if I were into writing a C compiler (I'm not), I'd do it in Perl!

    -Phil
  • jmgjmg Posts: 15,140
    evanh wrote: »
    All RAM is shared in some way or other.

    Not COG.CORE ram - that is purely local to that CORE..

  • evanhevanh Posts: 15,126
    edited 2018-08-15 01:15
    jmg wrote: »
    evanh wrote: »
    All RAM is shared in some way or other.
    Not COG.CORE ram - that is purely local to that CORE..
    ST have arbitrarily chosen to market some as "shared" when it's just an ordinary use. I can do that too - Cogram has multiple intra-cog resources using it: Program counter for one address source, effective addresses for another address source, also direct mode, and of course destination address too. So that's the instruction register, S/D registers, and S/D ALU ports and R port/register also. Six Four obvious sharings for cogram alone.
  • evanhevanh Posts: 15,126
    Actually, the ST document could be interpreted as referring to multitasking allocation definition. Which makes the argument mute.

  • This is going way OT, I'd like to this thread "mainly" staying on topic. We can easily start a new thread and discuss architecture and naming ad infinitum

    @Phil - heck, I'd write it in Forth :)
  • kwinnkwinn Posts: 8,697
    evanh wrote: »
    jmg wrote: »
    evanh wrote: »
    All RAM is shared in some way or other.
    Not COG.CORE ram - that is purely local to that CORE..
    ST have arbitrarily chosen to market some as "shared" when it's just an ordinary use. I can do that too - Cogram has multiple intra-cog resources using it: Program counter for one address source, effective addresses for another address source, also direct mode, and of course destination address too. So that's the instruction register, S/D registers, and S/D ALU ports and R port/register also. Six obvious sharings for cogram alone.

    The main problem is that the term "shared ram" has multiple definitions, not one rigid definition. From a P1 hardware view the cog/core ram is not shared. The CPU uses those registers to perform its functions, but they are not shared with hub or other cogs/cores. I'm not sure since I have not followed all the changes, but on the P2 the core ram may be shared with the streamer and other cores.
  • evanhevanh Posts: 15,126
    edited 2018-08-15 02:29
    kwinn,
    I'm now thinking ST is using the correct meaning.

    I was just demonstrating how anything can be misused if desired. The XMOS naming of "cores" is a good example of doing it wrong.
  • Roy ElthamRoy Eltham Posts: 2,996
    edited 2018-08-15 04:47
    There is not really anything in the instructions that need changing for HUB. Just in the documentation when referring to the hub memory, call it shared memory. When you talk about hubexec, mention that it's execution from shared memory.
    The HUB is shared resources for all the Cores.
  • jmgjmg Posts: 15,140
    Roy Eltham wrote: »
    There is not really anything in the instructions that need changing for HUB. Just in the documentation when referring to the hub memory, call it shared memory. When you talk about hubexec, mention that it's execution from shared memory..

    .. or that HUBEXEC could become SMEXEC ? allowing HUB to be retired almost entirely.

  • kwinnkwinn Posts: 8,697
    evanh wrote: »
    kwinn,
    ......
    I was just demonstrating how anything can be misused if desired. The XMOS naming of "cores" is a good example of doing it wrong.
    I agree to an extent. What XMOS has is what I would consider "hardware assisted threading" or "hardware assisted task switching". It was an ingenious idea and allowed a single core to switch between tasks with virtually no overhead. Granted, each task ran at half speed, but that was still much better than software switching does. They may have called the tasks cores because from a software perspective each task would behave as if it was running on a separate core.
  • Heater.Heater. Posts: 21,230
    kwinn,
    What XMOS has is what I would consider "hardware assisted threading" or "hardware assisted task switching". It was an ingenious idea and allowed a single core to switch between tasks with virtually no overhead. Granted, each task ran at half speed, but that was still much better than software switching does. They may have called the tasks cores because from a software perspective each task would behave as if it was running on a separate core.
    Not quite so simple.

    In the XMOS architecture threads are pipelined/interleaved such that one can have up to 4 threads all running at the same speed as if one only had 1. Each one runs at the maximum MIPS of the machine.

    Starting a 5th thread slows things down a bit as that exceeds the 4 pipeline slots. A 6th thread slows things a bit more. Until one gets to eight threads where they are all running at half the MIPS rating of the machine.

    From a software perspective each thread appears as a separate core. EXCEPT starting and stopping those 5th, 6th, 7th, 8th threads modulates the speed of all the others. The illusion is shattered and timing determinism goes out the window.

    I recall having a lengthy debate with David May on the XMOS forum, until he finally conceded that timing determinism of the XMOS was not quite as advertised.

    For this reason I did not like the way XMOS marketing suddenly started calling threads "cores".

    To be fair though, they did mostly call them "logical cores". And the XMOS devices have all kind of other features to ensure deterministic timing, clocked I/O ports for example. One is not supposed to be timing by instruction counting.





  • Heater. wrote: »
    kwinn,
    What XMOS has is what I would consider "hardware assisted threading" or "hardware assisted task switching". It was an ingenious idea and allowed a single core to switch between tasks with virtually no overhead. Granted, each task ran at half speed, but that was still much better than software switching does. They may have called the tasks cores because from a software perspective each task would behave as if it was running on a separate core.
    Not quite so simple.

    In the XMOS architecture threads are pipelined/interleaved such that one can have up to 4 threads all running at the same speed as if one only had 1. Each one runs at the maximum MIPS of the machine.

    Starting a 5th thread slows things down a bit as that exceeds the 4 pipeline slots. A 6th thread slows things a bit more. Until one gets to eight threads where they are all running at half the MIPS rating of the machine.

    From a software perspective each thread appears as a separate core. EXCEPT starting and stopping those 5th, 6th, 7th, 8th threads modulates the speed of all the others. The illusion is shattered and timing determinism goes out the window.

    I recall having a lengthy debate with David May on the XMOS forum, until he finally conceded that timing determinism of the XMOS was not quite as advertised.

    For this reason I did not like the way XMOS marketing suddenly started calling threads "cores".

    To be fair though, they did mostly call them "logical cores". And the XMOS devices have all kind of other features to ensure deterministic timing, clocked I/O ports for example. One is not supposed to be timing by instruction counting.




    Aren't you being a bit hard on XMOS? It seems to me that no matter how many "cores" you use you still get deterministic timing from a system wide perspective. It's just that you have to consider the system as a whole. You can't just look at each of the cores individually. I guess that isn't true if you have cores starting and stopping all the time but if you always have the same number running then the timing is deterministic.
  • Heater.Heater. Posts: 21,230
    edited 2018-08-15 12:49
    Perhaps. I just did not like the way hardware scheduled threads became "logical cores" and cores became "tiles". It was not originally pitched like that but some marketing wool pulling that came later.

    If you have to consider the system as a whole then each hardware thread does not have deterministic timing. The Propeller does not suffer from that, every COG runs at the same pace all the time no matter what others are doing.

    It is that independent, deterministic timing of the Propeller that makes mixing and matching ones Spin objects with others from OBEX or wherever else so easy. They do not get in each others way timing wise. Unlike systems that use interrupts to juggle multiple things at the same time.

    With XMOS one would normally not have all threads running at the same time. Threads will stop as they wait on timers, I/O pins, communications channels etc. This random stopping and starting of threads will modulate the execution speed of running threads.

    On the other hand, as I said, one is not supposed to count instructions to get the timing right on XMOS, there are hardware facilities to get I/O timing spot on and the compiler can tell you if your code can meet timing constraints.


  • Heater. wrote: »
    Perhaps. I just did not like the way hardware scheduled threads became "logicl cores" and cores became "tiles". It was not originally pitched like that but some marketing wool pulling that came later.
    I agree that their terminology is misleading.

Sign In or Register to comment.