Shop OBEX P1 Docs P2 Docs Learn Events
Large / Virtual Memory Models for the Propeller — Parallax Forums

Large / Virtual Memory Models for the Propeller

PainlessPainless Posts: 69
edited 2009-07-16 05:00 in Propeller 1
I'm still very new (and addicted) to the propeller but certainly not new to computers and programming. I cut my computing teeth on a ZX81 and Spectrum 48k at the tender age of 11 (my 40th birthday is coming up) and have delved into many areas of computer and programming from Dos to Linux as well as IBM Mainframes with interest.

I've just been browsing the propeller wiki, expanding my knowledge of this fantastically addictive little device and came across the page on Large Memory Models. This brought back some memories of the early virtual memory models for IBM mainframes that wanted to overcome it's limited memory addressing.

I was very interested in the 4 instruction method of running assembler instructions in cog memory by fetching them from hub memory:

nxt    rdlong  instr,pc
       add     pc,#4
instr  nop                ' placeholder!
       jmp     nxt



...is this the kind of approach that is becoming a standard for the propeller (if indeed one exists yet?)? I noted that several implementations, although some of them seem more commercial, are in the works.

The main point of this long rambling post is that I'm wondering if anyone has tackled the large / virtual memory subject with the method of page swapping? I.E. A pre-processor (on the PC/MAC) splits the assembler into, say, 384 byte pages with a main table for reference data pertaining to jumps and data references etc. On the propeller, code could be included in the cog ram to handle any instructions that require the page in the cog to be swapped with another. Such a model would allow pretty much any program storage medium to be used for assembler code, be it hub ram, external ram or even disk or SD / flash media.

...or am I totally off base here or missing something in the propeller which would make this either unachievable or horrendously inefficient?

Russ.

Comments

  • Bill HenningBill Henning Posts: 6,445
    edited 2009-07-15 22:18
    Welcome!

    When I came up with LMM a bit over two years ago I considered swapping 256 long chunks of COG memory instead - it can be pretty efficient if you use DJNZ's counter as the cog address for the RDLONG's (I think Phil suggested that) - however I decided (for now) to go with the short loop, and explicit FCACHE blocks.

    Why?

    Most loops fit within 128 longs, and paging 256 longs could easily lead to a lot of page trashing. I bet on a slightly unrolled (4 way) VM loop, and explicit use of FCACHE being faster most of the time.

    Recently I released a preview release of my Las assembler that makes writing LMM code reasonably painless, as it pretty much hides all the differences between LMM and regular PASM.

    As far as extensions for implementing virtual memory, it is quite possible, and actually fairly easy with LMM - just have a small page table in hub memory, and keep a small working set in the hub - however right now I can put 7.5MB of memory into my Morpheus system, which is LOTS for quite a while.

    You can download Las at the link in my signature.

    Bill
    Painless said...
    I'm still very new (and addicted) to the propeller but certainly not new to computers and programming. I cut my computing teeth on a ZX81 and Spectrum 48k at the tender age of 11 (my 40th birthday is coming up) and have delved into many areas of computer and programming from Dos to Linux as well as IBM Mainframes with interest.

    I've just been browsing the propeller wiki, expanding my knowledge of this fantastically addictive little device and came across the page on Large Memory Models. This brought back some memories of the early virtual memory models for IBM mainframes that wanted to overcome it's limited memory addressing.

    I was very interested in the 4 instruction method of running assembler instructions in cog memory by fetching them from hub memory:

    nxt    rdlong  instr,pc
           add     pc,#4
    instr  nop                ' placeholder!
           jmp     nxt
    



    ...is this the kind of approach that is becoming a standard for the propeller (if indeed one exists yet?)? I noted that several implementations, although some of them seem more commercial, are in the works.

    The main point of this long rambling post is that I'm wondering if anyone has tackled the large / virtual memory subject with the method of page swapping? I.E. A pre-processor (on the PC/MAC) splits the assembler into, say, 384 byte pages with a main table for reference data pertaining to jumps and data references etc. On the propeller, code could be included in the cog ram to handle any instructions that require the page in the cog to be swapped with another. Such a model would allow pretty much any program storage medium to be used for assembler code, be it hub ram, external ram or even disk or SD / flash media.

    ...or am I totally off base here or missing something in the propeller which would make this either unachievable or horrendously inefficient?

    Russ.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com - my site 6.250MHz custom Crystals for running Propellers at 100MHz
    Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
    Morpheus & Mem+ Advanced dual Propeller SBC with XMM and 256 Color VGA
    Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full

    Post Edited (Bill Henning) : 7/15/2009 10:58:40 PM GMT
  • PainlessPainless Posts: 69
    edited 2009-07-15 22:35
    Hello Bill!

    Thanks for your reply, what you describe certainly makes a lot of sense as a starting point with unrolling increasing efficiency also. I'm wondering if, down the line, implementing a swapping system with much smaller pages (say 32 or 64 longs) would yield a good increase in performance. More than one target area for bringing the pages into cog ram could be utilized, allowing for code 'caching' (perhaps even with a way for the programmer to specify certain areas of code to be 'pinned'). With a pre-processing split into even small pages, small jumps could be left as-is where they exist only inside their own page leading to some decent speed increases.

    This is so interesting. I find myself delving into so many different areas of experimentation with the propeller that it's almost getting frustrating, I need to focus on one or two 'projects' and stick with it!

    Russ.
  • Bill HenningBill Henning Posts: 6,445
    edited 2009-07-15 23:02
    Hi Russ,

    Why don't you try it? I believe that the approach I chose (using unrolled loop for the "glue"/"main" code between small loops/functions that loop, with FCACHE loading and executing 2-256 longs starting at $80) maintains a good balance between simplicity and speed, however there may indeed be faster approaches.

    I'd love to see how it works out.

    Yes, it is very fascinating - so much so that I ended up designing a dual Propeller based computer with extended memory and a full expansion bus so I (and others) could better play with it [noparse]:)[/noparse]
    Painless said...
    Hello Bill!

    Thanks for your reply, what you describe certainly makes a lot of sense as a starting point with unrolling increasing efficiency also. I'm wondering if, down the line, implementing a swapping system with much smaller pages (say 32 or 64 longs) would yield a good increase in performance. More than one target area for bringing the pages into cog ram could be utilized, allowing for code 'caching' (perhaps even with a way for the programmer to specify certain areas of code to be 'pinned'). With a pre-processing split into even small pages, small jumps could be left as-is where they exist only inside their own page leading to some decent speed increases.

    This is so interesting. I find myself delving into so many different areas of experimentation with the propeller that it's almost getting frustrating, I need to focus on one or two 'projects' and stick with it!

    Russ.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com - my site 6.250MHz custom Crystals for running Propellers at 100MHz
    Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
    Morpheus & Mem+ Advanced dual Propeller SBC with XMM and 256 Color VGA
    Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
  • PainlessPainless Posts: 69
    edited 2009-07-15 23:11
    Bill,

    I've just started reading about your Morpheus system, it looks amazing! I think I really need to invest more time on the standard setup first though before I delve into the more complicated. I need to take some time to learn the standard PASM, it's been a LONG time (no pun intended) since I did any assembler (hint: my last assembler program was Z80A) that I'm sure I will just need to start afresh. Spin seems, so far, to be a pretty capable language although it's lack of string manipulation function is a little frustrating. I suppose I've gotten too used to string slicing functions.

    Once I've become a lot more familiar with PASM I definitely want to learn a lot more about your LMM system.

    Good luck with Morpheus and Largos!

    Russ.
  • Bill HenningBill Henning Posts: 6,445
    edited 2009-07-16 00:22
    Hi Russ,

    Thank you for the kind words smile.gif I had a blast designing it, and believe it or not, I am even having fun documenting it!

    Starting with Spin & PASM is a great idea - once you grow used to it, it is a killer combination for most applications, and yes, string handling is fairly poor.

    Thanks,

    Bill
    Painless said...
    Bill,

    I've just started reading about your Morpheus system, it looks amazing! I think I really need to invest more time on the standard setup first though before I delve into the more complicated. I need to take some time to learn the standard PASM, it's been a LONG time (no pun intended) since I did any assembler (hint: my last assembler program was Z80A) that I'm sure I will just need to start afresh. Spin seems, so far, to be a pretty capable language although it's lack of string manipulation function is a little frustrating. I suppose I've gotten too used to string slicing functions.

    Once I've become a lot more familiar with PASM I definitely want to learn a lot more about your LMM system.

    Good luck with Morpheus and Largos!

    Russ.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com - my site 6.250MHz custom Crystals for running Propellers at 100MHz
    Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
    Morpheus & Mem+ Advanced dual Propeller SBC with XMM and 256 Color VGA
    Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
  • RossHRossH Posts: 5,336
    edited 2009-07-16 00:41
    Hi Russ,

    If you're more familiar with C, Catalina also uses an LMM model (based on Bill's original work). See my signature line for a link.

    As Bill implies, simplicity is the key here. Even the FCACHE mechanism is difficult to use effectively unless you are programming directly in PASM. Catalina doesn't use FCACHE because I am hoping someone will eventually come up with a better general purpose VM mechanism - but this has not happened yet!

    (BTW I believe the Imagecraft C compiler may implement FCACHE - you may also want to have a look at that)

    There are an infinite number of paging and VM techniques that could be implemented on the Prop - but the problem with most of them is that their complexity blows the limits of what can be achieved in a single cog (which also has to run an LMM kernel) and the speed loss of having to communicate between multiple cogs both complicates things and also slows things down too much.

    But this is definitely a "hot topic" with various people working on it, so by all means have a go.

    Ross.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Catalina - a FREE C compiler for the Propeller - see Catalina
  • Bill HenningBill Henning Posts: 6,445
    edited 2009-07-16 00:50
    FCACHE can be used simply in a couple of cases with huge benefit:

    while (expression) {
    ...
    }

    for(;[noparse];)[/noparse] {
    ...
    }

    Example 1:

    char *d, *s;
    ...
    while (*d++ = *s++);
    
    



    Example 2:

    int i,j,arr(100)(100);
    ...
    for(i=0;i<100;i++)
      for(j=0;j<100;j++)  /* ofcourse an optimizer should move the invariant i*100 out of the inner loop */
        arr(i)(j)=i*100+j;  /* sorry had to use wrong brackets */
    
    



    As long as there are no function calls in the code, just compile as an FCACHE block starting at cog address $80. It should fit fine in the 128 long limit (which can be stretched to 256 if you don't use DCACHE)

    if there is one, or possibly two, function calls, and in-lining the functions will still fit in the FCACHE limit, inline them.

    Never put in-line non-looping code into FCACHE, ends up slower than letting it run as regular LMM

    Given how often short pointer and other loops are used in C, this will result in a HUGE speedup.
    RossH said...
    Hi Russ,

    If you're more familiar with C, Catalina also uses an LMM model (based on Bill's original work). See my signature line for a link.

    As Bill implies, simplicity is the key here. Even the FCACHE mechanism is difficult to use effectively unless you are programming directly in PASM. Catalina doesn't use FCACHE because I am hoping someone will eventually come up with a better general purpose VM mechanism - but this has not happened yet!

    (BTW I believe the Imagecraft C compiler may implement FCACHE - you may also want to have a look at that)

    There are an infinite number of paging and VM techniques that could be implemented on the Prop - but the problem with most of them is that their complexity blows the limits of what can be achieved in a single cog (which also has to run an LMM kernel) and the speed loss of having to communicate between multiple cogs both complicates things and also slows things down too much.

    But this is definitely a "hot topic" with various people working on it, so by all means have a go.

    Ross.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com - my site 6.250MHz custom Crystals for running Propellers at 100MHz
    Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
    Morpheus & Mem+ Advanced dual Propeller SBC with XMM and 256 Color VGA
    Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full

    Post Edited (Bill Henning) : 7/16/2009 1:01:42 AM GMT
  • RossHRossH Posts: 5,336
    edited 2009-07-16 01:55
    Hi Bill,

    Yes, I agree - I meant it was difficult to use from a compiler-writer's perspective [noparse]:)[/noparse]

    I do intend having a go at implementing something in Catalina when I get time - the reason I haven't done so yet is twofold:

    1. The major savings are for small 'leaf' functions which make no external calls. But when speed really is important there are usually better ways to do things rather than relying on the compiler to optimize these for you. For instance - a better solution for the first example (copying bytes in memory) is to use the ANSI 'memcpy' function - which is normally implemented directly in assembly anyway for speed (it currently isn't in Catalina, but it's on my list of things to do!). The second example (initializing an array with something other than a constant value) would probably only be done once during program initialization - if it has to be done over and over then it would be better to build one initialized version and (again) use memcpy to reinitialize each time after that - although admittedly you may have to manually re-iniitalize each time if memory space is very tight.

    2. I am already struggling to fit everything I need in my LMM Kernel (especially when I include XMM support code). Dedicating 128 longs to an FCACHE block that only gets used in cases like those described above currently seems to be a poor tradeoff.

    However ... if we restrict candidates for FCACHE further by saying that they must make no global references at all (i.e. they can only refer to local varaibles) then the same space used for FCACHE could also be used for paging - and then it becomes very much more attractive. I'm working on a version of Catalina that would support this - i.e. it uses a separate memory space for local variables (which will always be in Hub RAM) versus global variables (which will always be in XMM RAM).

    Ross.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Catalina - a FREE C compiler for the Propeller - see Catalina
  • Bill HenningBill Henning Posts: 6,445
    edited 2009-07-16 03:18
    Sounds great!

    The examples I gave were deliberately contrived and trivial, but slightly more complicated versions are often used for:

    - initializing data structures
    - inverting matrices
    - walking lists
    - walking trees etc
    - drawing lines on a bitmap
    - etc etc etc
    - whole leaf functions, as long as they loop

    Straight non-looping code is not worth fcaching.

    Best,

    Bill
    RossH said...
    Hi Bill,

    Yes, I agree - I meant it was difficult to use from a compiler-writer's perspective [noparse]:)[/noparse]

    I do intend having a go at implementing something in Catalina when I get time - the reason I haven't done so yet is twofold:

    1. The major savings are for small 'leaf' functions which make no external calls. But when speed really is important there are usually better ways to do things rather than relying on the compiler to optimize these for you. For instance - a better solution for the first example (copying bytes in memory) is to use the ANSI 'memcpy' function - which is normally implemented directly in assembly anyway for speed (it currently isn't in Catalina, but it's on my list of things to do!). The second example (initializing an array with something other than a constant value) would probably only be done once during program initialization - if it has to be done over and over then it would be better to build one initialized version and (again) use memcpy to reinitialize each time after that - although admittedly you may have to manually re-iniitalize each time if memory space is very tight.

    2. I am already struggling to fit everything I need in my LMM Kernel (especially when I include XMM support code). Dedicating 128 longs to an FCACHE block that only gets used in cases like those described above currently seems to be a poor tradeoff.

    However ... if we restrict candidates for FCACHE further by saying that they must make no global references at all (i.e. they can only refer to local varaibles) then the same space used for FCACHE could also be used for paging - and then it becomes very much more attractive. I'm working on a version of Catalina that would support this - i.e. it uses a separate memory space for local variables (which will always be in Hub RAM) versus global variables (which will always be in XMM RAM).

    Ross.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com - my site 6.250MHz custom Crystals for running Propellers at 100MHz
    Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
    Morpheus & Mem+ Advanced dual Propeller SBC with XMM and 256 Color VGA
    Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
  • Cluso99Cluso99 Posts: 18,066
    edited 2009-07-16 03:31
    @Painless:
    If you follow my link to the tools thread you will find links to my Zero footprint debugger which uses a similar loop to the one you posted for LMM. It resides in the shaddow registers of $1F0...
    You will also find my Overlay Loader which loads pasm code fast from hub. The length of the overlay is variable.
    Appropriate acknowledgements are in my code and threads.

    I had to develop both of these while writing a faster Spin Interpreter. In ZiCog Heater uses the overlay loading and also the decoding method in my faster spin.

    ZiCog is a Z80 emulator and we are running CPM2.2 and CPM3 although CPM3 has a keyboard input problem which I have not had the time to debug.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBladeProp, RamBlade, TwinBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: Micros eg Altair, and Terminals eg VT100 (Index) ZiCog (Z80), MoCog (6809)
    · Search the Propeller forums (via Google)
    My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
  • PainlessPainless Posts: 69
    edited 2009-07-16 03:36
    Hi RossH!

    I've been wanting to check out your Catalina C compiler for some time, but haven't gotten around to it yet. I've used C under Linux on and off and have found it to be a powerful, if frustrating to learn, language. Did you ever hear the joke about the constipated C programmer?.... he couldn't pass a parameter... Ok... that was bad.... I'll get my coat.

    Ross & Bill,

    Please don't think that I started this thread to be critical of your work with LMM, far from it. It was more from a standpoint of interest than anything else, I'm still very unfamiliar with the inner workings of the propeller and just starting to dive into it's assembler code, in short, I'm not qualified to make any definite statements about what is best and what isn't.

    Thanks for the information you've both shared.

    Russ.
  • RossHRossH Posts: 5,336
    edited 2009-07-16 04:01
    Hi Russ,

    No worries. We didn't intend to hijack your thread. And I'm sure none of us read any criticism into your posts. No-one in these forums claims to be the font of all wisdom on the Propeller - part of the fun is being surprised when someone new comes along and does something with the Prop that no-one else had thought of. I've been proven wrong several times in these forums - mostly when I said I didn't think something was possible on the Propeller. I should learn not to say it!

    Ross.

    P.S. Yes (bad jokes aside!) C can be a frustrating language, and it can be comletely impenetrable when badly written. But it's still worth learning since it's easily the most portable language ever invented - it runs natively on just about any computer chip ever made.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Catalina - a FREE C compiler for the Propeller - see Catalina
  • Bill HenningBill Henning Posts: 6,445
    edited 2009-07-16 05:00
    No worries, I did not see any criticism!
    Painless said...
    Hi RossH!

    I've been wanting to check out your Catalina C compiler for some time, but haven't gotten around to it yet. I've used C under Linux on and off and have found it to be a powerful, if frustrating to learn, language. Did you ever hear the joke about the constipated C programmer?.... he couldn't pass a parameter... Ok... that was bad.... I'll get my coat.

    Ross & Bill,

    Please don't think that I started this thread to be critical of your work with LMM, far from it. It was more from a standpoint of interest than anything else, I'm still very unfamiliar with the inner workings of the propeller and just starting to dive into it's assembler code, in short, I'm not qualified to make any definite statements about what is best and what isn't.

    Thanks for the information you've both shared.

    Russ.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com - my site 6.250MHz custom Crystals for running Propellers at 100MHz
    Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
    Morpheus & Mem+ Advanced dual Propeller SBC with XMM and 256 Color VGA
    Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Sign In or Register to comment.