Is This Possible Yet? Efficient Use of HUB RAM

A few years ago the answer to this question was 'NO'. I wonder if things have changed?

Using a P1 programmed in 'C'.

At boot 32k is automagically loaded from the external memory into HUB memory. Eight 2k (max) blocks get loaded into the 8 COGs which start doing their own thing using COG RAM for storage etc. Can the full 32k of HUB, as it is now no longer needed by the COGs or boot process, be used as a contiguous block of RAM?

Comments

  • 13 Comments sorted by Date Added Votes
  • Just to be clear...

    the COGs are running in COG mode, loaded with a C function (or 2k of assembly).
  • A few years ago the answer to this question was 'NO'. I wonder if things have changed?

    Using a P1 programmed in 'C'.

    At boot 32k is automagically loaded from the external memory into HUB memory. Eight 2k (max) blocks get loaded into the 8 COGs which start doing their own thing using COG RAM for storage etc. Can the full 32k of HUB, as it is now no longer needed by the COGs or boot process, be used as a contiguous block of RAM?

    The P1 hardware has not changed, so the same limitations apply. Not sure if cog0 can be reloaded with PASM code after the spin interpreter loads the other 7 cogs, but afaik all 8 cogs can access all 32K of hub ram.
    Just to be clear...

    the COGs are running in COG mode, loaded with a C function (or 2k of assembly).

    The cogs only have the one mode where they execute the P1 machine language instructions. Any "C function" would be converted to PASM by the C compiler, loaded into hub ram, and then to cog ram.
    In science there is no authority. There is only experiment.
    Life is unpredictable. Eat dessert first.
  • kwinn wrote: »
    The P1 hardware has not changed, so the same limitations apply.

    When I originally asked the question the limitation was the software. So, I guess the question is "can prop-gcc do this?"
  • From what i've read, you might want to look into overlays. It might be possible to overlay the boot sections.
  • Yes, of course you can write a C program that loads PASM code into all 8 cogs, and then use the hub RAM for data. It's a little trickier to create COG C programs that don't require hub RAM, but it can be done. Normally, a COG C program uses some hub RAM for the stack, but if you're careful you can avoid using the stack.
  • Dave Hein wrote: »
    Normally, a COG C program uses some hub RAM for the stack, but if you're careful you can avoid using the stack.

    Why can't they run with a local stack?

    It seems crazy that a multi-core processor can't run all its cores totally stand-alone and independently of each other.

  • By way of an (admittedly contrived) example...
    volatile unsigned char buffer1[2048];
    ...total of 32k of buffers...
    volatile unsigned char buffer16[2048]

    void func1()
    {
    while(1)
    {
    for (n=0; n<2048; n++)
    {
    buffer1[n] = buffer2[n+1];
    //other things going on up to 512 words of code less space for registers and local stack.
    }
    }
    }

    ...total of 8 functions...

    void func8()
    {
    while(1)
    {
    for (n=0; n<2048; n++)
    {
    buffer15[n] = buffer16[n+1];
    //other things going on up to 512 words of code less space for registers and local stack.
    }
    }
    }

    int main()
    {
    int *cog1 = cog_run(func1, 128);
    ...total of 8 functions...
    int *cog8 = cog_run(func8, 128);
    }
  • Dave Hein wrote: »
    Normally, a COG C program uses some hub RAM for the stack, but if you're careful you can avoid using the stack.

    Why can't they run with a local stack?

    It seems crazy that a multi-core processor can't run all its cores totally stand-alone and independently of each other.
    There are no COG instructions for maintaining a stack efficiently.

  • David Betz wrote: »
    There are no COG instructions for maintaining a stack efficiently.

    Thanks David. I'd overlooked the lack of addressing modes.
  • Also, there is a limited amount of cog RAM available to use for a stack. It's certainly possible to implement a way to address cog RAM with self-modifying code. So it would be possible to have a cog mode that uses a cog stack. However, given the limited amount of cog RAM it's probably better to use the native mode to avoid using the stack. This makes for faster and more compact code, which is usually the reason to use cog mode in the first place.
  • Dave Hein wrote: »
    It's certainly possible to implement a way to address cog RAM with self-modifying code. So it would be possible to have a cog mode that uses a cog stack.
    Yes, of course that would be possible. However, that would result in very poor code density and, as you mention, the COG has a limited amount of memory.

  • You need a 2K Hub RAM buffer for the cog image in order to start a cog, after which that buffer isn't needed any more. Spin was horribly inefficient about this leaving those buffers tied up forever by default and there were schemes to get around it, usually by reusing PASM images for data or video buffers. Matching the size of the images to the needed buffers was a bit tricky but there are some OBEX objects that do this. While a lot of the original OBEX objects provide for stopping and restarting cogs, meaning you might need the image again one day, in practice cogs are almost never stopped once started and the PASM image can be discarded.

    In C, I think it is more straightforward to make a buffer, load the PASM image from some outside source like high EEPROM, then reuse the buffer for something else. But the basic principle would be similar.
  • There is code in a later version of PropGCC than the one that comes with SimpleIDE that can load COG images from EEPROM. It still requires a single 2K buffer in hub memory but it can load any number of COGs using that one buffer. The propeller-load program also knows how to write the COG images into EEPROM.
Sign In or Register to comment.