Shop OBEX P1 Docs P2 Docs Learn Events
Why does C code in Prop take so much memory? — Parallax Forums

Why does C code in Prop take so much memory?

homosapienhomosapien Posts: 147
edited 2020-06-02 17:12 in Propeller 1
Awhile back I was working on CNC controller using a P1 chip, programming in C/C++. I was hitting the memory limit of 32K when compiling, so I ported the code to work on an Arduino (which can use various microprocessors that have more available memory), and it only took ~17k of memory when compiled on that platform. The port process was minimal, basically just replacing the Prop-specific function calls to mainstream C calls. I ended up running it on an Arduino Nano, which has about the same amount of memory as P1, and having lots of room in memory left over for further modifications/improvements.

The Arduino does have hardware serial (P1 has to use memory for the actual serial algorithms), but both devices use memory as the serial buffer, so it seems hard to believe that that could be the issue.

Does anyone know why the "equivalent" C code takes so much more memory space on a P1 vs Arduino (AVR ATmega family)?

Comments

  • At least in the case of the atmega328, it probably has a lot to do with the fact that it's 8bit vs the Prop, 32bit (that's assuming the compiler for the avr builds native code?). I don't fully understand how the different memory models in the Prop work for loading C-compiled code, but it also has to fit in cog register RAM (2kB at a time), so I'm sure that has an effect on how the program, or the compiler, is written.
  • Some function in SimpleIDE use a lot of memory so knowing which one can save some space and once referenced some libraries are loaded including code that is not used.

    Using floating point can use up a lot of memory.

    Mike
  • P1 GCC supports multiple memory models with different tradeoffs:
    - Cog: regular 32 bit instructions, limited to 2K, very fast, not very dense.
    - LMM: regular 32 bit instructions, but loaded from Hub RAM with the help of a kernel. quite fast, somewhat less dense than Cog code.
    - CMM: compressed variable-length instructions interpreted by a kernel. Not very fast, but very dense (somewhat similiar to 8 bit microcontrollers, but not as dense as Spin bytecodes)
    - XMM: regular 32 bit instructions, but loaded from some external memory with the help of a kernel. Also not very fast (well, it depends on what memory you use), but large memory can hold a large program.

    IIRC LMM is the default, so memory fills up quickly if your program is non-trivial.
  • File I/O, floating point and formatted prints will increase the size of the executable. You can reduce the size by checking "Simple printf" and "32bit Double" under the compiler options, and "Tiny lib" under linker options. Also use the CMM memory model if program speed is not an issue. If you can post your code we might be able to give you other suggestions on reducing the size of the binary.
  • Not sure if this is part of the problem but I have seen a full 32 bits being dedicated to a single flag.
  • There are several reasons because the same code is bigger on the Propeller than on the Arduino UNO.
    Aside from the different architecture (32 vs. 8 bit code) that by itself may contribute to generate slightly bigger code, the Propeller doesn't have hardware devices and it needs code to implement UART, SPI, I2C, etc. that on an Arduino may requires poking few I/O locations to setup and operate.
    Additionally, any code running on a COG eats up code memory that can't be used for C/C++ code, so it is very easy to use all available memory even for smal programs.

    I don't know which memory model you are using, but try to use CMM (-mcmm), this will generate smaller code at the price of being slower than LMM.

    If you want to know how memory is used, let gcc generate the memory map with -Wl,-Map=memory.map so you can see what is used and maybe do some optimizations.
  • I corrently have no PropGCC installed, so I only can show it in FlexC:
    $ cat minimal-hello-world.c
    main() {
      char *text = "Hello, World!\n";
      while( *text ) _tx(*text++);
    }
    
    $ fastspin -O1 minimal-hello-world.c 
    Propeller Spin/PASM Compiler 'FastSpin' (c) 2011-2020 Total Spectrum Software Inc.
    Version 4.2.0 Compiled on: May 30 2020
    minimal-hello-world.c
    minimal-hello-world.pasm
    Done.
    Program size is 784 bytes
    
    $ spinsim -b minimal-hello-world.binary 
    Hello, World!
    
    784 bytes is not sooo bad for LMM.

    As cog code it is below 256 bytes.

    Know your compiler.
    Know your libs, mostly for knowing what to avoid.
    Know your hardware/board.

    I bet tiny binaries in GCC are possible in a similar way.

    The smallest hello world binary I was able to build in C in Atari-ST days was 92 bytes. Instead of the libs I could directly use the BIOS there for terminal output instead of using the C standard libs. Roughly the same applies to C on CP/M.
  • Thank you for the responses.

    As I said in the original post, this has been a straightforward port from P1 to the Arduino. Its is a CNC (laser) controller, basically a gCode interpret-to-motor-control program, so speed is important. No floating point math, no file operations, no EEprom usage. The only included libraries are simpletools and fdserial.

    Thinking about it, I do think the issue is the 8-bit vs the 32-bit architecture. I had not tried the CMM memory model, I assumed the slower speed would be a problem, I may give it a try (it actually compiles to a much closer size of ~17kB to the Arduino code). Will put it on a machine later today and see what actually happens in terms of movement.

    I kind of drank the Arduino kool-aid after moving the project to that platform. The plentiful, inexpensive hardware, tons of easily available internet information for the AT Mega architecture and variety of chips with different peripherals and memory available has had me like a kid in a candy store with a fistful of cash. :(
  • My 3d printer runs on the Arduino Atmega using the Open source Marlin code. Works great.

    I'm not a fan of ATmega hardware or 8 bit machines.

    Mike
  • homosapien wrote: »
    Thank you for the responses.

    As I said in the original post, this has been a straightforward port from P1 to the Arduino. Its is a CNC (laser) controller, basically a gCode interpret-to-motor-control program, so speed is important. No floating point math, no file operations, no EEprom usage. The only included libraries are simpletools and fdserial.

    Thinking about it, I do think the issue is the 8-bit vs the 32-bit architecture. I had not tried the CMM memory model, I assumed the slower speed would be a problem, I may give it a try (it actually compiles to a much closer size of ~17kB to the Arduino code). Will put it on a machine later today and see what actually happens in terms of movement.

    I kind of drank the Arduino kool-aid after moving the project to that platform. The plentiful, inexpensive hardware, tons of easily available internet information for the AT Mega architecture and variety of chips with different peripherals and memory available has had me like a kid in a candy store with a fistful of cash. :(

    I believe you will find that CMM on the Propeller is similar in speed to the 8-bit ATMega, if not much faster... don't quote me on that though.

    Importantly, the Propeller gives you fcache, allowing the compactness of CMM with the speed of native assembly (not just LMM, but native cog-executed PASM).
    For help with fcache, start with google (search for "fcache site:forums.parallax.com"), then see some examples in PropWare, then come back here with any specific questions.
  • I am also having a problem with large files when I compile and link. I spent a "while" trying to solve random problems that didn't exist and was shocked to learn that the IDE/Propgcc was not warning me about a shortage of memory.

    Does anyone know what the maximum memory that you can use might be?
    Does anyone know how to get a memory map of the program??

    I've read the post about the different memory models above and will try them. The formated printf and fprintf seem to use a gigantic amount of memory.
  • In arduinoland printf is legendary for using lots of program memory. Unless you are formatting a lot of output you are almost always better off doing it manually, converting numbres to strings yourself a digit at a time and padding or adjusting them as necessary for output.
  • To save memory:
    When using C on the P1 I would use either "print" or, if there are no floating point values to be printed, "printi". That will save significant memory over "printf". You can still format the output. Check the simpletext.h library for limitations.

    Also use CMM memory module (although I think that is the default).

    Tom
  • bitnerd wrote: »
    I am also having a problem with large files when I compile and link. I spent a "while" trying to solve random problems that didn't exist and was shocked to learn that the IDE/Propgcc was not warning me about a shortage of memory.

    SimpleIDE writes the code size in the lower-left corner of the Window (code size xxx bytes (yyyy total)) and the build should fail when you exceed the maximum available memory.
    As far as I know, the only "random" problems with a shortage of memory are when using lot of variables on the stack or when allocating memory without checking the return values. What kind of problems you had ?
    Does anyone know what the maximum memory that you can use might be?

    The Propeller has 32k of memory, so 32k minus some amount for the stack that depends on your program. Up to 30k of code and static data (the sizes reported by SimpleIDE) should be safe for any application.
    Does anyone know how to get a memory map of the program??

    Add -Wl,-Map=memory.map to the Other link options in the Linker tab, it will generate a file with the detailed memory map. Don't know if it is possible to automatically use the project name.
  • RossHRossH Posts: 5,344
    Just to throw a curve ball into the mix ...

    Catalina now supports multiple memory models, so your program can be part LMM (not very compact, but quite fast), part CMM (very compact, but not very fast) and part XMM (programs can be very large, but quite slow). So you can mix and match to suit the needs of your application.

    Also, as others have mentioned - avoid using stdio functions (such as printf) if possible. The stdio library is not really suitable when memory is limited. Catalina provides a plethora of alternatives that take far less memory.

    Ross.
  • Many thanks to all who responded. I will try the recommendations.

    I use CMM memory module and random errors start to creep in around 29k, I'm currently at 29,208 and at least it runs. I start to have really serious problems around 29.6k.

    The thought of going to 32k is unimaginable.
  • @bitnerd,

    Yes, using the standard C library to perform I/O functions like printf() will eat your lunch real fast, even while running in CMM.

    I started with SimpleIDE but rapidly outgrew it with my ever expanding program. Internal memory simply wasn't sufficient.

    The only way around this was to switch to external memory mode (XMM) which it no longer supports, even though over the years developers have created various propeller boards that contain it.

    That's when I switched to the Catalina ANSI C compiler. It supports LMM, CMM, and XMM code natively, and contains drivers for many of these XMM equipped boards.

    Or you can create your own XMM board, and Catalina allows you to create custom drivers to make it work. I did this by taking the Parallax USB Project board, adding a couple of 256KB SPI SRAM chips, and replacing the existing 64KB EEPROM with a 256KB one. I wrote my own memory driver in assembly for the SPI SRAMs.

    Catalina also contains extensive documentation, but if you still have some questions you can ask @RossH, the Catalina creator, on this forum. He's helped me immensely in this journey.

    My existing code is about 110KB and growing, and it's working great. I suspect I'll hit 128KB before too much longer.

    As @RossH mentioned above, Catalina now supports multiple memory model functionality. That means you can have your main, or Primary code running in XMM, with all of those large C libraries to perform printf(), scanf(), etc, while CMM can be running on one (or several) cogs internally to perform other operations. There's also a feature where XMM and CMM can share common memory within the 32KB HubRam of the propeller to exchange information between them.

    In my case, I have this:

    One XMM Primary program that generates user menus, sensor monitoring, and stepper motor control instructions. All user menus are placed into Shared Memory for output.

    One CMM program handles all UART serial traffic for Console and Control/Display port. It pulls the XMM generated menus from Shared Memory and sends them to either the Console or Control/Display port (or both), relieving the XMM program of this burden.

    Another CMM program handles all UART I/O traffic to a GPS receiver. It parses the GPS messages, decodes them, and places them within Shared Memory for review by the XMM program.

    Another CMM program handles I2C functions along with stepper motor control.

    Once I get the Wifi radio code perfected, it too will be assigned to a CMM program for management.

    All of these programs are operating simultaneously and independently of each other. The only commonality is the Shared Memory.

    It takes a little bit of time to get accustomed to the extensive capability offered by Catalina, but once you do, it's a joy to work with.

    But if you don't want to take the XMM leap, Catalina offers alternate library functions to perform I/O that are much smaller that the standard stdio ones. You might want to take a look at them to see if they would work for your application.

  • @Wingineer19, it seems you are giving the P1 quite a workout, nice that you got your driver working.

    Enjoy!

    Mike
  • @msrobots,

    Indeed I have.

    When I first started this project I wondered what the Prop1 could really do.

    As I went down my checklist of what it needed to do, my focus started to shift toward the Prop2 because I wasn't confident that the Prop1 was up to the task.

    But with the advent of the Multi Memory Model recently introduced in Catalina, it's clear that many of those checklist items can be performed on the Prop1.

    The only big task left for a CMM cog is to parse radio traffic from the Wifi link and separate RTK GPS corrections from the uplink control commands sent by my control computer.

    Both data types will be multiplexed within the Wifi stream so the CMM cog will have to separate them, provide the RTK data to the GPS receiver, and then forward the uplink command to the XMM program for processing.

    Other tasks like reading temperature sensors, voltage sensors, current sensors, battery management sensors, etc, should be easily handled by the XMM primary program. Stepper motor steering information should also be easily computed by the XMM program which would then provide this data to the CMM stepper motor control program for execution.

    Ultimately I may yet need to move to the Prop2 for project completion, but indications right now are that the Prop1 will likely be able to handle it all.

    I wish a FLiP type module existed that had 512KB EEPROM and 512KB SPI SRAMs installed, but since there isn't one, I will need to stick with my add-on memory board for the existing FLiP, or design a whole new module myself. In that case my next project is to become a KiCAD expert...
  • Cluso99Cluso99 Posts: 18,069
    We use Catalina in a commercial project that has 3 P1 chips.
    One of those P1's has 512KB of SRAM plus an SD card. This P1 runs Catalina and the large star/nebulae/etc databases and instructs another P1 to control a telescope. The third P1 controls a handheld keyboard and 4x20 red LCD display.
  • Cluso99 wrote: »
    We use Catalina in a commercial project that has 3 P1 chips.
    One of those P1's has 512KB of SRAM plus an SD card. This P1 runs Catalina and the large star/nebulae/etc databases and instructs another P1 to control a telescope. The third P1 controls a handheld keyboard and 4x20 red LCD display.
    What memory model are you using with Catalina? Are you running XMM with code in the SRAM?

  • Cluso99Cluso99 Posts: 18,069
    David Betz wrote: »
    Cluso99 wrote: »
    We use Catalina in a commercial project that has 3 P1 chips.
    One of those P1's has 512KB of SRAM plus an SD card. This P1 runs Catalina and the large star/nebulae/etc databases and instructs another P1 to control a telescope. The third P1 controls a handheld keyboard and 4x20 red LCD display.
    What memory model are you using with Catalina? Are you running XMM with code in the SRAM?
    Yes I believe so. I didn’t write the C code.
  • Cluso99 wrote: »
    David Betz wrote: »
    Cluso99 wrote: »
    We use Catalina in a commercial project that has 3 P1 chips.
    One of those P1's has 512KB of SRAM plus an SD card. This P1 runs Catalina and the large star/nebulae/etc databases and instructs another P1 to control a telescope. The third P1 controls a handheld keyboard and 4x20 red LCD display.
    What memory model are you using with Catalina? Are you running XMM with code in the SRAM?
    Yes I believe so. I didn’t write the C code.
    I'm just looking for some indication that XMM was ever used for anything. I'm not sure the PropGCC XMM was used.

  • I appreciate the info on Catalina. Many thanks.
    @bitnerd,

    Yes, using the standard C library to perform I/O functions like printf() will eat your lunch real fast, even while running in CMM.

    I started with SimpleIDE but rapidly outgrew it with my ever expanding program. Internal memory simply wasn't sufficient.

    The only way around this was to switch to external memory mode (XMM) which it no longer supports, even though over the years developers have created various propeller boards that contain it.

    That's when I switched to the Catalina ANSI C compiler. It supports LMM, CMM, and XMM code natively, and contains drivers for many of these XMM equipped boards.

    Or you can create your own XMM board, and Catalina allows you to create custom drivers to make it work. I did this by taking the Parallax USB Project board, adding a couple of 256KB SPI SRAM chips, and replacing the existing 64KB EEPROM with a 256KB one. I wrote my own memory driver in assembly for the SPI SRAMs.

    Catalina also contains extensive documentation, but if you still have some questions you can ask @RossH, the Catalina creator, on this forum. He's helped me immensely in this journey.

    My existing code is about 110KB and growing, and it's working great. I suspect I'll hit 128KB before too much longer.

    As @RossH mentioned above, Catalina now supports multiple memory model functionality. That means you can have your main, or Primary code running in XMM, with all of those large C libraries to perform printf(), scanf(), etc, while CMM can be running on one (or several) cogs internally to perform other operations. There's also a feature where XMM and CMM can share common memory within the 32KB HubRam of the propeller to exchange information between them.

    In my case, I have this:

    One XMM Primary program that generates user menus, sensor monitoring, and stepper motor control instructions. All user menus are placed into Shared Memory for output.

    One CMM program handles all UART serial traffic for Console and Control/Display port. It pulls the XMM generated menus from Shared Memory and sends them to either the Console or Control/Display port (or both), relieving the XMM program of this burden.

    Another CMM program handles all UART I/O traffic to a GPS receiver. It parses the GPS messages, decodes them, and places them within Shared Memory for review by the XMM program.

    Another CMM program handles I2C functions along with stepper motor control.

    Once I get the Wifi radio code perfected, it too will be assigned to a CMM program for management.

    All of these programs are operating simultaneously and independently of each other. The only commonality is the Shared Memory.

    It takes a little bit of time to get accustomed to the extensive capability offered by Catalina, but once you do, it's a joy to work with.

    But if you don't want to take the XMM leap, Catalina offers alternate library functions to perform I/O that are much smaller that the standard stdio ones. You might want to take a look at them to see if they would work for your application.

  • I add the -WI and -Map=memory.map, but I was never able to find any file with a memory map. Where does it put this info ??

    macca wrote: »
    bitnerd wrote: »
    I am also having a problem with large files when I compile and link. I spent a "while" trying to solve random problems that didn't exist and was shocked to learn that the IDE/Propgcc was not warning me about a shortage of memory.

    SimpleIDE writes the code size in the lower-left corner of the Window (code size xxx bytes (yyyy total)) and the build should fail when you exceed the maximum available memory.
    As far as I know, the only "random" problems with a shortage of memory are when using lot of variables on the stack or when allocating memory without checking the return values. What kind of problems you had ?
    Does anyone know what the maximum memory that you can use might be?

    The Propeller has 32k of memory, so 32k minus some amount for the stack that depends on your program. Up to 30k of code and static data (the sizes reported by SimpleIDE) should be safe for any application.
    Does anyone know how to get a memory map of the program??

    Add -Wl,-Map=memory.map to the Other link options in the Linker tab, it will generate a file with the detailed memory map. Don't know if it is possible to automatically use the project name.

  • bitnerd wrote: »
    I add the -WI and -Map=memory.map, but I was never able to find any file with a memory map. Where does it put this info ??

    memory.map or whatever you wrote after -Map=, the folder should be where your project is located.
  • Cluso99Cluso99 Posts: 18,069
    edited 2020-07-13 06:41
    David Betz wrote: »
    Cluso99 wrote: »
    David Betz wrote: »
    Cluso99 wrote: »
    We use Catalina in a commercial project that has 3 P1 chips.
    One of those P1's has 512KB of SRAM plus an SD card. This P1 runs Catalina and the large star/nebulae/etc databases and instructs another P1 to control a telescope. The third P1 controls a handheld keyboard and 4x20 red LCD display.
    What memory model are you using with Catalina? Are you running XMM with code in the SRAM?
    Yes I believe so. I didn’t write the C code.
    I'm just looking for some indication that XMM was ever used for anything. I'm not sure the PropGCC XMM was used.
    Yes. Just confirmed that XMM is being used. The latest binary I have online is 178KB. It uses the RAMBLADE3 switch.
  • Cluso99 wrote: »
    David Betz wrote: »
    Cluso99 wrote: »
    David Betz wrote: »
    Cluso99 wrote: »
    We use Catalina in a commercial project that has 3 P1 chips.
    One of those P1's has 512KB of SRAM plus an SD card. This P1 runs Catalina and the large star/nebulae/etc databases and instructs another P1 to control a telescope. The third P1 controls a handheld keyboard and 4x20 red LCD display.
    What memory model are you using with Catalina? Are you running XMM with code in the SRAM?
    Yes I believe so. I didn’t write the C code.
    I'm just looking for some indication that XMM was ever used for anything. I'm not sure the PropGCC XMM was used.
    Yes. Just confirmed that XMM is being used. The latest binary I have online is 178KB. It uses the RAMBLADE3 switch.
    Cool! Thanks for verifying. It's good to hear that XMM was useful to someone.

Sign In or Register to comment.