Shop OBEX P1 Docs P2 Docs Learn Events
The Great printf() Challenge — Parallax Forums

The Great printf() Challenge

Wingineer19Wingineer19 Posts: 279
edited 2023-09-13 02:04 in C/C++

I originally posted this code under the Catalina - ANSI C for the Propeller 1 & 2 category to have RossH take a look at it. But I think it's important enough to warrant a wider examination by the C programming community here.

As we all know, one of the fastest ways to consume memory on the Prop1 is to use the printf() library function. I've looked around the Web and found some pretty good rewrites for use with microcontrollers which drastically reduced the required memory footprint. Unfortunately, the ones I came across don't support floating point which is something I need for my applications.

Last year I decided to attempt my own rewrite by starting from scratch, building upon what others have done, but adding my own tweaks and refinements. I called it ConPrint to distinguish it from the library printf() function.

This version does support floating point but the technique I use to do it results in some loss of precision. It would be really nice if that can be fixed without bloating the code. So, I've decided to freely post it here on the Forum to see what you C programming gurus can do to make it better, smaller, more efficient, and more accurate.

I'm not really a programmer but I have stayed in a Holiday Inn Express (for those of you who remember their commercials from years ago). Hence, I don't document my code. If I had to suffer writing it, then so do those trying to decipher it :smile:

I'm afraid the only prize I have to offer is my gratitude and the admiration of others who marvel at your programming skills.

Anyway, here it is:

//Segment Is Conprint.c
//Attempts To Replace The Library printf() function -- but in a much smaller package

#include <stdarg.h>
#include <math.h>

#define ln10  2.302585093

char ConTxDStr[256];

char *int2asc(char *InStr,unsigned int value,char sign,char wide,char uplow,char base)
{
 unsigned int temp=value;
 unsigned int remain;
 char diff;
 char deep=1;
 char *iptr=InStr;
 if((base < 2) || (base > 16)) return InStr;
 if(sign != 0) *iptr++ = sign;
 if(value > 0) deep=(char) (log(value) / log(base)) + 1;
 iptr=iptr + deep;
 for(diff=deep; diff<wide; diff++) iptr++;
 InStr=iptr;
 *iptr-- = 0;
 do
  {
   temp=value;
   value=value / base;
   remain=temp - value*base;
   if(remain < 10) *iptr-- = remain + '0';  
   else *iptr-- = remain - 10 + ((uplow == 'X') ? 'A': 'a');
  }while(value);
 for(diff=deep; diff<wide; diff++) *iptr-- = '0';
 return(InStr);
}

float ipowten(short x)
{
 return(exp(x * ln10));
}

char *eftoa(char *InStr,float fval,short wide,short precise,char sign,char type)
{
 char *fptr=InStr;
 char fsign;
 float fexpo;
 float emult;
 float pmult;
 short expo;
 short index;
 short ival;
 short point=0;
 fsign=(fval < 0.0) ? -1 : (fval > 0.0) ? 1 : 0;
 switch(fsign)
  {
   case 0:  expo=0;
            *fptr++='0';
            *fptr++='.';
            for(index=0; index<precise; index++) *fptr++ = '0';
            if(type == 'e') goto makex;
            break; 
   case -1: fval=-fval;
            sign='-';
   case 1:  if(sign != 0) *fptr++ = sign;
            fexpo=log10(fval);
            expo=(short) fexpo;
            if(fexpo < 0.0) expo=expo - 1;           
            emult=ipowten(expo);
            pmult=ipowten(precise);
            fval=fval / emult;
            if(emult > 1.0) pmult=pmult * emult;
            fval=(fval * pmult + 0.5) / pmult;
            switch(type)
             {
              case 'f':  if(expo > 0)
                          {
                           precise=precise + expo;
                           point=expo;
                          }
                         else
                          {
                           fval=fval * emult;
                           expo=0;
                          }                        
                         for(index=0; index<(wide - expo - 1); index++) *fptr++ = '0';
              case 'e':  for(index=0; index<=precise; index++)
                          {
                           ival=(short) fval;
                           *fptr++ = '0' + ival;
                           fval=10.0 * (fval - (float) ival);
                           if(index == point) *fptr++ = '.';
                          }
                         if(type != 'e') break;
makex:                   *fptr++='E';
                         if(expo < 0)
                          {
                           *fptr++='-';
                           expo=-expo;
                          }
                         else *fptr++='+';
                         *fptr++=(expo / 10) + '0';
                         *fptr++=expo - (expo/10)*10 + '0';
                         break;
             }
           break;
  } 
 *fptr=0;
 return(fptr);
}

char *strprint(char *result,char *InStr,va_list valist)
{
 char *optr=result;
 char *nptr;
 char *sptr;
 char val;
 char sign=0;
 char flag=0;
 int wide=0;
 int deep=0;
 int sval;
 unsigned int uval;
 float fval;
 while((val=*InStr++) != 0x00)
  {
   switch(flag)
    {
     case 0:  switch(val)
               {
                case '%': wide=0;
                          flag=1;
                          break;
                default:  sign=0;
                          *optr++=val;
                          *optr=0;
                          break;
               }
              break;
     default: nptr=optr;
              switch(val)
               {
                case '%': *optr++=val;
                          *optr=0;
                          break;
                case '+': sign='+';
                          break;
                case 'b': uval=va_arg(valist,unsigned int);
                          optr=int2asc(optr,uval,0,wide,0,2);
                          break;
                case 'o': uval=va_arg(valist,unsigned int);
                          optr=int2asc(optr,uval,0,wide,0,8);
                          break;
                case 'c': val=(char) va_arg(valist,int);
                          *optr++=val;
                          *optr=0;
                          break;
                case 'd': sval=va_arg(valist,int);
                          if(sval < 0)
                           {
                            uval=-sval;
                            sign='-';
                           }
                          else uval=sval;
                          optr=int2asc(optr,uval,sign,wide,0,10);
                          break;
                case 'E': val='e';
                case 'e':
                case 'f': fval=(float) va_arg(valist,double);
                          if(fval < 0.0)
                           {
                            fval=-fval;
                            sign='-';
                           }
                          optr=eftoa(optr,fval,wide,deep,sign,val);
                          break;
                case 's': sptr=va_arg(valist,char*);
                          do
                           {
                            *optr++ = *sptr++;
                           }
                          while(*sptr != 0);
                          break;
                case 'u': uval=va_arg(valist,unsigned int);
                          optr=int2asc(optr,uval,0,wide,0,10);
                          break;
                case 'x':
                case 'X': uval=va_arg(valist,unsigned int);
                          optr=int2asc(optr,uval,0,wide,val,16);
                          break;
                case '.': deep=0;
                          flag=2;
                          break;
                default:  if(val < '0') break;
                          if(val > '9') break;
                          if(flag == 1) wide=wide * 10 + (val - '0');
                          else deep=deep * 10 + (val - '0');
                          break;
               }
              if(nptr == optr) break;
              nptr=optr;
              flag=0;
              break;
     }
   }
 va_end(valist);
 *optr=0;
 return(result);
}

void ConPrint(char *format,...)
{
 char *txd=ConTxDStr;
 va_list ptr;
 va_start(ptr,format);
 txd=strprint(txd,format,ptr);
 va_end(ptr);
 s4_str(0,ConTxDStr);
}

Comments

  • Float is bloat, as they say. The obvious first to re-evaluate why you need floats in the first place. There's a good chance you don't and you've just been tricked by floats being the only kind of fractional number directly supported in C. On the same note, question whether emulating the printf-esque format strings really is a good idea. It's usually more economical to just have individual functions to print certain items. This removes format string parsing bloat, but can blow up the call sites a bit more.

    Other obvious spots:

       case -1: fval=-fval;
                sign='-';
       case 1:  if(sign != 0) *fptr++ = sign;
    

    toss the sign variable, you can just write the sign entirely inside the -1 case.

    Also, the whole sign thing is kinda wrongly construed because there are separate +0 and -0 values that need to be distinguished.


    I'm not sure how this is handled in Catalina (does it even have double?), but in general you *always* want to suffix single-precision floating point constants with f (i.e. 3.14f instead of 3.14) since otherwise the C standard mandates that the intermediate calculation is blown up to double precision, possibly introducing two float<>double conversions. Here's an example of this happening with GCC on x86: https://godbolt.org/z/aM6Podrr1


    Also, it is generally considered a bad idea to do string functions with unlimited length. Basically a bug waiting to happen. For printing to the console you can obviously just call the send function each time (and it might actually be faster due to not blocking on the tx ring buffer). This also eliminates whatever buffer memory you'd need to hold the string.


    Sorry, too late around here, will not think about the actual printing algorithm after midnight, strange things would happen.

  • @Wuerfel_21 said:
    Sorry, too late around here, will not think about the actual printing algorithm after midnight, strange things would happen.

    Sleep well and wishing you very pleasant dreams about printf() :smile:

  • RossHRossH Posts: 5,353
    edited 2023-09-13 07:05

    @Wuerfel_21 said:

    I'm not sure how this is handled in Catalina (does it even have double?)

    In Catalina a double is the same as a float. This is permitted by the C standards, which are pretty loose (the main requirement for double is that it not be smaller than a float - it does not need to be larger).

    I agree that floating point tends to bloat code, which is why Catalina provides two versions of the standard libraries - one that excludes floating point support, which makes them much smaller.

    Ross.

  • @RossH said:
    In Catalina a double is the same as a float. This is permitted by the C standards, which are pretty loose (the main requirement for double is that it not be smaller than a float - it does not need to be larger).

    Yes, in the C standard double is allowed to be the same size as a float, but only if it meets some precision requirements that cannot be satisfied in 32 bits. Specifically, doubles must have at least 10 decimal digits of precision (requiring 34 bits for the mantissa). So making both float and double be, for example, 48 bits would satisfy the standard, but making them both 32 bits does not.

    In practice most microcontroller users may not care.

  • RossHRossH Posts: 5,353

    @ersmith said:

    @RossH said:
    In Catalina a double is the same as a float. This is permitted by the C standards, which are pretty loose (the main requirement for double is that it not be smaller than a float - it does not need to be larger).

    Yes, in the C standard double is allowed to be the same size as a float, but only if it meets some precision requirements that cannot be satisfied in 32 bits. Specifically, doubles must have at least 10 decimal digits of precision (requiring 34 bits for the mantissa). So making both float and double be, for example, 48 bits would satisfy the standard, but making them both 32 bits does not.

    In practice most microcontroller users may not care.

    Yes, I think you are correct. Certainly about most users not caring!

    However, it is reasonably trivial to add a double type to Catalina (LCC already supports both doubles and long doubles) and the quickest and easiest thing to do would be to convert the basic float type to 8 bytes just to satisfy the standard. But the floating point libraries would be entirely software-based (i.e. no plugin support!) and therefore might be too large and too slow to be of much practical use - at least on the P1. I still have an initial implementation from the early days of Catalina, so I can give it a brush up and see if I can remember why I decided on 32 bits only. I believe the main reason was that the original Spin floating point plugins (written by Cam Thomson for the Propeller 1) were only 32 bits, and I needed to use those to keep the code size small enough and the speed fast enough to make floating point at all useful.

    I had pretty much run out of things to add to Catalina, so thanks ... I guess ... :smiley:

    Ross.

  • Getting back to the original topic: the key part of accurately printing floating point numbers is calculating 10**N for various integer values of N. If your library's pow() function is written very carefully you could use that, but for 32 bit floats is probably easier / safer to use a table of precalculated values. Here's a short program to print floating point numbers. It's not quite perfect (printing floats is hard) but it's much better than the naive divide by 10.0 approach. It should work on any C compiler with 32 bit IEEE floats.

    /*
    // code to print floats
    // Written by Eric R. Smith, placed in the public domain
    */
    
    #include <stdio.h>
    #include <stdint.h>
    
    #define DOUBLE_BITS 23
    #define MAX_DEC_DIGITS 9
    #define DOUBLE_ONE ((unsigned)(1<<DOUBLE_BITS))
    
    #define DOUBLE_MASK (DOUBLE_ONE-1)
    
    typedef float FTYPE;
    typedef uint32_t UITYPE;
    typedef union D_OR_I {
        FTYPE d;
        UITYPE i;
    } D_OR_I;
    
    static float powten_tab[] = {
                      1e-37, 1e-36, 1e-35, 1e-34, 1e-33, 1e-32, 1e-31, 1e-30,
        1e-29, 1e-28, 1e-27, 1e-26, 1e-25, 1e-24, 1e-23, 1e-22, 1e-21, 1e-20,
        1e-19, 1e-18, 1e-17, 1e-16, 1e-15, 1e-14, 1e-13, 1e-12, 1e-11, 1e-10,
        1e-09, 1e-08, 1e-07, 1e-06, 1e-05, 1e-04, 1e-03, 1e-02, 1e-01, 1e-00,
        1e+01, 1e+02, 1e+03, 1e+04, 1e+05, 1e+06, 1e+07, 1e+08, 1e+09, 1e+10,
        1e+11, 1e+12, 1e+13, 1e+14, 1e+15, 1e+16, 1e+17, 1e+18, 1e+19, 1e+20,
        1e+21, 1e+22, 1e+23, 1e+24, 1e+25, 1e+26, 1e+27, 1e+28, 1e+29, 1e+30,
        1e+31, 1e+32, 1e+33, 1e+34, 1e+35, 1e+36, 1e+37, 1e+38
    };
    
    
    /* calculate 10^n with as much precision as possible */
    float powten(int n)
    {
        if (n < -37) {
            return 0.0f;
        }
        if (n > 38) {
            return 1e+99; /* infinity */
        }
        return powten_tab[n+37];
    }
    
    
    /* extract base 2 log of a floating point number */
    /* if your system library has ilogb, you can use that */
    int my_ilogbf(float f) {
        D_OR_I u;
        UITYPE i;
    
        u.d = f; i = u.i;
        i = (i<<1) >> (DOUBLE_BITS+1);
        return i - 127;
    }
    
    /* check for infinity */
    /* if your system library has isinf, you can use that */
    int my_isinff(float f) {
        D_OR_I u;
        UITYPE i;
    
        u.d = f; i = u.i;
        i = i >> DOUBLE_BITS;
        return i == 255;
    }
    
    /*
    // disassemble a number into integer and exponents
    // fixed to base 10 for now
    */
    
    static void disassemble(FTYPE x, UITYPE *aip, int *np, int numdigits)
    {
        FTYPE a;
        int maxdigits;
        UITYPE ai;
        UITYPE u;
        UITYPE maxu;
        int n;
        int i;
        int trys = 0;
        D_OR_I un;
    
        if (x == 0.0f) {
            *aip = 0;
            *np = 0;
            return;
        }
    
        /* first, find (a,n) such that
        // 1.0 <= a < 10 and x = a * 10^n
        */
        n = my_ilogbf(x);
        /* initial estimate: 2^10 ~= 10^3, so 3/10 of ilogb is a good first guess */
        n = (3 * n)/10 ;
        maxdigits = MAX_DEC_DIGITS;
    
        while (trys++ < 8) {
            FTYPE p;
            p = powten(n);
            a = x / p;
            if (a < 1.0f) {
                --n;
            } else if (a >= 10.0f) {
                ++n;
            } else {
                break;
            }
        }
        i = my_ilogbf(a);
        un.d = a;
        ai = un.i & DOUBLE_MASK;
        ai |= DOUBLE_ONE;
        ai = ai<<i;
    
        /*
        // now extract as many significant digits as we can
        // into u
        */
        u = 0;
        if (numdigits< 0) {
            numdigits = n - numdigits;
    
            /* "0" digits is a special case (we may need to round the
            // implicit 0 up to 1)
            // negative digits will always mean 0 though
            */
    
            if (numdigits < 0) {
                goto done;
            }
        } else {
            numdigits = numdigits+1;
        }
        if (numdigits > maxdigits)
            numdigits = maxdigits;
        maxu = 1; /* for overflow */
        while ( u < DOUBLE_ONE && numdigits-- > 0) {
            UITYPE d;
            d = (ai >> DOUBLE_BITS); /* next digit */
            ai &= DOUBLE_MASK;
            u = u * 10;
            maxu = maxu * 10;
            ai = ai * 10;
            u = u+d;
        }
        /*
        // round
        */
        if (ai > (unsigned)(10*DOUBLE_ONE/2) || (ai == (unsigned)(10*DOUBLE_ONE/2) && (u & 1))) {
            u++;
            if (u == maxu) {
                ++n;
            }
        }
    done:
        *aip = u;
        *np = n;
    }
    
    /*
    // print out a signed decimal number
    // if needpt is true, print '.' after the first character
    // divider is 10^desired digits
    */
    static void putdec(unsigned ai, unsigned divider, int needpt) {
        int digit;
        do {
            digit = ai / divider;
            ai = ai % divider;
            divider = divider / 10;
            putchar('0' + digit);
            if (needpt) {
                putchar('.');
                needpt = 0;
            }
        } while (divider > 0);
    }
    
    
    /*
    // print a float out in exponential format
    */
    
    void printfloat(float x) {
        UITYPE ai;
        int exp;
        int sign;
    
        if (x < 0) {
            x = -x;
            sign = '-';
        } else {
            sign = '+';
        }
    
        putchar(sign);
    
        /* handle special cases */
        if (x != x) {  /* only can happen for NaN */
            fputs("nan", stdout);
            return;
        }
        if (my_isinff(x)) {
            fputs("inf", stdout);
            return;
        }
        /* get ai as 7 digits using base 10 (the "6" is digits after the decimal point) */
        disassemble(x, &ai, &exp, 6);
    
        /* print ai out in 7 digits with a . after the first digit */
        putdec(ai, 1000000, 1);
    
        /* now print the exponent */
        putchar('E');
        if (exp < 0) {
            putchar('-');
            exp = -exp;
        } else {
            putchar('+');
        }
        putdec(exp, 100, 0);
    }
    
    void test(float f) {
        printfloat(f);
        putchar('\n');
    }
    
    int main() {
        /* test program */
        test(-9.0f);
        test(12.34561f);
        test(-0.72f);
        test(999999.1f);
        test(3.1234567e+10f);
        return 0;
    }
    
  • Yes, but remember the objective here is to maximize the accuracy of the floating point printout while minimizing the code size.

    I haven't messed with my ConPrint function for nearly a year, but now, for experimental purposes, I've extracted the portions of the code that create the floating point string and put them into a new function for further examination.

    Let's just call this new function float2str(). You input a floating point value, and it returns a string containing the floating point number. I mentioned previously that there is some loss of precision using my technique.

    For the sake of argument let's say that we want to convert floating point value fval into its string equivalent fstr.

    One idea I'm looking at now to mitigate the loss of precision involves an iterative approach:

    1. Initially copy fval into a variable called rval: rval=fval;

    2. Feed rval into the float2str() function.

    3. Retrieve the output fstr string from float2str().

    4. Convert fstr into a new rval using the atof() function: rval=atof(fstr);

    5. Calculate a difference value between fval and rval: dval=fval - rval;

    6. Add dval to fval to create a new rval : rval=fval + dval;

    7. Repeat #2 above until the dval is minimized.

    Using this approach I've noticed that dval is minimized (i.e. fstr printout converges toward fval) after a few iterations. Accuracy is much better than without the iterative approach, but it's still not perfect.

    The question for me is how accurate is good enough considering I'm working with 32-bit floats and not 64-bit and I'm not interested in bloating the code any more than it already is.

    I'll continue to experiment with this approach and see if it can be improved further. Onward and upward...

  • @Wingineer19 :

    Yes, but remember the objective here is to maximize the accuracy of the floating point printout while minimizing the code size.

    Oh, I haven't forgotten -- sometimes having a pre-computed lookup table is both faster and smaller than trying to compute the data accurately at run time. A table of powers of 10 only needs about 78 entries or so for 32 bit floats, which isn't that big (it's equivalent to 78 machine instructions in LMM, which is a fairly small function). The code I posted also has the advantage that it does round-trip conversion, that is float -> string -> float should always produce the original float (that, at least, was the case when something similar was used in the PropGCC printf/scanf routines, I haven't tried with the slightly modified code I posted).

  • As we all know, the standard library printf() function consumes so much HubRam memory on the Prop1 that not much of the 32KB remains for any useful program to be run.

    Hence, the purpose of this exercise has been to rewrite printf() to consume a much smaller memory footprint yet retain most of the original printf() functionality, including floating point support. To do that four new functions were written:

    1. conprint() -- this is the function the user calls which takes the place of printf(). It essentially populates an output format string and the va_list pointer and passes this info to the strprint() function.

    2. strprint() -- similar in functionality to the library sprintf() function, it receives the format string and va_list pointer from conprint() and determines how to fill the output string. For individual characters and strings it populates the output string directly, but will call the int2asc() function to format integers or the eftoa() function to format floating point numbers.

    3. int2asc() -- receives an integer value from strprint() and converts it into a binary, octal,
      signed or unsigned decimal, or hex string.

    4. eftoa() -- receives a floating point value from strprint() and converts it into a string in
      floating point or exponential format.

    With the exception of eftoa(), which I'm still working on to reduce size and maximize accuracy, the functions are pretty much in their final format. As is, these four functions appear to consume about half the memory of the original printf() function.

    That's good, and provides some breathing room in HubRam for a useful program, but there's one additional possibility that could be explored as well: the use of an XMM cog and memory to perform the printf() function. In my case, I'm using a FLIP, and although I built an external SRAM memory module for it, I would prefer to run the FLIP without it. But how can this be done without using an external memory add-on?

    Well, Catalina allows the user to run an XMM program directly from EEPROM while using some HubRam as cache. In this mode, the XMM program runs in the Catalina SMALL Memory Mode, which means the code itself is stored and fetched from EEPROM, while data, heap, and stack are stored within HubRam.

    Catalina also allows Multi-Model-Memory, which means that one cog can execute from XMM memory, while the remaining cog(s) can execute LMM or CMM code (and maybe PASM as well) from HubRam. The XMM cog can communicate with the LMM/CMM cog(s) via a shared memory structure.

    I've found it best to keep the Shared Structure as small as possible, and since the XMM and LMM/CMM programs use HubRam for their data memory, populating the Shared Structure with a few variable types and pointers is the best bet.

    The following code demonstrates the Catalina Multi-Model-Memory feature where the CMM Secondary program computes the sine and cosine values of an angle then passes this information over to the XMM Primary program for printout.

    Of course using this technique to have the XMM program format values for printout is slower than having the CMM program do it, but if one can live with the result it's an excellent way to save HubRam memory.

    First, look at the Secondary program:

    //segment is secondary.c
    //last revision on 07Nov23
    
    //catalina secondary.c -lc -lm -lserial4 -C NO_HMI -C NO_ARGS -C FLIP -C COMPACT -R 0x57d8 -M64k -y
    //spinc -B2 -n SECONDARY -s 512 -c -l secondary.binary > secondary.inc
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <stdarg.h>
    #include <math.h>
    #include <time.h>
    #include <propeller.h>
    
    #define     Yes                 1
    #define     No                  0
    #define     conport             0
    #define     print_command       1
    #define     ROUNDUP             128
    
    struct shared
    {
        char request;
        char *format;
        char *outstr;    
     va_list ptr;
    };
    
    struct shared *secondary;
    
    char   outstr[256];
    
    void secprint(char *format,...)
    { 
     va_list ptr;
     va_start(ptr,format);
     (*secondary).format=format;
     (*secondary).ptr=ptr;
     (*secondary).outstr=&outstr[0];
     (*secondary).request=print_command;
     while((*secondary).request != 0);
     s4_str(conport,outstr);
    }
    
    void main(struct shared *primary)
    {
     float theta;
     secondary=primary; 
     for(;;)
      {
       for(theta=0.0; theta<6.283185307; theta+=0.0062831853)
        {
         secprint("theta=%f,sine=%f,cosine=%f\r\n",theta,sin(theta),cos(theta));
        }
      }
    }
    

    We then compile this Secondary code:

    c:\> catalina secondary.c -lc -lm -lserial4 -C NO_HMI -C NO_ARGS -C FLIP -C COMPACT -R 0x57d8 -M64k -y
    Catalina Compiler 6.1.1
    
    code = 2952 bytes
    cnst = 72 bytes
    init = 44 bytes
    data = 264 bytes
    file = 33012 bytes
    
    c:\> spinc -B2 -n SECONDARY -s 512 -c -l secondary.binary > secondary.inc
    Catalina SpinC 6.0
    c:\>
    

    Note that we must specify the location in HubRam where this code is to be stored using the -R 0x57d8 directive. No need to worry about determining this value. If it's wrong, the code will tell us at runtime what the value should be so we can just update it and recompile.

    Second, look at the Primary code:

    //segment is primary.c
    //last revision on 07Nov23
    
    // catalina primary.c -lc -lm -lserial4 -C NO_HMI -C FLIP -C XEPROM -C SMALL -C CACHED_4K -y
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <stdarg.h>
    #include <math.h>
    #include <time.h>
    #include <propeller.h>
    #include "secondary.inc"
    
    #define     Yes                 1
    #define     No                  0
    #define     conport             0
    #define     print_command       1
    #define     ROUNDUP             128
    
    void check_memory(void *secondary_memory)
    {
     if(SECONDARY_CODE_ADDRESS == (int) secondary_memory) return;
     for(;;)
      {
       s4_str(conport,"Error - Change secondary Memory To 0x");
       s4_hex(conport,secondary_memory,4);
       s4_str(conport,"\r\n");
       while(s4_rxcheck(conport) < 0);
      }
    }
    
    struct shared
    {
        char request;
        char *format;
        char *outstr;    
     va_list ptr;
    };
    
    static struct shared  primary;
    
    void priprint(void)
    {
     vsprintf(primary.outstr,primary.format,primary.ptr);
     va_end(primary.ptr); 
     primary.request=0;
    }
    
    void main(void)
    {
     char SECONDARY_RESERVED_SPACE[ROUNDUP * ((SECONDARY_RUNTIME_SIZE + ROUNDUP -1)/ROUNDUP)]; 
     check_memory(SECONDARY_RESERVED_SPACE);
     start_SECONDARY(&primary,ANY_COG);
     for(;;)
      {
       if(primary.request == print_command) priprint();
      }
    }
    

    Now compile the Primary code:

    c:\> catalina primary.c -lc -lm -lserial4 -C NO_HMI -C FLIP -C XEPROM -C SMALL -C CACHED_4K -y
    Catalina Compiler 6.1.1
    
    code = 25284 bytes
    cnst = 172 bytes
    init = 5240 bytes
    data = 368 bytes
    file = 63912 bytes
    c:\>
    

    Notice that in this Primary code example the priprint() function uses the library vsprintf() to format the output and not my replacement printf() functions mentioned above. If I had used my replacement functions the file size shown above would have been considerably smaller.

    Finally, upload the compiled code to EEPROM and execute it:

    c:\> payload eeprom primary -I vt100
    Using Propeller (version 1) on port COM3 for first download
    Using Secondary Loader on port COM3 for subsequent download
    c:\>
    

    The code outputs this info:

    theta=0.000000,sine=0.000000,cosine=1.000000
    theta=0.006283,sine=0.006283,cosine=0.999980
    theta=0.012566,sine=0.012566,cosine=0.999921
    theta=0.018850,sine=0.018848,cosine=0.999822
    theta=0.025133,sine=0.025130,cosine=0.999684
    theta=0.031416,sine=0.031411,cosine=0.999507
    theta=0.037699,sine=0.037690,cosine=0.999289
    theta=0.043982,sine=0.043968,cosine=0.999033
    theta=0.050265,sine=0.050244,cosine=0.998737
    theta=0.056549,sine=0.056519,cosine=0.998402
    theta=0.062832,sine=0.062791,cosine=0.998027
    theta=0.069115,sine=0.069060,cosine=0.997612
    theta=0.075398,sine=0.075327,cosine=0.997159
    theta=0.081681,sine=0.081591,cosine=0.996666
    theta=0.087965,sine=0.087851,cosine=0.996134
    theta=0.094248,sine=0.094108,cosine=0.995562
    theta=0.100531,sine=0.100362,cosine=0.994951
    theta=0.106814,sine=0.106611,cosine=0.994301
    theta=0.113097,sine=0.112856,cosine=0.993611
    theta=0.119381,sine=0.119097,cosine=0.992883
    theta=0.125664,sine=0.125333,cosine=0.992115
    theta=0.131947,sine=0.131564,cosine=0.991308
    theta=0.138230,sine=0.137790,cosine=0.990461
    theta=0.144513,sine=0.144011,cosine=0.989576
    theta=0.150796,sine=0.150226,cosine=0.988652
    theta=0.157080,sine=0.156434,cosine=0.987688
    theta=0.163363,sine=0.162637,cosine=0.986686
    theta=0.169646,sine=0.168833,cosine=0.985645
    theta=0.175929,sine=0.175023,cosine=0.984564
    theta=0.182212,sine=0.181206,cosine=0.983445
    theta=0.188496,sine=0.187381,cosine=0.982287
    

    This was just a quick demo of how the Catalina Multi-Model-Memory feature could be used to execute the library printf() function without consuming massive amounts of HubRam memory.

    This program barely fit within the 64 kilobyte EEPROM of the FLIP, so in my real world application I will be replacing it with a 128 kilobyte one. In my view this is still better than attaching an external SRAM memory module to the FLIP, and although not nearly as fast as when using the module, it provides acceptable speed for my application.

    Catalina also allows the storage of C strings within EEPROM. This is handy for storing a multitude of Menus and submenus in upper EEPROM memory and the XMM kernel allows quick retrieval and display of these strings for viewing by the operator.

    My hope is that by using all of these tricks and techniques I can coax the FLIP into doing what I want without having to upgrade to the Prop2.

  • RossHRossH Posts: 5,353

    Hello @Wingineer19

    This sounds very promising!

    Almost by definition, if you are using the stdio functions (like printf and friends) then you only need human speed and not real-time speed. So this seems like a good solution - execute from the limited Hub RAM those things that need speed, and execute from the XMM RAM those things that need lots of memory but only need human speed!

    I hope to get time to examine it more closely over the next few days - but it sounds like you have it all well in hand! :)

    Ross.

  • Wingineer19Wingineer19 Posts: 279
    edited 2023-11-10 03:53

    @RossH said:
    ...So this seems like a good solution - execute from the limited Hub RAM those things that need speed, and execute from the XMM RAM those things that need lots of memory but only need human speed!

    Precisely. For those functions which require lots of code but use very little data, why not push them over to the XMM side and use the Shared Structure to provide inputs to it and get the results from it.

    The downside is that the user will have to understand that getting the desired results will take longer because the XMM code execution is slower than executing CMM from HubRam. But if one could live with that, then why not go with it.

    For this test the SMALL Memory Model fits the bill perfectly and by using the XEPROM driver no additional add-on memory is required on something like the FLiP, with the only caveat being that the 64 kilobyte EEPROM will need to be replaced with a 128 kilobyte EEPROM to support decent code size.

    The above code example should work fine on any Prop1 platform that has at least a 64 kilobyte EEPROM for anyone wanting to take it for a spin (no pun intended). The only requirement is that one has Catalina installed on their computer along with something like a FLiP to perform the experiment.

    In my real world software development case, I'm using a Prop1 USB Project Board with 256 kilobytes of EEPROM installed.

    I've modified the for() statement within the Primary code to include a switch() statement which looks for different request values within the Shared Structure and then performs different tasks depending upon it.

    For example, in addition to performing the printf() function like the above example, the XMM cog will retrieve and display Menus stored within EEPROM upon request, will populate a WiFi Ground Station list with default values, populate various WiFi IP settings with default values, and set a default Reference Location. When I get the time I'm going to add a feature to allow the XMM cog to retrieve each of these WiFi values directly from EEPROM, and once I get the EEPROM write functions perfected, the XMM cog will also be able to store these values there as well.

    With the XMM cog assigned these various housekeeping chores, there should be sufficient HubRam remaining for CMM cog(s) to perform time critical stuff like reading and formatting data from a GPS receiver, estimate position from WiFi station multilateration using Jacobian Matrix functions, and control some servos.

    It's still early in the overall coding process so I'm not certain yet that the Prop1 will be able to do all of these necessary things using the Multi-Model-Memory. I only know that it's not possible without using it.

    Like I said, my fallback position is to move to the Prop2, but if the Prop1 can get the job done then why not stay with it.

Sign In or Register to comment.