How can I get most out of Fcache?

Christof Eb. · 2012-09-12 12:43

Hi,
is there already a documentation about fcache?

I would like to be able to get most out of this mechanism, which is very powerful. Can I convince it to cache more than the innermost loop?
Thanks
Christof

ersmith · 2012-09-12 12:50

Christof Eb. wrote: »

Hi,
is there already a documentation about fcache?

I would like to be able to get most out of this mechanism, which is very powerful. Can I convince it to cache more than the innermost loop?
Thanks
Christof

It will automatically cache outer loops too, as long as the normal conditions for fitting in fcache are met: there can be no function calls (except calls to NATIVE functions) inside the loop, no branches to points outside the loop, and the loop has to fit.

In the CMM preview you can also suggest to the compiler that it put a whole function into FCACHE, by putting __attribute__((fcache)) before the function declaration. Provided the function fits, and has no calls to other functions (actually NATIVE functions are allowed, but that's a special case), then the whole function will be placed into FCACHE. This is useful if you need to guarantee the timing of initialization code relative to some loops, or if the function has multiple independent loops and you want to keep them in the FCACHE together.

Eric

Ariba · 2012-09-12 21:14

ersmith wrote: »

...
In the CMM preview you can also suggest to the compiler that it put a whole function into FCACHE, by putting __attribute__((fcache)) before the function declaration. Provided the function fits, and has no calls to other functions (actually NATIVE functions are allowed, but that's a special case), then the whole function will be placed into FCACHE. This is useful if you need to guarantee the timing of initialization code relative to some loops, or if the function has multiple independent loops and you want to keep them in the FCACHE together.

This sounds exactly like what I was looking for: a way to execute a short function always from the cache. This allows to have fast routines with a tight timing without the use of an PASM cog. So I tried this test code:

/*
 * This is a simple serial test for CMM.
 */
#include <propeller.h>

int baudticks;
int txmask = 1<<30;

__attribute__((fcache))
void tx(int val)
{
  int bitm, tm;
  val = (val | 0x300)<<1;   //add start + 2 stopbits
  tm = _CNT + baudticks;
  for(bitm=1; bitm<0x800; bitm<<=1) {
    if(val & bitm) _OUTA |= txmask;
    else _OUTA &= ~txmask;
    tm = __builtin_propeller_waitcnt(tm, baudticks);
  }
}

void str(char* strp)
{
  while(*strp > 0) tx(*strp++);
}

int main(void)
{
    baudticks = _CLKFREQ / 115200;
    _OUTA |= txmask;
    _DIRA |= txmask;
    while(1) {
        waitcnt(CLKFREQ/4+CNT);
        str("Hello World\r");
    }
    return 0;
}

.. and I can output the hello world text with 115200 baud in all memory models (LMM, CMM, XMMC, SDXMMC) with no problems. The question is: Is it safe to expect that the tx function is always executed from the cache, or can it be that it is executed slower for the first time, or if the cache is full of other cached routines?

Andy

ersmith · 2012-09-13 05:07

Ariba wrote: »

The question is: Is it safe to expect that the tx function is always executed from the cache, or can it be that it is executed slower for the first time, or if the cache is full of other cached routines?

A function declared __attribute__((fcache)) is always completely loaded into FCACHE before it is executed, so the relative timings of the instructions within the function will always stay the same. The time it takes to start the function will vary depending on whether or not it is already in the cache... if some other function was using the cache then the first instruction of tx will not start until the cache has been re-loaded.

I'll also note that there is an undocumented feature: a function declared with __attribute__((native)) is always placed in the kernel memory, and hence always runs at full speed. This feature is undocumented because the size of kernel memory is subject to change; in particular there is very little space left in the XMM and CMM kernels. It's probably OK to use native functions in pure LMM mode, but I wouldn't do it in other modes.

Eric

Christof Eb. · 2012-09-13 12:24

Hi Eric, do you see a chance to put the c source as comments into the compiler output pasm .s file? This would be very helpful to be able to see for example, if a routine fits into fcache.
Thanks alot, Christof

ersmith · 2012-09-13 12:40

Christof Eb. wrote: »

Hi Eric, do you see a chance to put the c source as comments into the compiler output pasm .s file?

Try:

propeller-elf-gcc -g -c -Os -Wa,-ahl=foo.s foo.c

The combination of the "-g" flag to gcc and the "-Wa,-ahl=foo.s" flags to the assembler cause the assembler to produce a listing with interleaved C and assembly.

Eric

Christof Eb. · 2012-09-15 10:30

Thanks, Eric, there is so much to know.... Christof

How can I get most out of Fcache?

Comments