Running binaries from Hub RAM

d2rk · 2012-03-29 10:46

There are two binaries in the Hub RAM with addresses binary1_addr and binary2_addr respectively. Each binary has fixed 16-byte header and represents the next program:

int main() {
  while (1) {
    printf("TEST#%d on COG#%d\n", TEST_ID, cogid());
    msec_delay(DELAY); /* each test has own delay value */
  }
  return 0;
}

compiled with LMM model. The following code executes each binary one by one on a new cog:

add     binary1_addr, #4
shl     binary1_addr, #16
mov     dst, interpreter
or      dst, binary1_addr
coginit dst


mov     time, cnt               ' small delay to allow COG to settle
add     time, delay 
waitcnt time, #0


add     binary2_addr, #4
shl     binary2_addr, #16
mov     dst, interpreter
or      dst, binary2_addr
coginit dst

where interpreter is

[INDENT]interpreter  long  ($0000 << 16) | ($F004 << 2) | %1000[/INDENT]

As a result each cog should print a message with test and cog id. The output:

TEST#1 on COG#0

TEST#2 on COG#1

TEST#1 on COG#0

TEST#2 on COG#1

TEST#1 on COG#0

TEST#2 on COG#1

TEST#1 on COG#0

TEST#2 on COG#1

TEST#1 on COG#0

TEST#1 on COG#1

TEST#1 on COG#0

TEST#1 on COG#1

TEST#1 on COG#0

TEST#1 on COG#1

TEST#1 on COG#0

TEST#1 on COG#1

As can be seen, from some moment the first test occupied two cogs. Where can be a problem and how to fix it?

Thank you.

denominator · 2012-03-29 10:54

That looks like a dangerous way to run two different C programs. The problem I see here is that you're trying to kick off two different copies of the LMM interpreter. The issue here is that they are going to try to share the same stack and heap space, and otherwise get in each other's face.

If you want to run two different threads of C under the LMM kernel, you should have the same executable and use the _start_cog_thread function to start each thread with its own runtime and stack space.

If you really need separate executables, you're going to need to modify the LMM kernel to be able to locate the stack/heap in a constrained area that you set up for each program. This probably is not a trivial exercise.

d2rk · 2012-03-29 11:25

Yes, the purpose is to separate executables.

Just started to review LMM kernel. I thought it will use right stack information from header.

denominator · 2012-03-29 11:49

d2rk wrote: »

I thought it will use right stack information from header.

It does, but that's assuming the interpreter has been kicked off by the original interpreter (via the aforementioned _start_cog_thread). You're trying to use the LMM kernel in a way it wasn't designed to be used, so I think you're blazing your own trail here...

Dave Hein · 2012-03-29 14:05

There are a few problems here. First of all, LMM programs contain a small Spin program that start up the LMM kernel. You would have to add the memory offset to the Spin header before you do the coginit. You also need to ensure that there is a small amount of Spin stack space of a few longs in size. The startup Spin program sets the C stack pointer to $8000, so both C programs are going to be using the same stack space. You will need to ensure that the program in the lower address range doesn't clobber the program in the higher address space with it's heep manager. The third problem is that C LMM programs are linked starting with address 0. They cannot easily be relocated like a Spn program can.

Were the 2 programs linked to start at binary1_addr and binary2_addr? If so, how did you do that?

I have been able to run LMM C programs at address 0 while running Spin programs at higher addresses. I had to patch the startup code in the C code so that I could specify a stack address other than $8000.

denominator · 2012-03-29 21:58

Dave Hein wrote: »

Were the 2 programs linked to start at binary1_addr and binary2_addr? If so, how did you do that?

It appears that program 1 and program 2 were identical in his test - although d2rk may want to run two different executables, it looks like he started by trying to run two instances of the same executable.

d2rk · 2012-03-29 22:07

Hi Dave. The first and third problems were done. Each Spin header has fixed header with right offsets and C LMM programs are linked starting with correct address. I have own tool that generates linker script, fixes headers, compiles sources and downloads them on the chip. It can also work with multiple binaries.

Now I need to solve the second problem. I will try to fix LMM interpreter first. Otherwise I have one more idea.

d2rk · 2012-03-29 22:15

denominator wrote: »

It appears that program 1 and program 2 were identical in his test - although d2rk may want to run two different executables, it looks like he started by trying to run two instances of the same executable.

No, I have two different executables and run two different instances. All the addresses have right offsets. See Just another image base address.

d2rk · 2012-03-30 03:31

The problem (second problem introduced by Dave) was temporary solved and all works perfectly. But I'm still looking for a good solution.

On this moment I modified crt0_lmm.s file with LMM kernel and made the stack pointer to be based on __hub_end value which is unique for each binary.

Dave Hein · 2012-03-30 05:42

The C startup code uses the value of the stack pointer address (which is passed in PAR) to determine whether it is starting up the first instance or a thread. A PAR value of $8000 indicates the first instance. When it runs the first instance it will allocate a lock and store the lock number that is used by the atomic read-modify LMM operator. In Spinix, I patch cog location 2 with an instruction to test for a non-zero cog number instead of testing for $8000. This allows me to pass a stack address that is not equal to $8000. Of course, this requires that lock number zero must be allocated before storing up the C code. This can be avoided by rewritting the startup code so that it ORs a value of 256 to the lock number, but I haven't bothered doing that yet.

denominator · 2012-04-09 21:25

Dave Hein wrote: »

I have been able to run LMM C programs at address 0 while running Spin programs at higher addresses.

Dave,

How did you relocate the Spin programs? Did you do this manually, or do you have a programmatic way to do that? I saw this thread: http://forums.parallax.com/showthread.php?138534-Spin-modules-as-separate-relocatable-modules-(in-OS) and I'm guessing that you have solved that problem for Spinx (http://forums.parallax.com/showthread.php?123795-spinix). Do you have a pointer to the code and/or methodology you use to allow an arbitrary Spin program to be loaded at a different address?

I'm asking because I'd like to use arbitrary Spin objects from within a C program. I'd like to kick off the Spin interpreter running Spin in a separate cog, the Spin program would be located at a non-standard location in RAM (i.e. it wouldn't be at 0x20); that's the trick I don't currently know how to do. I'd use the more-or-less "standard" mailbox approach to send method requests through a mailbox and wait for the response; I think I know what to do in that regard.

Thanks!

- Ted

Dave Hein · 2012-04-10 04:54

The source code for the Spinix run routine is shown below. I edited out most of the Spinix-specific stuff to make it easier to read. The program offset is added to the last 5 words of the program's header. The first 3 words aren't used by the program since it always references the absolute addresses of $0000 and $0004 for CLKFREQ and CLKMODE.

PUB run(fname) | infile, size, AppAddr, i, cognum, vbase, dbase

  ' Open the program file  
  infile := fopen(fname, string("r"))
  ifnot infile
    return -3

  ' Determine the program size including the VAR section
  AppAddr := mem.malloc(16)
  fread(AppAddr, 1, 16, infile)
  size := word[AppAddr][5] ' size is equal to the value of dbase
  mem.free(AppAddr)
  fseek(infile, 0, SEEK_SET)

  ' Allocate the memory plus some for the stack
  AppAddr := mem.malloc(size + 800)
  ifnot AppAddr
    fclose(infile)
    return -2
  fread(AppAddr, 1, size, infile)

  ' Add offset to the data pointers
  repeat i from 3 to 7
    word[AppAddr][i] += AppAddr
    
  fclose(infile)

  'pbase := word[AppAddr][3]
  vbase := word[AppAddr][4]
  dbase := word[AppAddr][5]
  'pcurr := word[AppAddr][6]
  'dcurr := word[AppAddr][7]

  ' Set up the program's stack frame
  word[dbase][-4] := (@@0) | 2          ' pbase + abort trap and return value
  word[dbase][-3] := @firstvar          ' vbase set to point to firstvar
  word[dbase][-2] := dbase + 64         ' dbase set to a safe place far away
  word[dbase][-1] := word[@@0][2] + @@0 ' Starting address of exit routine -- must be first PUB
  long[dbase]~                          ' Zero result

  ' Zero the VAR area
  longfill(vbase, 0, (dbase - vbase - 8) >> 2)

  ' Start the program in a new cog
  cognum := cognew($f004, AppAddr + 4)
  if cognum == -1
    mem.free(AppAddr)
    return -4

  return cognum

denominator · 2012-04-10 06:54

Wow, that looks simpler than I thought it would be. Thank you so much!!!

Dave Hein · 2012-04-10 08:54

Yes, Parallax (Chip) did a great job in developing Spin. The use of PBASE allows objects to be relocated anywhere in memory, as long as they don't reference hardcoded absolute addresses. I think the only thing that looks complicated in my run code is how I set up the stack frame. The stack frame consists of the 4 words located just before the RESULT stack variable. The DBASE pointer points to the RESULT variable, and all other stack variables are referenced from DBASE.

The Spin runner code normally puts the values of $FFFFFFF9 in the two longs (4 words) that make up the initial stack frame. This basically traps any aborts and discards the return value, and causes a return jump to location $FFF9 in the ROM. The code at $FFF9 does a cogstop(cogid), which stops the cog. I wanted to jump to an exit routine instead, so I set the initial stack frame up differently. word[dbase][-4] specifies the calling program's PBASE. The 2 least-signifcant bits are used to trap aborts and inidicate whether a return value should be pushed to the stack. PBASE is just given by @@0.

word[dbase][-3] is the calling programs VBASE. I get this from the address of the first long in the VAR section, which in my case is "firstvar". word[dbase][-2] is the value of DCURR that should be use after a return. I set this to 16 longs after DBASE. In theory, this would be set to DBASE plus four times the number of local variables that a method uses, but the exact value wasn't important in my case. word[dbase][-1] is the return address. In my case, I want it to return to my exit routine, which I placed as the first method in the object. This way I could obtain its address from the method table as word[pbase][2] + pbase.

d2rk · 2012-04-22 05:57

Continuing discussion in Making PropGCC programs more OS friendly thread.

Running binaries from Hub RAM

Comments