SpinLMM

Version 0.9


Introduction

SpinLMM is an enhancement to the Spin interpreter that provides the capability to execute PASM instructions using the large memory model (LMM). LMM was created by Bill Henning to provide a way to run PASM programs from the hub memory of the Propeller. The large memory model uses an interpreter that runs in a cog to fetch an instruction from the hub RAM and execute it in place. This is feasible because the carry and zero flags are not normally modified unless they are explicitly defined in the PASM instruction.

SpinLMM can be used to implement PASM routines that normally run in a seperate cog from the Spin program. There are many existing applications where PASM routines use up an entire cog, but they are called infrequently and/or seqeuentially, and the parallel execution feature of a seperate cog is not needed. Some examples of this may be I2C access, floating-point operations or half-duplex serial operation.

Programming for SpinLMM requires a basic understanding of PASM. A beginner programmer should first become familiar with programming in PASM using multiple cogs before working with SpinLMM. However, in some respects SpinLMM is simpler than multi-cog PASM because of the direct interface to the Spin code through the stack. SpinLMM uses the @@@ and Bytecode extensions to the Spin language that are available in the BST and homespun compilers. The Parallax Spin Tool does not support these extensions, so either BST or homespun must be used with SpinLMM.

The Basics

An example of a simple SpinLMM program that returns a value of 5 is given below.

The call to the lmm.start method installs the LMM PASM interpreter into the Spin interpreter. The lmm.run0 method invokes the LMM PASM interpreter starting at GetFive. The jmp #FRETX instruction pushes the value of the reg0 register onto the stack and returns control to the Spin interpreter.

Parameters are passed to LMM routines by pushing them onto the Spin stack. The LMM PASM routine must pop them off of the stack to access them. The following program shows a routine that adds two numbers and returns the sum to the Spin program.

The stack address is contained in the pcurr register that is used by the Spin interpreter. The lmm.run2 method pushes the two parameters onto the stack, and the LMM routine pops them off of the stack by decrementing the pcurr register and reading the values. The registers reg0, reg1 and pcurr are all pre-defined in the SpinLMM interpreter. A complete list of the available registers is given later in this document.

Branching

LMM PASM programs cannot branch the same way a normal PASM program branches because it is executed by an interpreter and not directly by the processor. Instruction addresses in an LMM PASM program refer to hub memory, and not cog memory. The LMM interpreter uses a small loop to execute each PASM instruction one at a time. The loop looks like this.

An instruction is read from the address in the lmm_pc register, and then executed after incrementing the lmm_pc register by four. The interpreter then jumps back to the beginning of the loop to fetch the next instruction. The instructions in this loop total to 5 instruction cycles, or 20 system cycles. However, three additional instruction cycles are required to align with the memory access window of the cog. This results in a total of eight instruction cycles, or 32 system cycles to execute each LMM PASM instruction. This is an 8:1 speed reduction compared to straight PASM.

This loop could be unrolled several times to approach an average of 16 cycles per instruction, with only a 4:1 reduction in speed of straight PASM. However, the SpinLMM interpreter only uses the four-instruction loop. There is actually no speed penalty for hub-access instructions when using the four-instruction loop versus an unrolled loop. This is because hub-access instructions, such as rdlong and wrlong can execute within the extra cycles needed for hub access window alignment.

Branching in an LMM PASM program is performed by manipulating the program counter, or lmm_pc register in the interpreter. The interpreter adds 4 to the lmm_pc immediately after the instruction is read. At the time when an LMM PASM instruction is executed the lmm_pc register is pointing to the next long location. An example of a routine that performs branching is shown below. This routine computes the sum of the integer numbers from 1 to N using a small loop.

This program performs a jump back to the beginning of the loop by loading the address of LoopIt into the lmm_pc register. Notice that the addess is stored immediately after the rdlong lmm_pc, lmm_pc instruction by using long @@@LoopIt. The @@@ operator generates the absolute address of LoopIt. This operator is only supported by the BST and homespun compilers. The Parallax Spin Tool will not be able to correctly compile this code.

There are various branching instructions that cannot be efficiently implemented directly in LMM PASM such as the djnz and call instructions. Special psuedo-ops have been created to facilitate these branches. LMM psuedo-ops are just small routines within the interpreter that the LMM PASM program can jump to. FRETX is a psuedo-op that returns control to the Spin interpreter. Other psuedo-ops are defined in the next section.


Pseudo-ops

FJMP

The FJMP pseudo-op causes the execution to jump to the location specified in the next long after the FJMP instruction. The FJMP pseudo-op sets the LMM program counter to this value by executing a rdlong lmm_pc, lmm_pc instruction. It can be used with the jmp instruction, and it is especially useful in conjuction with the djnz instruction as shown below.

FCALL

FCALL is used to implement a subroutine call. The FCALL pseudo-op saves the program counter in the dedicated lmm_ret register, and then jumps to the FJMP pseudo-op. If the called routine also uses the FCALL pseudo-op it must save the lmm_ret register so that it can be used later when returning. A return is performed by copying the lmm_ret register to the program counter, lmm_pc. An example of using the FCALL pseudo-op and performing a return is shown below.

ICALL

The ICALL pseudo-op is used to perform an indirect, or indexed jump. It is normally used in conjunction with a calling table. It is similar to FCALL except that the value of the reg7 register is added to the contents of the next long to determine the jump address. It is implemented with a rdlong lmm_pc, lmm_pc instruction followed by an add lmm_pc, reg7, which is then followed by another rdlong lmm_pc, lmm_pc. An example of using ICALL is shown below.

This code will cause sub3 to be called. Note that the index address is the sum of the long after the jmp #ICALL instruction and the contents of register reg7. Therefore, it would also work to store the address of the call table in reg7, and store the table offset after the jmp #ICALL instruction. This is useful for driver code that implements a set of basic functions, but each instance of the driver uses different code for the basic functions.

The ICALL peusdo-op can be used to perform a simple indirect call if either the immediate value or reg7 is always zero. In this case, the call table is just a single long that contains the indirect address.

IJMP

IJMP is similar to ICALL except that a return address is not stored in lmm_ret. The IJMP pseudo-op performs indexed jumps in the same way that ICALL does.

FRET

This pseudo-op returns control back to the Spin interpreter. It does not push a return value onto the stack.

FRETX

This pseudo-op pushes the contents of reg0 onto the stack and returns control back to the Spin interpreter.

FCACHE

The FCACHE pseudo-op is used to load a section of code into the cog's memory and then jump to it. The cache area consists of 16 consecutive long locations, and it uses the same space as reg8 through reg23. reg8 is aliased with the label cache_addr, which can be used to set the cog address with the PASM org instruction. The instructions to be cached must be terminated by a zero long value, and it must contain a jump back to the LMM or Spin interpreters.

Note, the FCACHE is the only pseudo-op that changes the status flags. The FCACHE sets the zero flag before jumping to the cache. The carry flag is unaffected. The cache area is 16 longs in size, and the cached loop must fit within this space. This includes the terminating zero at the end of the loop. An example using the FCACHE pseudo-op with the SumIt routine is shown below.

LMM_LOOP

This is the starting address of the LMM interpreter loop. An FCACHE program can continue normal execution by jumping to this location.


Registers

There are several registers defined in the Spin and LMM interpreters. Most of the registers are used for general operations. Some of the registers have special functions, such as reg0, reg7, pcurr, lmm_pc and lmm_ret. The registers are defined below.