LMM Pacito
Title:LMM Pacito
Author:thehappyhippy
Published:Wed, 19 Dec 2007 17:35:46 GMT

Specification for a Large Memory Model with access up to 512 klongs of code and 512kbytes of data.


LMM Kernel Specification v1.0 - pacito version

The purpose of this specification is to serve as basis for a large memory model (LMM) for the Parallax Propeller. This would extend the usable memory area beyond the 2kbytes of COG's memory. This specification requiers the following hardware support:

- A parallax propeller v 1.0
- Am external RAM (DRAM or SRAM) for more than 32 kbytes of code

Note: Instruction names refer to the COG native instruction set. When a reference to a new instruction is made, its functionality is implied. The program refers to the un-preprocessed source with a mixture of native and no native instructions. The compiled (or compiled program) refers to the above mention program after preprocessing and compilation. It only contains COG machine code.

Scope

The COG can execute very fast machine code, so It makes sense to use this built-in language instead of creating a second level language. If some instructions are worked around limitations to memory access can be bypassed.

- Memory addressing is limited to 512 memory locations
- Program execution is available beyond 511 memory locations
- No support for high-level languages like C
- Up to 512 registers

The last point, as the cog works is a strength, but in our case as the program is going to be relocated has to be reduced to a mere 16 registers, with 8 of them having special meanings.

R0..R7: general purpose registers
PC : program counter
SP : Stack pointer
DP : Data pointer (points to the beginning of the Data section)
IM : Immediate register, used by the kernel
BP : Base pointer, used for stack access
DL : Data length, length of the Data area
U0, U1 : not yet defined

To address the first two points a new set of instructions is proposed:

ldb, ldw and ldd : These instructions can read any memory location, at byte, word and long lengths. Their counterparts stb, stw and std write to any memory location.

Jump, subrutine call and return are also supported via rcall and rjmp. These instructions can access any memory position inside program memory. Stack is handled automatically.

Stack manipulation, push, pop, and stack read and write in place with a base pointer is also provided via special instrucions ldbpb, ldbpw, ldbpd, stbpb, stbpw and stbpd. As part of the stack manipulation, two more instructions enter and leave provide stack reserving techniques. Stack overflow is checked for every new enter instruction.

A new instruction to load long immediates from program memory is provided, too.

Some of these instructions will be subroutine calls to kernel functions, and some will require arguments. Extra arguments will be either a long constant after the instruction (jumps and loads/stores from memory) or a mov instruction before the call to a special kernel register in the case of small arguments. Thise will create multi-long pseudo instructions. As the program will be divided in 32 longs chunks, special care is taken to not have a double long instruction on the last long of a chunk. A nop will be added in these case.


Memory management

As memory inside the cog is limited to 496 instructions, the only way to extend it is to use caching techniques. The cog's memory will act as a Level 1 cache and the HUB memory can act as a secondary cache if desired when external memory is used, in the case that only HUB memory is used, this feature can be left out.

The cache will occupy 128 longs and will have 4 chunks (lines) of 32 longs each, leaving the rest to the kernel. Caching occurs transparently to the application and with special support from the extra instructions use of the increased address space can be done. These lines will be filled on-demand and reused as needed (without overwriting the last used line in a round-robin fashion). An aging mechanism could be implemented if space allows.

The cache line contains two extra fields, a memory pointer to its absolute address (aligned to a 32 long boundary) and a jump instruction at the end to return to the kernel. The first allows for fast look-up in case of a rjump / rcall instruction and the second return the control to the kernel to load another cache line or to continue execution if the line is already present. No multilong instructions are allowed to be separated in two cache lines, the preprocessor will ensure this.

Data memory and stack are accessed outside the scheme. Self-modifying code is not possible at this time.

Control transfer instructions

The program shall have only rcall, rret and rjmp instructions, these instructions support transfer of control to the whole memory space beyond cogs's memory limitation using the techniques described in the caching section.

Each new instruction will be replaced by the preprocessor for a sequence of native cog machine code.

rcall #a_function_beyond_2k

is going to be replaced by

call #krnl_rcall
long a_function_beyond_2k

The long is going to be ignored by the COG because its condition bits are zero. That will limit the addressable area to 19 bits, that would mean 2^19 longs, but the same restriction exists for memory load/store, so it is limited to 2^19 bytes.

rret is going to be replaced just with a call to krnl_rret. No extra arguments
are needed.

The case of jumps is a bit more complicated. If jmp were to be used, the line would have to be relocated appropriately, for that purpose all jumps will have to be rjumps. Using the scheme before, conditional jumping can be used without problems, because the second parameter has condition IF_NEVER (all zero), the call/jump can have a condition. That will somehow reduce the overhead.

A special rjmp could be used if the jump would occur inside the cache line, but memory constraints, i.e. no place to implement it, can limit its availability.

Memory load/store

The special instructions ldb, ldw, ldd, stb, stw and std are used to access data in the data section. They are plus the DP used to address up to 512 kbytes of data.Read and write to memory occurs without caching.

Stack load/store

The stack has its own group of instructions for stack manipulations. These lay also in a 512 kbyte area.

Program termination

The program terminates when the instruction term is executed, this instruction is replaced by a call to the kernel routine krnl_term.

Syscall and interaction with the hardware

The syscall mechanism is going to be handled over to another COG, for that purpose, an area of HUB memory will be used. This points need working.

Additional hardware

To use more than 32 kbytes of code/data, an external memory is needed. Two possible connection methods have been envisioned, (but more exist). A SRAM or DRAM connected directly to the Propeller executing the LMM kernel, or a SRAM/DRAM connected to a helping circuit, a CPLD or a second propeller. This second approach is being tested as of now. This helping circuit is responsable for obtaining the data to fill the cache lines at a rate of one long every 5 or 6 instructions, generate the lower addresses and to interface with the memory device.