Memory Manager (malloc & free) for Propeller! (DOWNLOAD LINK) UPDATED: 2008-03-

Jasper_M · 2008-03-09 00:47

This object provides you with malloc(bytes) and free(ptr) functions for the SPIN language.

You can (read: must) specify your program's stack usage, so that the heap won't collide with stack. You can also block size (minimum allocation unit) in the configuration. It takes 215 longs of Hub memory (+ of course the heap). Hub memory consumption is not dependent on heap size or block size because memory allocation table is stored in Cog RAM. Speaking of which, this requires one dedicated cog...

A test program is included. It tests that the memory manager works as designed.

New in version 011
- AFAIK the driver works correctly (multiple bugs fixed, real test program)
- Real test program
- Aligned mallocs (can be aligned to 2,4,8,16 times the block size)
- Heap expands from end of memory towards SPIN stack
- malloc returns NULL (0) on failure
- The test program uses Parallax TV Text driver, so it should run on all boards
- Included the MIT license
- Added locks to allow multi-cog usage

Download the object from OBEX:
obex.parallax.com/objects/275/

As the OBEX object is not approved yet, I also attached the driver in this post.

Code example:


OBJ
  pmm : "JAM_PMM_010"
PUB main | x, i
  pmm.start(512) 'Specify our program's stack usage.
  x := pmm.malloc(12) 'Allocate 12 bytes
  if(x) 'If no out-of-memory condition, then...
    'Fill the allocated memory with numbers from 0 to 11
    repeat i from 0 to 11
      BYTE[noparse][[/noparse]x][noparse][[/noparse] i ] := i 
    pmm.free(x) 'Free the allocated memory to be recycled :)

Jasper Mattsson

Post Edited (Jasper_M) : 3/13/2008 8:57:59 PM GMT

deSilva · 2008-03-09 05:00

Good documentation in source; but no copyleft disclaimer at the end

Can you please move the four VARs to DAT? This object is a singleton if ever I have seen one...

Can you think of some way to consider an "alignment", especially 64 bytes.... I see that it can be done a general block size, but I would use it for graphics memory allocation as well. See the point?
It becomes difficult to just put this manually to the end of RAM now - which had never been a good practice BTW.
Also, you could much simpler adjust its size!

Jasper_M · 2008-03-09 12:22

deSilva said...
Good documentation in source; but no copyleft disclaimer at the end

Can you please move the four VARs to DAT? This object is a singleton if ever I have seen one...

Can you think of some way to consider an "alignment", especially 64 bytes.... I see that it can be done a general block size, but I would use it for graphics memory allocation as well. See the point?
It becomes difficult to just put this manually to the end of RAM now - which had never been a good practice BTW.
Also, you could much simpler adjust its size!

Thanks for the feedback : )

... I didn't remember that VAR region is duplicated for each object... (long time, no SPIN [noparse]:)[/noparse])

And come to think of it, I don't know why I didn't allocate the HEAP in the DAT section too... seems incredibly stupid afterwards...

I think I can pretty easily make an aligned version of malloc, with 0 performance hit on unaligned mallocs.

And I'll include the MIT license with the next version.

So the next version will have all the changes you mentioned.

deSilva · 2008-03-09 12:36

@Jsper: Don't rush!

Putting the Heap in the DAT section has two great drawbacks:
- One would have to modify it all the time to adjust it to each program. It would be much better it will dynamically use the space upto the current main-stack pointer (inlcuding some spare). Though I am not sure where to get this main-stack pointe address from, when the object is called from a nother COG with a local stack....

- The complete DAT will be transfered to the Propeller. This will multiply the load time!!!!

Allowing alligned allocation will mean you have to keep book for the padding as well...

Jasper_M · 2008-03-09 12:49

DAT block for the heap could be declared like


heap                    LONG    0[noparse][[/noparse]HEAP_SIZE_LONGS]

So it would still be sufficient to alter only one constant.

Also, the stack pointer when calling PMM is probably not the maximum stack pointer:


pmm.malloc(12) 
some_recursive_or_deep_function 'Stack pointer in this function might be even 100 bytes greater than in malloc

The only problem here would be the load time...

And about the aligned allocation, I can make it search for free memory chunks that only start from addresses that are aligned. Aligned malloc will probably be about 30 or 40% slower than unaligned mallocs.

deSilva · 2008-03-09 13:01

I see no advantage of utilizingg the DAT over growing the heap from the RAM/ROM border...
Each simple heap system has the STACK/HEAP clash problem...
Maybe give the "max heap size" as a safe line, and also check the local (!) top of stack aith each malloc?
This will not help for the local stackframe from a derived COG, but will do no harm either...

Jasper_M · 2008-03-09 13:25

It would at least allow the user to see how much memory is really consumed in the F8 screen of Propeller Tool.
And it would set absolute compiler-enforced limits automatically, which is good in some (or most) microcontroller applications where reliability is important.

Usually memory is allocated first, processing done then. This makes the case where stack overwrites heap more probable than heap overwriting stack, so in the most common case all the protections in malloc would be pretty useless.

On the other hand, your approach would significantly reduce required configuration and allow deep functions run before allocating memory.

It's getting pretty hard to decide...

hippy · 2008-03-09 16:13

Jasper_M said...
It's getting pretty hard to decide...

Perhaps the solution is to have the PMM determine its operation depending on how it is first initialised ?

It could either be passed the address and size of a heap which exists in VAR or DAT space as defined by the end-programmer, or passed a zero/null reference so it starts to allocate from top of memory, or some offset below for cases where a programmer may wish to reserve RAM above heap for a fixed purpose such as video buffer etc.

My personal preference is for OBEX code to be flexible and multi-purpose, easy to use however the end programmer wants to work, with the later removal of unnecessary or unwanted capabilities put into the hands of the end-programmer.

All multi-purpose objects will have some bloat which programmers may not want and a well designed and modularised object will help those who want to cut the bloat out. Beginners who simply want to get things working quickly will appreciate the flexibility and accept the bloat as the price for that.

Jasper_M · 2008-03-09 17:55

hippy said...

Jasper_M said...
It's getting pretty hard to decide...

Perhaps the solution is to have the PMM determine its operation depending on how it is first initialised ?

It could either be passed the address and size of a heap which exists in VAR or DAT space as defined by the end-programmer, or passed a zero/null reference so it starts to allocate from top of memory, or some offset below for cases where a programmer may wish to reserve RAM above heap for a fixed purpose such as video buffer etc.

My personal preference is for OBEX code to be flexible and multi-purpose, easy to use however the end programmer wants to work, with the later removal of unnecessary or unwanted capabilities put into the hands of the end-programmer.

All multi-purpose objects will have some bloat which programmers may not want and a well designed and modularised object will help those who want to cut the bloat out. Beginners who simply want to get things working quickly will appreciate the flexibility and accept the bloat as the price for that.

I began to implement it so that the stack expands from end of the RAM (I noticed that aligned mallocs would be easier and it'd also be easier for the user, as deSilva pointed out). The ASM functions would be so different that it's not practical to make a version that can do both.

Now I'm having a problem with moving parameters from VAR to DAT: If I move the arguments from VAR section to DAT, it'll be very slow - one assignment to DAT section takes almost 500 clock cycles in SPIN, so they'd make about 20% of the time malloc takes. Now I'm trying to make it so that it passes a pointer to vars in function stack...

stevenmess2004 · 2008-03-09 21:30

Yeah, the spin interperter has fast access to the first 8 arguments/local variables and the first 8 VAR variables. Not sure about the DAT section though. One of the words in the first 16 longs of memory is the initial stack pointer. Maybe you could make a start routine for it that got passed the amount of free stack space. You would simply allocate memory from the ROM down to stackStart+stackSize.

Jasper_M · 2008-03-09 22:02

stevenmess2004 said...
Yeah, the spin interperter has fast access to the first 8 arguments/local variables and the first 8 VAR variables. Not sure about the DAT section though. One of the words in the first 16 longs of memory is the initial stack pointer. Maybe you could make a start routine for it that got passed the amount of free stack space. You would simply allocate memory from the ROM down to stackStart+stackSize.

Yeah, I'll try that tomorrow. I mean reading the stack length from Hub RAM.

DAT space requires absolute addressing, because the SPIN interpreter only knows object (VAR) & stack base pointers. Also, I think that in DAT section the SPIN interpreter does some aligning, so that

DAT
             byte x
variable  long y

works. (Not sure though, I've not read that far the source code yet)

stevenmess2004 · 2008-03-10 01:29

Here is the header info from some of hippy's documentation.

That didn't work real well so the file is attached.

Note that the description block is a mix of byte, word and long entries.

                           .----------.
                           |__________|         Long : Frequency (Hz)
                           |__|                 Byte : XTAL mode
                           |__|__               Byte : Checksum
                .----------|_____|              Word : Base of program
            .---|----------|_____|              Word : Base of variables
          .-|---|----------|_____|              Word : Base of stack
          | | .-|----------|_____|              Wopd : Initial program counter
        .-|-|-|-|----------|     |              Word : Initial stack pointer
        | | | | |          `-----'
        | | | | |
        | | | | |          .----------.
        | | | | `-> .------|__________|    +0   Pointer to next object
        | | | |     |   .--|__________|    +1   Pointer to first method
        | | | |     |   |  |__________|
        | | | |     | .-|--|          |    +N   Pointer to last method
        | | | |     | | |  |----------|
        | | | |     | | |  | DAT      |         Data Area
        | | | |     | | |  |----------|
        | | | `---> | | `->| PUB      |
        | | |       | |    :          :         Methods
        | | |       | `--->| PRI      |
        | | |       |      `----------'
        | | |       |      .----------.
        | | `-----> `----->| vars     |        Main Program Variables
        | |                |          |
        | |                `----------'
        | |                .----------.
        | `--------------->| stack    |         Stack Space
        `----------------->|          |
                           `----------'

Jasper_M · 2008-03-10 17:09

Thanks! But I think I'll use


PUB start | stack_var
  stack_top := @stack_var

so that I'll get more accurate representation of what the stack pointer will be.
... No, it's not accurate because stack is usually allocated in VAR section for other cogs... So I'll use the other way

BTW, where's Hippy's documentation located at? (is there more of it?)

Post Edited (Jasper_M) : 3/10/2008 5:25:39 PM GMT

deSilva · 2008-03-10 17:23

Jasper_M said...
Now I'm having a problem with moving parameters from VAR to DAT: If I move the arguments from VAR section to DAT, it'll be very slow - one assignment to DAT section takes almost 500 clock cycles in SPIN, so they'd make about 20% of the time malloc takes. Now I'm trying to make it so that it passes a pointer to vars in function stack...

I am not sure what you are talking about Jasper; The only thing needed was to put the 4 variables from VAR to DAT...

A variable access in SPIN is around 200 ticks; As Steven said it's somewhat faster with the first 8 oder 16 VAR variables...
I doubt your value of 500 ticks = 60µs. If you insist I might even activate a test program to prove the contrary...

What do you mean: "DAT space needs absolute addressing.."???

There is no difference at all between using DAT and VAR space, believe me!

deSilva · 2008-03-10 17:30

Jasper_M said...
Thanks! But I think I'll use
PUB start | stack_var
  stack_top := @stack_var
so that I'll get more accurate representation of what the stack pointer will be.
... No, it's not accurate because stack is usually allocated in VAR section for other cogs...

Yes that's what I have disussed above: It's not "safe" to use that value.
What you can do is use a value given be the user initialization. Then you can also check the@stack_var as you planned.
But using the "official" stack pointer will be best. Of course there cann be always clashes at thenext procedure call with 100 local variables...

Jasper_M · 2008-03-10 17:47

By absolute addressing I meant that the address has to be stored in SPIN bytecode. I was wrong, a 1-byte offset from pbase is enough, assuming that the DAT section is close to PBASE. If not, absolute addressing is used. With VARs you have the 8 shortcut instructions.

I tried out the speed in a laboratory test, and got results that VAR read accesses take 325 cycles on average while dat accesses take 437. It's not as high a difference as in my memory manager speed test, so the problem is somewhere else : /

EDIT: Also, deSilva and others, what do you think that would be a safe default for space that's left for the stack?

deSilva · 2008-03-10 18:02

There is no such value! Only the user knows how his stack will grow.

User behaviour is different. I personally allocate all vectors on the stack, thousends of longs..
Recursive routines are not very popular but that may change when new people come to the propeller with more programming experience and more advanced algorithms..

An initialization parameter can be -
- absolute required value of heap (this is something the user will (has to!) know.
- espected additional stack space /this is something the user is not so sure about, but he can guess)

This will already allow a first clash estimation....

It can also work the other way round: The user gives the expected stack and the initialization routine returns the max heap available. Then the program can decide wether to use a smaller or larger graphics area e.g....

Jasper_M · 2008-03-10 18:09

Now that's a good idea : ) I mean returning the heap space available.

jazzed · 2008-03-10 19:45

Link provided at top mentions "object pending approval" ... guess they're waiting for the license ?

Having looked at the code before (wanting to look some more) I noticed your malloc returns < 0
on failure. Normally malloc returns 0 on failure since that is compatible with address spaces on
being used by most O/S. Just a nit, and I guess it's not that important since Prop is the only device
for this code [noparse]:)[/noparse], but not following generally accepted industry norms can get you into "tricks & traps."

Is this first fit, free, and consolidate ? Perhaps another method was used ? Please elaborate.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
jazzed·... about·living in·http://en.wikipedia.org/wiki/Silicon_Valley

Traffic is slow at times, but Parallax orders·always get here fast 8)

Jasper_M · 2008-03-10 20:22

Yeah, when I was developing the driver, heap began at 0, so it used to be a legal value in the testing phase. I'll change it in next release. Then you can do


x := pmm.malloc(10)
if(x)
 ...

too.

The algorithm is first-fit. It uses a table that has 2-bit states of all the blocks. 0 means free, 1-3 mean allocated. It scans them, finds the first fit and marks it as allocated with a number (called color in the source code) 1-3. Before marking it as allocated, it checks the color of the block before the allocated chunk and after it, and then makes sure it's not the same color as either of those (so that when freed, it can be distinguished from its neighbours).

There's one optimization - scanning the table begins either from the address after the last allocation or the address of last free (in the case the freed address is less than the address after last malloc). If no fit is found, the scan is repeated, this time starting from 0. This causes memory fragmentation in some cases, but significantly increases (doubles) speed when mallocs happen sequentially.

deSilva · 2008-03-10 20:29

I just stumbled over Peter Verkaik's HEAP Object: obex.parallax.com/objects/241/

Post Edited (deSilva) : 3/10/2008 8:50:01 PM GMT

Jasper_M · 2008-03-10 20:38

Oh... Didn't notice that one before... Looks like it's 100% SPIN and linked-list based. One block has a 4-byte Hub RAM overhead. But it won't need a dedicated cog... And if you remove integrity, isValid, available you can squeeze it into just over 100 longs...

Jasper_M · 2008-03-10 21:08

Serious problems with the driver... I'll fix them probably tomorrow.

Same addresses get allocated twice in certain cases. -.-''

jazzed · 2008-03-10 21:17

Jasper,

I take it your heap grows up ? Growing down is simpler, and knowing
stack end while important to understand, is not necessary at start.
Using a dedicated block of memory helps avoid stack collisions too,
but in a cog, it's not clear to me at least how one would do that. Also
growing down provides fast allocations, but memory reuse is slower
... give and take [noparse]:)[/noparse].

I've seen examples of memory management where special numbers
are used to indicate the state of memory ... i.e. virgin, freed, reclaimed
and in use (of course) with the first 3 having patterns like 0, 6's, 7's.
Linux and other virtual memory systems have identifyable numbering.
Your use of "color" is more or less a demark ... for debugger acuity ?

Last time I looked, Peter's heap did not allow reusing freed memory. I
wrote a malloc recently in spin similar to his heap implementation, but
grows down, allows for easily changing the data width, memory reuse,
reclaimation, and requires less memory. Also built slingly linked list,
stack, and queue too, but I've been too busy with other stuff to test
it thoughroughly for a release. Not trying to hijack your thread ....

This kind of stuff interests me greatly ... [noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
jazzed·... about·living in·http://en.wikipedia.org/wiki/Silicon_Valley

Traffic is slow at times, but Parallax orders·always get here fast 8)

Jasper_M · 2008-03-10 21:53

Allocation table in cog memory grows up. In the released (BUGGY!) version the heap grows up. In the version I'm working on (probably release tomorrow), the heap grows down, but the allocation table in cog ram still grows up. The address is just converted before returning. And in my previous post, term block didn't mean a block returned by malloc but the smallest possible allocation unit - the table holds the separate status of each 8 bytes in memory. It does not store information like allocation length and start address.

And the term color is loosely related to graph coloring, because adjacent memory chunks cannot be of same color.

mcstar · 2008-03-12 17:37

Speaking of memory managers, Chad George· released LockPool and LockQueue last July which I've used successfully with several projects.· In one case I'm communicating with another device and don't know the message size until runtime.· Using LockPool I ask for a chunk of memory and get back a long with the size (2 high bytes)·and location of the memory allocated (2 low bytes). Then I push that onto the LockQueue.· Later I can pop the value from the queue which gives me the information I can use elsewhere to read/write the memory block.· My avatar shows·the·display of an·early test program which populates and deallocates memory using the manager.

·

Jasper_M · 2008-03-13 20:54

mcstar said...
Speaking of memory managers, Chad George released LockPool and LockQueue last July which I've used successfully with several projects. In one case I'm communicating with another device and don't know the message size until runtime. Using LockPool I ask for a chunk of memory and get back a long with the size (2 high bytes) and location of the memory allocated (2 low bytes). Then I push that onto the LockQueue. Later I can pop the value from the queue which gives me the information I can use elsewhere to read/write the memory block. My avatar shows the display of an early test program which populates and deallocates memory using the manager.

Those sound interesting, I have to check those out. BTW, could you tell more specifically what are the projects you're using that with : ) I'm curious...

Also, thread updated, new version. Please use shift+del on the previous version.

mcstar · 2008-03-14 02:54

Early on I just wrote simple tests,·a hex viewer, etc.
One that I'm working on now is a driver that communicates with TI caculators.· It lets the prop send/recieve data to/from the calc.· When I get it done I plan to use the calc as a portable keyboard/display for the prop. The memory manager comes in handy because data needs to be recieved in large blocks from the calc to implement the protocol properly.· I have no way to know how large the data structures are at design time, so I allocate memory as needed at runtime when to hold data from the calc.· The·first couple·bytes·recieved tell me·data type and size information.· I use the LockQueue to implement a memory table for all the data structures.· Take note, today I found what might be a bug in the LockPool.· It still works, but you may notice the bug when you start dealing with larger data structures.· I'm working on a fix for it and will post an update if I can confirm/fix it.

·Here's the link to thread with the spin object from Chad George
http://forums.parallax.com/showthread.php?p=665326

UPDATE : The "Bug" I· found in the LockPool as it turns out is only a lack of bounds checking.· If you specify a block size of > 1 (say 4), then you can easily/inadvertantly get memory allocations outside the range of your allocated variable space if you are assuming a byte size of allocation.· In my case, I was allocating 1024 bytes but my block size was 4 so, when I asked for a block of 128 (I was thinking bytes), but the Alloc method it turns out, searched for a block of free space of 128*4 bytes, and I was getting locations outside of my expected memory pool.· Just keep this mind in if you need to use this object.· Do your math first.

Post Edited (mcstar) : 3/14/2008 9:00:40 PM GMT

Jasper_M · 2008-03-15 12:01

Cool : ) Thanks for the link, I will check it out.

What calculators are you making the program for? Z80 or MC68000 ones? The interface could also be used to import data from a propeller-based datalogger or ADC to the calculator for analysis : )

mcstar · 2008-03-16 01:34

The calculator I'm using is the TI-92.· It has·a decent screen and a full qwerty keyboard.

Jasper_M said...
....·The interface could also be used to import data from a propeller-based datalogger or ADC to the calculator for analysis : )

That is exactly one of the uses I have in mind.· Since the calculator can do linear/quad/other regression, you can do all kinds of cool things with real world inputs.· It also has great statistics support.· I like the data types supported by the TI-92 as well. It supports
[*][url=]Expressions[/url] [*][url=]Lists[/url] [*][url=]Matrices[/url] [*][url=]Data Variables[/url] [*][url=]Text Files[/url] [*][url=]Strings[/url] [*][url=]GDB's[/url] [*][url=]Figures[/url] [*][url=]Pictures[/url] [*][url=]Programs[/url] [*][url=]Functions[/url] [*][url=]Macros[/url]

Once I get propeller code that can read/write/interpret some of these datatypes, I'll have the makings of a decent OS.· Perhaps that's a bit ambitious.· First step is to get where I can transfer files, and read/write lists.

stevenmess2004 · 2008-03-16 01:52

I like this. I have a TI89 which should probably work as well. I wonder if we can write a small asm program for the calculator so that the calculator works like propTerminal? Than we would have a screen, keyboard, something to solve complicated equations and a propeller to do sensors and other things as well (like an SD card).

Memory Manager (malloc & free) for Propeller! (DOWNLOAD LINK) UPDATED: 2008-03-

Comments