Memory Manager (malloc & free) for Propeller! (DOWNLOAD LINK) UPDATED: 2008-03-
Jasper_M
Posts: 222
This object provides you with malloc(bytes) and free(ptr) functions for the SPIN language.
You can (read: must) specify your program's stack usage, so that the heap won't collide with stack. You can also block size (minimum allocation unit) in the configuration. It takes 215 longs of Hub memory (+ of course the heap). Hub memory consumption is not dependent on heap size or block size because memory allocation table is stored in Cog RAM. Speaking of which, this requires one dedicated cog...
A test program is included. It tests that the memory manager works as designed.
New in version 011
- AFAIK the driver works correctly (multiple bugs fixed, real test program)
- Real test program
- Aligned mallocs (can be aligned to 2,4,8,16 times the block size)
- Heap expands from end of memory towards SPIN stack
- malloc returns NULL (0) on failure
- The test program uses Parallax TV Text driver, so it should run on all boards
- Included the MIT license
- Added locks to allow multi-cog usage
Download the object from OBEX:
obex.parallax.com/objects/275/
As the OBEX object is not approved yet, I also attached the driver in this post.
Code example:
Jasper Mattsson
Post Edited (Jasper_M) : 3/13/2008 8:57:59 PM GMT
You can (read: must) specify your program's stack usage, so that the heap won't collide with stack. You can also block size (minimum allocation unit) in the configuration. It takes 215 longs of Hub memory (+ of course the heap). Hub memory consumption is not dependent on heap size or block size because memory allocation table is stored in Cog RAM. Speaking of which, this requires one dedicated cog...
A test program is included. It tests that the memory manager works as designed.
New in version 011
- AFAIK the driver works correctly (multiple bugs fixed, real test program)
- Real test program
- Aligned mallocs (can be aligned to 2,4,8,16 times the block size)
- Heap expands from end of memory towards SPIN stack
- malloc returns NULL (0) on failure
- The test program uses Parallax TV Text driver, so it should run on all boards
- Included the MIT license
- Added locks to allow multi-cog usage
Download the object from OBEX:
obex.parallax.com/objects/275/
As the OBEX object is not approved yet, I also attached the driver in this post.
Code example:
OBJ pmm : "JAM_PMM_010" PUB main | x, i pmm.start(512) 'Specify our program's stack usage. x := pmm.malloc(12) 'Allocate 12 bytes if(x) 'If no out-of-memory condition, then... 'Fill the allocated memory with numbers from 0 to 11 repeat i from 0 to 11 BYTE[noparse][[/noparse]x][noparse][[/noparse] i ] := i pmm.free(x) 'Free the allocated memory to be recycled :)
Jasper Mattsson
Post Edited (Jasper_M) : 3/13/2008 8:57:59 PM GMT
zip
22K
Comments
Can you please move the four VARs to DAT? This object is a singleton if ever I have seen one...
Can you think of some way to consider an "alignment", especially 64 bytes.... I see that it can be done a general block size, but I would use it for graphics memory allocation as well. See the point?
It becomes difficult to just put this manually to the end of RAM now - which had never been a good practice BTW.
Also, you could much simpler adjust its size!
Thanks for the feedback : )
... I didn't remember that VAR region is duplicated for each object... (long time, no SPIN [noparse]:)[/noparse])
And come to think of it, I don't know why I didn't allocate the HEAP in the DAT section too... seems incredibly stupid afterwards...
I think I can pretty easily make an aligned version of malloc, with 0 performance hit on unaligned mallocs.
And I'll include the MIT license with the next version.
So the next version will have all the changes you mentioned.
Putting the Heap in the DAT section has two great drawbacks:
- One would have to modify it all the time to adjust it to each program. It would be much better it will dynamically use the space upto the current main-stack pointer (inlcuding some spare). Though I am not sure where to get this main-stack pointe address from, when the object is called from a nother COG with a local stack....
- The complete DAT will be transfered to the Propeller. This will multiply the load time!!!!
Allowing alligned allocation will mean you have to keep book for the padding as well...
So it would still be sufficient to alter only one constant.
Also, the stack pointer when calling PMM is probably not the maximum stack pointer:
The only problem here would be the load time...
And about the aligned allocation, I can make it search for free memory chunks that only start from addresses that are aligned. Aligned malloc will probably be about 30 or 40% slower than unaligned mallocs.
Each simple heap system has the STACK/HEAP clash problem...
Maybe give the "max heap size" as a safe line, and also check the local (!) top of stack aith each malloc?
This will not help for the local stackframe from a derived COG, but will do no harm either...
And it would set absolute compiler-enforced limits automatically, which is good in some (or most) microcontroller applications where reliability is important.
Usually memory is allocated first, processing done then. This makes the case where stack overwrites heap more probable than heap overwriting stack, so in the most common case all the protections in malloc would be pretty useless.
On the other hand, your approach would significantly reduce required configuration and allow deep functions run before allocating memory.
It's getting pretty hard to decide...
Perhaps the solution is to have the PMM determine its operation depending on how it is first initialised ?
It could either be passed the address and size of a heap which exists in VAR or DAT space as defined by the end-programmer, or passed a zero/null reference so it starts to allocate from top of memory, or some offset below for cases where a programmer may wish to reserve RAM above heap for a fixed purpose such as video buffer etc.
My personal preference is for OBEX code to be flexible and multi-purpose, easy to use however the end programmer wants to work, with the later removal of unnecessary or unwanted capabilities put into the hands of the end-programmer.
All multi-purpose objects will have some bloat which programmers may not want and a well designed and modularised object will help those who want to cut the bloat out. Beginners who simply want to get things working quickly will appreciate the flexibility and accept the bloat as the price for that.
I began to implement it so that the stack expands from end of the RAM (I noticed that aligned mallocs would be easier and it'd also be easier for the user, as deSilva pointed out). The ASM functions would be so different that it's not practical to make a version that can do both.
Now I'm having a problem with moving parameters from VAR to DAT: If I move the arguments from VAR section to DAT, it'll be very slow - one assignment to DAT section takes almost 500 clock cycles in SPIN, so they'd make about 20% of the time malloc takes. Now I'm trying to make it so that it passes a pointer to vars in function stack...
Yeah, I'll try that tomorrow. I mean reading the stack length from Hub RAM.
DAT space requires absolute addressing, because the SPIN interpreter only knows object (VAR) & stack base pointers. Also, I think that in DAT section the SPIN interpreter does some aligning, so that
works. (Not sure though, I've not read that far the source code yet)
That didn't work real well so the file is attached.
so that I'll get more accurate representation of what the stack pointer will be.
... No, it's not accurate because stack is usually allocated in VAR section for other cogs... So I'll use the other way
BTW, where's Hippy's documentation located at? (is there more of it?)
Post Edited (Jasper_M) : 3/10/2008 5:25:39 PM GMT
I am not sure what you are talking about Jasper; The only thing needed was to put the 4 variables from VAR to DAT...
A variable access in SPIN is around 200 ticks; As Steven said it's somewhat faster with the first 8 oder 16 VAR variables...
I doubt your value of 500 ticks = 60µs. If you insist I might even activate a test program to prove the contrary...
What do you mean: "DAT space needs absolute addressing.."???
There is no difference at all between using DAT and VAR space, believe me!
Yes that's what I have disussed above: It's not "safe" to use that value.
What you can do is use a value given be the user initialization. Then you can also check the@stack_var as you planned.
But using the "official" stack pointer will be best. Of course there cann be always clashes at thenext procedure call with 100 local variables...
I tried out the speed in a laboratory test, and got results that VAR read accesses take 325 cycles on average while dat accesses take 437. It's not as high a difference as in my memory manager speed test, so the problem is somewhere else : /
EDIT: Also, deSilva and others, what do you think that would be a safe default for space that's left for the stack?
User behaviour is different. I personally allocate all vectors on the stack, thousends of longs..
Recursive routines are not very popular but that may change when new people come to the propeller with more programming experience and more advanced algorithms..
An initialization parameter can be -
- absolute required value of heap (this is something the user will (has to!) know.
- espected additional stack space /this is something the user is not so sure about, but he can guess)
This will already allow a first clash estimation....
It can also work the other way round: The user gives the expected stack and the initialization routine returns the max heap available. Then the program can decide wether to use a smaller or larger graphics area e.g....
Having looked at the code before (wanting to look some more) I noticed your malloc returns < 0
on failure. Normally malloc returns 0 on failure since that is compatible with address spaces on
being used by most O/S. Just a nit, and I guess it's not that important since Prop is the only device
for this code [noparse]:)[/noparse], but not following generally accepted industry norms can get you into "tricks & traps."
Is this first fit, free, and consolidate ? Perhaps another method was used ? Please elaborate.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
jazzed·... about·living in·http://en.wikipedia.org/wiki/Silicon_Valley
Traffic is slow at times, but Parallax orders·always get here fast 8)
too.
The algorithm is first-fit. It uses a table that has 2-bit states of all the blocks. 0 means free, 1-3 mean allocated. It scans them, finds the first fit and marks it as allocated with a number (called color in the source code) 1-3. Before marking it as allocated, it checks the color of the block before the allocated chunk and after it, and then makes sure it's not the same color as either of those (so that when freed, it can be distinguished from its neighbours).
There's one optimization - scanning the table begins either from the address after the last allocation or the address of last free (in the case the freed address is less than the address after last malloc). If no fit is found, the scan is repeated, this time starting from 0. This causes memory fragmentation in some cases, but significantly increases (doubles) speed when mallocs happen sequentially.
Post Edited (deSilva) : 3/10/2008 8:50:01 PM GMT
Same addresses get allocated twice in certain cases. -.-''
I take it your heap grows up ? Growing down is simpler, and knowing
stack end while important to understand, is not necessary at start.
Using a dedicated block of memory helps avoid stack collisions too,
but in a cog, it's not clear to me at least how one would do that. Also
growing down provides fast allocations, but memory reuse is slower
... give and take [noparse]:)[/noparse].
I've seen examples of memory management where special numbers
are used to indicate the state of memory ... i.e. virgin, freed, reclaimed
and in use (of course) with the first 3 having patterns like 0, 6's, 7's.
Linux and other virtual memory systems have identifyable numbering.
Your use of "color" is more or less a demark ... for debugger acuity ?
Last time I looked, Peter's heap did not allow reusing freed memory. I
wrote a malloc recently in spin similar to his heap implementation, but
grows down, allows for easily changing the data width, memory reuse,
reclaimation, and requires less memory. Also built slingly linked list,
stack, and queue too, but I've been too busy with other stuff to test
it thoughroughly for a release. Not trying to hijack your thread ....
This kind of stuff interests me greatly ... [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
jazzed·... about·living in·http://en.wikipedia.org/wiki/Silicon_Valley
Traffic is slow at times, but Parallax orders·always get here fast 8)
And the term color is loosely related to graph coloring, because adjacent memory chunks cannot be of same color.
·
Those sound interesting, I have to check those out. BTW, could you tell more specifically what are the projects you're using that with : ) I'm curious...
Also, thread updated, new version. Please use shift+del on the previous version.
One that I'm working on now is a driver that communicates with TI caculators.· It lets the prop send/recieve data to/from the calc.· When I get it done I plan to use the calc as a portable keyboard/display for the prop. The memory manager comes in handy because data needs to be recieved in large blocks from the calc to implement the protocol properly.· I have no way to know how large the data structures are at design time, so I allocate memory as needed at runtime when to hold data from the calc.· The·first couple·bytes·recieved tell me·data type and size information.· I use the LockQueue to implement a memory table for all the data structures.· Take note, today I found what might be a bug in the LockPool.· It still works, but you may notice the bug when you start dealing with larger data structures.· I'm working on a fix for it and will post an update if I can confirm/fix it.
·Here's the link to thread with the spin object from Chad George
http://forums.parallax.com/showthread.php?p=665326
UPDATE : The "Bug" I· found in the LockPool as it turns out is only a lack of bounds checking.· If you specify a block size of > 1 (say 4), then you can easily/inadvertantly get memory allocations outside the range of your allocated variable space if you are assuming a byte size of allocation.· In my case, I was allocating 1024 bytes but my block size was 4 so, when I asked for a block of 128 (I was thinking bytes), but the Alloc method it turns out, searched for a block of free space of 128*4 bytes, and I was getting locations outside of my expected memory pool.· Just keep this mind in if you need to use this object.· Do your math first.
Post Edited (mcstar) : 3/14/2008 9:00:40 PM GMT
What calculators are you making the program for? Z80 or MC68000 ones? The interface could also be used to import data from a propeller-based datalogger or ADC to the calculator for analysis : )
[*][url=]Expressions[/url] [*][url=]Lists[/url] [*][url=]Matrices[/url] [*][url=]Data Variables[/url] [*][url=]Text Files[/url] [*][url=]Strings[/url] [*][url=]GDB's[/url] [*][url=]Figures[/url] [*][url=]Pictures[/url] [*][url=]Programs[/url] [*][url=]Functions[/url] [*][url=]Macros[/url]
Once I get propeller code that can read/write/interpret some of these datatypes, I'll have the makings of a decent OS.· Perhaps that's a bit ambitious.· First step is to get where I can transfer files, and read/write lists.