Software Mutex in SPIN

Andrew E Mileski · 2015-10-14 00:56

Thought I'd toss this against the wall and see if it sticks.

Requires 15 bytes allocated per mutex for 8 clients (one client per cog), max 128 clients is possible (a whopping 255 bytes per mutex) for those doing time-slicing.

Unfortunately, one cannot really "try" the lock without actually locking, as testing takes many cycles. It could be added though.

I know all this can be optimized further.

Note to self: I need to code, or find, a tiny & efficient heap allocator.

{{
mutex.spin

Author:  Andrew E. Mileski <andrewm@isoar.ca>
License:  MIT
Version:  0.1

The following SPIN code allows for up to 128 clients per mutex.

Each mutex requires 2 * clients - 1 bytes of memory.

Clients must be manually assigned a unique consecutive numeric ID from 0 to 127.

Mutexes are referred-to by address, so that they can be accessed across cogs.

Filter algorithm: Peterson's algorithm for N processes
http://en.wikipedia.org/wiki/Peterson%27s_algorithm

TLDR:
* The mutex is priority based:  first-in-first-out (FIFO).
* Memory read & write must be atomic (atomic read-modify-write not required).
* There are N lock levels for N clients.
* The client with the higest lock level holds the lock, and all other clients wait.
* Every lock level has at most one less waiting client than any lower lock level.
* A client can only advance to the next lock level if all higher lock levels are unocupied.
}}
CON
                LEVEL_RELEASED  = -1    ' Lock levels must be higher
                INVALID_ID      = -1    ' Client ID's must be higher
                
VAR
                ' Max number of clients for this mutex
                byte    clients
                
                ' Address of array of current lock level for each client
                word    addr_clients

                ' Address of array of client that is waiting on lock level
                word    addr_waits
                                                
PUB Start(_addr_mutex, _max_clients) : next_addr
{{
Purpose:  Setup a mutex

Parameters:
_addr_mutex     Address of the mutex data (not allocated here).
_max_clients    Maximum number of clients allowed (max. 127, but depends on memory allocated)

Return:
next_addr       The address of the byte following the mutex data.
}}
                clients := _max_clients
                addr_clients := _addr_mutex
                addr_waits := addr_clients + clients
                next_addr := addr_waits + clients
                
                bytefill(addr_clients, LEVEL_RELEASED, clients)
                bytefill(addr_waits, INVALID_ID, clients - 1)

PUB Stop

                return

PUB GetMutex : mutex
{{
Purpose:  Get the address of the mutex data.

Return:
Address of the mutex data.
}}
                mutex := addr_clients

PUB GetMutexSize(_clients) : mutex_size

                mutex_size := (_clients << 1) - 1
                                
PUB Aquire(_client_id) | addr_client, addr_wait, client, level
{{
Purpose:  Aquire the lock (busy-wait until it is available).

Parameters:
_client_id      Client ID of the client to aquire the lock for.

Return:
0, always succeeds.
}}

                ' Optimization, used more than once
                addr_client := addr_clients + _client_id
                
                repeat level from 0 to (clients - 1)

                        ' Optimization, used more than once
                        addr_wait := addr_waits + level
                        
                        ' Set client's lock level to current level
                        BYTE[addr_client] := level

                        ' Last client entering a level waits
                        BYTE[addr_wait] := _client_id
                
                        ' Busy-wait while there are other clients ahead of this client
                        repeat

                                ' Can advance if another client entered this level after this client
                                if (BYTE[addr_wait] <> _client_id)
                                        quit

                                ' Check all other clients
                                repeat client from 0 to (clients - 1)

                                        ' Ignore the current client
                                        if (client == _client_id)
                                                next

                                        ' Check if there is another client at same or higher level
                                        if (~BYTE[addr_clients + client] => level)
                                                quit

                                ' Check if can advance to next level (no clients ahead)
                                if (client == clients)
                                        quit
 
PUB Release(_client_id)
{{
Purpose:  Release the lock.

Parameters:
_client_id      Client ID of the client to release the lock for.

Return:
0, always succeeds.
}}
        BYTE[addr_clients + _client_id] := LEVEL_RELEASED

EDIT: Just realized I had the wrong license. Meant MIT (corrected).

Mike Green · 2015-10-14 03:39

There are no atomic read/modify/write memory cycles in hub memory. The only instructions that access hub memory directly are RDxxxx and WRxxxx (COGINIT indirectly reads from hub memory to load a cog image). All of the operators in Spin that appear to do read/modify/write are not atomic. Only the LOCKxxx instructions modify/test a shared resource atomically. RDxxxx and WRxxxx are atomic in that the entire storage value whether byte, word, or long is handled as a single atomic entity.

In "if (~BYTE[addr_clients + client] => level)" what happens if two cogs are running and both fetch the same non-zero BYTE[addr_clients + client] before either cog can set the byte to zero?

Andrew E Mileski · 2015-10-16 09:50

SPIN doesn't change anything. Reading / Writing to hub memory may take many cycles to setup, but once it finally starts, a RD* or WR* machine instruction in the SPIN interpreter is atomic. SPIN also doesn't re-order operations, nor optimize.

The algorithm was designed for systems that lack read-modify-write operations. It is overkill when those operations exist.

The algorithm is cleverly designed (I didn't invent it) to break ties. It is basically a funnel, narrowing by AT LEAST one client every level. At the final level, only one client remains, and can pass through the lock.

Consider BYTE[addr_waits + 0], the entry to the mutex. Every client writes to it first. Some client is going to be last, because hub memory is shared (via TDM, and not multi-ported). Whomever is last cannot advance.

The real key to it all seems to be that advancing to the next level also requires that ALL higher levels be unoccupied.
i.e. if there are 10 levels, and one client is at level 7, a new client entering can't advance to level 6, instead it remains at level 0.

Software Mutex in SPIN

Comments