Locks on the P2
RossH
Posts: 5,660
in Propeller 2
Hello all
I'm having some trouble using locks on the P2. They seem to operate quite differently to the locks on the P1.
I was aware that the result of the P2 "locktry" instruction (i.e. the carry flag) is apparently the reverse of the P1 "lockset" instruction, but the differences seem to go deeper than that.
On the P1, only one "lockset" instruction - executed on any cog - would return the result that the lock had been acquired. But on the P2 the "locktry" operation seems to return that result every time it is executed on the same cog once the lock is acquired - i.e. the lock seems to belong to the whole cog, rather than to any particular program executing on that cog.
Can anyone confirm that this is correct?
Thanks!
I'm having some trouble using locks on the P2. They seem to operate quite differently to the locks on the P1.
I was aware that the result of the P2 "locktry" instruction (i.e. the carry flag) is apparently the reverse of the P1 "lockset" instruction, but the differences seem to go deeper than that.
On the P1, only one "lockset" instruction - executed on any cog - would return the result that the lock had been acquired. But on the P2 the "locktry" operation seems to return that result every time it is executed on the same cog once the lock is acquired - i.e. the lock seems to belong to the whole cog, rather than to any particular program executing on that cog.
Can anyone confirm that this is correct?
Thanks!

Comments
https://docs.google.com/document/d/1UnelI6fpVPHFISQ9vpLzOVa8oUghxpI6UpkXVsYgBEQ/edit?usp=sharing
I changed the way they worked to make them more robust for managing debugging. I can't remember the details of the "whys" at the moment.
Yes, I read that - it seemed to confirm what I am seeing in practice - i.e. that locks now belong to the entire cog, and can no longer be used as semaphores to protect critical code segments within a cog. I will try to implement my own
I guess the more interesting case is when you need protection across COGs at the same time as within a COG. Some multi-core RTOS or something weird like that.
I have tested running 1500 threads on multiple cogs (I used to only be able to run 80 per cog on the P1!) and it works fine - except for the locks
/***************************************************************************\ * * * Multiple Thread Demo * * * * Demonstrates many threads executing concurrently on a single cog * * * \***************************************************************************/ /* * include Catalina multi-threading: */ #include <catalina_threads.h> /* * include some useful multi-threading utility functions: */ #include <thread_utilities.h> /* * define how many threads we want: */ #ifdef __CATALINA_P2 #define THREAD_COUNT 300 // can go higher, but things start to slow down! #else #define THREAD_COUNT 80 // (just barely on the Propeller 1!) #endif /* * define the stack size each thread needs (since this number depends on the * function executed by the thread, the smallest possible stack size has to be * established by trial and error): */ #define STACK_SIZE (MIN_THREAD_STACK_SIZE + 45) /* * define the number of thread locks we need: */ #define NUM_LOCKS 1 /* * define some global variables that all threads will share: */ static int ping; /* * a pool of thread locks - note that the pool must be 5 bytes larger than * the actual number of locks required (MIN_THREAD_POOL_SIZE = 5) */ static char pool[MIN_THREAD_POOL_SIZE + NUM_LOCKS]; static int lock; /* * function : this function can be executed as a thread. */ int function(int me, char *not_used[]) { while (1) { if (ping == me) { // print our id _thread_printf(pool, lock, "%d ", (unsigned)me); ping = 0; } else { // nothing to do, so yield _thread_yield(); } } return 0; } /* * main : start up to THREAD_COUNT threads, then ping each one in turn */ int main(void) { int i = 0; int lock; void *thread_id; unsigned long stacks[STACK_SIZE * THREAD_COUNT]; // assign a lock to avoid context switch contention _thread_set_lock(_locknew()); // initialize a pool of thread locks _thread_init_lock_pool (pool, NUM_LOCKS, _locknew()); // assign a thread lock to avoid plugin contention lock = _thread_locknew(pool); _thread_printf(pool, lock, "Press a key to start\n"); k_wait(); // start instances of function until we have started THREAD_COUNT of them for (i = 1; i <= THREAD_COUNT; i++) { thread_id = _thread_start(&function, &stacks[STACK_SIZE*i], i, NULL); _thread_printf(pool, lock, "thread %d ", i); if (thread_id == (void *)0) { _thread_printf(pool, lock, " failed to start\n"); while (1) { }; } else { _thread_printf(pool, lock, " started, id = %d\n", (unsigned)thread_id); } } // now loop forever, pinging each thread in turn while (1) { _thread_printf(pool, lock, "\n\nPress a key to ping all threads\n"); k_wait(); for (i = 1; i <= THREAD_COUNT; i++) { _thread_printf(pool, lock, "%d:", i); // ping the thread ping = i; // wait till thread responds while (ping) { // nothing to do, so yield _thread_yield(); }; } } return 0; }None of the bit setting instructions have an equivalent though, so you have to use a whole register for each lock.
EDIT: INCMOD/DECMOD can do this too.
No, it is not based on co-routines (if that's what you mean by co-operative). You may have thought so because of the "yield" operations shown in the example. However, these are not necessary, and the program works with them removed - they are included so that a thread that finds it has nothing useful to do can tell the kernel that it can context switch to another thread if there are any waiting (otherwise it does nothing).
But it is also not pre-emptive. There is just a simple round-robin scheduler built into each multi-threading kernel. And yes, on the P1 it works without interrupts. I may modify it to use interrupts on the P2 - in fact, I will need to for the new "NATIVE" mode, when there is no actual kernel that can do the task scheduling.
Ross.
Thanks. I will investigate. However, I have to be able to implement locks without using up cog resources for each one. If you are running thousands of threads and each one needs a lock (for some reason) then you would soon run out of cog resources!
With the P1-style semaphores, I can implement as many thread locks as I need using just one "true" lock and some hub RAM. But this fails on the P2, because the locks are not true semaphores.
There will be a solution - I just don't know what it is yet!
Yes, this might work. I would have to use one hub lock to resolve inter-cog conflicts, plus one register per cog to prevent intra-cog conflicts.
Thanks.
EDIT: Okay, yes, these are the best for the job. BITH both sets the target bit and returns its prior state. Dunno why I thought otherwise now.
DAT ' set_lock : return with carry flag set if we successfully set the lock set_lock decmod lock,max wc ' if we set the lock then C will be set if_nc incmod lock,max ' we did not set the lock, so restore it ret ' clr_lock : we must clear the lock to allow others to set it clr_lock _ret_ incmod lock,max ' release the lock ' lock variables : lock long 0 ' lock must initially be zero max long 10000 ' must be larger than max number of threadsCan anyone see any problems, or improve on this?
Thanks!
EDIT: Here's an example using BITH and BITL (limited to 32 locks):
Yes, your bith/bitl solution looks better than mine!
No need to CALL it, even. Just put the instruction wherever it's needed.
And again, if anyone can see something wrong or has an improvement, all suggestions welcome!
' Simulating P1-style locks on the P2 ... ' set_lock : return with C=1 and Z=0 (i.e. C_AND_NZ) if we get the lock. ' note we must get both inter-cog and intra-cog locks. set_lock bith lock,#31 wcz ' can we get intra-cog lock? if_nz locktry lock wc ' Z=0 means yes - can we get inter-cog lock? if_nz_and_nc bitl lock,#31 ' C=0 means no - release intra-cog lock ret ' clr_lock : release both locks. clr_lock lockrel lock ' release inter-cog lock bitl lock,#31 ' release intra-cog lock ret ' lock : bits 3:0 hold the number of the inter-cog lock, ' while bit 31 is the actual intra-cog lock lock long 0EDIT: Oops! Must use wcz with bith. Why?
Plus, there are logical flag operators for TESTB/TESTBN:
TESTB D,{#}S WC/WZ TESTBN D,{#}S WC/WZ TESTB D,{#}S ANDC/ANDZ TESTBN D,{#}S ANDC/ANDZ TESTB D,{#}S ORC/ORZ TESTBN D,{#}S ORC/ORZ TESTB D,{#}S XORC/XORZ TESTBN D,{#}S XORC/XORZ BITL D,{#}S {WCZ} BITH D,{#}S {WCZ} BITC D,{#}S {WCZ} BITNC D,{#}S {WCZ} BITZ D,{#}S {WCZ} BITNZ D,{#}S {WCZ} BITRND D,{#}S {WCZ} BITNOT D,{#}S {WCZ}I think it should be "if_nz" ... because BITH returns the prior state, not the change of state. C/Z comes back low for a successful try.
Yes, you are correct. Amended.
Here is a more sophisticated multi-threading demo - this program runs 5 multi-threaded kernel cogs (4 started dynamically) and then 50 threads. The threads wander around between the kernel cogs, moving themselves from cog to cog randomly. As usual, this program is compiled for the P2 EVAL board, serial interface, 230400 baud.
/***************************************************************************\ * * * Thread Affinity Demo * * * * Demonstrates changing the affinity of a thread * * * * (i.e. moving threads between kernels running on different cogs) * * * \***************************************************************************/ /* * include Catalina multi-threading functions: */ #include <catalina_threads.h> /* * include some useful multi-threading utility functions: */ #include <thread_utilities.h> /* * define how many additional kernel cogs we want (note: there must * be this many free cogs available!): */ #define NUM_KERNELS 4 /* * define how many threads we want per kernel: */ #define NUM_THREADS 10 /* * define how many thread locks we want (we only really need 1): */ #define NUM_LOCKS 1 /* * define the stack size for each kernel cog and each thread: */ #define STACK_SIZE (MIN_THREAD_STACK_SIZE + 100) /* * global variables that all multi-threaded cogs will share ... */ /* * flag to tell all kernels to start their threads: */ static int start_threads; /* * flag to tell all threads to start switching between kernels: */ static int start_switching; /* * a lock to use to avoid kernel contention (all kernels must use * the same lock for this purpose) */ static int kernel_lock; /* * a pool of thread locks - note that the pool must be 5 bytes larger than * the actual number of locks required (MIN_THREAD_POOL_SIZE = 5): */ static char pool[MIN_THREAD_POOL_SIZE + NUM_LOCKS]; /* * The particular thread lock (out of the pool above) that we will use to * protect our HMI functions: */ static int hmi_lock; /* * cogs running multithreading kernels notify the threads that they are * available by putting a 1 in this array: */ static int kernel[8] = { 0 }; /* * thread_function : this function can be started as a thread. It runs on the * cog it is started for a while, then moves itself to the * next available cog (cogs running multi-threading kernels * are indicated by the value 1 in the kernel array). */ int thread_function(int argc, char *argv[]) { void *me = _thread_id(); int old_cog; int new_cog; // get our initial cog old_cog = _cogid(); // print where we were started _thread_printf(pool, hmi_lock, "Thread %d (%s) started on cog %d\n", argc, argv[0], old_cog); // wait until we are told to start switching while (!start_switching) { _thread_yield(); } while (1) { // wait a random time (to mix things up a little, but // not go so fast that we can't read the messages!) _thread_wait(200*random(5)); // get our current cog old_cog = _cogid(); // find the next available multi-threading kernel new_cog = old_cog; do { new_cog = (new_cog + 1) % 8; } while (kernel[new_cog] == 0); // 50% of the time, move ourselves to the new kernel if (random(100) > 50) { _thread_affinity_change (me, new_cog); } // get our new new cog new_cog = _cogid(); // print a message if we moved if ((new_cog != old_cog)) { _thread_printf(pool, hmi_lock, "Thread %d (%s) moved from cog %d to cog %d\n", argc, argv[0], old_cog, new_cog); } } return 0; } /* * cog_function : this function will be run as the first thread of a new * multi-threading kernel on a new cog. This function will * then start NUM_THREADS threads, which will wander between * all the available multi-threading kernels. */ int cog_function(int argc, char *argv[]) { int cog = _cogid(); void *me = _thread_id(); void *thread; char *message[1] = {"g'day!"}; int i; // stack space for threads unsigned long thread_stack[STACK_SIZE * NUM_THREADS]; // set the lock of this kernel (all kernels must use the same lock, and // this must be set up before any other thread functions are called) _thread_set_lock(kernel_lock); // announce ourselves _thread_printf(pool, hmi_lock, "Multi-threading kernel (%s) started on cog %d\n", argv[0], cog); // indicate we are available to run threads kernel[cog] = 1; // wait until we are told to start the threads while (!start_threads) { _thread_yield(); } // start some threads that will wander between the kernels for (i = 0; i < NUM_THREADS; i++) { thread = _thread_start(&thread_function, &thread_stack[STACK_SIZE * (i + 1)], (cog+1)*NUM_THREADS + i, message); if (thread == 0) { _thread_printf(pool, hmi_lock, "Failed to start thread\n"); } } // now wait forever - this thread does not actually do anything // except give the multi-threading kernel something to execute // when it is not executing any other threads. It could perform // other tasks if required. while (1) { _thread_yield(); } return 0; } /* * main : Start NUM_KERNELS additional kernels, and then start NUM_THREADS * threads that will switch between them. Each kernel will also start * NUM_THREADS threads of their own. */ int main(int argc, char *argv[]) { int i; int cog; void *thread; char *message[1] = {"hello!"}; // stack space for kernels and threads unsigned long kernel_stack[NUM_KERNELS * (STACK_SIZE * NUM_THREADS + 100)]; unsigned long thread_stack[STACK_SIZE * NUM_THREADS]; // assign a lock to be used to avoid kernel contention kernel_lock = _locknew(); // set the lock of this kernel (all kernels must use the same lock, and // this must be set up before any other thread functions are called) _thread_set_lock(kernel_lock); // initialize a pool of thread locks _thread_init_lock_pool (pool, NUM_LOCKS, _locknew()); // assign a thread lock to avoid plugin contention hmi_lock = _thread_locknew(pool); // a delay here is used to introduce some randomness _thread_printf(pool, hmi_lock, "\nPress a key to start kernels\n"); k_wait(); randomize(); // start additional multi-threading kernels for (i = 0; i < NUM_KERNELS; i++) { cog = _thread_cog(&cog_function, &kernel_stack[(STACK_SIZE*NUM_THREADS + 100)*(i + 1)], i, message); if (cog < 0) { _thread_printf(pool, hmi_lock, "Failed to start kernel\n"); } } // announce ourselves cog = _cogid(); _thread_printf(pool, hmi_lock, "Multi-threading kernel also running on cog %d\n", cog); // declare ourselves available to run threads kernel[cog] = 1; _thread_wait(500); // now start the threads on all the kernels _thread_printf(pool, hmi_lock, "\nPress a key to start all threads\n"); k_wait(); start_threads = 1; // start some threads of our own that will wander between the kernels for (i = 0; i < NUM_THREADS; i++) { thread = _thread_start(&thread_function, &thread_stack[STACK_SIZE * (i + 1)], (cog+1)*NUM_THREADS + i, message); if (thread == 0) { _thread_printf(pool, hmi_lock, "Failed to start thread\n"); } } _thread_wait(500); // now allow all the threads to switch between kernels _thread_printf(pool, hmi_lock, "\nPress a key to start thread switching\n"); k_wait(); start_switching = 1; // now wait forever - this thread does not actually do anything // except give the multi-threading kernel something to execute // when it is not executing any other threads. It could perform // other tasks if required. while (1) { _thread_yield(); } return 0; }The multi-threading support will be part of the next release of Catalina.
Possibly. I'll be able to answer that question better once I have completed the thread support for the new "native" mode ... because there is no kernel in this mode!
The demo program was compiled in "compact" mode, so no pasm. Wait till I finish the other modes (compact mode is always the first one I work on, because it is the easiest).