Single Cog Multi Threading Made Easy (assembler)
pjv
Posts: 1,903
Hi All;
Here is a small 25 instruction Scheduler Kernel that allows you to easily implement up to a dozen or so independently timed threads in a single assembler Cog. That means you can write multiple applications for a single Cog as if they were operating by themselves, without much timing concern for each other. There are some limits, however 64 independent simple LED flashers has been demonstrated, as have simultaneous 19,200 baud with 9600 baud full duplex simultaneous transmit receive UARTs. Demo code for this as well as 8 LED flashers is attached. It is a trivial matter to combine the 8 flashers with the UART application and a repeating trigger to run 13 threads. All independently, and in a single Cog.
The basic concept here is to not permit any thread to hog the processor and hold it in a WAIT loop. Make the threads all co-operate with each other.... and this is essential... and have each thread execute only a few -say a dozen or so- instructions, and then at progammatically convenient spots, jump back to the Scheduler kernel. In this manner all threads get interleaved, but the timing and pointer handling details for all threads is handled by the kernel.
For example, a simple LED flasher thread could look like:
And two LED flashers could be:
·And so on.
Each thread requires two vectors, the first holds the thread's timer reload value (in ticks) in bits 31..9 plus the current jump address for the tread in bits 8..0 The second vector holds the current value of the thread's count-down timer as it is ticked down by the scheduler kernel. The vectors for all threads must be in one single contiguous space, and are typically preloaded at compile time with LONG values.
The Scheduler operates on a tick basis which may be set to some number of clocks representing 2 to 10 micro seconds or so, and this time is the basis and granularity·for all tick timing.
The kernel operates in the following manner:
- It is intialized to point to thread1 to start operating.
- It waits for a tick.
- The current thread(1)'s down counter is ticked, and tested for zero... if not zero, the kernel points to the next thread(2) and repeats the process.
- On finally reaching zero, the thread(1) resume jump address is extracted from the first vector, and the jump is made to the thread(1) application code.
- The thread executes some (for the simple LEDs only two) instructions, and returns to the kernel.
- The jumpret instruction saves the next address as the continuation address for the thread the next time the thread's counter reaches zero.
- The kernel also extracts the timer reload value from the first vector, and loads that into the thread's second vector to start the timing process for thread(1) all over.
- As each thread is ticked down in this interleaved manner, each thread is continued from the point where it returned to the kernel.
- When all threads have been serviced with the current tick, the kernel is pointed back to the first thread, and does a WAIT for the next tick.
- This is the ONLY wait in the system, no others are permitted.
-
- Any thread my be suspended from operating. This is indicated by it's count-down vector containing a value of zero.
- Threads that need to run only once can effect this by returning to the scheduler kernel at·the special location named NEXTTHREAD where their counter is NOT reloaded, hence remains at zero.
-To trigger any suspended thread, simply have another, supervising thread, set the suspended thread's count-down vector to non zero. The suspended thread now will re-commence operation in the specified number of ticks.
If desired, any thread could vary the counting load values or also the current count values of any other thread's vectors,·including suspending a thread by setting its counter to zero.
Lots of possibilities with so little time to explain.
Perhaps·some more demos in the next few days..... please indicate if this stuff is of interest; I hate to be wasting my time!
The attached demos·include the vector intitial values at compile time and are ready to try...... have fun.
Post Edit.... finally figured out how to add some screen shots of the simultaneous threads
Cheers, Peter (pjv)
Post Edited (pjv) : 3/7/2010 7:32:57 PM GMT
Here is a small 25 instruction Scheduler Kernel that allows you to easily implement up to a dozen or so independently timed threads in a single assembler Cog. That means you can write multiple applications for a single Cog as if they were operating by themselves, without much timing concern for each other. There are some limits, however 64 independent simple LED flashers has been demonstrated, as have simultaneous 19,200 baud with 9600 baud full duplex simultaneous transmit receive UARTs. Demo code for this as well as 8 LED flashers is attached. It is a trivial matter to combine the 8 flashers with the UART application and a repeating trigger to run 13 threads. All independently, and in a single Cog.
The basic concept here is to not permit any thread to hog the processor and hold it in a WAIT loop. Make the threads all co-operate with each other.... and this is essential... and have each thread execute only a few -say a dozen or so- instructions, and then at progammatically convenient spots, jump back to the Scheduler kernel. In this manner all threads get interleaved, but the timing and pointer handling details for all threads is handled by the kernel.
For example, a simple LED flasher thread could look like:
LED1 or outa,#pin1 'turn led1 on jmpret VLED1,#NextThread 'save thread1 return address and return to kernel to reload timer and do next thread andn outa,#pin1 'turn led1 off jmpret VLED1,#NextThread 'save thread1 return address and return to kernel to reload timer and do next thread jmp #LED1 'jump back to beginning of thead 1.
And two LED flashers could be:
LED1 or outa,#pin1 'turn led1 on jmpret VLED1,#NextThread 'save thread1 return address and return to kernel to reload timer and do next thread andn outa,#pin1 'turn led1 off jmpret VLED1,#NextThread 'save thread1 return address and return to kernel to reload timer and do next thread jmp #LED1 'jump back to beginning of thead 1. LED2 or outa,#pin2 'turn led2 on jmpret VLED2,#NextThread 'save thread2 return address and return to kernel to reload timer and do next thread andn outa,#pin2 'turn led2 off jmpret VLED2,#NextThread 'save thread2 return address and return to kernel to reload timer and do next thread jmp #LED2 'jump back to beginning of thead 2
·And so on.
Each thread requires two vectors, the first holds the thread's timer reload value (in ticks) in bits 31..9 plus the current jump address for the tread in bits 8..0 The second vector holds the current value of the thread's count-down timer as it is ticked down by the scheduler kernel. The vectors for all threads must be in one single contiguous space, and are typically preloaded at compile time with LONG values.
The Scheduler operates on a tick basis which may be set to some number of clocks representing 2 to 10 micro seconds or so, and this time is the basis and granularity·for all tick timing.
The kernel operates in the following manner:
- It is intialized to point to thread1 to start operating.
- It waits for a tick.
- The current thread(1)'s down counter is ticked, and tested for zero... if not zero, the kernel points to the next thread(2) and repeats the process.
- On finally reaching zero, the thread(1) resume jump address is extracted from the first vector, and the jump is made to the thread(1) application code.
- The thread executes some (for the simple LEDs only two) instructions, and returns to the kernel.
- The jumpret instruction saves the next address as the continuation address for the thread the next time the thread's counter reaches zero.
- The kernel also extracts the timer reload value from the first vector, and loads that into the thread's second vector to start the timing process for thread(1) all over.
- As each thread is ticked down in this interleaved manner, each thread is continued from the point where it returned to the kernel.
- When all threads have been serviced with the current tick, the kernel is pointed back to the first thread, and does a WAIT for the next tick.
- This is the ONLY wait in the system, no others are permitted.
-
- Any thread my be suspended from operating. This is indicated by it's count-down vector containing a value of zero.
- Threads that need to run only once can effect this by returning to the scheduler kernel at·the special location named NEXTTHREAD where their counter is NOT reloaded, hence remains at zero.
-To trigger any suspended thread, simply have another, supervising thread, set the suspended thread's count-down vector to non zero. The suspended thread now will re-commence operation in the specified number of ticks.
If desired, any thread could vary the counting load values or also the current count values of any other thread's vectors,·including suspending a thread by setting its counter to zero.
Lots of possibilities with so little time to explain.
Perhaps·some more demos in the next few days..... please indicate if this stuff is of interest; I hate to be wasting my time!
CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 PUB Main cognew(@Scheduler, 0) DAT Scheduler org 0 'Scheduler Routine ================================================================= 'uses two vecors per thread.... all CONTIGUOUS 'first holds tick reload value in bits 31..9, and the thread's jump (resume) address in bits 8..0 'second of pair holds the thread's schedule tick down counter 'The whole idea is to not let any thread loop in a continuous fasion... always return to the scheduler kernel. 'It will then search through all the threads' timers and dispatch the next one to run at the appropriate time. CON Pin0 = 1 <<0 ' Margin = 12 'headroom value TickClocks = 400 'set tick time to 5 usec at 80 MHz TickuS = TickClocks / 80 '80 clocks per uSec DAT mov Temp,#0 sub Temp,#8 ror Temp,#3 'bits 31,30,29 input all others output mov dira,Temp mov CntTarget,cnt 'match counter jmp #AllThreads 'initialize to first thread RestoreSusp addx 0-0,#0 'restore underflow on suspended thread NextThread add CountDown,SegDPlus2 'access next thread vector pair add RestoreSusp,SegDPlus2 ' djnz ThreadCtr,#CountDown 'downcount AllThreads mov ThreadCtr,#(LastThread -FirstThread)/2 'how many threads... cannot be zero movd RestoreSusp,#FirstThread+1 'init to first thread's downcounter movd CountDown,#FirstThread+1 'init to first thread's downcounter WaitForTick add CntTarget,#TickClocks -Margin 'add one tick of clocks minus check margin sub CntTarget,cnt nr,wc 'compare to current value add CntTarget,#Margin 'reverse margin or outa,#Pin0 'show tick and free time if_nc waitcnt CntTarget,#0 'wait till current catches up andn outa,#Pin0 'end of free time pulse CountDown djnz 0-0,#RestoreSusp wc 'count down and monitor underflow mov Temp,CountDown 'copy current instruction shr Temp,#9 'expose thread downcounter address movd SetTime,Temp 'merge into required thread sub Temp,#1 'calc thread repeat load address movs Jumper,Temp 'merge into jumper movs ThreadRet,Temp 'and into load instruction Jumper jmp 0-0 'jump to selected thread ThreadRet mov Temp,0-0 'get tick count for resuming thread shr Temp,#9 'get rid of jump address SetTime add 0-0,Temp 'load thread's resume down count timer into first of pair jmp #NextThread 'calc next thread address Temp long 0 ' ThreadCtr long 0 'how many threads CntTarget long 0 'counter target for next tick SegDPlus2 long $400 'to add 2 to the Dest segment 'APPLICATION PROGRAM ============================================================================ 'Thread Timing and Jump Vectors ================================================================ 'This section is the assignment of thread vector contiguous pairs, one pair per thread. 'Thread vector pairs MUST be contiguous. 'Each first vector holds the resume delay tick count (repeat) loading value in bits (b31..b9), and the thread's resume address in bits (b8..b0). ' RRRRRRRR_RRRRRRRR_RRRRRRR__AAAAAAAAA 'Each second vector is the thread's delay downcounting (by ticks) timer. 'The delay downcounter holds the number of ticks remaining until the thread (re)activates. 'A thread is triggered when its delay downcounter reaches zero. 'A thread self suspends when its delay downcounter is not reloaded, or loaded with zero. 'A thread is indefinitely suspended while its delay downcounter is zero. DAT FirstThread 'simply a place holder before the first thread vector pair. LastThread 'simply a place holder after the last thread vector pair.
The attached demos·include the vector intitial values at compile time and are ready to try...... have fun.
Post Edit.... finally figured out how to add some screen shots of the simultaneous threads
Cheers, Peter (pjv)
Post Edited (pjv) : 3/7/2010 7:32:57 PM GMT
Comments
Ray
I like what you're doing with your scheduler and see utility in it for the given COG constraints.
Curious how hard would it be to extend the idea to an overlay or LMM engine approach for bigger applications?
Either would be slower, but being able to run larger applications in N threads (more than 8) could be very useful.
Thanks,
--Steve
Assember is really very easy; it's Spin with all those cryptic symbols that give me nigtmares. I find it very hard to read. As far as further up my sleeve; well, perhaps some surprises.... I'm targeting UPEW to showcase a bunch of stuff.... lots of development going on.
@ Steve;
I have not delved into the LMM as yet, so I can't comment. But since all threads are independent, you could probably run larger programs in segments accross numerous Cogs. Perhaps some small handler to give some clean yet effective thread interaction via hub memory semaphores. I bet that 256 threads accross 8 Cogs would be quite do-able!
Hmmmmm, need to think on that some.
Cheers,
Peter (pjv)