Single Cog Multi Threading Made Easy (assembler)
Hi All;
Here is a small 25 instruction Scheduler Kernel that allows you to easily implement up to a dozen or so independently timed threads in a single assembler Cog. That means you can write multiple applications for a single Cog as if they were operating by themselves, without much timing concern for each other. There are some limits, however 64 independent simple LED flashers has been demonstrated, as have simultaneous 19,200 baud with 9600 baud full duplex simultaneous transmit receive UARTs. Demo code for this as well as 8 LED flashers is attached. It is a trivial matter to combine the 8 flashers with the UART application and a repeating trigger to run 13 threads. All independently, and in a single Cog.
The basic concept here is to not permit any thread to hog the processor and hold it in a WAIT loop. Make the threads all co-operate with each other.... and this is essential... and have each thread execute only a few -say a dozen or so- instructions, and then at progammatically convenient spots, jump back to the Scheduler kernel. In this manner all threads get interleaved, but the timing and pointer handling details for all threads is handled by the kernel.
For example, a simple LED flasher thread could look like:
And two LED flashers could be:
·And so on.
Each thread requires two vectors, the first holds the thread's timer reload value (in ticks) in bits 31..9 plus the current jump address for the tread in bits 8..0 The second vector holds the current value of the thread's count-down timer as it is ticked down by the scheduler kernel. The vectors for all threads must be in one single contiguous space, and are typically preloaded at compile time with LONG values.
The Scheduler operates on a tick basis which may be set to some number of clocks representing 2 to 10 micro seconds or so, and this time is the basis and granularity·for all tick timing.
The kernel operates in the following manner:
- It is intialized to point to thread1 to start operating.
- It waits for a tick.
- The current thread(1)'s down counter is ticked, and tested for zero... if not zero, the kernel points to the next thread(2) and repeats the process.
- On finally reaching zero, the thread(1) resume jump address is extracted from the first vector, and the jump is made to the thread(1) application code.
- The thread executes some (for the simple LEDs only two) instructions, and returns to the kernel.
- The jumpret instruction saves the next address as the continuation address for the thread the next time the thread's counter reaches zero.
- The kernel also extracts the timer reload value from the first vector, and loads that into the thread's second vector to start the timing process for thread(1) all over.
- As each thread is ticked down in this interleaved manner, each thread is continued from the point where it returned to the kernel.
- When all threads have been serviced with the current tick, the kernel is pointed back to the first thread, and does a WAIT for the next tick.
- This is the ONLY wait in the system, no others are permitted.
-
- Any thread my be suspended from operating. This is indicated by it's count-down vector containing a value of zero.
- Threads that need to run only once can effect this by returning to the scheduler kernel at·the special location named NEXTTHREAD where their counter is NOT reloaded, hence remains at zero.
-To trigger any suspended thread, simply have another, supervising thread, set the suspended thread's count-down vector to non zero. The suspended thread now will re-commence operation in the specified number of ticks.
If desired, any thread could vary the counting load values or also the current count values of any other thread's vectors,·including suspending a thread by setting its counter to zero.
Lots of possibilities with so little time to explain.
Perhaps·some more demos in the next few days..... please indicate if this stuff is of interest; I hate to be wasting my time!
The attached demos·include the vector intitial values at compile time and are ready to try...... have fun.
Post Edit.... finally figured out how to add some screen shots of the simultaneous threads
Cheers, Peter (pjv)
Post Edited (pjv) : 3/7/2010 7:32:57 PM GMT
Here is a small 25 instruction Scheduler Kernel that allows you to easily implement up to a dozen or so independently timed threads in a single assembler Cog. That means you can write multiple applications for a single Cog as if they were operating by themselves, without much timing concern for each other. There are some limits, however 64 independent simple LED flashers has been demonstrated, as have simultaneous 19,200 baud with 9600 baud full duplex simultaneous transmit receive UARTs. Demo code for this as well as 8 LED flashers is attached. It is a trivial matter to combine the 8 flashers with the UART application and a repeating trigger to run 13 threads. All independently, and in a single Cog.
The basic concept here is to not permit any thread to hog the processor and hold it in a WAIT loop. Make the threads all co-operate with each other.... and this is essential... and have each thread execute only a few -say a dozen or so- instructions, and then at progammatically convenient spots, jump back to the Scheduler kernel. In this manner all threads get interleaved, but the timing and pointer handling details for all threads is handled by the kernel.
For example, a simple LED flasher thread could look like:
LED1 or outa,#pin1 'turn led1 on
jmpret VLED1,#NextThread 'save thread1 return address and return to kernel to reload timer and do next thread
andn outa,#pin1 'turn led1 off
jmpret VLED1,#NextThread 'save thread1 return address and return to kernel to reload timer and do next thread
jmp #LED1 'jump back to beginning of thead 1.
And two LED flashers could be:
LED1 or outa,#pin1 'turn led1 on
jmpret VLED1,#NextThread 'save thread1 return address and return to kernel to reload timer and do next thread
andn outa,#pin1 'turn led1 off
jmpret VLED1,#NextThread 'save thread1 return address and return to kernel to reload timer and do next thread
jmp #LED1 'jump back to beginning of thead 1.
LED2 or outa,#pin2 'turn led2 on
jmpret VLED2,#NextThread 'save thread2 return address and return to kernel to reload timer and do next thread
andn outa,#pin2 'turn led2 off
jmpret VLED2,#NextThread 'save thread2 return address and return to kernel to reload timer and do next thread
jmp #LED2 'jump back to beginning of thead 2
·And so on.
Each thread requires two vectors, the first holds the thread's timer reload value (in ticks) in bits 31..9 plus the current jump address for the tread in bits 8..0 The second vector holds the current value of the thread's count-down timer as it is ticked down by the scheduler kernel. The vectors for all threads must be in one single contiguous space, and are typically preloaded at compile time with LONG values.
The Scheduler operates on a tick basis which may be set to some number of clocks representing 2 to 10 micro seconds or so, and this time is the basis and granularity·for all tick timing.
The kernel operates in the following manner:
- It is intialized to point to thread1 to start operating.
- It waits for a tick.
- The current thread(1)'s down counter is ticked, and tested for zero... if not zero, the kernel points to the next thread(2) and repeats the process.
- On finally reaching zero, the thread(1) resume jump address is extracted from the first vector, and the jump is made to the thread(1) application code.
- The thread executes some (for the simple LEDs only two) instructions, and returns to the kernel.
- The jumpret instruction saves the next address as the continuation address for the thread the next time the thread's counter reaches zero.
- The kernel also extracts the timer reload value from the first vector, and loads that into the thread's second vector to start the timing process for thread(1) all over.
- As each thread is ticked down in this interleaved manner, each thread is continued from the point where it returned to the kernel.
- When all threads have been serviced with the current tick, the kernel is pointed back to the first thread, and does a WAIT for the next tick.
- This is the ONLY wait in the system, no others are permitted.
-
- Any thread my be suspended from operating. This is indicated by it's count-down vector containing a value of zero.
- Threads that need to run only once can effect this by returning to the scheduler kernel at·the special location named NEXTTHREAD where their counter is NOT reloaded, hence remains at zero.
-To trigger any suspended thread, simply have another, supervising thread, set the suspended thread's count-down vector to non zero. The suspended thread now will re-commence operation in the specified number of ticks.
If desired, any thread could vary the counting load values or also the current count values of any other thread's vectors,·including suspending a thread by setting its counter to zero.
Lots of possibilities with so little time to explain.
Perhaps·some more demos in the next few days..... please indicate if this stuff is of interest; I hate to be wasting my time!
CON
_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000
PUB Main
cognew(@Scheduler, 0)
DAT
Scheduler org 0
'Scheduler Routine =================================================================
'uses two vecors per thread.... all CONTIGUOUS
'first holds tick reload value in bits 31..9, and the thread's jump (resume) address in bits 8..0
'second of pair holds the thread's schedule tick down counter
'The whole idea is to not let any thread loop in a continuous fasion... always return to the scheduler kernel.
'It will then search through all the threads' timers and dispatch the next one to run at the appropriate time.
CON
Pin0 = 1 <<0 '
Margin = 12 'headroom value
TickClocks = 400 'set tick time to 5 usec at 80 MHz
TickuS = TickClocks / 80 '80 clocks per uSec
DAT
mov Temp,#0
sub Temp,#8
ror Temp,#3 'bits 31,30,29 input all others output
mov dira,Temp
mov CntTarget,cnt 'match counter
jmp #AllThreads 'initialize to first thread
RestoreSusp addx 0-0,#0 'restore underflow on suspended thread
NextThread add CountDown,SegDPlus2 'access next thread vector pair
add RestoreSusp,SegDPlus2 '
djnz ThreadCtr,#CountDown 'downcount
AllThreads mov ThreadCtr,#(LastThread -FirstThread)/2 'how many threads... cannot be zero
movd RestoreSusp,#FirstThread+1 'init to first thread's downcounter
movd CountDown,#FirstThread+1 'init to first thread's downcounter
WaitForTick add CntTarget,#TickClocks -Margin 'add one tick of clocks minus check margin
sub CntTarget,cnt nr,wc 'compare to current value
add CntTarget,#Margin 'reverse margin
or outa,#Pin0 'show tick and free time
if_nc waitcnt CntTarget,#0 'wait till current catches up
andn outa,#Pin0 'end of free time pulse
CountDown djnz 0-0,#RestoreSusp wc 'count down and monitor underflow
mov Temp,CountDown 'copy current instruction
shr Temp,#9 'expose thread downcounter address
movd SetTime,Temp 'merge into required thread
sub Temp,#1 'calc thread repeat load address
movs Jumper,Temp 'merge into jumper
movs ThreadRet,Temp 'and into load instruction
Jumper jmp 0-0 'jump to selected thread
ThreadRet mov Temp,0-0 'get tick count for resuming thread
shr Temp,#9 'get rid of jump address
SetTime add 0-0,Temp 'load thread's resume down count timer into first of pair
jmp #NextThread 'calc next thread address
Temp long 0 '
ThreadCtr long 0 'how many threads
CntTarget long 0 'counter target for next tick
SegDPlus2 long $400 'to add 2 to the Dest segment
'APPLICATION PROGRAM ============================================================================
'Thread Timing and Jump Vectors ================================================================
'This section is the assignment of thread vector contiguous pairs, one pair per thread.
'Thread vector pairs MUST be contiguous.
'Each first vector holds the resume delay tick count (repeat) loading value in bits (b31..b9), and the thread's resume address in bits (b8..b0).
' RRRRRRRR_RRRRRRRR_RRRRRRR__AAAAAAAAA
'Each second vector is the thread's delay downcounting (by ticks) timer.
'The delay downcounter holds the number of ticks remaining until the thread (re)activates.
'A thread is triggered when its delay downcounter reaches zero.
'A thread self suspends when its delay downcounter is not reloaded, or loaded with zero.
'A thread is indefinitely suspended while its delay downcounter is zero.
DAT
FirstThread 'simply a place holder before the first thread vector pair.
LastThread 'simply a place holder after the last thread vector pair.
The attached demos·include the vector intitial values at compile time and are ready to try...... have fun.
Post Edit.... finally figured out how to add some screen shots of the simultaneous threads
Cheers, Peter (pjv)
Post Edited (pjv) : 3/7/2010 7:32:57 PM GMT


Comments
Ray
I like what you're doing with your scheduler and see utility in it for the given COG constraints.
Curious how hard would it be to extend the idea to an overlay or LMM engine approach for bigger applications?
Either would be slower, but being able to run larger applications in N threads (more than 8) could be very useful.
Thanks,
--Steve
Assember is really very easy; it's Spin with all those cryptic symbols that give me nigtmares. I find it very hard to read. As far as further up my sleeve; well, perhaps some surprises.... I'm targeting UPEW to showcase a bunch of stuff.... lots of development going on.
@ Steve;
I have not delved into the LMM as yet, so I can't comment. But since all threads are independent, you could probably run larger programs in segments accross numerous Cogs. Perhaps some small handler to give some clean yet effective thread interaction via hub memory semaphores. I bet that 256 threads accross 8 Cogs would be quite do-able!
Hmmmmm, need to think on that some.
Cheers,
Peter (pjv)