Single Cog Multi Threading Made Easy (assembler)

pjv · 2010-03-02 05:23

Hi All;

Here is a small 25 instruction Scheduler Kernel that allows you to easily implement up to a dozen or so independently timed threads in a single assembler Cog. That means you can write multiple applications for a single Cog as if they were operating by themselves, without much timing concern for each other. There are some limits, however 64 independent simple LED flashers has been demonstrated, as have simultaneous 19,200 baud with 9600 baud full duplex simultaneous transmit receive UARTs. Demo code for this as well as 8 LED flashers is attached. It is a trivial matter to combine the 8 flashers with the UART application and a repeating trigger to run 13 threads. All independently, and in a single Cog.

The basic concept here is to not permit any thread to hog the processor and hold it in a WAIT loop. Make the threads all co-operate with each other.... and this is essential... and have each thread execute only a few -say a dozen or so- instructions, and then at progammatically convenient spots, jump back to the Scheduler kernel. In this manner all threads get interleaved, but the timing and pointer handling details for all threads is handled by the kernel.

For example, a simple LED flasher thread could look like:

 
LED1    or      outa,#pin1              'turn led1 on
        jmpret  VLED1,#NextThread       'save thread1 return address and return to kernel to reload timer and do next thread
        andn    outa,#pin1              'turn led1 off
        jmpret  VLED1,#NextThread       'save thread1 return address and return to kernel to reload timer and do next thread
        jmp     #LED1                   'jump back to beginning of thead 1.

And two LED flashers could be:

LED1    or      outa,#pin1              'turn led1 on
        jmpret  VLED1,#NextThread       'save thread1 return address and return to kernel to reload timer and do next thread
        andn    outa,#pin1              'turn led1 off
        jmpret  VLED1,#NextThread       'save thread1 return address and return to kernel to reload timer and do next thread
        jmp     #LED1                   'jump back to beginning of thead 1.
LED2    or      outa,#pin2              'turn led2 on
        jmpret  VLED2,#NextThread       'save thread2 return address and return to kernel to reload timer and do next thread
        andn    outa,#pin2              'turn led2 off
        jmpret  VLED2,#NextThread       'save thread2 return address and return to kernel to reload timer and do next thread
        jmp     #LED2                   'jump back to beginning of thead 2

·And so on.

Each thread requires two vectors, the first holds the thread's timer reload value (in ticks) in bits 31..9 plus the current jump address for the tread in bits 8..0 The second vector holds the current value of the thread's count-down timer as it is ticked down by the scheduler kernel. The vectors for all threads must be in one single contiguous space, and are typically preloaded at compile time with LONG values.

The Scheduler operates on a tick basis which may be set to some number of clocks representing 2 to 10 micro seconds or so, and this time is the basis and granularity·for all tick timing.

The kernel operates in the following manner:

- It is intialized to point to thread1 to start operating.
- It waits for a tick.
- The current thread(1)'s down counter is ticked, and tested for zero... if not zero, the kernel points to the next thread(2) and repeats the process.
- On finally reaching zero, the thread(1) resume jump address is extracted from the first vector, and the jump is made to the thread(1) application code.
- The thread executes some (for the simple LEDs only two) instructions, and returns to the kernel.
- The jumpret instruction saves the next address as the continuation address for the thread the next time the thread's counter reaches zero.
- The kernel also extracts the timer reload value from the first vector, and loads that into the thread's second vector to start the timing process for thread(1) all over.
- As each thread is ticked down in this interleaved manner, each thread is continued from the point where it returned to the kernel.
- When all threads have been serviced with the current tick, the kernel is pointed back to the first thread, and does a WAIT for the next tick.
- This is the ONLY wait in the system, no others are permitted.
-
- Any thread my be suspended from operating. This is indicated by it's count-down vector containing a value of zero.
- Threads that need to run only once can effect this by returning to the scheduler kernel at·the special location named NEXTTHREAD where their counter is NOT reloaded, hence remains at zero.
-To trigger any suspended thread, simply have another, supervising thread, set the suspended thread's count-down vector to non zero. The suspended thread now will re-commence operation in the specified number of ticks.

If desired, any thread could vary the counting load values or also the current count values of any other thread's vectors,·including suspending a thread by setting its counter to zero.

Lots of possibilities with so little time to explain.

Perhaps·some more demos in the next few days..... please indicate if this stuff is of interest; I hate to be wasting my time!

CON
        _clkmode        = xtal1 + pll16x
        _xinfreq        = 5_000_000
PUB  Main
     cognew(@Scheduler, 0)
DAT
Scheduler               org       0
'Scheduler Routine =================================================================
'uses two vecors per thread.... all CONTIGUOUS
'first holds tick reload value in bits 31..9, and the thread's jump (resume) address in bits 8..0 
'second of pair holds the thread's schedule tick down counter
'The whole idea is to not let any thread loop in a continuous fasion... always return to the scheduler kernel.
'It will then search through all the threads' timers and dispatch the next one to run at the appropriate time. 
CON
Pin0            = 1 <<0                                 '
Margin          = 12                                    'headroom value
TickClocks      = 400                                   'set tick time to 5 usec at 80 MHz
TickuS          = TickClocks / 80                       '80 clocks per uSec
DAT
              mov       Temp,#0
              sub       Temp,#8
              ror       Temp,#3                         'bits 31,30,29 input all others output
              mov       dira,Temp
              mov       CntTarget,cnt                   'match counter
              jmp       #AllThreads                     'initialize to first thread
RestoreSusp   addx      0-0,#0                          'restore underflow on suspended thread
NextThread    add       CountDown,SegDPlus2             'access next thread vector pair
              add       RestoreSusp,SegDPlus2           '
              djnz      ThreadCtr,#CountDown            'downcount
AllThreads    mov       ThreadCtr,#(LastThread -FirstThread)/2 'how many threads... cannot be zero
              movd      RestoreSusp,#FirstThread+1      'init to first thread's downcounter
              movd      CountDown,#FirstThread+1        'init to first thread's downcounter
WaitForTick   add       CntTarget,#TickClocks -Margin   'add one tick of clocks minus check margin
              sub       CntTarget,cnt           nr,wc   'compare to current value
              add       CntTarget,#Margin               'reverse margin
              or        outa,#Pin0                      'show tick and free time
        if_nc waitcnt   CntTarget,#0                    'wait till current catches up
              andn      outa,#Pin0                      'end of free time pulse
              
CountDown     djnz      0-0,#RestoreSusp        wc      'count down and monitor underflow
              mov       Temp,CountDown                  'copy current instruction
              shr       Temp,#9                         'expose thread downcounter address
              movd      SetTime,Temp                    'merge into required thread
              sub       Temp,#1                         'calc thread repeat load address
              movs      Jumper,Temp                     'merge into jumper
              movs      ThreadRet,Temp                  'and into load instruction
Jumper        jmp       0-0                             'jump to selected thread
ThreadRet     mov       Temp,0-0                        'get tick count for resuming thread 
              shr       Temp,#9                         'get rid of jump address
SetTime       add       0-0,Temp                        'load thread's resume down count timer into first of pair       
              jmp       #NextThread                     'calc next thread address
Temp          long      0                               '
ThreadCtr     long      0                               'how many threads
CntTarget     long      0                               'counter target for next tick
SegDPlus2     long      $400                            'to add 2 to the Dest segment

'APPLICATION PROGRAM ============================================================================
'Thread Timing and Jump Vectors ================================================================
'This section is the assignment of thread vector contiguous pairs, one pair per thread.
'Thread vector pairs MUST be contiguous.
'Each first vector holds the resume delay tick count (repeat) loading value in bits (b31..b9), and the thread's resume address in bits (b8..b0).
' RRRRRRRR_RRRRRRRR_RRRRRRR__AAAAAAAAA
'Each second vector is the thread's delay downcounting (by ticks) timer.
'The delay downcounter holds the number of ticks remaining until the thread (re)activates.
'A thread is triggered when its delay downcounter reaches zero.
'A thread self suspends when its delay downcounter is not reloaded, or loaded with zero.
'A thread is indefinitely suspended while its delay downcounter is zero.
DAT
FirstThread   'simply a place holder before the first thread vector pair.
LastThread    'simply a place holder after the last thread vector pair.

The attached demos·include the vector intitial values at compile time and are ready to try...... have fun.

Post Edit.... finally figured out how to add some screen shots of the simultaneous threads

Cheers, Peter (pjv)

Post Edited (pjv) : 3/7/2010 7:32:57 PM GMT

Rsadeika · 2010-03-02 17:30

One of these days I will have to make a serious effort in learning pasm, just to see what your stuff is all about. Your code looks very interesting, and professionally done, what else do you have up your sleeve?

Ray

jazzed · 2010-03-02 18:45

@Peter (pjv)

I like what you're doing with your scheduler and see utility in it for the given COG constraints.

Curious how hard would it be to extend the idea to an overlay or LMM engine approach for bigger applications?
Either would be slower, but being able to run larger applications in N threads (more than 8) could be very useful.

Thanks,
--Steve

pjv · 2010-03-02 20:03

@Ray;

Assember is really very easy; it's Spin with all those cryptic symbols that give me nigtmares. I find it very hard to read. As far as further up my sleeve; well, perhaps some surprises.... I'm targeting UPEW to showcase a bunch of stuff.... lots of development going on.

@ Steve;

I have not delved into the LMM as yet, so I can't comment. But since all threads are independent, you could probably run larger programs in segments accross numerous Cogs. Perhaps some small handler to give some clean yet effective thread interaction via hub memory semaphores. I bet that 256 threads accross 8 Cogs would be quite do-able!

Hmmmmm, need to think on that some.

Cheers,

Peter (pjv)

Single Cog Multi Threading Made Easy (assembler)

Comments