Multitasking with P1+

Seairth · 2014-04-17 11:54

As the current planned design does not include hardware multitasking, and the instruction set differs somewhat from P1, I figured I'd start this thread to discuss possible multitasking approaches (and maybe move some of that discussion out of the current P1+ thread). No, I do not want to argue about how a software approach is inferior to a hardware approach, or how this will be too slow to be useful, or introduce too much jitter, and so on. I think our time would be better spent coming up with constructive and useful ideas.

To get things started, here's how I imagine a simple, round-robin approach would work:

	mov	taddr1, #task1
	mov	taddr2, #task2
	mov	taddr3, #task3

task0	xor	outa, #1
	jmpsw	taddr0, taddr1	wc wz
	jmp	#task0

task1	xor	outa, #2
	jmpsw	taddr1, taddr2	wc wz
	jmp	#task1

task2	xor	outa, #4
	jmpsw	taddr2, taddr3	wc wz
	jmp	#task2

task3	xor	outa, #8
	jmpsw	taddr3, taddr0	wc wz
	jmp	#task3
		
taddr0	res 1
taddr1	res 1
taddr2	res 1
taddr3	res 1

From here, I'm wondering if there's an effective way to use ALTDS in conjunction with JMPSW to uncouple the tasks from one another. Also, even those this is a contrived example, it would be nice to combine the JMPSW and JMP into a single instruction, though I don't see how.

Seairth · 2014-04-17 12:00

Seairth wrote: »

Also, even those this is a contrived example, it would be nice to combine the JMPSW and JMP into a single instruction, though I don't see how.

To clarify, it seems to me that JMPs are a natural place to perform context switches. Since the JMPSW is also a JMP, it would be nice to use JMPSW to JMP to something other than PC+1.

Seairth · 2014-04-17 12:28

And here is the same round-robin example using a very simple "scheduler" routine:

		mov	taddr1, #task1
		mov	taddr2, #task2
		mov	taddr3, #task3
		mov	tindex, #0

task0		xor	outa, #1
		jmpsw	taddr0, @sched	wc wz
		jmp	#task0

task1		xor	outa, #2
		jmpsw	taddr1, @sched	wc wz
		jmp	#task1

task2		xor	outa, #4
		jmpsw	taddr2, @sched	wc wz
		jmp	#task2

task3		xor	outa, #8
		jmpsw	taddr3, @sched	wc wz
		jmp	#task3
		
		' C/Z already restored, so avoid clobbering them prior to JMP
sched		add 	tindex, #1	nc nz	' increment to next index
		and	tindex, #3	nc nz
		mov 	t0, tindex		' get address of taddr[tindex]
		add 	t0, #taddr0	nc nz
		jmp	t0			' jump to that address

t0		res 1
tindex		res 1		
taddr0		res 1
taddr1		res 1
taddr2		res 1
taddr3		res 1

With this, each task only needs to know about its own taddrX, which reduces the coupling a bit.

prof_braino · 2014-04-17 18:52

If you want to see what Sal does with propforth, I could help start the conversation.

He's got cooperative multitasking on the P1 going very nicely. It very simple, and very fast. Low overhead is the key.

Seairth · 2014-04-17 20:06

prof_braino wrote: »

If you want to see what Sal does with propforth, I could help start the conversation.

He's got cooperative multitasking on the P1 going very nicely. It very simple, and very fast. Low overhead is the key.

I'm not familiar with propforth. I'm curious how the P1+ changes will affect that. And if there are any performance improvements to be gained.

prof_braino · 2014-04-18 10:29

As long as the basic functionality needed to inplement the kernel remains, there typically is no problem.

The forth cooperative multitasker allows each task to run until it hits a "slow" operation. There are typically an I/O operation, or a the end of a loop, etc.

The idea is that the processor(s) can do something else while waiting for the I/O to complete.

Each task is in a queue, and after it completes a single pass, it is placed back in the queue.

Usually each task appears to have the full resources at its disposal.

In the case of propforth, the cogs are treated as a pool. The next cog in the pool is assigned the next task in the queue. Otherwise, the method is pretty much the same as traditional forth.

The affect is that when a slow task is encountered, it does not slow bog down the entire system. The slow task is processed, and the system capacity is only reduced by 1/#cogs. So if we have 8 cogs, an uncooperative task slows us down by 1/8th.

On a 16 cog parts, we would only be slowed 1/16th, which would have a very low impact.

Look for "Forth multitasker" if you are interested, its pretty neat. They've refined this for decades, it pretty good. And there's no reason you could do the same scheme in any environment

Multitasking with P1+

Comments