Propeller II update - BLOG

cgracey · 2014-02-27 16:27

Cluso99 wrote: »

Sorry, but I am yet to be convinced any of this is necessary.

Why can't all this run under the normal tasking.
Task 0 is a "super task" where it can set/reset "stall" bits for the other 3 tasks (the pipeline just effectively ignores the instruction and does not advance the PC is the "Stall" bit is active).
Task 0 can switch in the PTRA/B etc of any task so it can r/w those values.

Task 0 could then stall a task, and by examining the PC of the subject task, determine the next instruction to be executed (that is not in the pipe) and replace it with a new instruction to jmp (saving pc,z,c) to some special debugging code.

Perhaps there is something even simpler than this.

I've read what you wrote a few times, but I'm not understanding what you mean. It does sound like more work (in software) than what is being cooked up right now.

cgracey · 2014-02-27 16:30

Cluso99 wrote: »

Chip,
May I be so bold to ask that you suspend this and get a release out so that we can at least get on with some serious testing?
Then get on with USB and SERDES. If there is time, you can always come back to this later.

Would you give me another two days? Documentation is a big detour that I'd rather take once than twice.

jmg · 2014-02-27 16:33

cgracey wrote: »

I've read what you wrote a few times, but I'm not understanding what you mean. It does sound like more work (in software) than what is being cooked up right now.

My take was this is a suggestion for a way to single-step, by fiddling the pipeline.
- ie the 'super task' allows one opcode from another task to dribble thru the pipeline, with as many packing NULLs as needed.
A NULL is like a nop, but it does not change the PC.

A breakpoint may need to trigger a tail version of the same thing, to be able to stop with no added changes..

cgracey · 2014-02-27 16:34

jmg wrote: »

My take was this is a suggestion for a way to single-step, by fiddling the pipeline.
- ie the super task allows one opcode to dribble thru the pipeline, with as many packing NULLs as needed.
A NULL is like a nop, but it does not change the PC

Okay. I kind of got that part, but I didn't understand how this would facilitate preemptive multitasking.

cgracey · 2014-02-27 16:36

Cluso99 wrote: »

Chip,
May I be so bold to ask that you suspend this and get a release out so that we can at least get on with some serious testing?
Then get on with USB and SERDES. If there is time, you can always come back to this later.

Cluso99, are you running a DE2-115 or a DE0-Nano? If you've got the DE2-115, I could squeeze one out, but the docs would be a little out of sync, though the instruction set would clue you in.

jmg · 2014-02-27 16:40

cgracey wrote: »

Okay. I kind of got that part, but I didn't understand how this would facilitate preemptive multitasking.

I think it was mainly about debug, but task-swap 'comes for free' in any full Debug support, but it may not be as rapid as a WIDE swap.
Debug cares less about how quickly it takes to save/restore, than high performance task-swap does.

Cluso99 · 2014-02-27 16:45

cgracey wrote: »

Cluso99, are you running a DE2-115 or a DE0-Nano? If you've got the DE2-115, I could squeeze one out, but the docs would be a little out of sync, though the instruction set would clue you in.

Unfortunately I have a DE0 so I know you will require extra time to make it fit. I don't use video if that helps (a few of us DE0 users don't require video any more, ozprop included). I will ask on a new thread what we can remove if that will help save you time?

Cluso99 · 2014-02-27 16:59

cgracey wrote: »

Cluso99 wrote: »

Sorry, but I am yet to be convinced any of this is necessary.

Why can't all this run under the normal tasking.
Task 0 is a "super task" where it can set/reset "stall" bits for the other 3 tasks (the pipeline just effectively ignores the instruction and does not advance the PC is the "Stall" bit is active).
Task 0 can switch in the PTRA/B etc of any task so it can r/w those values.

Task 0 could then stall a task, and by examining the PC of the subject task, determine the next instruction to be executed (that is not in the pipe) and replace it with a new instruction to jmp (saving pc,z,c) to some special debugging code.

Perhaps there is something even simpler than this.

I've read what you wrote a few times, but I'm not understanding what you mean. It does sound like more work (in software) than what is being cooked up right now.

I really don't understand why preemptive multitasking is required at all.

But, my thoughts were that to interrupt a task could be done simply by stalling the task, then by software examine/save its settings, and then release it.

While I understand saving all the threads' states using a wide would be nice, it does not totally permit the thread to continue later by restoring these states alone. Surely thing like waitcnt/passcnt, reps etc would all have major caveats here.

We have up to 4 tasks. Surely, the task scheduler could, by SETTASK control which tasks were running, and this alone would permit preemptive tasking by starting a new task (or itself performing this). This means that 1 task is effectively set aside to perform the preemptive task. So in many ways I don't understand why the current design does not permit this.

Mixed into this, it was brought up about debugging. IMHO that is actually quite simple, although a stall through the pipeline could aid single-stepping. Of course, this does not permit real-time tracing so its not cycle accurate.

jmg · 2014-02-27 17:15

Cluso99 wrote: »

We have up to 4 tasks. Surely, the task scheduler could, by SETTASK control which tasks were running, and this alone would permit preemptive tasking by starting a new task (or itself performing this). This means that 1 task is effectively set aside to perform the preemptive task. So in many ways I don't understand why the current design does not permit this.

Mixed into this, it was brought up about debugging. IMHO that is actually quite simple, although a stall through the pipeline could aid single-stepping. Of course, this does not permit real-time tracing so its not cycle accurate.

The important detail (which comes 'free' with debug support), is the ability to do a 100% Task Swap.
That needs more than just Task launch, it needs to save/restore the PC & flags, and other details.

cgracey · 2014-02-27 17:19

It seems a shame that we can't do a complete context store in one WIDE.

What about this: We reduce the LIFO from 32 bits wide down to the CALL/RET-requisite 18 bits (Z/C/PC), we can increase it by a few levels, too. This will let us pack everything (including DCACHE-address and DCACHE-valid bits) into one WIDE.

The whole process of switching threads would be:

	SETTASK	D/#		'starve the other task of time slots, it had 15/16
				'starting on the next instruction, we'll have all the slots

	SETPTRA	save_addr	'point to 'save' area
	WRWIDE	PTRA[0]		'save current WIDEs
	RDTASK	D/#		'read task's states into WIDEs, along with DCACHE-address and DCACHE-valid bits
	WRWIDE	PTRA[1]		'save the entire task state

	SETPTRA	restore_addr	'point to 'restore' area
	RDWIDE	PTRA[1]		'read new task state
	WRTASK	D/#		'establish new task state, along with DCACHE-address and DCACHE-valid bits
	RDWIDEQ PTRA[0]		'set the new WIDEs without affecting DCACHE-address or DCACHE-valid (RDWIDEQ = quiet)

	SETTASK	D/#		'give task 15/16 time slots to turn it back on in its new state

	<wait for some time, update pointers, loop>

cgracey · 2014-02-27 17:21

Cluso99 wrote: »

Unfortunately I have a DE0 so I know you will require extra time to make it fit. I don't use video if that helps (a few of us DE0 users don't require video any more, ozprop included). I will ask on a new thread what we can remove if that will help save you time?

I'll pull out everything disposable until the core fits.

ozpropdev · 2014-02-27 17:28

cgracey wrote: »
It seems a shame that we can't do a complete context store in one WIDE.

What about this: We reduce the LIFO from 32 bits wide down to the CALL/RET-requisite 18 bits (Z/C/PC), we can increase it by a few levels, too. This will let us pack everything (including DCACHE-address and DCACHE-valid bits) into one WIDE.

The whole process of switching threads would be:
	SETTASK	D/#		'starve the other task of time slots, it had 15/16
				'starting on the next instruction, we'll have all the slots

	SETPTRA	save_addr	'point to 'save' area
	WRWIDE	PTRA[0]		'save current WIDEs
	RDTASK	D/#		'read task's states into WIDEs, along with DCACHE-address and DCACHE-valid bits
	WRWIDE	PTRA[1]		'save the entire task state

	SETPTRA	restore_addr	'point to 'restore' area
	RDWIDE	PTRA[1]		'read new task state
	WRTASK	D/#		'establish new task state, along with DCACHE-address and DCACHE-valid bits
	RDWIDEQ PTRA[0]		'set the new WIDEs without affecting DCACHE-address or DCACHE-valid (RDWIDEQ = quiet)

	SETTASK	D/#		'give task 15/16 time slots to turn it back on in its new state

	<wait for some time, update pointers, loop>

That looks neat and tidy (Intuitive) Chip!

Roy Eltham · 2014-02-27 17:35

We already have 8 cogs that can do 4 tasks each for 32 tasks total. Do we really need more tasks than that?

Also (tongue in cheek a bit, but), Chip, this is the slippery slope that eventually leads to the stuff you hate about Windows.... So many things going on pre-empting each other such that it all turns to molasses or worse, tasks get blocked out by other tasks so much that it stops responding to your input well.

Personally, it's starting to get frustrating to watch these threads anymore. You still have to get this whole thing through synthesis and married to the I/O pad frame, what happens when it doesn't fit? Also, we need things to be stable ASAP so we can get the docs and software done in time for shipping. It's going to takes months and months to get the compilers/etc. built and solid, it needs to have started last year, but at the moment we don't even have Spin2 at all and PASM2 is a moving target.

Sorry to be negative at all here, but somethings gotta give...

jmg · 2014-02-27 17:41

cgracey wrote: »
It seems a shame that we can't do a complete context store in one WIDE.

What about this: We reduce the LIFO from 32 bits wide down to the CALL/RET-requisite 18 bits (Z/C/PC), we can increase it by a few levels, too. This will let us pack everything (including DCACHE-address and DCACHE-valid bits) into one WIDE.

The whole process of switching threads would be:
	SETTASK	D/#		'starve the other task of time slots, it had 15/16
				'starting on the next instruction, we'll have all the slots

	SETPTRA	save_addr	'point to 'save' area
	WRWIDE	PTRA[0]		'save current WIDEs
	RDTASK	D/#		'read task's states into WIDEs, along with DCACHE-address and DCACHE-valid bits
	WRWIDE	PTRA[1]		'save the entire task state

	SETPTRA	restore_addr	'point to 'restore' area
	RDWIDE	PTRA[1]		'read new task state
	WRTASK	D/#		'establish new task state, along with DCACHE-address and DCACHE-valid bits
	RDWIDEQ PTRA[0]		'set the new WIDEs without affecting DCACHE-address or DCACHE-valid (RDWIDEQ = quiet)

	SETTASK	D/#		'give task 15/16 time slots to turn it back on in its new state

	<wait for some time, update pointers, loop>

Sounds good, if it is practical.
Is this different from the wide approach looked at earlier ?

jmg · 2014-02-27 17:47

Roy Eltham wrote: »

We already have 8 cogs that can do 4 tasks each for 32 tasks total. Do we really need more tasks than that?

Tasks I see as less important than having good Debug support.
- but once you get good debug snapshots, task swap tends to come too, so anyone who wants to do that, can.

Bill Henning · 2014-02-27 17:51

Interesting... will chew on it, but initial impression is that I like it.

Some possibilities:

1) when you take a tasks slots away, perhaps delay until the pipeline is empty? If you do that, less state to save

2) if the LIFO is reduced to 16 bits... 4x16lifo + 16pc + z+c = 82 bits, WIDE is 256 bits... yep, you could increase the LIFO size

3) TYIELD #n fits this scheme too, as the scheduler can take away the tasks slots when it notices the task has yielded... I like

4) TSTEP - could TSTEP just give the task 4 cycles for one instruction to complete (or one JMPD variant, with 3 shadow instructions)

cgracey wrote: »
It seems a shame that we can't do a complete context store in one WIDE.

What about this: We reduce the LIFO from 32 bits wide down to the CALL/RET-requisite 18 bits (Z/C/PC), we can increase it by a few levels, too. This will let us pack everything (including DCACHE-address and DCACHE-valid bits) into one WIDE.

The whole process of switching threads would be:
	SETTASK	D/#		'starve the other task of time slots, it had 15/16
				'starting on the next instruction, we'll have all the slots

	SETPTRA	save_addr	'point to 'save' area
	WRWIDE	PTRA[0]		'save current WIDEs
	RDTASK	D/#		'read task's states into WIDEs, along with DCACHE-address and DCACHE-valid bits
	WRWIDE	PTRA[1]		'save the entire task state

	SETPTRA	restore_addr	'point to 'restore' area
	RDWIDE	PTRA[1]		'read new task state
	WRTASK	D/#		'establish new task state, along with DCACHE-address and DCACHE-valid bits
	RDWIDEQ PTRA[0]		'set the new WIDEs without affecting DCACHE-address or DCACHE-valid (RDWIDEQ = quiet)

	SETTASK	D/#		'give task 15/16 time slots to turn it back on in its new state

	<wait for some time, update pointers, loop>

David Betz · 2014-02-27 17:52

jmg wrote: »

Tasks I see as less important than having good Debug support.
- but once you get good debug snapshots, task swap tends to come too, so anyone who wants to do that, can.

Debug support? I thought that was what the LED on the board was for. :-)

Bill Henning · 2014-02-27 17:53

As I recall, threads came about due to propgcc not being able to pre-empt threads for the pthreads library (as there is no LMM anymore to conventiently allow pre-empting)

I am VERY much looking forward to improved debugging capabilities!

jmg wrote: »

Tasks I see as less important than having good Debug support.
- but once you get good debug snapshots, task swap tends to come too, so anyone who wants to do that, can.

potatohead · 2014-02-27 17:57

David Betz wrote: »

Debug support? I thought that was what the LED on the board was for. :-)

My thoughts exactly. Truth is, we have the means needed to debug rather nicely right now.

For detail testing, simulation code, running right on the P2 can single step, whatever, using an LMM style kernel.

David Betz · 2014-02-27 17:58

Bill Henning wrote: »

As I recall, threads came about due to propgcc not being able to pre-empt threads for the pthreads library (as there is no LMM anymore to conventiently allow pre-empting)

I am VERY much looking forward to improved debugging capabilities!

Really? Did Eric ask for them because I didn't.

cgracey · 2014-02-27 18:00

Roy Eltham wrote: »

We already have 8 cogs that can do 4 tasks each for 32 tasks total. Do we really need more tasks than that?

Also (tongue in cheek a bit, but), Chip, this is the slippery slope that eventually leads to the stuff you hate about Windows.... So many things going on pre-empting each other such that it all turns to molasses or worse, tasks get blocked out by other tasks so much that it stops responding to your input well.

Personally, it's starting to get frustrating to watch these threads anymore. You still have to get this whole thing through synthesis and married to the I/O pad frame, what happens when it doesn't fit? Also, we need things to be stable ASAP so we can get the docs and software done in time for shipping. It's going to takes months and months to get the compilers/etc. built and solid, it needs to have started last year, but at the moment we don't even have Spin2 at all and PASM2 is a moving target.

Sorry to be negative at all here, but somethings gotta give...

How this all gets used is up to programmers. There is, at least, a lot of debugging value in this addition.

Otherwise, you are right about time. I want this thing to be over, too. We are at the edge of the project and it's hard not to put a few more things in,

ozpropdev · 2014-02-27 18:03

David Betz wrote: »

Debug support? I thought that was what the LED on the board was for. :-)

LOL David

Unfortunately I need another 100 leds!

Brian

Sapieha · 2014-02-27 18:09

Hi Chip.

In time You put new things.

Can You made some additions to SERDES I/O ---->

1. Made Serial clock out of PIN --- Master mode
2. Made Slave mode

External Clock in to PIN

3. Made one I/O that connect both SERIN and SEROUT to same PIN --- Like I2C

That with 1. and 2. --- Can be much usable

cgracey wrote: »

How this all gets used is up to programmers. There is, at least, a lot of debugging value in this addition.

Otherwise, you are right about time. I want this thing to be over, too. We are at the edge of the project and it's hard not to put a few more things in,

Bill Henning · 2014-02-27 18:13

As my wife is fond of reminding me, I am not as young as I used to be, and neither is my memory - so I could be wrong about that recollection. Perhaps I dreamt the pthreads issue while snoring loudly (my wife assures me I shake the roof).

I do definitely recall Chip starting a thread about it... let me check... AHA!

Ahle has mentioned he'd like to make a preemptive multitasking system on the Prop2, so I've been thinking about what might be needed in the chip to facilitate it.

Found it!

http://forums.parallax.com/showthread.php/154167-What-s-needed-for-preemptive-multitasking

David Betz wrote: »

Really? Did Eric ask for them because I didn't.

ctwardell · 2014-02-27 18:15

I think that since time has already been spent on this (preemptive threading) and Chip has a handle on how to approach it that we should support him in seeing it through.

I know everyone is chomping at the bit to move ahead, but we would regret it later if PropGCC on the P2 has to use an LMM type setup to support threading instead of direct hubexec.

Commercial users would also be pleased with the ability to implement a proper RTOS.

My 2 cents,

C.W.

David Betz · 2014-02-27 18:18

Bill Henning wrote: »

As my wife is fond of reminding me, I am not as young as I used to be, and neither is my memory - so I could be wrong about that recollection. Perhaps I dreamt the pthreads issue while snoring loudly (my wife assures me I shake the roof).

I do definitely recall Chip starting a thread about it... let me check... AHA!

Found it!

http://forums.parallax.com/showthread.php/154167-What-s-needed-for-preemptive-multitasking

I wasn't necessarily saying it was a bad idea. I just meant that it wasn't something proposed by the propgcc team as far as I know. Anyway, it sounds like Chip has it all worked out. What I did propose was a TLB and a trap mechanism to handle TLB misses. However, I am well aware that that is out of the scope of P2 and maybe P3 as well.

David Betz · 2014-02-27 18:20

ctwardell wrote: »

I think that since time has already been spent on this (preemptive threading) and Chip has a handle on how to approach it that we should support him in seeing it through.

I know everyone is chomping at the bit to move ahead, but we would regret it later if PropGCC on the P2 has to use an LMM type setup to support threading instead of direct hubexec.

Commercial users would also be pleased with the ability to implement a proper RTOS.

My 2 cents,

C.W.

Unless things have changed since I last looked, I don't think PropGCC ever preempts threads on P1. It only switches threads when one thread yields. Of course you can run multiple threads in parallel if they are on different COGs.

jmg · 2014-02-27 18:22

Sapieha wrote: »

Can You made some additions to SERDES I/O ---->

1. Made Serial clock out of PIN --- Master mode
2. Made Slave mode
External Clock in to PIN
3. Made one I/O that connect both SERIN and SEROUT to same PIN --- Like I2C

I think that, and more, is planned for the serdes enhance pass.
Certainly clocked Sync modes are on the list.

Cluso99 · 2014-02-27 18:35

Chip's latest proposal sounds simple enough.

As an aside, he has realised the LIFOs only need to be 16 bits wide (Z+C needed too???). This is another prize.

But, since it is only Ale2 that originally asked and he subsequently pleaded for it to stop.

So may I ask, who is going to use this preemptive tasking???

Maybe the silicon is a waste and could be better utilised (we might even run out of it)?

Certainly I think for debugging, there are perhaps simpler and better ways to do it. Of course the ultimate is to buy a Parallax FPGA and have Chip add some special debug code in there for us.

rogloh · 2014-02-27 19:07

Cluso99 wrote: »

So may I ask, who is going to use this preemptive tasking???

Probably I would say all the people who use and write code in high level languages that want multithreading (eg typical RTOS users) that don't necessarily want to go to the trouble of adding YIELDS all over the place in their source to allow co-operative multithreading. The good thing about preemptive multitasking is you basically can use your original source code as is and don't need to worry (too much) about where you are being switched out. The RTOS does all that for you.

From what I see Chip is adding we still get total control of the switching from the scheduler task so the thread preemption is not completely automatic, software is still involved there. So we can still add more features in software such as thread locking, thread priorities etc layered on top of the scheduler for allowing critical thread sections and to influence thread switching order. It seems the best of all worlds here. We don't need interrupts but get what a lot of an RTOS normally offers along with rapid context switches.

PS. One question for Chip: if multitasking is enabled and a task is waiting on the system counter with WAITCNT (such as the scheduler task might be doing), does the time have to match exactly to wake it up or just that the elapsed time is greater than some calculated amount when WAITCNT executes? Because due to multitasking the clock cycle that exactly matches the WAITCNT instruction may not be exactly the one when it executes and checks the timer, the system ticks might be observed jumping up by 4's for example (with 4 tasks). We don't have some issue there do we?

Roger.

Propeller II update - BLOG

Comments