Propeller II update - BLOG

Sapieha · 2014-03-08 21:32

Hi.

What about THDNZ.

Task Halt if D Non Zero.

cgracey wrote: »

THALT is pretty good! It's nice and terse and reads well.

It works well, too, because it doesn't imply something that may or may not happen afterwards, like how TPAUSE suggests that things will resume, when maybe that's just not in the program.

JRetSapDoog · 2014-03-08 21:33

Whatever you name it, you've simple got to pair some silicon with ozpropdev's "TBREAK"; it's just too clever...having a thread or whatever stop (indefinitely) for some tea and crumpets.

ozpropdev · 2014-03-08 21:41

JRetSapDoog wrote: »

Whatever you name it, you've simple got to pair some silicon with ozpropdev's "TBREAK"; it's just too clever...having a thread or whatever stop (indefinitely) for some tea and crumpets.

I actually was being serious with TBREAK, because it seemed the most common use was a BREAKPOINT

Edit: Although Tea and Crumpets does sound like a good idea!

cgracey · 2014-03-08 21:42

jmg wrote: »

or TSIGLNZ ?

Task Signaling by Write to Register and Loop while register Non Zero

That's very descriptive, but hard to remember.

I think you, jmg, don't mind things being cryptic if it means they can be more exacting. I think the rest of us don't trust ourselves so well to remember names like that, so we look for simple compromises that convey the gist of the functionality, without getting too detailed and, consequently, choppy. TSIGLNZ does have a nice, perhaps Germanic, ring to it. Actually, it seems like something Enigma might have employed.

potatohead · 2014-03-08 21:53

TSALT Task Signal and Halt?

eldonb46 · 2014-03-08 22:09

(post deleted)

Sapieha · 2014-03-08 22:19

Hi Chip.

In attachment You will find ---->
What is needed for SERDES to USB brige.

cgracey · 2014-03-08 22:26

Sapieha wrote: »

Hi Chip.

In attachment You will find ---->
What is needed for SERDES to USB brige.

Those are only $0.37 @2,500 units from Digikey. I hope we can get by without any extra chips.

Sapieha · 2014-03-08 22:31

Hi Chip.

Only thing --- using external one is -->
It consume mor that 2 PIN.

cgracey wrote: »

Those are only $0.37 @2,500 units from Digikey. I hope we can get by without any extra chips.

Sapieha · 2014-03-08 23:02

Hi Chip.

I have be look on theirs site and them say don't use for new designs.

So I attach PDF to new one.

cgracey wrote: »

Those are only $0.37 @2,500 units from Digikey. I hope we can get by without any extra chips.

whicker · 2014-03-08 23:08

Sapieha,

That's a transceiver or essentially just a line driver chip.
The physical signaling isn't the problem, it's the support for generating and interpreting the serial bitstream, from what I understand.

Ahle2 · 2014-03-08 23:37

What about "TSUBMIT"?
It has a double nature; It means "to give over control to authority" and at the same time "report", "put in" , "propose", "pass on". So it basically means "signal and wait for authority to process and grant further advancement".

Btw, the reason that I'm not very talkative about preemtive threading, even though I brought it up, is because JMG and Bill pretty much "does my work". And everything is turning out for the better.

/Johannes

Cluso99 · 2014-03-08 23:37

I have just been catching up and a few things come to mind...

TPAUSE etc

I love the new function of TPAUSE (whatever it will be called) and the removal of TRESUME.

Actually I preferred the name TPAUSE but TSTALL / TSTALLNZ / STALLNZ / PASSNZ also sound great. This instruction has other uses because now tasks can communicate with themselves, and simply stall while waiting for something, all in a single instruction. This has lots of uses.

I might actually agree that maybe we could just look for BIT 0 = 0 for resuming.

Can we actually use AUGS and AUGD with TPAUSE ? I presume AUGS would work to pass a 32bit parameter, but since D is a cog register AUGD is useless ?

Single-step and Breakpoints

Earlier I was trying to describe why we needed the controlling task monitoring TPAUSE to be able to either inc PC or not. I didn't make the point clear so I will try again.

I am unsure whether the latest change to TPAUSE precludes this anyway ?

The old trace method I used was on P1 (and was developing for P2) was that the instructions were executed from an LMM style routine. With the new relative jumps, this is not possible without some considerable software to recalculate the relative address and plug it into the instruction, perhaps with the addition of AUGx as well.

What had been discussed is that the original instruction would replace the TPAUSE instruction and the task would be permitted to continue (without incrementing PC, or by decrementing the PC in WIDE and then TRESUME with increment PC which cancels the decrement/increment).

I am unsure what the impact of the new TPAUSE has on this. I presume we can no longer replace the TPAUSE instruction as it is currently held within the pipe rather than being cleared out (sort of like PASSCNT does for tasks).

I will wait to see the answers.

Anyway, I think its time to try these task things, and while that occurs, perhaps we can move on to USB/SERDES. If there is some compelling reason, the tasking can always be revisited later, or if simple during USB/SERDES.

David,
I did not see an answer to your question as to why TSAVE could not save to cog ram instead of a WIDE register. The reason is we are saving 8 longs and this would take 8 clocks to put into cog ram (because it is only 32bits wide).

Cluso99 · 2014-03-08 23:43

Sapieha wrote: »

Hi Chip.

I have be look on theirs site and them say don't use for new designs.

So I attach PDF to new one.

Sapieha,
As has been said, it's just the electrical interface. But the gem is here...

Compatible with the VHDL “Serial Interface Engine”

from USB Implementers' Forum

Can you find the VDHL code and what licenses does it entail ?

Cluso99 · 2014-03-08 23:47

I have just thought about it...

TPAUSE is really just these 2 instructions combined

MOV reg,S/#
JNZ *

Is there anything special that it also does???
Does it prime the ability to use T3SAVE or can this be done (successfully) at any point in task #3 (ie, only useful if task 3 is stalled with TPAUSE or PASSCNT or WAITxxx ???)

Cluso99 · 2014-03-08 23:52

Chip,
I am presuming the reason we only have T3SAVE & T3LOAD is that the mux to/from Task 3's PC/Z/C/states are the large silicon part, and not the requirement for multiple WIDES ?

Sapieha · 2014-03-09 00:06

Hi Cluso

In first place with this PDF's.

My idea was to show what logic is needed on silicon to manage USB.

Cluso99 wrote: »

Sapieha,
As has been said, it's just the electrical interface. But the gem is here...
Compatible with the VHDL “Serial Interface Engine”
from USB Implementers' Forum

Can you find the VDHL code and what licenses does it entail ?

cgracey · 2014-03-09 00:57

Cluso99 wrote: »

Chip,
I am presuming the reason we only have T3SAVE & T3LOAD is that the mux to/from Task 3's PC/Z/C/states are the large silicon part, and not the requirement for multiple WIDES ?

T3SAVE/T3LOAD move 244 bits of context data between the cog/task3 and WIDE registers. Data handled includes:

of cog:

INDA, INDB setup and current values
DCACHE address and validity

of task 3:

PTRA, PTRB, PTRX, PTRY
PORA, PORB, PORC, PORD
Z, C, PC
AUGS and AUGD states
delayed jump states
TLOCK pending state
TCHECK state

Task 3's LIFO stack must be handled outside of T3SAVE/T3LOAD.

T3SAVE/T3LOAD make quite a comprehensive context switcher. The threads running in task 3 can use INDA/INDB, the WIDE registers, and their own resources, all with impunity. Threads must do a TLOCK while using the MUL/DIV/SQRT/CORDIC circuits, though. And there would be some limitation on registers. Aside from that, the threads can act like they have the whole cog to themselves.

TCHECK (or whatever it will get called) can use AUGS to write a big value, if you needed one.

TCHECK works completely independently of T3SAVE/T3LOAD, but it may be useful to notify a scheduler of early thread completion in which T3SAVE/T3LOAD may come into play.

cgracey · 2014-03-09 01:00

Cluso99 wrote: »

I have just thought about it...

TPAUSE is really just these 2 instructions combined

MOV reg,S/#
JNZ *

Is there anything special that it also does???
Does it prime the ability to use T3SAVE or can this be done (successfully) at any point in task #3 (ie, only useful if task 3 is stalled with TPAUSE or PASSCNT or WAITxxx ???)

TCHECK (was TPAUSE) works like this little program:

MOV D,S/#
JNZ D,@$

Note that the MOV executes only once, but the JNZ executes indefinitely.

cgracey · 2014-03-09 01:09

Cluso99 wrote: »

What had been discussed is that the original instruction would replace the TPAUSE instruction and the task would be permitted to continue (without incrementing PC, or by decrementing the PC in WIDE and then TRESUME with increment PC which cancels the decrement/increment).

I am unsure what the impact of the new TPAUSE has on this. I presume we can no longer replace the TPAUSE instruction as it is currently held within the pipe rather than being cleared out (sort of like PASSCNT does for tasks).

It sounds like you are talking about implementing breakpoints here. If the target task is stuck in a TCHECK, you could just overwrite that TCHECK instruction with the original and it would go along its merry way. You'd only need to clear that task from the pipeline in order to single-step it, so that you could replace the original instruction again with the TCHECK. This is going to make breakpoints really fun and simple.

Ahle2 · 2014-03-09 01:30

cgracey wrote: »

TPAUSE and TRESUME are gone.

In TPAUSE's old place is:

TCHECK D,S/# 'Write S/# into D and jump to self. On subsequent iterations, don't write D, but jump to self if D <> 0.

This gets rid of the need for TRESUME. It takes one bit of state storage to track TCHECK now, so that we know if it's on its first or a subsequent iteration. On the first iteration, it writes S/# into D and jumps to itself. On subsequent iterations, it doesn't write D, but jumps to itself if D <> 0.

So, task A does a TCHECK to write a non-zero value into some register. Task B notices the non-0 value and can do whatever it wants about it, but can write 0 to the register to release Task A.

If you use "D == original D" as the condition instead, I think we can resolve JMGs and Bills despute once and for all.
Booth proposals needs the register value to change in order to work. It is then up to the the coder to choose the most suitable implementation. Probably a third alternative

. All implementations needs the value to change in some way or another.
I love the simplicity of JMGs atomic signal passing between master and slave but without loosing the MSB.

/Johannes

jmg · 2014-03-09 01:36

Cluso99 wrote: »

Is there anything special that it also does???
Does it prime the ability to use T3SAVE or can this be done (successfully) at any point in task #3 (ie, only useful if task 3 is stalled with TPAUSE or PASSCNT or WAITxxx ???)

I think the exact semantics here, is Task #3 has to be starved of cycles/slices, before it can SAVE.
It can use TPAUSE to stay in a known place, whilst the master(debug) task removes its slices.

There were problems with T3SAVE & T3LOAD IF a multi-cycle opcode had not fully completed,

One suggestion was to make T3SAVE, (or another opcode), wait for completion of any multicycle opcodes.
Note here multi-cycle means to include preamble loads, and post-amble result-read of shared opcodes.
Such auto-complete would also allow any threads to happily round-robin share common COG resource (not just one thread).

I'm not sure of the status of that detail ?

The ideal is to allow T3 to SAVE/LOAD without affecting the slices of any other tasks, and to allow any task to use Mathops.
(with the usual one at a time caveat of shared resource)

Cluso99 · 2014-03-09 01:38

cgracey wrote: »

It sounds like you are talking about implementing breakpoints here. If the target task is stuck in a TCHECK, you could just overwrite that TCHECK instruction with the original and it would go along its merry way. You'd only need to clear that task from the pipeline in order to single-step it, so that you could replace the original instruction again with the TCHECK. This is going to make breakpoints really fun and simple.

Thanks Chip - yes I was referring to breakpoints.

Because TCHECK would be in the pipeline, I presume it stays there unless we permit it to run thru the pipeline by giving it an instruction?
What would be the sequence needed to replace the breakpoint instruction (TCHECK) and have it be executed (something like the following??)?

TCHECK treg,#n 'task 3 executes a "stall on nz"

'now the supervising task executes the following...
SETTASK #n 'starve task 3 of cycles
T3SAVE 'save the current state of task3 in WIDE (in case we need to look at anything, and do that)
...
'find out the PC of the TCHECK instruction using the WIDE (or we may have been in a fixed location). Set that to be the "instr" in the following MOV.
MOV instr,originstr 'replace the TCHECK instruction with the original programs instruction
MOV treg,#0 'remove the TCHECK stall by setting treg=0
SETTASK #m 'give task 3 cycles again

' task 3 resumes be refetching the replaced TCHECK instruction...

Now, did we also need to do any further stepping to get the pipeline free of task 3 after the SETTASK #n starved task 3 of clocks ???

jmg · 2014-03-09 01:48

Ahle2 wrote: »

If you use "D <> original D" as the condition instead, I think we can resolve JMGs and Bills despute once and for all.
Booth proposals needs the register value to change in order to work. It is then up to the the coder to choose the most suitable implementation. Probably a third alternative . All implementations needs the value to change in some way or another.
I personally love the simplicity of JMGs atomic signal passing between master and slave but without loosing the MSB.

/Johannes

"D32 <> oD32" has some merit, and I considered that, but keep in mind, in terms of gate level logic resource, the most compact is a single Bit test (JB31), followed by a Zero test (JZ) and then '<>' needs more compare logic, and also a temp location to keep a copy of oD32.
Also Not-Equals can manage the waiting task, but the polling task also needs a compact check. JZ and JB are both compact. (1 line)

Cluso99 · 2014-03-09 01:50

I have edited your post in red to indicate what I think you mean or happens - but I am not sure???

jmg wrote: »

I think the exact semantics here, is Task #3 has to be starved of cycles/slices, before it can be SAVEd with T3SAVE by another task.
It can use [d]TPAUSE[/d] TCHECK to stay in a known place, whilst the master(debug) task removes its slices.

There were problems with T3SAVE & T3LOAD IF a multi-cycle opcode had not fully completed,
Understood. We could fetch the instruction that we have stalled it at, and determine if it was a multi-cycle opcode. This of course is in the case it was not stopped by a TCHECK. Or perhaps by allowing a single cycle to task3 (SETTASK followed by another SETTASK) and then re-examining a T3SAVE and checking if the PC had incremented, we would know the next instruction was stopped before it could be executed.

One suggestion was to make T3SAVE, (or another opcode), wait for completion of any multicycle opcodes.
Note here multi-cycle means to include preamble loads, and post-amble result-read of shared opcodes.
Such auto-complete would also allow any threads to happily round-robin share common COG resource (not just one thread).

I'm not sure of the status of that detail ?
This seems way too complex in silicon.

The ideal is to allow T3 to SAVE/LOAD without affecting the slices of any other tasks, and to allow any task to use Mathops.
(with the usual one at a time caveat of shared resource)

Does this make sense to you?

Cluso99 · 2014-03-09 01:53

Ahle2 wrote: »

If you use "D == original D" as the condition instead, I think we can resolve JMGs and Bills despute once and for all.
Booth proposals needs the register value to change in order to work. It is then up to the the coder to choose the most suitable implementation. Probably a third alternative . All implementations needs the value to change in some way or another.
I love the simplicity of JMGs atomic signal passing between master and slave but without loosing the MSB.

/Johannes

Sounds like a nice alternative. I will wait for Chip's comments as to silicon/timing.

jmg · 2014-03-09 01:57

cgracey wrote: »

T3SAVE/T3LOAD make quite a comprehensive context switcher. The threads running in task 3 can use INDA/INDB, the WIDE registers, and their own resources, all with impunity. Threads must do a TLOCK while using the MUL/DIV/SQRT/CORDIC circuits, though. And there would be some limitation on registers.

So that limitation/rule-book is still present ?
- Instead of the silicon managing sharing, the programmer has to use the big-hammer, and disable all other Threads before first mathop load, and until after the last mathop read for each math opcode ?. That can add a lot of jitter.

Cluso99 · 2014-03-09 03:05

jmg wrote: »

So that limitation/rule-book is still present ?
- Instead of the silicon managing sharing, the programmer has to use the big-hammer, and disable all other Threads before first mathop load, and until after the last mathop read for each math opcode ?. That can add a lot of jitter.

Or the programmer ensures only 1 task uses these singular resources at a time.

BTW This was accepted as a limitation way back when tasks were added.

jmg · 2014-03-09 03:07

Cluso99 wrote: »

I have edited your post in red to indicate what I think you mean or happens - but I am not sure???

Pretty much.

Cluso99 wrote:

Understood. We could fetch the instruction that we have stalled it at, and determine if it was a multi-cycle opcode.

True, but it is more complex, as you need to encapsulate the preamble, and post-amble save of results, on the same side of a SAVE.
If you T3SAVE too early, when you swap back later, the immediately executed result-read does not get the answer you hoped for.
A band-aid is to block all other threads. (even tho they may never have requested resource)

One suggestion was to make T3SAVE, (or another opcode), wait for completion of any multicycle opcodes.
Note here multi-cycle means to include preamble loads, and post-amble result-read of shared opcodes.
Such auto-complete would also allow any threads to happily round-robin share common COG resource (not just one thread).

I'm not sure of the status of that detail ?
This seems way too complex in silicon.

Not really, it is essentially a busy flag, like many micros have already on peripherals.
It has no effect on those not wanting to share.

Ahle2 · 2014-03-09 03:07

jmg wrote: »

"
Also Not-Equals can manage the waiting task, but the polling task also needs a compact check. JZ and JB are both compact. (1 line)

I don't follow. You can use these instructions as before on the polling task. My proposal does not change a single line(character) of code in either Bills or your examples. They would work the same as before. And both would work instead of just one of them.

NOTE! I changed the "<>" to "==" in my post above!

Propeller II update - BLOG

Comments