Propeller II update - BLOG

cgracey · 2014-03-07 14:06

jmg wrote: »

That makes sense, saving silicon is always good.

Is there a significant cost to allowing run-time choice of which task uses that single SWAP resource ?

A problem is, as coded now, a user has to do a rebuild and task remap, to debug a different task.

Also, can a user still 'see' the PC and Flags of the other tasks - ie how much DEBUG visibility is possible, on the other tasks ?

I think the only way this is going to be possible is to remap the task numbers, which is only 8 flops and a few mux's. I've been thinking about doing this, but haven't looked into it yet.

By the way, in the new setup, the tasks' PC's are all reset to $0000 for cog launches or $XXXX for hub exec launches. To point them somewhere different, before giving them time slots, you must use JMPTASK. So, if the tasks were to be remapped, allowing task 3 to masquerade as any other task, no special setup data would be forfeited.

cgracey · 2014-03-07 14:10

pedward wrote: »

sem_wait() decrements (locks) the semaphore pointed to by sem. If the semaphore's value is greater than zero, then the
decrement proceeds, and the function returns, immediately. If the semaphore currently has the value zero, then the call
blocks until either it becomes possible to perform the decrement (i.e., the semaphore value rises above zero), or a sig‐
nal handler interrupts the call.

We've got this covered with CMPSUB D,S/#. If D >= S/# then S/# is subtracted from D and C is set to one. Z is set to one if the overall result is 0.

One of the other cases, from what I understood, can be handled by SETB/CLRB D,S/#, which returns the original bit's value into C.

Heater. · 2014-03-07 15:36

pedward,

No.

I agree that yes, using common lingo is a good idea, but:

Historically all these concepts like "yield", "suspend" etc are high level language or operating system concepts.

What you like to call "yield" was called "SUSPEND" back in the days of the MASCOT standards for military software in the 1970s.

Whilst you are busy thinking about mutexes and semaphores and locks. Remember that no matter what POSIX says others have a different idea. The Ada language has the "rendezvou".

At the end of the day all these high level abstractions rely on a simple processor implementation of an atomic operation, like "lock exchange" back in the multiprocessor LOCUS 16 days, or the "lock" prefix in modern Intel x86 world.

Heater. · 2014-03-07 15:50

Bill.

Thanks Chip,
That has started to un-confuse me

Problem for me is that everything after that did not make any sense at all.

I guess I'm just too stupid to get the PII idea.

In the old days "yield" was called JMPRET. As used in FullDuplexSerial.

Then we have "automatic JMPRET" with the hardware scheduled threads. Sorry it was me who suggested that.

After that I totally lost the plot.

Bill Henning · 2014-03-07 16:13

Heater,

You are a smart guy... I have difficulty believing you don't understand the summary in

http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG/page298

I suspect it is more a case of you don't believe it is needed, and as you don't plan to use it, you are not interested - which is a valid point of view for someone who does not need those features.

The tasking hardware is a heck of a lot more than an automated jmpret, and I think that everyone will appreciate the new debugging capabilities.

Software threads come basically free with the debugging features added to tasking, and will be very useful once ethernet sockets and usb endpoints get used a lot.

I am eagerly awaiting the next fpga image drop, and SERDES/USB after that!

FYI, as 10Mbps soft Ethernet will clearly be impossible with the P2, I guess I'll just have to hook up an ENC26J60 to my DE2-115...

Heater. wrote: »

Bill.

Problem for me is that everything after that did not make any sense at all.

I guess I'm just too stupid to get the PII idea.

In the old days "yield" was called JMPRET. As used in FullDuplexSerial.

Then we have "automatic JMPRET" with the hardware scheduled threads. Sorry it was me who suggested that.

After that I totally lost the plot.

Cluso99 · 2014-03-07 16:13

Bill
Nice summary.

Chip,
Yes, I agree that the default start address is $000 for all tasks as JMPTASK works nicely to change it.

Do you have any spare bits left in the WIDE when T3SAVE is executed ?
The reason I ask, is it might be nice to know where the other tasks are up to (approximate is nice). Of course a T3LOAD would ignore those bits.

rogloh · 2014-03-07 16:17

@Chip, looking good though I do have a question for you on the use of TPAUSE/TRESUME for breakpoints.

Let's assume we wish to use TPAUSE as a breakpoint instruction which gets injected by an interactive debugger we control at run time into the user task code where we want our code to stop. This will obviously overwrite one original instruction in our code. Now once the user task hits this location and pauses and the scheduler detects the pause by monitoring the register where the TPAUSE writes, we go do all our task state analysis etc and then decide we want to continue on and allow the original instruction our TPAUSE breakpoint overwrote to execute. If we just increment the PC using TRESUME it will not be able to execute this instruction, it will do the next one. So I was also hoping the new changes being put in would allow us some way patch the original instruction back over the top of the TPAUSE and then continue on running the task that way. I know we would lose our breakpoint instruction temporarily when we do this, but the scheduler/debugger could always patch it back in again immediately if required once the original instruction got executed.

Is this type of sequence or something equivalent going to be possible? It's not really a problem if we have to stop the task and drain the pipeline etc. Also I know its not an option to run the original instruction we clobber in the context of the debugger task instead of our user task, it has to be the user task if it modifies task specific state such as its own PTRA or remapped registers etc. There might be other ways to redirect the PC to some temporary location in the user code and do the operation there and jump back etc but that just adds more side effects and so the ideal place to run the code is at the actual address where it normally runs. Can it be done?

Bill Henning · 2014-03-07 16:25

Ray,

Thanks.

rogloh,

- whenever a breakpoint is inserted, save the original instruction in a breakpoint table, which has one slot per breakpoint number
- before TRESUME, modify the WIDE to decrement the PC, put the original instruction back, then TRESUME

I think the above should work... if not, Chip will think of another way!

rogloh · 2014-03-07 16:35

Bill Henning wrote: »

rogloh,

- whenever a breakpoint is inserted, save the original instruction in a breakpoint table, which has one slot per breakpoint number
- before TRESUME, modify the WIDE to decrement the PC, put the original instruction back, then TRESUME

I think the above should work... if not, Chip will think of another way!

Yeah makes sense. Not a problem if we have to jump a few extra hoops, just as long as there is going to be some way to acheive the intended result.

Rayman · 2014-03-07 16:55

Is Prop2 still on track with Moore's law with respect to Prop1?

cgracey · 2014-03-07 17:10

rogloh wrote: »

@Chip, looking good though I do have a question for you on the use of TPAUSE/TRESUME for breakpoints.

Let's assume we wish to use TPAUSE as a breakpoint instruction which gets injected by an interactive debugger we control at run time into the user task code where we want our code to stop. This will obviously overwrite one original instruction in our code. Now once the user task hits this location and pauses and the scheduler detects the pause by monitoring the register where the TPAUSE writes, we go do all our task state analysis etc and then decide we want to continue on and allow the original instruction our TPAUSE breakpoint overwrote to execute. If we just increment the PC using TRESUME it will not be able to execute this instruction, it will do the next one. So I was also hoping the new changes being put in would allow us some way patch the original instruction back over the top of the TPAUSE and then continue on running the task that way. I know we would lose our breakpoint instruction temporarily when we do this, but the scheduler/debugger could always patch it back in again immediately if required once the original instruction got executed.

Is this type of sequence or something equivalent going to be possible? It's not really a problem if we have to stop the task and drain the pipeline etc. Also I know its not an option to run the original instruction we clobber in the context of the debugger task instead of our user task, it has to be the user task if it modifies task specific state such as its own PTRA or remapped registers etc. There might be other ways to redirect the PC to some temporary location in the user code and do the operation there and jump back etc but that just adds more side effects and so the ideal place to run the code is at the actual address where it normally runs. Can it be done?

Whenever you detect a TPAUSE, you will need to drain that task from the pipeline and starve it of cycles, so that you can do something with it.

You could then do a TRESUME and start it back up.

Or, you could replace the original instruction where the TPAUSE was inserted, and then restart. That would be like a breakpoint.

cgracey · 2014-03-07 17:12

Cluso99 wrote: »

Bill
Nice summary.

Chip,
Yes, I agree that the default start address is $000 for all tasks as JMPTASK works nicely to change it.

Do you have any spare bits left in the WIDE when T3SAVE is executed ?
The reason I ask, is it might be nice to know where the other tasks are up to (approximate is nice). Of course a T3LOAD would ignore those bits.

I think we have 13 bits free in the context WIDE.

Roy Eltham · 2014-03-07 17:18

Just wanted to toss out that typically with breakpoints, you want to set them and hit them every time. Not just once. So we'll need to hit a breakpoint, put the original instruction back, step once, then put the breakpoint back, then continue.

cgracey · 2014-03-07 18:07

Roy Eltham wrote: »

Just wanted to toss out that typically with breakpoints, you want to set them and hit them every time. Not just once. So we'll need to hit a breakpoint, put the original instruction back, step once, then put the breakpoint back, then continue.

Good point.

rogloh · 2014-03-07 18:18

cgracey wrote: »

Whenever you detect a TPAUSE, you will need to drain that task from the pipeline and starve it of cycles, so that you can do something with it.

You could then do a TRESUME and start it back up.

Or, you could replace the original instruction where the TPAUSE was inserted, and then restart. That would be like a breakpoint.

Great. Just what I was hoping for. I was beginning to think that the TRESUME itself was always required to clear an internal CPU flag indicating a paused state, but if we can simply drain the pipeline and overwrite the TPAUSE instruction with our original to achieve the same result without even needing to go touch the WIDE, that is pretty cool too. Thanks. I know we also want to be able to put the breakpoint back and that will require an extra single step operation in the debugger for doing that. Not a problem.

All sounds good.

Cluso99 · 2014-03-07 18:37

cgracey wrote: »

Whenever you detect a TPAUSE, you will need to drain that task from the pipeline and starve it of cycles, so that you can do something with it.

You could then do a TRESUME and start it back up.

Or, you could replace the original instruction where the TPAUSE was inserted, and then restart. That would be like a breakpoint.

To TRESUME the instruction replacing TPAUSE, the PC will be required to be decremented first because TRESUME will increment the PC, so it would normally be the instruction following the TPAUSE that would be executed. This is going to require a T3SAVE, Decrement the PC bits in the WIDE (a few instructions), then T3LOAD.

Would it be possible for TRESUME to have an extra bit (WC??) to determine whether to INC the PC or not ??
BTW this would then work for other tasks provided you didn't need to do T3SAVE/T3LOAD (ie examine the PC etc).

cgracey · 2014-03-07 19:00

Cluso99 wrote: »

To TRESUME the instruction replacing TPAUSE, the PC will be required to be decremented first because TRESUME will increment the PC, so it would normally be the instruction following the TPAUSE that would be executed. This is going to require a T3SAVE, Decrement the PC bits in the WIDE (a few instructions), then T3LOAD.

Would it be possible for TRESUME to have an extra bit (WC??) to determine whether to INC the PC or not ??
BTW this would then work for other tasks provided you didn't need to do T3SAVE/T3LOAD (ie examine the PC etc).

There's no need to do any of that. Just don't do a TRESUME and, instead, replace the TPAUSE instruction with the original instruction at that "breakpoint", and then start it up again.

I think TRESUME is too loaded of a name. TNUDGE would be better. It just nudges the target task's PC. There's no need to use it if you don't want the task's PC incremented. It's just a little pry bar you can pull out when needed to spring a task out of a TPAUSE. I mean, we could even just get rid of it and you could manually increment the PC in the WIDE 'task save' image. Ah, never mind - that would only work for task 3 that has the rich context save. Other tasks don't have that, so you'd need TRESUME to kick them out of TPAUSE's.

Roy Eltham · 2014-03-07 19:09

This may sound odd, but maybe TRESUME could become a was to continue after a breakpoint without removing the breakpoint. If you could have it start the task back up, but use the instruction in D instead of at PC for the first one to load into the pipe, then we could have easier "hit every time" breakpoints. Does that even make sense?

cgracey · 2014-03-07 19:11

Roy Eltham wrote: »

This may sound odd, but maybe TRESUME could become a was to continue after a breakpoint without removing the breakpoint. If you could have it start the task back up, but use the instruction in D instead of at PC for the first one to load into the pipe, then we could have easier "hit every time" breakpoints. Does that even make sense?

It would save a few instructions, but complicate things in the pipeline. Better to let it just fetch instructions on its own.

jmg · 2014-03-07 19:25

Roy Eltham wrote: »

Just wanted to toss out that typically with breakpoints, you want to set them and hit them every time. Not just once. So we'll need to hit a breakpoint, put the original instruction back, step once, then put the breakpoint back, then continue.

Yes, and also any breakpoint should be one cycle a single opcode long.

I'm not sure TPAUSE quite meets that yet, given the talk of preamble SW semaphores ?

Q: Is there an example of the new instructions, used as single-opcode breakpoints ?

potatohead · 2014-03-07 19:29

Why one cycle? Isn't that a nice to have as oppoed to a must have?

Bill Henning · 2014-03-07 19:34

See http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1248284&viewfull=1#post1248284

Basically, what codes are used as breakpoints is up to the debugger/scheduler

Normally,

TPAUSE taskreg,#code

taskreg is the cog location the debugger/scheduler is monitoring for a non-zero value

1..511 is an constant passed as code (you can pass bigger numbers using AUGS or S register)

I'd suggest '1' be used for the task to request being shut down (based on exit(1))

A range (say 2-127) could be used to mean break points

Another range (say 128-255) could be used for system services (putch, getch, puts, read, write etc)

Last range could be signals being waited on (256-511)

When waiting for one-of-many signals, perhaps the bits 9..31 of a source register could indicate signals.

Basically, the mechanism is software defined, so it can be adopted to specific requirements (and not require hard logic to implement)

Btw, before resuming the task, the debugger/scheduler would be responsible for clearing taskreg to 0

jmg wrote: »

Yes, and also any breakpoint should be one cycle long.

I'm not sure TPAUSE quite meets that yet, given the talk of preamble SW semaphores ?

Q: Is there an example of the new instructions, used as single-opcode breakpoints ?

jmg · 2014-03-07 19:37

potatohead wrote: »

Why one cycle? Isn't that a nice to have as oppoed to a must have?

I'll reword that to be a single opcode long.- breakpoints need to fit in anywhere. In many processors this is called a TRAP

jmg · 2014-03-07 19:48

Bill Henning wrote: »

TPAUSE taskreg,#code

taskreg is the cog location the debugger/scheduler is monitoring for a non-zero value

1..511 is an constant passed as code (you can pass bigger numbers using AUGS or S register)

I'd suggest '1' be used for the task to request being shut down (based on exit(1))
A range (say 2-127) could be used to mean break points
Another range (say 128-255) could be used for system services (putch, getch, puts, read, write etc)
Last range could be signals being waited on (256-511)

I'm now unclear - Is this what Chip has implemented, or is this a suggestion ?

rogloh · 2014-03-07 20:00

Bill Henning wrote: »

See http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1248284&viewfull=1#post1248284

Basically, what codes are used as breakpoints is up to the debugger/scheduler

Normally,

TPAUSE taskreg,#code

taskreg is the cog location the debugger/scheduler is monitoring for a non-zero value

1..511 is an constant passed as code (you can pass bigger numbers using AUGS or S register)

I'd suggest '1' be used for the task to request being shut down (based on exit(1))

A range (say 2-127) could be used to mean break points

Another range (say 128-255) could be used for system services (putch, getch, puts, read, write etc)

Last range could be signals being waited on (256-511)

When waiting for one-of-many signals, perhaps the bits 9..31 of a source register could indicate signals.

Basically, the mechanism is software defined, so it can be adopted to specific requirements (and not require hard logic to implement)

Btw, before resuming the task, the debugger/scheduler would be responsible for clearing taskreg to 0

I like having the ability to use all 32 bits/AUGS for the parameter value used in TPAUSE as it is going to allow even more capabilities. For example, you could encode either a hub or COG address pointer in the value passed in S. This could inform the scheduler to periodically poll some hub (or COG) memory location slowly in the background and wake up the task if it sees it change value. You would essentially sleep the user task and let the scheduler poll the address in the background and later resume the task if it requires attention. In the meantime other user tasks could potentially be swapped in and executed instead or the freed up timeslots be given to another hardware task for example. All sorts of things would be possible...like spin tasks in one COG writing hub data which wakes up C tasks in another.

jmg · 2014-03-07 20:02

Cluso99 wrote: »

Would it be possible for TRESUME to have an extra bit (WC??) to determine whether to INC the PC or not ??

Are you meaning TRESUME is more like a TRELEASE, that removes any stall, but does not INC the PC itself ?

ie instead, the INC is done by the resumed task, because the stall condition is removed & when cycles are given back, the line executes then INCs the PC in normal fashion.
(well, almost normal, a released TPAUSE would not re-trigger)

That sounds a little safer, to have the Slave Task itself in full control of it's own PC, and it means any out-of-phase TRELEASE(resume) is effectively harmless.

It also allows the replace-break handling, as there you do want to first execute the replaced opcode, which may or may not advance the PC on that cycle.

An option to allow Controller-Task to force INC would cover all bases ?
(but if it releases the stall, is this needed anymore ?)

Bill Henning · 2014-03-07 20:07

This is what Chip said he would implement.

The reason TRESUME increments the PC is to skip past TPAUSE looping to itself.

jmg wrote: »

I'm now unclear - Is this what Chip has implemented, or is this a suggestion ?

Bill Henning · 2014-03-07 20:09

TRESUME does increment the PC, in the currently discussed version

It would be possible to instead have TPAUSE taskreg,#code exit its internal looping when taskreg is 0 (after it has written the non-zero code), then TRESUME would not need to increment the PC. If this was done, it might be possible to get rid of the TRESUME instruction?

jmg wrote: »

Are you meaning TRESUME is more like a TRELEASE, that removes any stall, but does not INC the PC itself ?

ie instead, the INC is done by the resumed task, because the stall condition is removed & when cycles are given back, the line executes then INCs the PC in normal fashion.
(well, almost normal, a released TPAUSE would not re-trigger)

That sounds a little safer, to have the Slave Task itself in full control of it's own PC, and it means any out-of-phase TRELEASE(resume) is effectively harmless.

It also allows the replace-break handling, as there you do want to first execute the replaced opcode, which may or may not advance the PC on that cycle.

An option to allow Controller-Task to force INC would cover all bases ?
(but if it releases the stall, is this needed anymore ?)

jmg · 2014-03-07 20:12

Bill Henning wrote: »

This is what Chip said he would implement.

The reason TRESUME increments the PC is to skip past TPAUSE looping to itself.

I understand that, but there will be more than one way to implement release of a pause.

I'm trying to see how a breakpoint -> code -> breakpoint sequence works, with an external forced INC PC.
(Which I think is also what Cluso was covering ?)

Dave Hein · 2014-03-07 20:18

Rayman wrote: »

Is Prop2 still on track with Moore's law with respect to Prop1?

I think we were on track until someone suggested preemptive threads. I believe the Moore's law train jumped the tracks about that time.

Propeller II update - BLOG

Comments