These instructions completely stall the pipeline and are not useful for multi-tasking. What the cog can do afterwards on the pins, though, looks like hardware on the scope - perfect timing. I didn't make an any-edge version, because it seemed practically useless. If you want to trigger on both edges, you can use alternating WAITPOS/WAITNEG's.
I guess, but this does cost more code. There are DDR and Manchester apps, where any-edge decisions can be useful.
If this works with a pin mask, you could wait_any_edge(2 pins) as the core of a SW quadrature counter.
I went through a long mental exercise when designing the counters and determined that counting either negative or positive edges was useful, but counting all edges was superfluous, or even sloppy.
Counting or timing both edges is less common, (usually because of jitter), but capture on any edge can be quite useful.
Really interesting idea, jac_goudsmit. Something like that wouldn't even require any more flip-flops. I'll see what I can do there. Wait.. I just realized that it wouldn't catch intervening events, so it's not going to be a solution to the did-a-change-happen? problem.
I abandoned the course I was on with the extra registers - too big and messy. Instead, I exploited an input-pin mux that was already present for the pin instructions and made an edge detector that stalls the pipeline until the desired edge event occurs. These are only for single-task programs that need to have perfect registration to external (pins) or internal (XCH) events edge events:
WAITPOS D/#n 'specify pin and wait for positive edge
WAITNEG D/#n 'specify pin and wait for negative edge
They work beautifully. They turn the cog into an edge-triggered flip-flop, which jmg would appreciate. These are way simpler to use than WAITPEQ/WAITPNE would be, and they detect the edge event, not just states, so one of them does the trick.
I think these two instructions will make fast XCH comms a snap. Using these to frame nibble/byte/word moves over I/O pins should make fast inter-Prop2 comms possible, too.
As I like it --- I still have question on it?
Can them have ----Time out?
Yes!!! That's practically free. I can just borrow the timeout detector from WAITPEQ/WAITPNE. Good thinking! I'll add the any-edge mode, too. That makes three: WAITPOS/WAITNEG/WAITANY. What's a better name: WAITCHG or WAITANY?
I like WAITCHG. Seems to me it would raise fewer questions than WAITANY would. Change leads somebody right to positive or negative edges by virtue of those instructions being there. WAITANY immediately triggers, "What is any?"
Yes!!! That's practically free. I can just borrow the timeout detector from WAITPEQ/WAITPNE. Good thinking! I'll add the any-edge mode, too. That makes three: WAITPOS/WAITNEG/WAITANY. What's a better name: WAITCHG or WAITANY?
Yes!!! That's practically free. I can just borrow the timeout detector from WAITPEQ/WAITPNE. Good thinking! I'll add the any-edge mode, too. That makes three: WAITPOS/WAITNEG/WAITANY. What's a better name: WAITCHG or WAITANY?
WaitAny reminds me too much of
'Press any key'
'Where's the any key ? '
So change or delta are better, and CHG packs into 3 letters so WAITCHG is fine.
These are at least 2-clock instructions because they need to develop a 2-clock history for comparison before they can release.
WAITPOS/WAITNEG/WAITCHG take a pin number as the input, not a mask. WAITPEQ/WAITPNE use 32-bit mask and compare values, instead.
If an edge arrives just before this instruction starts, does it see it ?
or does the opcode auto-clear an edge detect, which then can catch an edge how close to the opcode start ?
ie what apertures does this have ?
If it was just polling a 2FF edge detector, that has single clk outputs, and it could catch an edge arriving close to the same time as the opcode. (given the 2FF parallel pipeline effect)
An edge arriving before the wait part of the opcode, is discarded as unseen.
WaitAny reminds me too much of
'Press any key'
'Where's the any key ? '
So change or delta are better, and CHG packs into 3 letters so WAITCHG is fine.
I was going to reply with the same thing. Where is the any key? I am still looking for it. My vote is WAITCHG.
I guess, but this does cost more code. There are DDR and Manchester apps, where any-edge decisions can be useful.
More code in a loop translates in reduced speed. And this could be very usefull for LPDDR (as LPDDR has no minimum operating frequency, unlike DDR). I know that P2 cannot interface directly to the reduced voltage of LPDDR, but maybe other protocols could benefit.
WAITPEQ WAITPNE, WAITPOS, WAITNEG, WAITCHG and WAITCNT and WAITVID
Are the cccc (conditional execution) bits required???
I am asking this because I cannot recall using conditionals on these instructions.
If they were available, this would provide the instruction extension bits for...
Optional cnt timeout (instead of using WC)
Optional poll versions of these
This would free the WC & WZ flags so we would know if it were a timeout (C) and the pin states (Z)
We also have the "R" bit available.
If more than 1 pin was specified (use a mask as for waitpeq/waitpne), then the WAITPOS, WAITNEG, WAITCHG should wait for just 1 of the pins to go POS/NEG/CHG. This then complements the WAITPEQ and WAITPNE nicely.
I've used conditions on WAITVID to control things like sync differences between standards and or to provide different pixel options depending on some data read mid-frame. At higher resolutions and driver complexity levels, time and code space are important.
If they were available, this would provide the instruction extension bits for...
Optional cnt timeout (instead of using WC)
Optional poll versions of these
This would free the WC & WZ flags so we would know if it were a timeout (C) and the pin states (Z)
We also have the "R" bit available.
If there are variants possible, then sticky/non sticky could be useful.
Non sticky is a read-only poll of a 2-D-FF + XOR edge detector. Prior edges are simply missed.
Sticky has to add a JK FF, so any edge sets and opcode-test clears. Prior edges are caught, but their timing may skewed.
More resource is needed for sticky, as the opcode now also writes to the edge-cell.
I went through a long mental exercise when designing the counters and determined that counting either negative or positive edges was useful, but counting all edges was superfluous, or even sloppy.
If it will be a counter mode available for incremental encoders, counting on both the edges of the A and B signals will quadruple the encoder ppr resolution.
With WAITPOS/WAITNEG/WAITCHG, SPI would be almost trivial to implement as a pure software solution. Add two shift registers and a few instructions (see [post=1211433]#235 in the other thread[/post]), you can shift out bits from SRA on one edge and shift in bits to SRB on the other edge. With the hardware support, this would make the loop approximately 8-10 clock cylces (WAITPOS and WAITNEG each take two cylce, the shift instructions each take one cycle, the branch instruction takes a cycle, maybe increment a counter, etc.) giving a throughput of maybe 10-15MHz. It might not be as fast as some would like, but I think it would cover 80% of the use cases out there (at least until the better, faster, strong P3 arrives).
I'll also point out that the receive UART could also be implemented using the same instructions. However, I am not arguing that the current serial hardware should removed (I don't want to start that conversation again here). I'm just pointing out the possibilities...
With WAITPOS/WAITNEG/WAITCHG, SPI would be almost trivial to implement as a pure software solution. Add two shift registers and a few instructions (see [post=1211433]#235 in the other thread[/post]), you can shift out bits from SRA on one edge and shift in bits to SRB on the other edge. With the hardware support, this would make the loop approximately 8-10 clock cylces (WAITPOS and WAITNEG each take two cylce, the shift instructions each take one cycle, the branch instruction takes a cycle, maybe increment a counter, etc.) giving a throughput of maybe 10-15MHz. It might not be as fast as some would like, but I think it would cover 80% of the use cases out there (at least until the better, faster, strong P3 arrives).
I'll also point out that the receive UART could also be implemented using the same instructions. However, I am not arguing that the current serial hardware should removed (I don't want to start that conversation again here). I'm just pointing out the possibilities...
I like how you think, Seairth. You haven't managed to lose your focus, props(!) for that.
.... giving a throughput of maybe 10-15MHz. It might not be as fast as some would like...
You are correct it is not as fast as some would like.
It would seriously limit the edge that P2 has, to make it a me-too struggler.
Lots of parts can do 10~15MHz, with less CPU used, and less power, than a P2 bit-bashing.
This approach is fine for adding extra shifters, but the hardware should also support UART/Sync/SPI modes, to whatever limit the pins impose (50MHZ+?)
If you have used all that already, then move to SW versions.
If it will be a counter mode available for incremental encoders, counting on both the edges of the A and B signals will quadruple the encoder ppr resolution.
I think the counters can now do that in hardware.
When Chip releases the details, someone could test this new mode in field conditions.
I'd like to see clkfreq/2 for spi master, and ideally for slave as well (though it may have to be /3 with an external clock), with the cog only having to deal with 8 bit or larger chunks. That way, fast master/slave drivers can run as a task.
You are correct it is not as fast as some would like.
It would seriously limit the edge that P2 has, to make it a me-too struggler.
Lots of parts can do 10~15MHz, with less CPU used, and less power, than a P2 bit-bashing.
This approach is fine for adding extra shifters, but the hardware should also support UART/Sync/SPI modes, to whatever limit the pins impose (50MHZ+?)
If you have used all that already, then move to SW versions.
Currently I am struggling to read 12MHz (using 80MHz clock) because I have to sample every 7/6/7 clocks (6.667). I can do that but no processing is possible. I know the real hardware will double the speed. Some of these instructions will help. So will a serial shifter, especially on the input side.
Out of curiosity, how do the WAITPEQ/WAITPNE instructions work internally? Are they just examining the INx register itself, or are is it checking deeper than that? If checking the register, it might be nice to generalize those instructions for any register. Combine this with tasks (and the new changes to the WAITxxx instructions where they don't stall the pipeline), this would allow for an effective synchronization mechanism between tasks.
Don't say "interrupt". That word is banned around here.
Say "event driven". Which is what you always wanted anyway.
It's just a case of having the right events for your threads to wait on.
Event-driven is certainly the ideal. I suppose its zenith might be an automated system which can launch a cog to process an event. Then, program flow is in no way "interrupted" yet events are processed in a timely, automated manner - sans blocking and code stratagems. This also makes efficient use of cogs, since a cog is used only when the event requires processing.
If an edge arrives just before this instruction starts, does it see it ?
or does the opcode auto-clear an edge detect, which then can catch an edge how close to the opcode start ?
ie what apertures does this have ?
If it was just polling a 2FF edge detector, that has single clk outputs, and it could catch an edge arriving close to the same time as the opcode. (given the 2FF parallel pipeline effect)
An edge arriving before the wait part of the opcode, is discarded as unseen.
The trouble is that you don't know what pin to start staring at until the instruction is at pipeline stage 4, so edge monitoring cannot happen early. Well, it could happen early if we had 2 stages of 128 flip-flops that always monitored every pin signal, but 256*8 more flops is too many.
The trouble is that you don't know what pin to start staring at until the instruction is at pipeline stage 4, so edge monitoring cannot happen early. Well, it could happen early if we had 2 stages of 128 flip-flops that always monitored every pin signal, but 256*8 more flops is too many.
How about if the first time you execute one of these edge detection instructions it always fails to detect an edge no matter what the state of the pin but it remembers what pin is being monitored so it can constantly sample it until the instruction comes around again at which time you return the real result. So, if the pin in the instruction matches the one that is being remembered then you return a real value. If not, you change the pin being remembered and start monitoring that new pin until encountering another instruction checking the same pin. I guess that's a bit convoluted though.
The trouble is that you don't know what pin to start staring at until the instruction is at pipeline stage 4, so edge monitoring cannot happen early. Well, it could happen early if we had 2 stages of 128 flip-flops that always monitored every pin signal, but 256*8 more flops is too many.
Ah ok, so the structure is a 128:1 mux and then a local edge detector ?
This would have a leading edge blanking scheme, to avoid false-triggering when activated on a pin already hi ?
Comments
If this works with a pin mask, you could wait_any_edge(2 pins) as the core of a SW quadrature counter.
Counting or timing both edges is less common, (usually because of jitter), but capture on any edge can be quite useful.
As I like it --- I still have question on it?
Can them have ----Time out?
Yes!!! That's practically free. I can just borrow the timeout detector from WAITPEQ/WAITPNE. Good thinking! I'll add the any-edge mode, too. That makes three: WAITPOS/WAITNEG/WAITANY. What's a better name: WAITCHG or WAITANY?
I am a bit unclear on weather these would work on a specific pin number, or a pin mask like WAITPE/WAITPNE
if it is a mask, does it block until all pins rise/fall/change, or until just one?
Can I assume these WAITxxx instructions behave like one cycle instructions, ie release the stall in the following cycle?
Thanks -- That give much possibility's.
What on naming --- As You know --- English are not my power language
These are at least 2-clock instructions because they need to develop a 2-clock history for comparison before they can release.
WAITPOS/WAITNEG/WAITCHG take a pin number as the input, not a mask. WAITPEQ/WAITPNE use 32-bit mask and compare values, instead.
No. It's all the same clock domain. The whole core is one clock.
WaitAny reminds me too much of
'Press any key'
'Where's the any key ? '
So change or delta are better, and CHG packs into 3 letters so WAITCHG is fine.
If an edge arrives just before this instruction starts, does it see it ?
or does the opcode auto-clear an edge detect, which then can catch an edge how close to the opcode start ?
ie what apertures does this have ?
If it was just polling a 2FF edge detector, that has single clk outputs, and it could catch an edge arriving close to the same time as the opcode. (given the 2FF parallel pipeline effect)
An edge arriving before the wait part of the opcode, is discarded as unseen.
I was going to reply with the same thing. Where is the any key? I am still looking for it. My vote is WAITCHG.
More code in a loop translates in reduced speed. And this could be very usefull for LPDDR (as LPDDR has no minimum operating frequency, unlike DDR). I know that P2 cannot interface directly to the reduced voltage of LPDDR, but maybe other protocols could benefit.
I mean that any-edge decisions could be more usefull (not more code and reduced speed).
WAITPEQ WAITPNE, WAITPOS, WAITNEG, WAITCHG and WAITCNT and WAITVID
Are the cccc (conditional execution) bits required???
I am asking this because I cannot recall using conditionals on these instructions.
If they were available, this would provide the instruction extension bits for...
- Optional cnt timeout (instead of using WC)
- Optional poll versions of these
- This would free the WC & WZ flags so we would know if it were a timeout (C) and the pin states (Z)
- We also have the "R" bit available.
If more than 1 pin was specified (use a mask as for waitpeq/waitpne), then the WAITPOS, WAITNEG, WAITCHG should wait for just 1 of the pins to go POS/NEG/CHG. This then complements the WAITPEQ and WAITPNE nicely.Just a thought.
If there are variants possible, then sticky/non sticky could be useful.
Non sticky is a read-only poll of a 2-D-FF + XOR edge detector. Prior edges are simply missed.
Sticky has to add a JK FF, so any edge sets and opcode-test clears. Prior edges are caught, but their timing may skewed.
More resource is needed for sticky, as the opcode now also writes to the edge-cell.
If it will be a counter mode available for incremental encoders, counting on both the edges of the A and B signals will quadruple the encoder ppr resolution.
All the more reason to support (using the task registers) a cooperative tasking mode!
I'll also point out that the receive UART could also be implemented using the same instructions. However, I am not arguing that the current serial hardware should removed (I don't want to start that conversation again here). I'm just pointing out the possibilities...
You are correct it is not as fast as some would like.
It would seriously limit the edge that P2 has, to make it a me-too struggler.
Lots of parts can do 10~15MHz, with less CPU used, and less power, than a P2 bit-bashing.
This approach is fine for adding extra shifters, but the hardware should also support UART/Sync/SPI modes, to whatever limit the pins impose (50MHZ+?)
If you have used all that already, then move to SW versions.
I think the counters can now do that in hardware.
When Chip releases the details, someone could test this new mode in field conditions.
10-15Mhz is not fast enough for many uses.
I'd like to see clkfreq/2 for spi master, and ideally for slave as well (though it may have to be /3 with an external clock), with the cog only having to deal with 8 bit or larger chunks. That way, fast master/slave drivers can run as a task.
The trouble is that you don't know what pin to start staring at until the instruction is at pipeline stage 4, so edge monitoring cannot happen early. Well, it could happen early if we had 2 stages of 128 flip-flops that always monitored every pin signal, but 256*8 more flops is too many.
Ah ok, so the structure is a 128:1 mux and then a local edge detector ?
This would have a leading edge blanking scheme, to avoid false-triggering when activated on a pin already hi ?