I don't think WC is the right answer here. It's primary (only?) purpose is to avoid a potential bug in the code. If you forget the WC and deadlock, there is no recovery. No debugging.
Also, I think it would be much quicker for Chip to simply get rid of the stall altogether than to make it conditional. At this point, simplify, document, and move on. We have other fish that need frying (smart pins)!
I don't think WC is the right answer here. It's primary (only?) purpose is to avoid a potential bug in the code. If you forget the WC and deadlock, there is no recovery. No debugging.
Also, I think it would be much quicker for Chip to simply get rid of the stall altogether than to make it conditional. At this point, simplify, document, and move on. We have other fish that need frying (smart pins)!
I wasn't very clear was I? I had in mind the polling commands similar to WAITxxx and POLLxxx. Then you know if you have data ready. Simple would be best.
I don't think a cog program is going to do a GETQX or GETQY before a CORDIC command.
As long as you write sensible code, there should be no problems. The only possible disaster I see is if you are overlapping CORDIC commands and an interrupt occurs. That can be easily avoided, though, by either not doing that if you are using interrupts, or using STALLI/ALLOWI around that code.
I don't think a cog program is going to do a GETQX or GETQY before a CORDIC command.
It's issue isn't whether a normal program will do the right thing. The issue is that there is a condition in which it is possible to irrecoverably lock up a cog, which should never be possible.
I don't think a cog program is going to do a GETQX or GETQY before a CORDIC command.
It's issue isn't whether a normal program will do the right thing. The issue is that there is a condition in which it is possible to irrecoverably lock up a cog, which should never be possible.
Indeed, it's the cog equivalent of "Blue Screen Of Death". Bad show.
I don't think a cog program is going to do a GETQX or GETQY before a CORDIC command.
It's issue isn't whether a normal program will do the right thing. The issue is that there is a condition in which it is possible to irrecoverably lock up a cog, which should never be possible.
Indeed, it's the cog equivalent of "Blue Screen Of Death". Bad show.
At least BSOD gives some diagnostic info.
Regardless, I think we all know what the current limitations of the CORDIC engine are. If you follow these rules, you will always be safe:
* Perform no more than one operation at a time.
* Do not perform a CORDIC operation inside an ISR if the interrupt can occur while the main code or another ISR is already in the middle of a CORDIC operation.
If Chip isn't going to change anything at this time, then let's move on! (can you tell that I'm eager to get those smart pins?)
I don't think a cog program is going to do a GETQX or GETQY before a CORDIC command.
It's issue isn't whether a normal program will do the right thing. The issue is that there is a condition in which it is possible to irrecoverably lock up a cog, which should never be possible.
Indeed, it's the cog equivalent of "Blue Screen Of Death". Bad show.
At least BSOD gives some diagnostic info.
Regardless, I think we all know what the current limitations of the CORDIC engine are. If you follow these rules, you will always be safe:
* Perform no more than one operation at a time.
* Do not perform a CORDIC operation inside an ISR if the interrupt can occur while the main code or another ISR is already in the middle of a CORDIC operation.
If Chip isn't going to change anything at this time, then let's move on! (can you tell that I'm eager to get those smart pins?)
There are actually quite a few things that can lock a cog up. How about doing a WAITxxx without establishing the setup, or having the event occur? Bad programming cannot be guarded against.
There are actually quite a few things that can lock a cog up. How about doing a WAITxxx without establishing the setup, or having the event occur? Bad programming cannot be guarded against.
True. I hadn't thought about the WAITxxx instructions. So, just leave CORDIC as it is for now.
I think we need to come to a happy medium between too many silicon belts and suspenders and identifying issues (probably many strange cases of bad programming) and just document what you can't do or will get you in trouble. Let's find the danger zones and put up signs and document preventative processes.
There are always going to be cases of, "if I do A,B and C then this bad thing happens" - well then read this and don't do A,B and C without doing X before C like it says in the document.
We're all big kids around here. I don't want to program at lower levels while wearing my helmet, knee pads and elbow pads.
I also imagine there were things that would have or could have been disallowed on the P1 that people have found ways to exploit in a good way.
Okay, how about an instruction to reset the Cog's CORDIC result flags, or a variant of GETQX/Y that always returns immediately and can be used to force reset the flags.
That way, a person could set a cog watch dog type flag and recover by seeing it not update and a supervisor role performed by another COG could restart the COG.
Or, say things get loaded dynamically. People might do this. That code could reset the CORDIC prior to doing anything.
I do agree with docs and examples rather than too many care functions.
Experienced users may drive the math hard, and that is nice to have on the table. Brian got three results, for example. It's like the P1 waitvid in this way. Some glitches were possible, but the basic nature of it allowed for a lot more performance than expected when that behavior [waitvid hand off point] ended up being properly exploited.
We can put minimums out there too. People stick to those and have few worties.
Couldn't a (existing) timeout interrupt work here? (Interrupts get us into trouble, interrupts get us out of trouble)
If you have a cog already using interrupts, that might affect a 3 cordic instruction chain, then set a 'guard time' a bit like watchdog timer, and if interrupts push over that timer boundary you can intervene via the timeout ISR. Yes horribly undeterministic but you must have already compromised determinism to get into this fix.
If the WC option alone could not fulfil the mission, because it is only telling that the result that was just captured using GETQX or GETQY is not valid, why don't stablish an operation counter (individual for each COG), incremented every time a new CORDIC operation is submited from each COG and decremented when the final results reaches CORDIC SM's output stage..
If one uses GETQX or GETQY and there is at least one early requested operation in progress, but that didn't resulted yet, CY returns accordingly and the remaining (progressing) ops counter returns in place of X or Y or both, whichever option better fits.
Shure, if there are no operations progressing thru the CORDIC engine for this particular COG, CY and the count value of zero will tell all the history.
This way, nothing could be possibly blocked and you can know how many results should be grabbed, (and discarded if this is the intent), before you start another CORDIC operation round.
Long process running ISRs (single step included) could also use this counter value with advantage, since the coder can stabilish a mechanism for the ISR itself to recover any eventual results progressing thru the CORDIC resolver, and make them available (buffer/scratchpad) and flagged, upon return.
In the single step case, it's even better to stablish such a mechanism, because the ISR routine could progress till the point where GETQX or GETQY will be executed, and use early recovered values that exited from the CORDIC SM in a previous iteration, to simulate the function of the GETQ itself, thus, not disrupting this kind of code.
There in lies an example of my whole concern. I'm thinking there is likely other examples like this where, even within a running program, things can get confused. And even sticking to one CORDIC operation at a time is not going to guaranty of reliability.
What are the failure triggers for one CORDIC operation at a time ?
I think this needs the failure conditions identified for 1,2,3 CORDIC issues to decide if rules alone are enough.
I'd be fine with the tools warning on a 3 deep case, if that were the only risk area, but if even a 1 deep CORDIC can fail in an unexpected lockup, that is best avoided.
Even timer-miss coding, does eventually release and so gives some diagnostic information.
Ah a variant of that idea, Yanomani, is each result flag also has a partnering "in-progress flag". With an in-progress flag any GETQx instruction can know if it should continue to wait for a result or not. This is probably what Chip was trying to do in the first version of the result flag but it really does need two flags to work correctly.
That point, of needing both an in-progress and a done, has only just crystallised for me. I've seen that mechanism many times before but haven't paid it much attention until now.
....
however if I do stuff after the last qmul I have to be careful as to how long I take otherwise I can "hang" the code.
qmul m1,m2 'first operation
qmul m1a,m2a 'second operation
qmul m1b,m2b 'third operation
waitx #19 '<-- do stuff here
getqx m3 'first result
getqy m4
setb outb,#led0
getqx m3a 'second result
getqy m4a
setb outb,#led1 'waitx >34 never gets here
getqx m3b 'third result
getqy m4b
setb outb,#led2 'waitx >18 never gets here
is this as expected?
It's the never gets here that is the scariest here.
This seems like a HW-SW phasing issue, and in MCUs this is not uncommon & can be managed.
One example is SPI ports, where they have a WCOL flag for Write Collision, which is set if you write too soon.
ie the HW-SW interface never locks up, but it does signal when things went wrong.
Ideally with CORDIC, one-deep should always be 100% safe, ( is it now?) and two-three deep may need increasing user care, but should never lock-solid, and should have some means to tell a bad-result occurred.
I'm unclear on the exact SW-HW lock mechanism, but if a full FIFO is too costly, would something like a simple Up/Down counter avoid a SW stall ?
Roughly, if 2 reads are needed, qmul increments by 2, and getq waits if > 0 and decrements by 1.
The wait test for > 0 means under-run never stalls
JMG, it's completely solved by adding the second flag. No need to reset the CORDIC, no bad code lockups nor corner cases from, e.g. non-maskable debug interrupt.
It can be used in any situation with the worst case being wrong results.
There actually is no FIFO. I was surprised you were able to stack three operations and get correct results back. I figure that only two are sustainable on a continuous basis.
I think in the above failure case, there are only two, it just appears to be 3, but the timing means the read managed to remove the first result, before the third one arrives.
This means the simple counter may work, it needs to track what is available, and when there is no possible read, it avoids stall.
In the above, the +2 would be done on Result_Ready timing, and saturate at 4, -1 is done on Read.
Carefully timed Wr/Rd/Rd would be fine to 3, but a large (user unexpected) delay would mean later read attempts found a Ctr=0 flag, and thus skip (and set a flag to give a testable under-run error ?)
An "in-progress flag" will tell you, that something should be recovered in the future.
But it don't tell "how many operations are flowing through".
The count will also tell you (or allows you to calculate, at least) "how much time" you have to get your job done, till the next result reaches the utput stage, or worse, will get overwritten (overrun???) by another operation in progress.
I had long ISRs (many instructions to execute) in mind, when i was thinking about this, and was looking for a mechanism to let the decision to be made inside them; if, how often and for how much time they should dedicate their attention to the CORDIC SM, to perform their (ISR) functions, and also, don't disturb the routines they interrupted (sure, if they were coded to act, in some cooperative ways).
The single stepper just happens to fit, as one more case that could benefit from such an approach.
It'll enable the single step ISR (or any other) to start on the fly, and look at the neighborhood, to tell if something is about to crash.
Comments
I don't think WC is the right answer here. It's primary (only?) purpose is to avoid a potential bug in the code. If you forget the WC and deadlock, there is no recovery. No debugging.
Also, I think it would be much quicker for Chip to simply get rid of the stall altogether than to make it conditional. At this point, simplify, document, and move on. We have other fish that need frying (smart pins)!
An immediate return from GETQx would need an event/poll or something similar.
A third option is QMUL doesn't return until operation is done. But this has many timing repercussions on top of the obvious lost MIPS.
I wasn't very clear was I? I had in mind the polling commands similar to WAITxxx and POLLxxx. Then you know if you have data ready. Simple would be best.
As long as you write sensible code, there should be no problems. The only possible disaster I see is if you are overlapping CORDIC commands and an interrupt occurs. That can be easily avoided, though, by either not doing that if you are using interrupts, or using STALLI/ALLOWI around that code.
It's issue isn't whether a normal program will do the right thing. The issue is that there is a condition in which it is possible to irrecoverably lock up a cog, which should never be possible.
At least BSOD gives some diagnostic info.
Regardless, I think we all know what the current limitations of the CORDIC engine are. If you follow these rules, you will always be safe:
* Perform no more than one operation at a time.
* Do not perform a CORDIC operation inside an ISR if the interrupt can occur while the main code or another ISR is already in the middle of a CORDIC operation.
If Chip isn't going to change anything at this time, then let's move on! (can you tell that I'm eager to get those smart pins?)
There are actually quite a few things that can lock a cog up. How about doing a WAITxxx without establishing the setup, or having the event occur? Bad programming cannot be guarded against.
True. I hadn't thought about the WAITxxx instructions. So, just leave CORDIC as it is for now.
What would be ideal?
There are always going to be cases of, "if I do A,B and C then this bad thing happens" - well then read this and don't do A,B and C without doing X before C like it says in the document.
We're all big kids around here. I don't want to program at lower levels while wearing my helmet, knee pads and elbow pads.
I also imagine there were things that would have or could have been disallowed on the P1 that people have found ways to exploit in a good way.
Yeah, at some point, the programmer has to take full responsibilty.
or something similar to establish a known state.
That way, a person could set a cog watch dog type flag and recover by seeing it not update and a supervisor role performed by another COG could restart the COG.
Or, say things get loaded dynamically. People might do this. That code could reset the CORDIC prior to doing anything.
I'm not sure just the flags get us there.
Experienced users may drive the math hard, and that is nice to have on the table. Brian got three results, for example. It's like the P1 waitvid in this way. Some glitches were possible, but the basic nature of it allowed for a lot more performance than expected when that behavior [waitvid hand off point] ended up being properly exploited.
We can put minimums out there too. People stick to those and have few worties.
If you have a cog already using interrupts, that might affect a 3 cordic instruction chain, then set a 'guard time' a bit like watchdog timer, and if interrupts push over that timer boundary you can intervene via the timeout ISR. Yes horribly undeterministic but you must have already compromised determinism to get into this fix.
Anyway, its a way out of lockup
If one uses GETQX or GETQY and there is at least one early requested operation in progress, but that didn't resulted yet, CY returns accordingly and the remaining (progressing) ops counter returns in place of X or Y or both, whichever option better fits.
Shure, if there are no operations progressing thru the CORDIC engine for this particular COG, CY and the count value of zero will tell all the history.
This way, nothing could be possibly blocked and you can know how many results should be grabbed, (and discarded if this is the intent), before you start another CORDIC operation round.
Long process running ISRs (single step included) could also use this counter value with advantage, since the coder can stabilish a mechanism for the ISR itself to recover any eventual results progressing thru the CORDIC resolver, and make them available (buffer/scratchpad) and flagged, upon return.
In the single step case, it's even better to stablish such a mechanism, because the ISR routine could progress till the point where GETQX or GETQY will be executed, and use early recovered values that exited from the CORDIC SM in a previous iteration, to simulate the function of the GETQ itself, thus, not disrupting this kind of code.
Only my two cents..
What are the failure triggers for one CORDIC operation at a time ?
I think this needs the failure conditions identified for 1,2,3 CORDIC issues to decide if rules alone are enough.
I'd be fine with the tools warning on a 3 deep case, if that were the only risk area, but if even a 1 deep CORDIC can fail in an unexpected lockup, that is best avoided.
Even timer-miss coding, does eventually release and so gives some diagnostic information.
It's the never gets here that is the scariest here.
This seems like a HW-SW phasing issue, and in MCUs this is not uncommon & can be managed.
One example is SPI ports, where they have a WCOL flag for Write Collision, which is set if you write too soon.
ie the HW-SW interface never locks up, but it does signal when things went wrong.
Ideally with CORDIC, one-deep should always be 100% safe, ( is it now?) and two-three deep may need increasing user care, but should never lock-solid, and should have some means to tell a bad-result occurred.
I'm unclear on the exact SW-HW lock mechanism, but if a full FIFO is too costly, would something like a simple Up/Down counter avoid a SW stall ?
Roughly, if 2 reads are needed, qmul increments by 2, and getq waits if > 0 and decrements by 1.
The wait test for > 0 means under-run never stalls
It can be used in any situation with the worst case being wrong results.
This means the simple counter may work, it needs to track what is available, and when there is no possible read, it avoids stall.
In the above, the +2 would be done on Result_Ready timing, and saturate at 4, -1 is done on Read.
Carefully timed Wr/Rd/Rd would be fine to 3, but a large (user unexpected) delay would mean later read attempts found a Ctr=0 flag, and thus skip (and set a flag to give a testable under-run error ?)
An "in-progress flag" will tell you, that something should be recovered in the future.
But it don't tell "how many operations are flowing through".
The count will also tell you (or allows you to calculate, at least) "how much time" you have to get your job done, till the next result reaches the utput stage, or worse, will get overwritten (overrun???) by another operation in progress.
I had long ISRs (many instructions to execute) in mind, when i was thinking about this, and was looking for a mechanism to let the decision to be made inside them; if, how often and for how much time they should dedicate their attention to the CORDIC SM, to perform their (ISR) functions, and also, don't disturb the routines they interrupted (sure, if they were coded to act, in some cooperative ways).
The single stepper just happens to fit, as one more case that could benefit from such an approach.
It'll enable the single step ISR (or any other) to start on the fly, and look at the neighborhood, to tell if something is about to crash.