But it don't tell "how many operations are flowing through".
There is only a single result buffer per Cog and it has no need of reporting the number of commands in the pipeline. There is a fixed known CORDIC execution time.
There is only a single result buffer per Cog and it has no need of reporting the number of commands in the pipeline. There is a fixed known CORDIC execution time.
Chip said there were two levels possible.
It may be that a tracking counter can reduce to a flag, as they are essentially covering the same issues.
The counter is more granular, and should catch, and be able to tag, incorrect reads.
It is only 2 bits and some state logic.
The four states are:
1 - Command(s) running but no new result yet (time to wait)
2 - Command(s) running and new result (don't wait, another result coming)
3 - Command(s) not running and new result (better late than never)
4 - Command not running and no new result (all results read or full chip reset)
JMG,
We've generally agreed that the programmer has to take some responsibility for keeping order. There will be situations that produces wrong results because the result buffer wasn't checked at the right times. That's perfectly acceptable imho.
What worried me the most was the potential lock-ups.
Are you meaning results from earlier operations, catched on the fly, certified by the presence of "done", but, don't pertaining to the current set, the one the code "supposes" to be working on?
Couldn't the counter be externally simulated (maintained), so the code can be assured that it got n results, from n intended operations, and not n+1, n+2 or (if possible, and maximum) n+3, where, in such cases, the first ones were due to earlier operations, that weren't fully grabbed from SM's output stages?
There is only a single result buffer per Cog and it has no need of reporting the number of commands in the pipeline. There is a fixed known CORDIC execution time.
Chip said there were two levels possible.
I believe he was talking about the number of commands in the pipeline. Three does just fit for a few clocks ... so a maximally pumped CORDIC will alternate between two and three.
1: We know it's not hard to get a dangling result. Any code that attempts to flush the CORDIC without a dangling result will hang. There is no way to check for the excess nor any way to software reset the CORDIC.
2: Any code that is issuing multiple CORDIC commands can be thrown in the same manner as the above dangling result but in this case the extra results are actually lost. When returning to collect the remaining singular result, if no corrective action is taken, then it'll hang on the second GETQX/Y.
PS: These two points are based on Chip's current implementation only.
What is a bad result?
Are you meaning results from earlier operations, catched on the fly, certified by the presence of "done", but, don't pertaining to the current set, the one the code "supposes" to be working on?
Assuming the stall is fixed, the CORDIC queue design means too many reads of the wrong phase will not give the expected result.
Earlier examples had this repeating the last value.
Couldn't the counter be externally simulated (maintained), so the code can be assured that it got n results, from n intended operations, and not n+1, n+2 or (if possible, and maximum) n+3, where, in such cases, the first ones were due to earlier operations, that weren't fully grabbed from SM's output stages?
I don't think so, as the effect ozpropdev shows works fine with 3 issues, provided timing is such that the first result is removed soon enough.
A SW counter does not cycle-count.
It seems simple enough to use the 2 bit Up.Down tracking queue counter, which will track timing, as it is INC on Result_Ready, and it can also (eg) write C or Z on an error.
During testing and debug, you can confirm you have valid results, and if you have a stable design case,
(eg ozpropdev with no interrupts) you could choose to skip testing valid if time is very tight.
No program is going to do a GETQX/GETQY before it issues a CORDIC command, right? That's seems kind of crazy to worry about.
I don't see any point in even initializing the 'done' flags, as they will get cleared on a CORDIC command, set when done (a result pops out, may be more coming), and cleared again on GETQX/GETQY. The mechanism, as is, does all you'd need it to do to allow overlapped operations.
If you never overlap CORDIC commands and you don't have interrupts doing CORDIC commands or GETQX/GETQY, there should never be a problem.
If you DO overlap CORDIC operations, you better just use STALLI to protect from interruption, so that you can get the results back out before they get overwritten due to cycles stolen by the interrupt, eventuating in a hang situation when you do the last GETQX/GETQY.
I may not be seeing everything, of course.
It just seems better to keep it simple, not worrying about checks. Those checks take a good chunk of time, themselves, and code, too.
The main concern, is the failure mode : Drop Dead.
It can be triggered by a change in timing, which is quite subtle, and likely also means Break/ Step debug of such code could simply freeze.
Or, if someone inserts a printf, or moves an array store...
The mechanism, as is, does all you'd need it to do to allow overlapped operations.
True, except for the timing case ozpropev has given.
The problem with timing triggered failures, is they are very hard to nail down, and very hard to prove you cannot ever have.
It just seems better to keep it simple, not worrying about checks. Those checks take a good chunk of time, themselves, and code, too.
Simple is good, but there is no code or time overhead to a simple queue tracking counter ?
That avoids the stall, and can provide a Valid/Error flag, which can assist debug.
It also allows power-users to queue 3 Cordic calls, if they want to.
It also allows power-users to queue 3 Cordic calls, if they want to.
A count doesn't help here because there is only one buffer for the results. Anything beyond a single result is trashed. All that's needed to make things clean is the in-progress and done pair of flags.
This will always have to be a generic first step when using the CORDIC the way it is. But even with this I'm still nervous. The extra flag would clear this issue completely.
Chip,
The four states formed from the two flags are:
1 - Command(s) running but no new result yet (time to wait)
2 - Command(s) running and new result (don't wait, another result coming)
3 - Command(s) not running and new result (better late than never)
4 - Command not running and no new result (all results read or full chip reset)
There is three effective states: State 1 is the only one that waits. States 2 and 3 are treated as one and the same. State 4 is the extra needed one that allows a clean immediate return from GETQx instructions where currently it will just hang.
And this also means no extra instructions, no additional checks or status or resets. It just works as you've already designed it to.
This will always have to be a generic first step when using the CORDIC the way it is. But even with this I'm still nervous. The extra flag would clear this issue completely.
I finally understand what the concern is:
A CORDIC computation may be in progress when the cog is restarted.
If something was already in transit, that could pose a problem. It would only happen if the cog was restarted in hub exec mode (no lengthy load time) and it immediately did a CORDIC command and then went into a GETQX/GETQY, which could catch those results that were in transit from when the cog was running before. That seems like an extreme long shot, but it does need to be covered for. A simple WAITX #36, or so, would certainly cover for that possibility, before using the CORDIC.
As soon as a CORDIC command executes, those 'ready' flags for X and Y are cleared and will not be set until results arrive. We just need to be sure that they are the results of our own CORDIC command, and not from another program that used the same cog.
Chip,
It's more than that. The other half of the problem is out of sync results. A fast Cog start-up is not needed to still get out of sync. That's why a CORDIC flush is needed at code start. Pondering it a bit more, out-of-sync can still happen even with the extra in-progress flag. So, the CORDIC flush would still be needed. Just it shrinks to the two GETQx instructions.
It also allows power-users to queue 3 Cordic calls, if they want to.
A count doesn't help here because there is only one buffer for the results. Anything beyond a single result is trashed. All that's needed to make things clean is the in-progress and done pair of flags.
A count does not create a buffer but it does track what is possible - as you also said above "Three does just fit for a few clocks ... so a maximally pumped CORDIC will alternate between two and three."
It is both that 'alternate between three and two' case, as well as underflow, that a Counter can manage.
You can always think of a counter as flags, if you like
You can always think of a counter as flags, if you like
If there is three commands issued to the CORDIC but the program fails to pick up any results in a timely manner then only the first result gets held in the buffer. The following two results just vanish as they come off the pipeline. The only thing left to track is that first result.
How would you debug ozpropdev's code example above ?
What if you added a save-to-hub line to assist debug, for example ?
The common expectation is that would merely slow things down, not kill it stone dead.
This will happen all right. Ozprop's example could be considered as manual simulation of what can happen with a debug interruption. He doesn't demo a failed condition, which would have required two commands in the CORDIC pipeline, but it's not too hard to extrapolate to that happening.
Data corruption can be expected but the hardware just immediately hanging is not quite on. I presume even the debug functionality will be locked out at that point.
Comments
There is only a single result buffer per Cog and it has no need of reporting the number of commands in the pipeline. There is a fixed known CORDIC execution time.
Chip said there were two levels possible.
It may be that a tracking counter can reduce to a flag, as they are essentially covering the same issues.
The counter is more granular, and should catch, and be able to tag, incorrect reads.
It is only 2 bits and some state logic.
You're right!
If there is "done", grab results, then save, then update results pointer.
If there is still "in-progress", loop back and wait for "done".
Sorry by the messy oyster brain that fills my skull.
1 - Command(s) running but no new result yet (time to wait)
2 - Command(s) running and new result (don't wait, another result coming)
3 - Command(s) not running and new result (better late than never)
4 - Command not running and no new result (all results read or full chip reset)
We've generally agreed that the programmer has to take some responsibility for keeping order. There will be situations that produces wrong results because the result buffer wasn't checked at the right times. That's perfectly acceptable imho.
What worried me the most was the potential lock-ups.
What is a bad result?
Are you meaning results from earlier operations, catched on the fly, certified by the presence of "done", but, don't pertaining to the current set, the one the code "supposes" to be working on?
Couldn't the counter be externally simulated (maintained), so the code can be assured that it got n results, from n intended operations, and not n+1, n+2 or (if possible, and maximum) n+3, where, in such cases, the first ones were due to earlier operations, that weren't fully grabbed from SM's output stages?
However, there seems to be a simple way to also give the equivalent of the SPI WCOL example here.
It could also make debug work better.
Do you have an example of single queue lockup ?
You mentioned earlier that "And even sticking to one CORDIC operation at a time is not going to guaranty of reliability."
I believe he was talking about the number of commands in the pipeline. Three does just fit for a few clocks ... so a maximally pumped CORDIC will alternate between two and three.
1: We know it's not hard to get a dangling result. Any code that attempts to flush the CORDIC without a dangling result will hang. There is no way to check for the excess nor any way to software reset the CORDIC.
2: Any code that is issuing multiple CORDIC commands can be thrown in the same manner as the above dangling result but in this case the extra results are actually lost. When returning to collect the remaining singular result, if no corrective action is taken, then it'll hang on the second GETQX/Y.
PS: These two points are based on Chip's current implementation only.
Earlier examples had this repeating the last value.
I don't think so, as the effect ozpropdev shows works fine with 3 issues, provided timing is such that the first result is removed soon enough.
A SW counter does not cycle-count.
It seems simple enough to use the 2 bit Up.Down tracking queue counter, which will track timing, as it is INC on Result_Ready, and it can also (eg) write C or Z on an error.
During testing and debug, you can confirm you have valid results, and if you have a stable design case,
(eg ozpropdev with no interrupts) you could choose to skip testing valid if time is very tight.
You can software reset the CORDIC with only three instructions:
No program is going to do a GETQX/GETQY before it issues a CORDIC command, right? That's seems kind of crazy to worry about.
I don't see any point in even initializing the 'done' flags, as they will get cleared on a CORDIC command, set when done (a result pops out, may be more coming), and cleared again on GETQX/GETQY. The mechanism, as is, does all you'd need it to do to allow overlapped operations.
If you never overlap CORDIC commands and you don't have interrupts doing CORDIC commands or GETQX/GETQY, there should never be a problem.
If you DO overlap CORDIC operations, you better just use STALLI to protect from interruption, so that you can get the results back out before they get overwritten due to cycles stolen by the interrupt, eventuating in a hang situation when you do the last GETQX/GETQY.
I may not be seeing everything, of course.
It just seems better to keep it simple, not worrying about checks. Those checks take a good chunk of time, themselves, and code, too.
What if you try using the debug interrupt with that?
If you did have a big math problem, maybe better to put 8 cogs on it instead?
The main concern, is the failure mode : Drop Dead.
It can be triggered by a change in timing, which is quite subtle, and likely also means Break/ Step debug of such code could simply freeze.
Or, if someone inserts a printf, or moves an array store...
Correct, but even if it did, that code ideally should simply return a bad result.
True, except for the timing case ozpropev has given.
The problem with timing triggered failures, is they are very hard to nail down, and very hard to prove you cannot ever have.
Simple is good, but there is no code or time overhead to a simple queue tracking counter ?
That avoids the stall, and can provide a Valid/Error flag, which can assist debug.
It also allows power-users to queue 3 Cordic calls, if they want to.
A count doesn't help here because there is only one buffer for the results. Anything beyond a single result is trashed. All that's needed to make things clean is the in-progress and done pair of flags.
Ah yes, I got your point, but needs a fix ...
This will always have to be a generic first step when using the CORDIC the way it is. But even with this I'm still nervous. The extra flag would clear this issue completely.
The four states formed from the two flags are:
1 - Command(s) running but no new result yet (time to wait)
2 - Command(s) running and new result (don't wait, another result coming)
3 - Command(s) not running and new result (better late than never)
4 - Command not running and no new result (all results read or full chip reset)
There is three effective states: State 1 is the only one that waits. States 2 and 3 are treated as one and the same. State 4 is the extra needed one that allows a clean immediate return from GETQx instructions where currently it will just hang.
And this also means no extra instructions, no additional checks or status or resets. It just works as you've already designed it to.
I finally understand what the concern is:
If something was already in transit, that could pose a problem. It would only happen if the cog was restarted in hub exec mode (no lengthy load time) and it immediately did a CORDIC command and then went into a GETQX/GETQY, which could catch those results that were in transit from when the cog was running before. That seems like an extreme long shot, but it does need to be covered for. A simple WAITX #36, or so, would certainly cover for that possibility, before using the CORDIC.
As soon as a CORDIC command executes, those 'ready' flags for X and Y are cleared and will not be set until results arrive. We just need to be sure that they are the results of our own CORDIC command, and not from another program that used the same cog.
Is this the matter?
Chip,
It's more than that. The other half of the problem is out of sync results. A fast Cog start-up is not needed to still get out of sync. That's why a CORDIC flush is needed at code start. Pondering it a bit more, out-of-sync can still happen even with the extra in-progress flag. So, the CORDIC flush would still be needed. Just it shrinks to the two GETQx instructions.
A count does not create a buffer but it does track what is possible - as you also said above "Three does just fit for a few clocks ... so a maximally pumped CORDIC will alternate between two and three."
It is both that 'alternate between three and two' case, as well as underflow, that a Counter can manage.
You can always think of a counter as flags, if you like
How would you debug ozpropdev's code example above ?
What if you added a save-to-hub line to assist debug, for example ?
The common expectation is that would merely slow things down, not kill it stone dead.
If there is three commands issued to the CORDIC but the program fails to pick up any results in a timely manner then only the first result gets held in the buffer. The following two results just vanish as they come off the pipeline. The only thing left to track is that first result.
This will happen all right. Ozprop's example could be considered as manual simulation of what can happen with a debug interruption. He doesn't demo a failed condition, which would have required two commands in the CORDIC pipeline, but it's not too hard to extrapolate to that happening.
Data corruption can be expected but the hardware just immediately hanging is not quite on. I presume even the debug functionality will be locked out at that point.
? The failure condition is pretty clear from his comments in the code - maybe you missed those ?
if "waitx >34 never gets here"
if "waitx >18 never gets here"
Code failure example I am referring to is here :
http://forums.parallax.com/discussion/comment/1351783/#Comment_1351783