Cordic pipeline test

evanh · 2015-10-30 18:55

Oi! JMG, you're commenting on old posts!

evanh · 2015-10-30 18:58

Yanomani wrote: »

But it don't tell "how many operations are flowing through".

There is only a single result buffer per Cog and it has no need of reporting the number of commands in the pipeline. There is a fixed known CORDIC execution time.

evanh · 2015-10-30 19:01

"In-progress" plus "done" is totally enough.

jmg · 2015-10-30 19:13

evanh wrote: »

There is only a single result buffer per Cog and it has no need of reporting the number of commands in the pipeline. There is a fixed known CORDIC execution time.

Chip said there were two levels possible.

It may be that a tracking counter can reduce to a flag, as they are essentially covering the same issues.

The counter is more granular, and should catch, and be able to tag, incorrect reads.
It is only 2 bits and some state logic.

Yanomani · 2015-10-30 19:15

evanh

You're right!

If there is "done", grab results, then save, then update results pointer.
If there is still "in-progress", loop back and wait for "done".

Sorry by the messy oyster brain that fills my skull.

evanh · 2015-10-30 19:16

The four states are:
1 - Command(s) running but no new result yet (time to wait)
2 - Command(s) running and new result (don't wait, another result coming)
3 - Command(s) not running and new result (better late than never)
4 - Command not running and no new result (all results read or full chip reset)

jmg · 2015-10-30 19:17

Yanomani wrote: »

If there is "done", grab results, then save, then update results pointer.
If there is still "in-progress", loop back and wait for "done".

.. and can this also tag a bad result ?

evanh · 2015-10-30 19:28

JMG,
We've generally agreed that the programmer has to take some responsibility for keeping order. There will be situations that produces wrong results because the result buffer wasn't checked at the right times. That's perfectly acceptable imho.

What worried me the most was the potential lock-ups.

Yanomani · 2015-10-30 19:35

jmg

What is a bad result?

Are you meaning results from earlier operations, catched on the fly, certified by the presence of "done", but, don't pertaining to the current set, the one the code "supposes" to be working on?

Couldn't the counter be externally simulated (maintained), so the code can be assured that it got n results, from n intended operations, and not n+1, n+2 or (if possible, and maximum) n+3, where, in such cases, the first ones were due to earlier operations, that weren't fully grabbed from SM's output stages?

jmg · 2015-10-30 19:38

evanh wrote: »

What worried me the most was the potential lock-ups.

Agreed on the lockiup.

However, there seems to be a simple way to also give the equivalent of the SPI WCOL example here.
It could also make debug work better.

jmg · 2015-10-30 19:42

evanh wrote: »

What worried me the most was the potential lock-ups.

Do you have an example of single queue lockup ?

You mentioned earlier that "And even sticking to one CORDIC operation at a time is not going to guaranty of reliability."

evanh · 2015-10-30 19:46

jmg wrote: »

evanh wrote: »

There is only a single result buffer per Cog and it has no need of reporting the number of commands in the pipeline. There is a fixed known CORDIC execution time.

Chip said there were two levels possible.

I believe he was talking about the number of commands in the pipeline. Three does just fit for a few clocks ... so a maximally pumped CORDIC will alternate between two and three.

evanh · 2015-10-30 19:58

jmg wrote: »

Do you have an example of single queue lockup ?

1: We know it's not hard to get a dangling result. Any code that attempts to flush the CORDIC without a dangling result will hang. There is no way to check for the excess nor any way to software reset the CORDIC.

2: Any code that is issuing multiple CORDIC commands can be thrown in the same manner as the above dangling result but in this case the extra results are actually lost. When returning to collect the remaining singular result, if no corrective action is taken, then it'll hang on the second GETQX/Y.

PS: These two points are based on Chip's current implementation only.

jmg · 2015-10-30 20:27

Yanomani wrote: »

jmg

What is a bad result?
Are you meaning results from earlier operations, catched on the fly, certified by the presence of "done", but, don't pertaining to the current set, the one the code "supposes" to be working on?

Assuming the stall is fixed, the CORDIC queue design means too many reads of the wrong phase will not give the expected result.
Earlier examples had this repeating the last value.

Yanomani wrote: »

Couldn't the counter be externally simulated (maintained), so the code can be assured that it got n results, from n intended operations, and not n+1, n+2 or (if possible, and maximum) n+3, where, in such cases, the first ones were due to earlier operations, that weren't fully grabbed from SM's output stages?

I don't think so, as the effect ozpropdev shows works fine with 3 issues, provided timing is such that the first result is removed soon enough.
A SW counter does not cycle-count.

It seems simple enough to use the 2 bit Up.Down tracking queue counter, which will track timing, as it is INC on Result_Ready, and it can also (eg) write C or Z on an error.

During testing and debug, you can confirm you have valid results, and if you have a stable design case,
(eg ozpropdev with no interrupts) you could choose to skip testing valid if time is very tight.

Electrodude · 2015-10-30 21:17

evanh wrote: »

There is no way to check for the excess nor any way to software reset the CORDIC.

You can software reset the CORDIC with only three instructions:

QMUL dummy, dummy
GETQX dummy
GETQY dummy

cgracey · 2015-10-30 22:49

I don't see the concern here.

No program is going to do a GETQX/GETQY before it issues a CORDIC command, right? That's seems kind of crazy to worry about.

I don't see any point in even initializing the 'done' flags, as they will get cleared on a CORDIC command, set when done (a result pops out, may be more coming), and cleared again on GETQX/GETQY. The mechanism, as is, does all you'd need it to do to allow overlapped operations.

If you never overlap CORDIC commands and you don't have interrupts doing CORDIC commands or GETQX/GETQY, there should never be a problem.

If you DO overlap CORDIC operations, you better just use STALLI to protect from interruption, so that you can get the results back out before they get overwritten due to cycles stolen by the interrupt, eventuating in a hang situation when you do the last GETQX/GETQY.

I may not be seeing everything, of course.

It just seems better to keep it simple, not worrying about checks. Those checks take a good chunk of time, themselves, and code, too.

Rayman · 2015-10-30 23:48

Overlapping cordic sounds like a use with extreme caution thing...

What if you try using the debug interrupt with that?

If you did have a big math problem, maybe better to put 8 cogs on it instead?

jmg · 2015-10-30 23:52

cgracey wrote: »

I don't see the concern here.

The main concern, is the failure mode : Drop Dead.
It can be triggered by a change in timing, which is quite subtle, and likely also means Break/ Step debug of such code could simply freeze.
Or, if someone inserts a printf, or moves an array store...

cgracey wrote: »

No program is going to do a GETQX/GETQY before it issues a CORDIC command, right? That's seems kind of crazy to worry about.

Correct, but even if it did, that code ideally should simply return a bad result.

cgracey wrote: »

The mechanism, as is, does all you'd need it to do to allow overlapped operations.

True, except for the timing case ozpropev has given.
The problem with timing triggered failures, is they are very hard to nail down, and very hard to prove you cannot ever have.

cgracey wrote: »

It just seems better to keep it simple, not worrying about checks. Those checks take a good chunk of time, themselves, and code, too.

Simple is good, but there is no code or time overhead to a simple queue tracking counter ?
That avoids the stall, and can provide a Valid/Error flag, which can assist debug.
It also allows power-users to queue 3 Cordic calls, if they want to.

evanh · 2015-10-31 01:14

jmg wrote: »

It also allows power-users to queue 3 Cordic calls, if they want to.

A count doesn't help here because there is only one buffer for the results. Anything beyond a single result is trashed. All that's needed to make things clean is the in-progress and done pair of flags.

evanh · 2015-10-31 01:20

Electrodude wrote: »
You can software reset the CORDIC with only three instructions:
QMUL dummy, dummy
GETQX dummy
GETQY dummy

Ah yes, I got your point, but needs a fix ...

QMUL dummy, dummy
WAITX #40
GETQX dummy
GETQY dummy

This will always have to be a generic first step when using the CORDIC the way it is. But even with this I'm still nervous. The extra flag would clear this issue completely.

evanh · 2015-10-31 01:23

Chip,
The four states formed from the two flags are:
1 - Command(s) running but no new result yet (time to wait)
2 - Command(s) running and new result (don't wait, another result coming)
3 - Command(s) not running and new result (better late than never)
4 - Command not running and no new result (all results read or full chip reset)

There is three effective states: State 1 is the only one that waits. States 2 and 3 are treated as one and the same. State 4 is the extra needed one that allows a clean immediate return from GETQx instructions where currently it will just hang.

And this also means no extra instructions, no additional checks or status or resets. It just works as you've already designed it to.

cgracey · 2015-10-31 01:41

evanh wrote: »
Electrodude wrote: »
You can software reset the CORDIC with only three instructions:
QMUL dummy, dummy
GETQX dummy
GETQY dummy
Ah yes, I got your point, but needs a fix ...
QMUL dummy, dummy
WAITX #40
GETQX dummy
GETQY dummy
This will always have to be a generic first step when using the CORDIC the way it is. But even with this I'm still nervous. The extra flag would clear this issue completely.

I finally understand what the concern is:

A CORDIC computation may be in progress when the cog is restarted.

If something was already in transit, that could pose a problem. It would only happen if the cog was restarted in hub exec mode (no lengthy load time) and it immediately did a CORDIC command and then went into a GETQX/GETQY, which could catch those results that were in transit from when the cog was running before. That seems like an extreme long shot, but it does need to be covered for. A simple WAITX #36, or so, would certainly cover for that possibility, before using the CORDIC.

As soon as a CORDIC command executes, those 'ready' flags for X and Y are cleared and will not be set until results arrive. We just need to be sure that they are the results of our own CORDIC command, and not from another program that used the same cog.

Is this the matter?

evanh · 2015-10-31 01:48

[moved content to previous post]

evanh · 2015-10-31 02:39

Phone call ... a bit more thinking ...

Chip,
It's more than that. The other half of the problem is out of sync results. A fast Cog start-up is not needed to still get out of sync. That's why a CORDIC flush is needed at code start. Pondering it a bit more, out-of-sync can still happen even with the extra in-progress flag. So, the CORDIC flush would still be needed. Just it shrinks to the two GETQx instructions.

jmg · 2015-10-31 03:55

evanh wrote: »

jmg wrote: »

It also allows power-users to queue 3 Cordic calls, if they want to.

A count doesn't help here because there is only one buffer for the results. Anything beyond a single result is trashed. All that's needed to make things clean is the in-progress and done pair of flags.

A count does not create a buffer but it does track what is possible - as you also said above "Three does just fit for a few clocks ... so a maximally pumped CORDIC will alternate between two and three."

It is both that 'alternate between three and two' case, as well as underflow, that a Counter can manage.

You can always think of a counter as flags, if you like

jmg · 2015-10-31 04:00

cgracey wrote: »

Is this the matter?

Not just that.
How would you debug ozpropdev's code example above ?
What if you added a save-to-hub line to assist debug, for example ?

The common expectation is that would merely slow things down, not kill it stone dead.

evanh · 2015-10-31 05:15

jmg wrote: »

You can always think of a counter as flags, if you like

If there is three commands issued to the CORDIC but the program fails to pick up any results in a timely manner then only the first result gets held in the buffer. The following two results just vanish as they come off the pipeline. The only thing left to track is that first result.

evanh · 2015-10-31 05:53

Ozprop's earlier example - http://forums.parallax.com/discussion/comment/1351931/#Comment_1351931

jmg wrote: »

How would you debug ozpropdev's code example above ?
What if you added a save-to-hub line to assist debug, for example ?
The common expectation is that would merely slow things down, not kill it stone dead.

This will happen all right. Ozprop's example could be considered as manual simulation of what can happen with a debug interruption. He doesn't demo a failed condition, which would have required two commands in the CORDIC pipeline, but it's not too hard to extrapolate to that happening.

Data corruption can be expected but the hardware just immediately hanging is not quite on. I presume even the debug functionality will be locked out at that point.

jmg · 2015-10-31 06:03

evanh wrote: »

He doesn't demo a failed condition,...

? The failure condition is pretty clear from his comments in the code - maybe you missed those ?

if "waitx >34 never gets here"
if "waitx >18 never gets here"

Code failure example I am referring to is here :
http://forums.parallax.com/discussion/comment/1351783/#Comment_1351783

evanh · 2015-10-31 06:16

Oh, of course, that even earlier example is even plainer isn't it. Unexpected delays blow up, end of story.

Cordic pipeline test

Comments