Cordic pipeline test

ozpropdev · 2015-10-24 13:37

Hi All
I've just been looking at the cordic again and tried a simple test.
It appears that the cordic can only deal with one operation per cog at a time.
My code is split into two test.

		qmul	m1,m2			'first operation
		getqx	m3
		getqy	m4
		qmul	m1a,m2a			'second operation
		getqx	m3a
		getqy	m4a

and then

		qmul	m1,m2			'first operation
		waitx 	wclk		
		qmul	m1a,m2a			'second operation
		getqx	m3
		getqy	m4
		getqx	m3a
		getqy	m4a

Notice hat the second variant uses waitx to pause a bit before sending another operation.
Here's the results

 Cordic pipeline test - Ozpropdev 2015
 Reference values
 1st result = $31F46C22
 2nd result = $13F9AC78
 Waitx value ?#28
 Test results
 1st result = $31F46C22
 2nd result = $31F46C22
 Waitx value ?#30
 Test results
 1st result = $13F9AC78
 2nd result = $13F9AC78
 Waitx value ?

A value of 28 gives the first answer and a value of 30 gives the second answer.
Included is my full test code (Support code is a work in progress)

cgracey · 2015-10-24 15:05

Try getting rid of the WAITX. There's an opportunity every 16 clocks to start another CORDIC operation.

I will review the GETQX/GETQY behavior.

ozpropdev · 2015-10-25 02:36

Thanks Chip.
In a new test I was able to start 3 math ops on cordic and get the right results.
Works great but some care is needed to synchronize the retrieval window to get correct results.

		qmul	m1,m2			'first operation
		qmul	m1a,m2a			'second operation
		qmul	m1b,m2b			'third operation
		getqx	m3			'first result
		getqy	m4
		waitx 	#25			'10-25 works		
		getqx	m3a			'second result
		getqy	m4a
		waitx 	#25		
		getqx	m3b			'third result
		getqy	m4b

evanh · 2015-10-25 03:40

Can up to seven instructions fill between each QMUL without missing a timing slot? And presumably a similar number of instructions replace the WAITs as well?

What happens if the GETQX is late by one clock?

evanh · 2015-10-25 03:54

Ah, forget the GETQX question. I can see from reading Oz's results at least part of the answer is likely there already.

Oz, try logging GETCNT in the test with the adjustable WAIT. I think you'll find there is extra delays occurring, ie: an effective timeout due to having missed a result.

jmg · 2015-10-25 04:06

ozpropdev wrote: »

waitx #25 '10-25 works
....
waitx #25

Interesting, is that 10~25 range valid for both waits ?
Does upper and lower limits infer that too-late missed a value ?
What happens if an interrupt fired in this time ?

Electrodude · 2015-10-25 04:43

If you read the result too early, do you get the previous result or what?

Can this be changed to block instead?

ozpropdev · 2015-10-25 04:49

Here's the results for too early (9) and too late (26)

 Cordic pipeline test - Ozpropdev 2015
 Reference values
 1st result = $31F46C22
 2nd result = $13F9AC78
 3rd result = $041EB7B8
 Waitx value ?9
 Test results
 1st result = $31F46C22
 2nd result = $31F46C22
 3rd result = $13F9AC78
 Waitx value ?#26
 Test results
 1st result = $31F46C22
 2nd result = $041EB7B8
 3rd result = $041EB7B8
 Waitx value ?

evanh · 2015-10-25 04:58

Oh, don't look too good does it. I think I understand though. Try GETCNT to see if there is a timeout occurring. I bet there is when the WAIT is too large.

The way I'm understanding is the GETQX is waiting for an edge condition (event) but also has a timeout so as to prevent a hang. The timeout will need extended a little, me thinks, to eliminate the need for WAITX #10.

jmg · 2015-10-25 04:58

Interesting, if that is as-intended behaviour, maybe a WAITC (wait Cordic) is needed, which would act like a FIFO read and tolerate interrupts ?

Electrodude · 2015-10-25 05:22

So getqx and getqy return whatever answer is currently waiting for your cog to read, and it gets overwritten when it gets overwritten and doesn't get cleared when you read it?

I can't really see the point of a WAITC unless you plan to attempt hubram ops after submitting an operation and before reading the result, which seems a little crazy. Maybe it could instead be an assembler macro that just does WAITX #x for the appropriate value of x, maybe taking a relative pointer to where the corresponding Qxxx instruction is so the assembler can calculate how many ticks it should wait.

evanh · 2015-10-25 05:25

Electrodude wrote: »

So getqx and getqy return whatever answer is currently stored for your cog, and it gets overwritten when it gets overwritten?

No, the GETQx instructions do wait for the next result to drop off the CORDIC pipe. Adding a WAITC won't help.

evanh · 2015-10-25 05:27

But if a result doesn't turn up in time then you do get whatever last came out. That's my understanding.

PS: There may also be potential missed loops of Hub timing adding an extra 16 clocks. Not sure if HubRAM timings apply to the CORDIC though.

Electrodude · 2015-10-25 05:37

evanh wrote: »

Electrodude wrote: »

So getqx and getqy return whatever answer is currently stored for your cog, and it gets overwritten when it gets overwritten?

No, the GETQx instructions do wait for the next result to drop off the CORDIC pipe. Adding a WAITC won't help.

Then how did ozpropdev get GETQx to return the same thing twice by making the delay too short?

evanh · 2015-10-25 05:46

That's where the length of the hard-coded GETQx timeout is important. If it's too short then the buff of WAITX #10 helps it out. I'm speculating of course.

jmg · 2015-10-25 06:00

Electrodude wrote: »

Then how did ozpropdev get GETQx to return the same thing twice by making the delay too short?

Yes, Looks like this is not quite working as expected....

cgracey · 2015-10-25 08:53

The way it works now is like this:

When a CORDIC command is executed, the "got" flag is cleared.
When the results come back, the "got" flag is set.
Once the "got" flag is set, GETQX/GETQY release with the results.

How should this work for overlapped commands? It's a bit of a brain bender.

I think "got" needs to be cleared on GETQX/GETQY. In fact, GETQX/GETQY would each need their own "got" flag. They'd get set on results arriving and cleared on results being read by GETQX/GETQY. Is that the recipe we need? I think they'd also need to be cleared on a CORDIC command.

Is this what we need? :

X and Y "got" flags cleared on CORDIC command
X and Y "got" flags set on results arrival
X and Y "got" flags individually cleared on GETQX and GETQY

That's simple to do, anyway. We just need to nail down how the flags should behave. Would that behavior give us what we want?

jmg · 2015-10-25 09:07

I think it needs a genuine pipeline alongside the ripple-thru flags, just like a FIFO.
ie an interrupt may delay the read of queued results.
-or, it may be simpler to queue at the cordic front end, if the true fifo like operation needs too much extra logic.
It does need to avoid timing changes changing the nett results.

cgracey · 2015-10-25 09:27

jmg wrote: »

I think it needs a genuine pipeline alongside the ripple-thru flags, just like a FIFO.
ie an interrupt may delay the read of queued results.
-or, it may be simpler to queue at the cordic front end, if the true fifo like operation needs too much extra logic.
It does need to avoid timing changes changing the nett results.

If you're not overlapping commands, it should work fine, as is.

I just compiled with the new rules I listed above and it seems to work as expected. You get to do GETQX/GETQY only once after a CORDIC command now. This should allow you to overlap commands without any WAITX's. The next FPGA release will have this behavior.

evanh · 2015-10-25 09:38

jmg wrote: »

I think it needs a genuine pipeline alongside the ripple-thru flags, just like a FIFO.
ie an interrupt may delay the read of queued results.
-or, it may be simpler to queue at the cordic front end, if the true fifo like operation needs too much extra logic.
It does need to avoid timing changes changing the nett results.

Lol, here comes the scourge that is interrupts - and introduced after the CORDIC too. The pipeline is already there but yeah, an additional FIFO for each Cog's results maybe the only reliable solution.

And this isn't even directly addressing the current issue.

cgracey · 2015-10-25 09:42

evanh wrote: »

jmg wrote: »

I think it needs a genuine pipeline alongside the ripple-thru flags, just like a FIFO.
ie an interrupt may delay the read of queued results.
-or, it may be simpler to queue at the cordic front end, if the true fifo like operation needs too much extra logic.
It does need to avoid timing changes changing the nett results.

Lol, here comes the scourge that is interrupts - and introduced after the CORDIC too. The pipeline is already there but yeah, an additional FIFO for each Cog's results maybe the only reliable solution.

And this isn't even directly addressing the current issue.

I'm not seeing why a FIFO is needed.

You give a command, and when it's done you can read the results. How would interrupts destroy this? Are you assuming the interrupt code is going to use the CORDIC, too? That would not work if mainline code was already using it.

evanh · 2015-10-25 09:43

cgracey wrote: »

If you're not overlapping commands, it should work fine, as is.

Without bold print rules, overlapping CORDIC commands and interrupts will be mixed together.

cgracey · 2015-10-25 09:51

evanh wrote: »

cgracey wrote: »

If you're not overlapping commands, it should work fine, as is.

Without bold print rules, overlapping CORDIC commands and interrupts will be mixed together.

Ah, I see. You won't be able to get your results out in time before they get overwritten. Then, when you go to get the second results, you lock up because there's no new data. Better not do overlapping CORDIC commands if you've got interrupts occurring. That's not a hardship, really. People just have to be told. Putting a FIFO in there to deal with that would be extreme overkill. Wait... it would only have to have one level to overcome the problem. Maybe later we'll deal with this, if it seems necessary.

jmg · 2015-10-25 18:33

cgracey wrote: »

Ah, I see. You won't be able to get your results out in time before they get overwritten. Then, when you go to get the second results, you lock up because there's no new data. Better not do overlapping CORDIC commands if you've got interrupts occurring. That's not a hardship, really. People just have to be told. ...

Most important is not having read-delays affect the result delivery, which I think you have fixed.
(is the special case of same-clock read and next-arrive handled ok ?)

Not mixing queued CORDIC and interrupts is tolerable, as that is a common rule already - most MCU floating point libraries are not reentrant.

evanh · 2015-10-25 21:33

I wouldn't put this in the same category for one reason: The whole Cog becomes banned, not just the ISR.

That said, using IRQ blocking is a workaround, and probably the one that'll be most used.

evanh · 2015-10-25 21:40

On the current problem though, I'm a bit uneasy that there is potential for total lock-up on a GETQx instruction. If I'm reading Chip right then an incorrectly placed GETQx can wait forever.

jmg · 2015-10-25 21:44

evanh wrote: »

On the current problem though, I'm a bit uneasy that there is potential for total lock-up on a GETQx instruction. If I'm reading Chip right then an incorrectly placed GETQx can wait forever.

Not on current code, given ozpropdev's examples.
There, a same-as-before result occurs.

I'm not sure what Chip's upcoming changes will do to the examples.

evanh · 2015-10-25 21:46

I have a separate question too: What is the timings for Cog access, namely the GETQx instructions, to the CORDIC?

evanh · 2015-10-25 21:55

And does GETQX or GETQY reset the result flag? Does it matter what the instruction order is? What if the next result arrives when only half the result has been read?

jmg · 2015-10-25 23:19

evanh wrote: »

And does GETQX or GETQY reset the result flag? Does it matter what the instruction order is? What if the next result arrives when only half the result has been read?

I think Chip has now expanded to a flag each ?
From above:
"In fact, GETQX/GETQY would each need their own "got" flag." ...

"I just compiled with the new rules I listed above and it seems to work as expected. You get to do GETQX/GETQY only once after a CORDIC command now. This should allow you to overlap commands without any WAITX's. The next FPGA release will have this behavior. "

I think the rare but possible case of new-result arrives with read-previous on the same clock edge, may need special care.

cgracey · 2015-10-26 13:48

I've got the read/new result overlap case covered. I expect to have new FPGA files out today. Just need to compile for all 4 boards.

Cordic pipeline test

Comments