P2 Tricks, Traps & Differences between P1 (general discussion)

cgracey · 2018-10-11 09:29

evanh wrote: »

Of those 54 stages, how many are recursive in nature?

None. I made it as short as I could. There used to be only 38 stages, but there was no time to do the K-factor compensation within the normal stages. So, I had to make 16 discrete stages, littered among the iteration stages, just to do subtractions to keep the scale at 1.00000000.

The great thing about those periodic subtraction stages is that they keep overflow totally in check. You can rotate ($7FFF_FFFF,$0000_0000) by $8000_0000 and get ($8000_0001,$0000_0000). In most CORDIC implementations, the K-factor compensation is done at the end of the computation and you need a few guard MSBs to contain the result, then an over-sized multiplier to scale the result down. Much easier to tap it down here and there, along the way, so that it comes out perfect at the end.

evanh · 2018-10-11 09:46

In nature, as in they can be rolled back up if dedicated to one command only.

EDIT: You kind of have already answered previously by saying it could be done with just four two barrel shifters.

cgracey · 2018-10-11 09:52

evanh wrote: »

In nature, as in they can be rolled back up if dedicated to one command only.

EDIT: You kind of have already answered previously by saying it could be done with just four two barrel shifters.

Okay. I see what you are asking about now. Thirty-two of those pipelined stages are iterative and would otherwise have been implemented by two barrel shifters and two adders.

cgracey · 2018-10-11 10:00

Here is the pipeline order:

1 - magnitude determination of inputs
2 - initial x,y,z shift
3..52 - 32 iteration stages punctuated by 16 subtraction stages and 2 extra hyperbolic stages
53 - post-iteration shift and round
54 - final x,y selection/adding

ozpropdev · 2018-10-11 10:21

John Abshier · 2018-10-11 13:41

I am not sure I understand this. It appears that if you use CORDIC, you cannot use interrupts. That means that one OBEX can break another? If so, P2 tools need a built in lint.

John Abshier

cgracey · 2018-10-11 13:54

John Abshier wrote: »

I am not sure I understand this. It appears that if you use CORDIC, you cannot use interrupts. That means that one OBEX can break another? If so, P2 tools need a built in lint.

John Abshier

John, you can use interrupts with CORDIC. You just can't interleave CORDIC operations to get really high throughput. And this all happens within one COG. It wouldn't impact OBEX programs.

evanh · 2018-10-11 14:00

Chip,
From that info I've built the following cordic execution map for a single cog, of a 16-cog prop2, feeding the cordic at full speed. Let me know if anything is in the wrong place.

 1   x,y,z mag		49   CORDIC		33   sub		17   sub
 2   x,y,z sft		50   CORDIC		34   CORDIC		18   CORDIC
 3   CORDIC		51   sub		35   CORDIC		19   CORDIC
 4   CORDIC		52   hyperbolic		36   sub		20   sub
 5   sub		53   shift and round	37   CORDIC		21   CORDIC
 6   CORDIC		54   final x,y		38   CORDIC		22   CORDIC
 7   CORDIC		55			39   sub		23   sub
 8   sub		56			40   CORDIC		24   CORDIC
 9   CORDIC		57			41   CORDIC		25   CORDIC
10   CORDIC		58			42   sub		26   sub
11   sub		59			43   CORDIC		27   hyperbolic
12   CORDIC		60			44   CORDIC		28   CORDIC
13   CORDIC		61			45   sub		29   CORDIC
14   sub		62			46   CORDIC		30   sub
15   CORDIC		63			47   CORDIC		31   CORDIC
16   CORDIC		64			48   sub		32   CORDIC
17   sub		 1   x,y,z mag		49   CORDIC		33   sub
18   CORDIC		 2   x,y,z sft		50   CORDIC		34   CORDIC
19   CORDIC		 3   CORDIC		51   sub		35   CORDIC
20   sub		 4   CORDIC		52   hyperbolic		36   sub
21   CORDIC		 5   sub		53   shift and round	37   CORDIC
22   CORDIC		 6   CORDIC		54   final x,y		38   CORDIC
23   sub		 7   CORDIC		55			39   sub	
24   CORDIC		 8   sub		56			40   CORDIC
25   CORDIC		 9   CORDIC		57			41   CORDIC
26   sub		10   CORDIC		58			42   sub	
27   hyperbolic		11   sub		59			43   CORDIC
28   CORDIC		12   CORDIC		60			44   CORDIC
29   CORDIC		13   CORDIC		61			45   sub	
30   sub		14   sub		62			46   CORDIC
31   CORDIC		15   CORDIC		63			47   CORDIC
32   CORDIC		16   CORDIC		64			48   sub	
33   sub		17   sub		 1   x,y,z mag		49   CORDIC
34   CORDIC		18   CORDIC		 2   x,y,z sft		50   CORDIC
35   CORDIC		19   CORDIC		 3   CORDIC		51   sub	
36   sub		20   sub		 4   CORDIC		52   hyperbolic
37   CORDIC		21   CORDIC		 5   sub		53   shift and round
38   CORDIC		22   CORDIC		 6   CORDIC		54   final x,y selection/adding
39   sub		23   sub		 7   CORDIC		55
40   CORDIC		24   CORDIC		 8   sub		56
41   CORDIC		25   CORDIC		 9   CORDIC		57
42   sub		26   sub		10   CORDIC		58
43   CORDIC		27   hyperbolic		11   sub		59
44   CORDIC		28   CORDIC		12   CORDIC		60
45   sub		29   CORDIC		13   CORDIC		61
46   CORDIC		30   sub		14   sub		62
47   CORDIC		31   CORDIC		15   CORDIC		63
48   sub		32   CORDIC		16   CORDIC		64
49   CORDIC		33   sub		17   sub		 1   magnitude determination of inputs
50   CORDIC		34   CORDIC		18   CORDIC		 2   initial x,y,z shift
51   sub		35   CORDIC		19   CORDIC		 3   CORDIC
52   hyperbolic		36   sub		20   sub		 4   CORDIC
53   shift and round	37   CORDIC		21   CORDIC		 5   sub
54   final x,y		38   CORDIC		22   CORDIC		 6   CORDIC
55			39   sub		23   sub		 7   CORDIC
56			40   CORDIC		24   CORDIC		 8   sub
57			41   CORDIC		25   CORDIC		 9   CORDIC
58			42   sub		26   sub		10   CORDIC
59			43   CORDIC		27   hyperbolic		11   sub
60			44   CORDIC		28   CORDIC		12   CORDIC
61			45   sub		29   CORDIC		13   CORDIC
62			46   CORDIC		30   sub		14   sub
63			47   CORDIC		31   CORDIC		15   CORDIC
64			48   sub		32   CORDIC		16   CORDIC

jmg · 2018-10-11 16:15

cgracey wrote: »

If you want CORDIC throughput, batch up your operations in special timed code. Once the first CORDIC command executes, your timing will be locked in. No getting off that crazy train. Once you are on, you are committed. No interruptions allowed. You will always come out the other end safely, with all your results. It is GLORIOUS!!!!

If no interrupts are allowed, then cordic should do that disable INT in HW, however that is the exact inverse of how users expect interrupts to work (and indeed why they are named interrupts!).

Can the cordic instead be paused for that cog, if an interrupt does occur ?
That’s the more expected operation.

cgracey · 2018-10-11 16:34

If you do a single CORDIC instruction and then get the results, interrupts are fine. If you want high throughput by interleaving CORDIC operations, then interrupts are not fine.

Rayman · 2018-10-11 16:38

Well, you could have the ISR routine set a flag that tells code that cordic results may be invalid and make it do it again, right?

potatohead · 2018-10-11 16:43

Jmg, if we pause the thing it's deterministic nature will be impacted, which will affect any other COGS using it. It's up to each user of the cordic to meet the timing.

The way It is right now, is any Cog can do whatever it wants, and not affect any other cog.

I think people are getting hung up on a couple of things:

Interrupts are not Global to the P2, only the Cog in which they happen. This means programs in the object exchange will not break one another, because they're all running in different cogs.

The other thing that was done, which is different from P1, is we definitely put facilities in for non-deterministic programs.

So people got to choose on this. And the reward for making that choice, is a lot of Fast Math. It's a killer feature.

They either meet timing, write their programs in a way that does that, or they're interrupt driven, and they write their programs in a way that deals with that.

There's no protecting anyone on this, without a big logic cost, or breaking the symmetry of this thing and with that limiting its throughput.

The CORDIC is super simple, input arguments, hit the timing to get results. That's it. People just have to do that. And they only really have to do that, if they're doing a whole lot of math. And it needs to be fast.

jmg · 2018-10-11 16:52

Rayman wrote: »

Well, you could have the ISR routine set a flag that tells code that cordic results may be invalid and make it do it again, right?

I think there is also an underflow event that was mentioned, set when this corruption/loss occurs?
(Reading a non existing answer)
That could trigger a re-do, and alert the user they have an issue?

evanh · 2018-10-11 18:15

cgracey wrote: »

Here is the pipeline order:

1 - magnitude determination of inputs
2 - initial x,y,z shift
3..52 - 32 iteration stages punctuated by 16 subtraction stages and 2 extra hyperbolic stages
53 - post-iteration shift and round
54 - final x,y selection/adding

I can see now that what I wanted to do as a partial pipeline doesn't pack well. It would still need the fully unrolled cordic for larger cog counts ... and the resource saving probably wouldn't be as good as I had hoped.

I'm done with this.

Electrodude · 2018-10-11 18:22

What if you added a mode in which it automatically drops CORDIC results in ascending LUTRAM addresses? You would submit CORDIC commands as fast as you could, and they'd show up in LUTRAM eventually. This would be convenient for FFTs: you could do the smaller sub-transforms out of LUTRAM and have the results automagically show up in the right places in LUTRAM. If an interrupt happened while you were submitting CORDIC commands, the results would still go to the right place. The CORDIC underflow event would let you know when they were all done.

EDIT: The write would happen at the same time when a write from the neighboring cog would take place - one would win if they both tried to write at the same time. For simplicity, I guess you'd have it so getqx and getqy would still do the right things, although I can't imagine why you'd want to do both.

evanh · 2018-10-11 18:34

I like it. Good from cog view. Not sure how easy it will be for cordic to reach into every cog like that though. Currently the cogs are all reaching out.

evanh · 2018-10-11 18:39

Needs config to limit the fill range. Effectively 16 DMA channels.

AJL · 2018-10-11 21:45

Perhaps for the next design.

Sounds like too big a change for the current design, and the 'knobs' that Chip has detailed for scaling of the current design do not include this type of change either.

Electrodude · 2018-10-11 22:05

It doesn't have to change anything that's already there; getq[xy] would work just as they do now; there would just be a second way to get the results that is activated when you run an instruction to set it up with a start pointer. If it's added and it does cause any problems, it can be ignored until the next design. The only problem it could cause that would break current functionality is if the muxing of the LUT write port is buggy.

cgracey · 2018-10-12 00:57

I agree that the only way to automate high CORDIC throughput would be to have it write directly into the LUT. That's probably more change than we have safe margin for, at this point.

I was pleased to confirm yesterday that in an 8-cog setup, any mix of CORDIC commands can be initiated, overlapped, and trailing results received at a pace of 8 clocks per function. That's pretty decent and not hard to manage. You just need to get the concept clear in your head.

cgracey · 2018-10-12 01:03

evanh wrote: »

Chip,
From that info I've built the following cordic execution map for a single cog, of a 16-cog prop2, feeding the cordic at full speed. Let me know if anything is in the wrong place.

 1   x,y,z mag		49   CORDIC		33   sub		17   sub
 2   x,y,z sft		50   CORDIC		34   CORDIC		18   CORDIC
 3   CORDIC		51   sub		35   CORDIC		19   CORDIC
 4   CORDIC		52   hyperbolic		36   sub		20   sub
 5   sub		53   shift and round	37   CORDIC		21   CORDIC
 6   CORDIC		54   final x,y		38   CORDIC		22   CORDIC
 7   CORDIC		55			39   sub		23   sub
 8   sub		56			40   CORDIC		24   CORDIC
 9   CORDIC		57			41   CORDIC		25   CORDIC
10   CORDIC		58			42   sub		26   sub
11   sub		59			43   CORDIC		27   hyperbolic
12   CORDIC		60			44   CORDIC		28   CORDIC
13   CORDIC		61			45   sub		29   CORDIC
14   sub		62			46   CORDIC		30   sub
15   CORDIC		63			47   CORDIC		31   CORDIC
16   CORDIC		64			48   sub		32   CORDIC
17   sub		 1   x,y,z mag		49   CORDIC		33   sub
18   CORDIC		 2   x,y,z sft		50   CORDIC		34   CORDIC
19   CORDIC		 3   CORDIC		51   sub		35   CORDIC
20   sub		 4   CORDIC		52   hyperbolic		36   sub
21   CORDIC		 5   sub		53   shift and round	37   CORDIC
22   CORDIC		 6   CORDIC		54   final x,y		38   CORDIC
23   sub		 7   CORDIC		55			39   sub	
24   CORDIC		 8   sub		56			40   CORDIC
25   CORDIC		 9   CORDIC		57			41   CORDIC
26   sub		10   CORDIC		58			42   sub	
27   hyperbolic		11   sub		59			43   CORDIC
28   CORDIC		12   CORDIC		60			44   CORDIC
29   CORDIC		13   CORDIC		61			45   sub	
30   sub		14   sub		62			46   CORDIC
31   CORDIC		15   CORDIC		63			47   CORDIC
32   CORDIC		16   CORDIC		64			48   sub	
33   sub		17   sub		 1   x,y,z mag		49   CORDIC
34   CORDIC		18   CORDIC		 2   x,y,z sft		50   CORDIC
35   CORDIC		19   CORDIC		 3   CORDIC		51   sub	
36   sub		20   sub		 4   CORDIC		52   hyperbolic
37   CORDIC		21   CORDIC		 5   sub		53   shift and round
38   CORDIC		22   CORDIC		 6   CORDIC		54   final x,y selection/adding
39   sub		23   sub		 7   CORDIC		55
40   CORDIC		24   CORDIC		 8   sub		56
41   CORDIC		25   CORDIC		 9   CORDIC		57
42   sub		26   sub		10   CORDIC		58
43   CORDIC		27   hyperbolic		11   sub		59
44   CORDIC		28   CORDIC		12   CORDIC		60
45   sub		29   CORDIC		13   CORDIC		61
46   CORDIC		30   sub		14   sub		62
47   CORDIC		31   CORDIC		15   CORDIC		63
48   sub		32   CORDIC		16   CORDIC		64
49   CORDIC		33   sub		17   sub		 1   magnitude determination of inputs
50   CORDIC		34   CORDIC		18   CORDIC		 2   initial x,y,z shift
51   sub		35   CORDIC		19   CORDIC		 3   CORDIC
52   hyperbolic		36   sub		20   sub		 4   CORDIC
53   shift and round	37   CORDIC		21   CORDIC		 5   sub
54   final x,y		38   CORDIC		22   CORDIC		 6   CORDIC
55			39   sub		23   sub		 7   CORDIC
56			40   CORDIC		24   CORDIC		 8   sub
57			41   CORDIC		25   CORDIC		 9   CORDIC
58			42   sub		26   sub		10   CORDIC
59			43   CORDIC		27   hyperbolic		11   sub
60			44   CORDIC		28   CORDIC		12   CORDIC
61			45   sub		29   CORDIC		13   CORDIC
62			46   CORDIC		30   sub		14   sub
63			47   CORDIC		31   CORDIC		15   CORDIC
64			48   sub		32   CORDIC		16   CORDIC

Evanh, that looks correct, neverminding the exact order of the middle stages. You could treat all 54 stages as black boxes for the purpose of helping a programmer understand.

jmg · 2018-10-12 01:28

evanh wrote: »

EDIT: But there is an event (QMT) for last GETQx got nothing. This'll probably trigger if attempting to re-retrieve the final result.

Is using that event trap going to be a reliable 'lost cordic value' flag ? The 'probably' sounds a little unsure ?

cgracey · 2018-10-12 02:34

jmg wrote: »

evanh wrote: »

EDIT: But there is an event (QMT) for last GETQx got nothing. This'll probably trigger if attempting to re-retrieve the final result.

Is using that event trap going to be a reliable 'lost cordic value' flag ? The 'probably' sounds a little unsure ?

Maybe the event should capture both:

a) Result overwritten with new result because GETX/GETY didn't execute in time.

b) GETX/GETY executed without prior CORDIC instruction.

AJL · 2018-10-12 03:36

cgracey wrote: »

jmg wrote: »

evanh wrote: »

EDIT: But there is an event (QMT) for last GETQx got nothing. This'll probably trigger if attempting to re-retrieve the final result.

Is using that event trap going to be a reliable 'lost cordic value' flag ? The 'probably' sounds a little unsure ?

Maybe the event should capture both:

a) Result overwritten with new result because GETX/GETY didn't execute in time.

b) GETX/GETY executed without prior CORDIC instruction.

So the event flag would mean "CORDIC result not valid"?

evanh · 2018-10-12 04:56

jmg wrote: »

evanh wrote: »

EDIT: But there is an event (QMT) for last GETQx got nothing. This'll probably trigger if attempting to re-retrieve the final result.

Is using that event trap going to be a reliable 'lost cordic value' flag ? The 'probably' sounds a little unsure ?

That "probably", wasn't about lost data. I was unsure of the exact condition that could trigger a QMT event at all. The thing is, a result that was produced a million clock prior will still be there to be collected.

What must happen is GETQx must flag it has done the collection - buffer becomes empty. Attempting another result fetch will either wait for an upcoming result or, if no more data to come then don't wait but, trigger the QMT event.

QX and QY will each have an empty flag. Either can trigger the QMT event upon GETQx while empty and inactive.

So Chip is now asking us if we want another condition combined into the same QMT event, again both QX and QY can trigger it. It detects new result arriving at the result buffer while the buffer is not empty, ie: prior result overwritten.

PS: A small detail: The buffer empty flags are forced set whenever a solitary command is issued, ie: the first command of a batch.

evanh · 2018-10-12 05:21

cgracey wrote: »

Evanh, that looks correct, neverminding the exact order of the middle stages.

It was the exact order I was interested in. The alignment I already understood.

I've worked out enough now to be sure it would need two designs, depending on cog count. So have thrown in the towel.

evanh · 2018-10-12 06:51

AJL wrote: »

cgracey wrote: »

...
b) GETX/GETY executed without prior CORDIC instruction.

So the event flag would mean "CORDIC result not valid"?

I've just checked it: "b)" means GETQx has returned immediately with the same result as before and there's nothing new to come.

Example:

		qdiv    length, #10
		getqx   shortlen
		getqx   shortlen

shortlen will be correctly length/10. But that will also trigger a QMT event.

evanh · 2018-10-13 05:45

cgracey wrote: »

Maybe the event should capture both:

a) Result overwritten with new result because GETX/GETY didn't execute in time.

b) GETX/GETY executed without prior CORDIC instruction.

Here's an example of using the QMT event as it is right now, (b) only:

emptycordic
		pollqmt               'clear any incidental QMT event
.qmtl
		getqx   temp1         'fetch next CORDIC result, QMT event occurs if emtpy
_ret_		jnqmt   #.qmtl        'loop until QMT event, auto clears

cgracey · 2018-10-13 05:53

evanh wrote: »
cgracey wrote: »

Maybe the event should capture both:

a) Result overwritten with new result because GETX/GETY didn't execute in time.

b) GETX/GETY executed without prior CORDIC instruction.

Here's an example of using the QMT event as it is right now, (b) only:
emptycordic
		pollqmt               'clear any incidental QMT event
.qmtl
		getqx   temp1         'fetch next CORDIC result, QMT event occurs if emtpy
_ret_		jnqmt   #.qmtl        'loop until QMT event, auto clears

So, do you think it would be better to trap overrun, too?

jmg · 2018-10-13 06:31

cgracey wrote: »

So, do you think it would be better to trap overrun, too?

I do not follow the depths of the Cordic queue, but yes, to me it makes sense to also have

a) Result overwritten with new result because GETX/GETY didn't execute in time.

because (I think) that gives you an earlier warning, and that makes both recovery and debug easier.

P2 Tricks, Traps &amp; Differences between P1 (general discussion)

Comments

P2 Tricks, Traps & Differences between P1 (general discussion)