P2 Tricks, Traps & Differences between P1 (general discussion)

124

Comments

  • cgraceycgracey Posts: 9,599
    edited October 11 Vote Up0Vote Down
    evanh wrote: »
    Of those 54 stages, how many are recursive in nature?

    None. I made it as short as I could. There used to be only 38 stages, but there was no time to do the K-factor compensation within the normal stages. So, I had to make 16 discrete stages, littered among the iteration stages, just to do subtractions to keep the scale at 1.00000000.

    The great thing about those periodic subtraction stages is that they keep overflow totally in check. You can rotate ($7FFF_FFFF,$0000_0000) by $8000_0000 and get ($8000_0001,$0000_0000). In most CORDIC implementations, the K-factor compensation is done at the end of the computation and you need a few guard MSBs to contain the result, then an over-sized multiplier to scale the result down. Much easier to tap it down here and there, along the way, so that it comes out perfect at the end.
  • evanhevanh Posts: 5,431
    edited October 11 Vote Up0Vote Down
    In nature, as in they can be rolled back up if dedicated to one command only.

    EDIT: You kind of have already answered previously by saying it could be done with just four two barrel shifters.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • cgraceycgracey Posts: 9,599
    edited October 11 Vote Up0Vote Down
    evanh wrote: »
    In nature, as in they can be rolled back up if dedicated to one command only.

    EDIT: You kind of have already answered previously by saying it could be done with just four two barrel shifters.

    Okay. I see what you are asking about now. Thirty-two of those pipelined stages are iterative and would otherwise have been implemented by two barrel shifters and two adders.
  • cgraceycgracey Posts: 9,599
    edited October 11 Vote Up0Vote Down
    Here is the pipeline order:

    1 - magnitude determination of inputs
    2 - initial x,y,z shift
    3..52 - 32 iteration stages punctuated by 16 subtraction stages and 2 extra hyperbolic stages
    53 - post-iteration shift and round
    54 - final x,y selection/adding
  • <deleted>

    Melbourne, Australia
  • I am not sure I understand this. It appears that if you use CORDIC, you cannot use interrupts. That means that one OBEX can break another? If so, P2 tools need a built in lint.

    John Abshier
  • I am not sure I understand this. It appears that if you use CORDIC, you cannot use interrupts. That means that one OBEX can break another? If so, P2 tools need a built in lint.

    John Abshier

    John, you can use interrupts with CORDIC. You just can't interleave CORDIC operations to get really high throughput. And this all happens within one COG. It wouldn't impact OBEX programs.
  • evanhevanh Posts: 5,431
    edited October 11 Vote Up0Vote Down
    Chip,
    From that info I've built the following cordic execution map for a single cog, of a 16-cog prop2, feeding the cordic at full speed. Let me know if anything is in the wrong place.
     1   x,y,z mag		49   CORDIC		33   sub		17   sub
     2   x,y,z sft		50   CORDIC		34   CORDIC		18   CORDIC
     3   CORDIC		51   sub		35   CORDIC		19   CORDIC
     4   CORDIC		52   hyperbolic		36   sub		20   sub
     5   sub		53   shift and round	37   CORDIC		21   CORDIC
     6   CORDIC		54   final x,y		38   CORDIC		22   CORDIC
     7   CORDIC		55			39   sub		23   sub
     8   sub		56			40   CORDIC		24   CORDIC
     9   CORDIC		57			41   CORDIC		25   CORDIC
    10   CORDIC		58			42   sub		26   sub
    11   sub		59			43   CORDIC		27   hyperbolic
    12   CORDIC		60			44   CORDIC		28   CORDIC
    13   CORDIC		61			45   sub		29   CORDIC
    14   sub		62			46   CORDIC		30   sub
    15   CORDIC		63			47   CORDIC		31   CORDIC
    16   CORDIC		64			48   sub		32   CORDIC
    17   sub		 1   x,y,z mag		49   CORDIC		33   sub
    18   CORDIC		 2   x,y,z sft		50   CORDIC		34   CORDIC
    19   CORDIC		 3   CORDIC		51   sub		35   CORDIC
    20   sub		 4   CORDIC		52   hyperbolic		36   sub
    21   CORDIC		 5   sub		53   shift and round	37   CORDIC
    22   CORDIC		 6   CORDIC		54   final x,y		38   CORDIC
    23   sub		 7   CORDIC		55			39   sub	
    24   CORDIC		 8   sub		56			40   CORDIC
    25   CORDIC		 9   CORDIC		57			41   CORDIC
    26   sub		10   CORDIC		58			42   sub	
    27   hyperbolic		11   sub		59			43   CORDIC
    28   CORDIC		12   CORDIC		60			44   CORDIC
    29   CORDIC		13   CORDIC		61			45   sub	
    30   sub		14   sub		62			46   CORDIC
    31   CORDIC		15   CORDIC		63			47   CORDIC
    32   CORDIC		16   CORDIC		64			48   sub	
    33   sub		17   sub		 1   x,y,z mag		49   CORDIC
    34   CORDIC		18   CORDIC		 2   x,y,z sft		50   CORDIC
    35   CORDIC		19   CORDIC		 3   CORDIC		51   sub	
    36   sub		20   sub		 4   CORDIC		52   hyperbolic
    37   CORDIC		21   CORDIC		 5   sub		53   shift and round
    38   CORDIC		22   CORDIC		 6   CORDIC		54   final x,y selection/adding
    39   sub		23   sub		 7   CORDIC		55
    40   CORDIC		24   CORDIC		 8   sub		56
    41   CORDIC		25   CORDIC		 9   CORDIC		57
    42   sub		26   sub		10   CORDIC		58
    43   CORDIC		27   hyperbolic		11   sub		59
    44   CORDIC		28   CORDIC		12   CORDIC		60
    45   sub		29   CORDIC		13   CORDIC		61
    46   CORDIC		30   sub		14   sub		62
    47   CORDIC		31   CORDIC		15   CORDIC		63
    48   sub		32   CORDIC		16   CORDIC		64
    49   CORDIC		33   sub		17   sub		 1   magnitude determination of inputs
    50   CORDIC		34   CORDIC		18   CORDIC		 2   initial x,y,z shift
    51   sub		35   CORDIC		19   CORDIC		 3   CORDIC
    52   hyperbolic		36   sub		20   sub		 4   CORDIC
    53   shift and round	37   CORDIC		21   CORDIC		 5   sub
    54   final x,y		38   CORDIC		22   CORDIC		 6   CORDIC
    55			39   sub		23   sub		 7   CORDIC
    56			40   CORDIC		24   CORDIC		 8   sub
    57			41   CORDIC		25   CORDIC		 9   CORDIC
    58			42   sub		26   sub		10   CORDIC
    59			43   CORDIC		27   hyperbolic		11   sub
    60			44   CORDIC		28   CORDIC		12   CORDIC
    61			45   sub		29   CORDIC		13   CORDIC
    62			46   CORDIC		30   sub		14   sub
    63			47   CORDIC		31   CORDIC		15   CORDIC
    64			48   sub		32   CORDIC		16   CORDIC
    
    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • cgracey wrote: »
    If you want CORDIC throughput, batch up your operations in special timed code. Once the first CORDIC command executes, your timing will be locked in. No getting off that crazy train. Once you are on, you are committed. No interruptions allowed. You will always come out the other end safely, with all your results. It is GLORIOUS!!!!
    If no interrupts are allowed, then cordic should do that disable INT in HW, however that is the exact inverse of how users expect interrupts to work (and indeed why they are named interrupts!).

    Can the cordic instead be paused for that cog, if an interrupt does occur ?
    That’s the more expected operation.
  • If you do a single CORDIC instruction and then get the results, interrupts are fine. If you want high throughput by interleaving CORDIC operations, then interrupts are not fine.
  • Well, you could have the ISR routine set a flag that tells code that cordic results may be invalid and make it do it again, right?
    Prop Info and Apps: http://www.rayslogic.com/
  • potatoheadpotatohead Posts: 9,368
    edited October 11 Vote Up0Vote Down
    Jmg, if we pause the thing it's deterministic nature will be impacted, which will affect any other COGS using it. It's up to each user of the cordic to meet the timing.

    The way It is right now, is any Cog can do whatever it wants, and not affect any other cog.

    I think people are getting hung up on a couple of things:

    Interrupts are not Global to the P2, only the Cog in which they happen. This means programs in the object exchange will not break one another, because they're all running in different cogs.

    The other thing that was done, which is different from P1, is we definitely put facilities in for non-deterministic programs.

    So people got to choose on this. And the reward for making that choice, is a lot of Fast Math. It's a killer feature.

    They either meet timing, write their programs in a way that does that, or they're interrupt driven, and they write their programs in a way that deals with that.

    There's no protecting anyone on this, without a big logic cost, or breaking the symmetry of this thing and with that limiting its throughput.

    The CORDIC is super simple, input arguments, hit the timing to get results. That's it. People just have to do that. And they only really have to do that, if they're doing a whole lot of math. And it needs to be fast.



    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: http://forums.parallax.com/showthread.php?123709-Commented-Graphics_Demo.spin<br>
  • Rayman wrote: »
    Well, you could have the ISR routine set a flag that tells code that cordic results may be invalid and make it do it again, right?
    I think there is also an underflow event that was mentioned, set when this corruption/loss occurs?
    (Reading a non existing answer)
    That could trigger a re-do, and alert the user they have an issue?

  • cgracey wrote: »
    Here is the pipeline order:

    1 - magnitude determination of inputs
    2 - initial x,y,z shift
    3..52 - 32 iteration stages punctuated by 16 subtraction stages and 2 extra hyperbolic stages
    53 - post-iteration shift and round
    54 - final x,y selection/adding

    I can see now that what I wanted to do as a partial pipeline doesn't pack well. It would still need the fully unrolled cordic for larger cog counts ... and the resource saving probably wouldn't be as good as I had hoped.

    I'm done with this.
    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • ElectrodudeElectrodude Posts: 1,160
    edited October 11 Vote Up0Vote Down
    What if you added a mode in which it automatically drops CORDIC results in ascending LUTRAM addresses? You would submit CORDIC commands as fast as you could, and they'd show up in LUTRAM eventually. This would be convenient for FFTs: you could do the smaller sub-transforms out of LUTRAM and have the results automagically show up in the right places in LUTRAM. If an interrupt happened while you were submitting CORDIC commands, the results would still go to the right place. The CORDIC underflow event would let you know when they were all done.

    EDIT: The write would happen at the same time when a write from the neighboring cog would take place - one would win if they both tried to write at the same time. For simplicity, I guess you'd have it so getqx and getqy would still do the right things, although I can't imagine why you'd want to do both.
  • I like it. Good from cog view. Not sure how easy it will be for cordic to reach into every cog like that though. Currently the cogs are all reaching out.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • Needs config to limit the fill range. Effectively 16 DMA channels.
    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • AJLAJL Posts: 63
    edited October 11 Vote Up0Vote Down
    Perhaps for the next design.

    Sounds like too big a change for the current design, and the 'knobs' that Chip has detailed for scaling of the current design do not include this type of change either.
  • It doesn't have to change anything that's already there; getq[xy] would work just as they do now; there would just be a second way to get the results that is activated when you run an instruction to set it up with a start pointer. If it's added and it does cause any problems, it can be ignored until the next design. The only problem it could cause that would break current functionality is if the muxing of the LUT write port is buggy.
  • I agree that the only way to automate high CORDIC throughput would be to have it write directly into the LUT. That's probably more change than we have safe margin for, at this point.

    I was pleased to confirm yesterday that in an 8-cog setup, any mix of CORDIC commands can be initiated, overlapped, and trailing results received at a pace of 8 clocks per function. That's pretty decent and not hard to manage. You just need to get the concept clear in your head.
  • cgraceycgracey Posts: 9,599
    edited October 12 Vote Up0Vote Down
    evanh wrote: »
    Chip,
    From that info I've built the following cordic execution map for a single cog, of a 16-cog prop2, feeding the cordic at full speed. Let me know if anything is in the wrong place.
     1   x,y,z mag		49   CORDIC		33   sub		17   sub
     2   x,y,z sft		50   CORDIC		34   CORDIC		18   CORDIC
     3   CORDIC		51   sub		35   CORDIC		19   CORDIC
     4   CORDIC		52   hyperbolic		36   sub		20   sub
     5   sub		53   shift and round	37   CORDIC		21   CORDIC
     6   CORDIC		54   final x,y		38   CORDIC		22   CORDIC
     7   CORDIC		55			39   sub		23   sub
     8   sub		56			40   CORDIC		24   CORDIC
     9   CORDIC		57			41   CORDIC		25   CORDIC
    10   CORDIC		58			42   sub		26   sub
    11   sub		59			43   CORDIC		27   hyperbolic
    12   CORDIC		60			44   CORDIC		28   CORDIC
    13   CORDIC		61			45   sub		29   CORDIC
    14   sub		62			46   CORDIC		30   sub
    15   CORDIC		63			47   CORDIC		31   CORDIC
    16   CORDIC		64			48   sub		32   CORDIC
    17   sub		 1   x,y,z mag		49   CORDIC		33   sub
    18   CORDIC		 2   x,y,z sft		50   CORDIC		34   CORDIC
    19   CORDIC		 3   CORDIC		51   sub		35   CORDIC
    20   sub		 4   CORDIC		52   hyperbolic		36   sub
    21   CORDIC		 5   sub		53   shift and round	37   CORDIC
    22   CORDIC		 6   CORDIC		54   final x,y		38   CORDIC
    23   sub		 7   CORDIC		55			39   sub	
    24   CORDIC		 8   sub		56			40   CORDIC
    25   CORDIC		 9   CORDIC		57			41   CORDIC
    26   sub		10   CORDIC		58			42   sub	
    27   hyperbolic		11   sub		59			43   CORDIC
    28   CORDIC		12   CORDIC		60			44   CORDIC
    29   CORDIC		13   CORDIC		61			45   sub	
    30   sub		14   sub		62			46   CORDIC
    31   CORDIC		15   CORDIC		63			47   CORDIC
    32   CORDIC		16   CORDIC		64			48   sub	
    33   sub		17   sub		 1   x,y,z mag		49   CORDIC
    34   CORDIC		18   CORDIC		 2   x,y,z sft		50   CORDIC
    35   CORDIC		19   CORDIC		 3   CORDIC		51   sub	
    36   sub		20   sub		 4   CORDIC		52   hyperbolic
    37   CORDIC		21   CORDIC		 5   sub		53   shift and round
    38   CORDIC		22   CORDIC		 6   CORDIC		54   final x,y selection/adding
    39   sub		23   sub		 7   CORDIC		55
    40   CORDIC		24   CORDIC		 8   sub		56
    41   CORDIC		25   CORDIC		 9   CORDIC		57
    42   sub		26   sub		10   CORDIC		58
    43   CORDIC		27   hyperbolic		11   sub		59
    44   CORDIC		28   CORDIC		12   CORDIC		60
    45   sub		29   CORDIC		13   CORDIC		61
    46   CORDIC		30   sub		14   sub		62
    47   CORDIC		31   CORDIC		15   CORDIC		63
    48   sub		32   CORDIC		16   CORDIC		64
    49   CORDIC		33   sub		17   sub		 1   magnitude determination of inputs
    50   CORDIC		34   CORDIC		18   CORDIC		 2   initial x,y,z shift
    51   sub		35   CORDIC		19   CORDIC		 3   CORDIC
    52   hyperbolic		36   sub		20   sub		 4   CORDIC
    53   shift and round	37   CORDIC		21   CORDIC		 5   sub
    54   final x,y		38   CORDIC		22   CORDIC		 6   CORDIC
    55			39   sub		23   sub		 7   CORDIC
    56			40   CORDIC		24   CORDIC		 8   sub
    57			41   CORDIC		25   CORDIC		 9   CORDIC
    58			42   sub		26   sub		10   CORDIC
    59			43   CORDIC		27   hyperbolic		11   sub
    60			44   CORDIC		28   CORDIC		12   CORDIC
    61			45   sub		29   CORDIC		13   CORDIC
    62			46   CORDIC		30   sub		14   sub
    63			47   CORDIC		31   CORDIC		15   CORDIC
    64			48   sub		32   CORDIC		16   CORDIC
    

    Evanh, that looks correct, neverminding the exact order of the middle stages. You could treat all 54 stages as black boxes for the purpose of helping a programmer understand.
  • evanh wrote: »
    EDIT: But there is an event (QMT) for last GETQx got nothing. This'll probably trigger if attempting to re-retrieve the final result.

    Is using that event trap going to be a reliable 'lost cordic value' flag ? The 'probably' sounds a little unsure ?

  • cgraceycgracey Posts: 9,599
    edited October 12 Vote Up0Vote Down
    jmg wrote: »
    evanh wrote: »
    EDIT: But there is an event (QMT) for last GETQx got nothing. This'll probably trigger if attempting to re-retrieve the final result.

    Is using that event trap going to be a reliable 'lost cordic value' flag ? The 'probably' sounds a little unsure ?

    Maybe the event should capture both:

    a) Result overwritten with new result because GETX/GETY didn't execute in time.

    b) GETX/GETY executed without prior CORDIC instruction.
  • cgracey wrote: »
    jmg wrote: »
    evanh wrote: »
    EDIT: But there is an event (QMT) for last GETQx got nothing. This'll probably trigger if attempting to re-retrieve the final result.

    Is using that event trap going to be a reliable 'lost cordic value' flag ? The 'probably' sounds a little unsure ?

    Maybe the event should capture both:

    a) Result overwritten with new result because GETX/GETY didn't execute in time.

    b) GETX/GETY executed without prior CORDIC instruction.

    So the event flag would mean "CORDIC result not valid"?
  • evanhevanh Posts: 5,431
    edited October 12 Vote Up0Vote Down
    jmg wrote: »
    evanh wrote: »
    EDIT: But there is an event (QMT) for last GETQx got nothing. This'll probably trigger if attempting to re-retrieve the final result.
    Is using that event trap going to be a reliable 'lost cordic value' flag ? The 'probably' sounds a little unsure ?
    That "probably", wasn't about lost data. I was unsure of the exact condition that could trigger a QMT event at all. The thing is, a result that was produced a million clock prior will still be there to be collected.

    What must happen is GETQx must flag it has done the collection - buffer becomes empty. Attempting another result fetch will either wait for an upcoming result or, if no more data to come then don't wait but, trigger the QMT event.

    QX and QY will each have an empty flag. Either can trigger the QMT event upon GETQx while empty and inactive.

    So Chip is now asking us if we want another condition combined into the same QMT event, again both QX and QY can trigger it. It detects new result arriving at the result buffer while the buffer is not empty, ie: prior result overwritten.


    PS: A small detail: The buffer empty flags are forced set whenever a solitary command is issued, ie: the first command of a batch.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • cgracey wrote: »
    Evanh, that looks correct, neverminding the exact order of the middle stages.
    It was the exact order I was interested in. The alignment I already understood.

    I've worked out enough now to be sure it would need two designs, depending on cog count. So have thrown in the towel.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • evanhevanh Posts: 5,431
    edited October 12 Vote Up0Vote Down
    AJL wrote: »
    cgracey wrote: »
    ...
    b) GETX/GETY executed without prior CORDIC instruction.

    So the event flag would mean "CORDIC result not valid"?

    I've just checked it: "b)" means GETQx has returned immediately with the same result as before and there's nothing new to come.

    Example:
    		qdiv    length, #10
    		getqx   shortlen
    		getqx   shortlen
    

    shortlen will be correctly length/10. But that will also trigger a QMT event.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • evanhevanh Posts: 5,431
    edited October 13 Vote Up0Vote Down
    cgracey wrote: »
    Maybe the event should capture both:

    a) Result overwritten with new result because GETX/GETY didn't execute in time.

    b) GETX/GETY executed without prior CORDIC instruction.

    Here's an example of using the QMT event as it is right now, (b) only:
    emptycordic
    		pollqmt               'clear any incidental QMT event
    .qmtl
    		getqx   temp1         'fetch next CORDIC result, QMT event occurs if emtpy
    _ret_		jnqmt   #.qmtl        'loop until QMT event, auto clears
    
    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • evanh wrote: »
    cgracey wrote: »
    Maybe the event should capture both:

    a) Result overwritten with new result because GETX/GETY didn't execute in time.

    b) GETX/GETY executed without prior CORDIC instruction.

    Here's an example of using the QMT event as it is right now, (b) only:
    emptycordic
    		pollqmt               'clear any incidental QMT event
    .qmtl
    		getqx   temp1         'fetch next CORDIC result, QMT event occurs if emtpy
    _ret_		jnqmt   #.qmtl        'loop until QMT event, auto clears
    

    So, do you think it would be better to trap overrun, too?
  • cgracey wrote: »
    So, do you think it would be better to trap overrun, too?

    I do not follow the depths of the Cordic queue, but yes, to me it makes sense to also have

    a) Result overwritten with new result because GETX/GETY didn't execute in time.

    because (I think) that gives you an earlier warning, and that makes both recovery and debug easier.
Sign In or Register to comment.