Changes and GETCT

System · 2019-07-14 07:59

This discussion was created from comments split from: Using GETCT - Was "What can your P2 do?".

hinv · 2019-07-10 15:23

cgracey wrote: »

frank freedman wrote: »

Hmmmmmm, no P2 to teach new tricks to, wonder when that can be changed. Couple bare chips would be enough of a start......

We should have P2 Eval boards with the new chip, probably in mid August, to sell.

What advantages/fixes do you hope to come out in the new chip?

whicker · 2019-07-10 16:29

hinv wrote: »

cgracey wrote: »

frank freedman wrote: »

Hmmmmmm, no P2 to teach new tricks to, wonder when that can be changed. Couple bare chips would be enough of a start......

We should have P2 Eval boards with the new chip, probably in mid August, to sell.

What advantages/fixes do you hope to come out in the new chip?

Please clarify: are you asking what the revision fixed from the previous hundred samples, or what the P2 can do in general?

Cluso99 · 2019-07-10 21:58

Search for the thread listing the fixes/changes fo the P2-ES chip. Might be in the "sticky" thread

cgracey · 2019-07-10 22:00

Here are the new silicon changes from the Google doc:

-All known prior bugs fixed.

-Clock-gating implemented, reduces power by ~40%.

-PLL filter modified to reduce jitter and improve lock.

-System counter extended to 64 bits. GETCT WC retrieves upper 32-bits.

-Streamer has many new modes with SINC1/SINC2 ADC conversions for Goertzel mode.

-HDMI mode added to streamer with ascending and descending pinouts for easy PCB layout.

-SINC2/SINC3 filters added to smart pins for improving ENOB in ADC conversions.

-Each cog has four 8-bit sample-per-clock ADC channels that feed from new smart pin 'scope' modes.

-BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT can now work on a span of bits (+S[9:5] bits). Prior SETQ overrides S[9:5].

-DIRx/OUTx/FLTx/DRVx can now work on a span of pins (+D[10:6] pins). Prior SETQ overrides D[10:6].

-WRPIN/WXPIN/WYPIN/AKPIN can now work on a span of pins (+S[10:6] pins). Prior SETQ overrides S[10:6].

-BIT_DAC output now has two 4-bit settings for low and high states, instead of one 8-bit high-state setting.

-RDxxxx/WRxxxx+PTRx expressions now index -16..+16 with updating and -32..+31 without updating.

-Sensible PTRx behavior implemented for 'SETQ(2) + RDLONG/WRLONG/WMLONG' operations.

-RDLUT/WRLUT can now handle PTRx expressions.

-Cog LUT sharing is now glitch-free.

-POP now returns Z=1 if result=0, used to return result[30].

-XORO32 improved.

-Main PRNG upgraded to Xoroshiro128**.

Phil Pilgrim (PhiPi) · 2019-07-11 20:16

cgracey wrote:

-System counter extended to 64 bits. GETCT WC retrieves upper 32-bits.

Do you have to get it twice to make sure it didn't change between gets to the lower 32 bits? Or is there some sort of locking mechanism in place?

-Phil

Cluso99 · 2019-07-11 22:00

Chip made the counter work such that the two instructions following each other would get the same correct value for upper then lower.

ozpropdev · 2019-07-11 22:15

Chip said:

When you execute 'GETCT D WC', you will get the top 32 bits. The next instruction is shielded from interrupt and if it's 'GETCT D', you will get the time-aligned lower 32 bits that goes with the prior instruction's reading of the top 32 bits.

I verified this the other day on the FPGA V32i V33k image including the interrupt shielding.

Edit: correction

evanh · 2019-07-11 22:58

That would be a v33

ozpropdev · 2019-07-11 23:32

evanh wrote: »

That would be a v33

Doh! V33k

ersmith · 2019-07-12 16:13

How long does the GETCT D WC hold the lower 32 bits? Is it just for the next instruction? That is, if we do:

   GETCT A WC
   NOP
   NOP
   GETCT B

is "B" the same as what we would have gotten if GETCT A WC were replaced with a NOP?

ozpropdev · 2019-07-12 23:50

ersmith wrote: »
How long does the GETCT D WC hold the lower 32 bits? Is it just for the next instruction? That is, if we do:
   GETCT A WC
   NOP
   NOP
   GETCT B
is "B" the same as what we would have gotten if GETCT A WC were replaced with a NOP?

Yes, 'B' will be the same.

cgracey · 2019-07-13 00:48

ersmith wrote: »
How long does the GETCT D WC hold the lower 32 bits? Is it just for the next instruction? That is, if we do:
   GETCT A WC
   NOP
   NOP
   GETCT B
is "B" the same as what we would have gotten if GETCT A WC were replaced with a NOP?

It's just for the next instruction. I think the top 32 bits are two clocks ahead of the lower 32 bits. That's how it winds up time-aligned.

msrobots · 2019-07-13 03:24

If I remember correctly there is some discussion about it in the long winded P2 thread, and the two instructions have to follow each other no nops in between, they are secure from interrups and deliver the 64bit counter at invocation of the first one with WC.

getct gets the saved value from 2 clocks before if a getct WC saved it, else the current lower 32 bits. That makes the subtraction not needed.

I might have misunderstood that, but it did make sense to me.

should b easy to test,

 GETCT A WC
 GETCT B
 GETCT C

either B differs 2 from C or 4

remodeling my mancave so no bench to test it.

Chip said the high cnt is delivered 2 clocks ahead, if so then the test will not work since it will be 2.

This could verify all assumptions

 GETCT A
 GETCT B WC
 GETCT C
 GETCT D

Enjoy!

Mike

evanh · 2019-07-13 03:35

I've tested it Mike, there is a two clock difference between the two consecutive instructions, as Chip just indicated above.

The WC variant doesn't in any way capture the lower 32 bits. It only masks interrupts for two clocks, enough for one subsequent instruction, which can be any instruction but obviously only the non-WC variant will usefully be affected.
EDIT2: This has now proven incorrect. The below data doesn't prove what I'd said.

EDIT: Here's a report line from my testing:

072aec14   0000005e   072aec18   072aec1a   0000005e   072aec22   072aec24   0000005e   0000005e   072aec2a   072aec2c

And here's the v33k sampling code:

		getct	ticks
		getct	ticks+1		wc
		getct	ticks+2
		getct	ticks+3
		getct	ticks+4		wc
		nop
		nop
		getct	ticks+5
		getct	ticks+6
		getct	ticks+7		wc
		getct	ticks+8		wc
		getct	ticks+9
		getct	ticks+10

Cluso99 · 2019-07-13 03:40

If you are really interested there was a big thread about the extended CNT register.
Otherwise, they are two instructions that must follow one another.

jmg · 2019-07-13 03:46

evanh wrote: »
cgracey wrote: »

I think the top 32 bits are two clocks ahead of the lower 32 bits. That's how it winds up time-aligned.

Oh, correct, I just tested this. I'd misunderstood what you meant by time-aligned. So it's not a prefix in any way, they are just two instructions that independantly sample two halves of the 64-bit counter at the respective timestamps.

EDIT: Therefore, the lower 32 bits could have rolled over between the two instructions. The IRQ masking does nothing useful. EDIT2: On second thought, the masking can be made good use of by always subtracting two off the value of the following non-WC instruction. This way any rollover gets reversed and no further checks are needed. ie: The following is the best way to read the whole 64-bit counter without glitches:
   GETCT A WC
   GETCT B
   SUB B, #2
EDIT3: There is still a caveat with interrupts there. The SUB B,#2 may need to be shielded depending on what code path that snippet is placed in.

I"m not following the hops here ..
I think Chip was saying the upper 32b has a 2 clock phase shift in the carry, so the two reads appear to be capturing the same instant in time, on a 64b counter
ie if the first read has just rolled over, the second read will also have just incremented.
The required phase-fix depends on the designed read order, HL or LH - I think only one order is time-aligned.
If the HW is designed for HL, then I think the carry needs to be at 2^32-2,

evanh · 2019-07-13 03:51

jmg wrote: »

I think Chip was saying the upper 32b has a 2 clock phase shift in the carry, so the two reads appear to be capturing the same instant in time, on a 64b counter
ie if the first read has just rolled over, the second read will also have just incremented.

I had originally assumed that as well. But the testing has definitely confirmed otherwise. See my other posting - https://forums.parallax.com/discussion/comment/1473650/#Comment_1473650 Just the first three samples show there is four counts from the first to the third sample.

PS: There's no way Chip would have put an extra 64-bit adder in there to advance the second sample (with WC) by two to make it suit the third sample.

PPS: Also, phase shifting the carry will retard the high word, not advance it. That would work if the sampling order was reversed.

Yanomani · 2019-07-13 04:07

Asking Peter's indulgence, since I fully agree with his observation about the objectives of this thread, but, at the same time, I also believe that is important for every P2 prospective user to have a better knowledge about the "inner stuff", thus, each one could have, perhaps, a better unsderstanding of what is inside its guts, and why things were done this or that way, thus enabling more people to take advantage of its power.

At the present case, since we are talking about time stamps, its trully important to understand why interrupts cannot be allowed to interfere into the execution path of a sequence such as GETCT M WC; GETCT N.

To simplify things, the sequence GETCT M WC; GETCT N needs to be understood as an unnbreakable 64-bit wide instruction, able to transfer a snapshot of the instantaneous contents of an ever-updating 64-bit wide register, whose increments occurs at sysclk rate.

Each P2 COG is a 32-bit processor, that can normally move information in up-to 32 bit-wide transactions; from the moment that CT was extended to 64 bits, there was the need to have some mechanism to allow the retrieval of the full 64-bit time stamp, as a snapshot, taken by a super-hyper-speed camera, pointed to an imaginary led-panel, where its ultra-fast-rolling bits are being internally displayed, while being updated at sysclk rate.

During the project phase, it was defined that the lower 32 bits, as taken by the snapshot, should agree with the moment of execution of the second instruction of the pair (GETCT N), so, if you use GETCT M WC; GETCT N, M and N must reflect the exact contents of CT, at that given moment. Not earlier, nor later.

To achieve that goal, the value transfered to M by the first part of the "double" instruction (GETCT M WC) must reflect the contents CT will have at some given future moment, two full sysclk cycles ahead in time, since its higher 32-bit wide halve could need to be increment by a unity, due to any possible roll that can occur at its lower 32 bits, up to the moment those bits are retrieved by the second halve of the double-instruction (GETCT N), and finally transfered to N.

Is all about keeping consistence with a 64-bit register contents, while doing the transfers in two consecutive (and unninterruptible) 32-bit wide moves.

If there wasn't the possibility of interrupts being allowed to interfere with the very moment of execution of GETCT N, in relation to the moment of execution of GETCT M WC, there were also not any reason to create an interrupt-blocking mechanism.

Out from its pure uses as a time-stamp grabber, other possible aplications of GETCT M WC interrupt-delaying mechanism, are to allow delaying the occurence of interrupts by very short, odd-sysclk count intervals, such as executing GETCT D WC, followed by, says, RDLUT D,{#}S. Or even being able to sync the start of an interrupt-acceptance window with the egg-beater HUB timming, by following GETCT D WC with any of the HUB egg-beater timing sync dependant instructions.

Possibilities are almost limitless.

msrobots · 2019-07-13 04:20

Thanks for testing @evan now we know for sure that it is working as expected, your numbers show it. Nice.

Enjoy!

Mike

evanh · 2019-07-13 04:53

Holy cow! JMG is right in some way. Mike, I was wrong, I'll update the earlier post.

Somehow Chip must be conditionally incrementing the high word during the WC instruction depending on if the value of the lower 32 bits is within a couple of counts of carrying.

A delayed carry low to high would have been much smaller hardware me thinks. Both in the instruction and counter logics.

Okay, looking at the second line below, here's a count aligned report that lands the third sample bang on the low word rollover - The first sample being four clocks earlier. Which means I was expecting the second sample (with WC) to be all zeros ... but it has been incremented to match the third sample timing.

7ffffffc   00000000   80000000   80000002   00000000   8000000a   8000000c   00000000   00000000   80000012   80000014 
fffffffc   00000001   00000000   00000002   00000001   0000000a   0000000c   00000001   00000001   00000012   00000014

Repost of sampling code:

		getct	ticks
		getct	ticks+1		wc
		getct	ticks+2
		getct	ticks+3
		getct	ticks+4		wc
		nop
		nop
		getct	ticks+5
		getct	ticks+6
		getct	ticks+7		wc
		getct	ticks+8		wc
		getct	ticks+9
		getct	ticks+10

evanh · 2019-07-13 05:16

In case it's not clear. My subtraction of two is wrong, don't do that.

evanh · 2019-07-13 05:25

Quoted from tricks and traps topic so as not to create a mess there too:

Cluso99 wrote: »

I think the top 32 bits are two clocks ahead of the lower 32 bits. That's how it winds up time-aligned.

That's a problematic description Cluso. "two clocks ahead" to me means at an earlier timestamp.

EDIT: How about "The first read upper 32 bits are adjusted to a timestamp of +2 clocks to suit the consecutively read lower 32 bits".

Cluso99 · 2019-07-13 05:45

Yes, the two reads will get the correct value. The lower 32-bit count is delayed by 2 clocks such that the two successive reads will get the correct 64-bit value. It's all done with smoke and mirrors

Postedit: I am not sure how Chip achieved it in silicon. Perhaps the upper 32 bit counter is incremented at the lower 32-bit count of FFFF-FFFD instead of FFFF-FFFF. It does not matter, just that it works.

evanh · 2019-07-13 05:47

But the description is not clear as is. Not a good idea to leave it that way.

EDIT: It's why the questions keep recurring.

EDIT2: Defining a time compensation within a time measurement is a brain teaser. The direction conveyed by forward, back, delayed, ahead and behind become unclear. Even terms like advance and retard have to be understood rigidly to be useful in this context.

I'm now of the opinion that, technically, the first (high) read is effectively retarded (behind) by two clocks. Not ahead.

EDIT3: I guess it depends on whether the counter value is considered to be time or not. Hence why it's important to describe the instruction actions as post read adds and subtracts.

cgracey · 2019-07-13 07:06

I actually don't remember the exact mechanism, but the 64-bit reading is always correct. I have a program which tests the dual instructions right around CT rollover. I will post it when I get back home.

jmg · 2019-07-13 07:13

Cluso99 wrote: »

Yes, the two reads will get the correct value. The lower 32-bit count is delayed by 2 clocks such that the two successive reads will get the correct 64-bit value. It's all done with smoke and mirrors

Postedit: I am not sure how Chip achieved it in silicon. Perhaps the upper 32 bit counter is incremented at the lower 32-bit count of FFFF-FFFD instead of FFFF-FFFF. It does not matter, just that it works.

Yes, That was what my post above suggested - that the carry is '2 early', which is hopefully similar logic to carry on rollover ?

cgracey wrote: »

I actually don't remember the exact mechanism, but the 64-bit reading is always correct. I have a program which tests the dual instructions right around CT rollover. I will post it when I get back home.

IIRC this was tested, and proven ok ?

ozpropdev · 2019-07-13 07:23

cgracey wrote: »

I actually don't remember the exact mechanism, but the 64-bit reading is always correct. I have a program which tests the dual instructions right around CT rollover. I will post it when I get back home.

Chip,you have already posted it here.
http://forums.parallax.com/discussion/comment/1463105/#Comment_1463105

evanh · 2019-07-13 07:43

Brian,
This one on the previous page might what you meant - http://forums.parallax.com/discussion/comment/1463013/#Comment_1463013

evanh · 2019-07-13 07:46

I'm satisfied the GETCT instructions for reading the 64-bit CT counter are fine and don't need any software compensation. Just the documentation needs to be much clearer.

ozpropdev · 2019-07-13 07:54

evanh wrote: »

Brian,
This one on the previous page might what you meant - http://forums.parallax.com/discussion/comment/1463013/#Comment_1463013

Oops!

Changes and GETCT

Comments