When you execute 'GETCT D WC', you will get the top 32 bits. The next instruction is shielded from interrupt and if it's 'GETCT D', you will get the time-aligned lower 32 bits that goes with the prior instruction's reading of the top 32 bits.
I verified this the other day on the FPGA V32i V33k image including the interrupt shielding.
If I remember correctly there is some discussion about it in the long winded P2 thread, and the two instructions have to follow each other no nops in between, they are secure from interrups and deliver the 64bit counter at invocation of the first one with WC.
getct gets the saved value from 2 clocks before if a getct WC saved it, else the current lower 32 bits. That makes the subtraction not needed.
I might have misunderstood that, but it did make sense to me.
should b easy to test,
GETCT A WC
GETCT B
GETCT C
either B differs 2 from C or 4
remodeling my mancave so no bench to test it.
Chip said the high cnt is delivered 2 clocks ahead, if so then the test will not work since it will be 2.
I've tested it Mike, there is a two clock difference between the two consecutive instructions, as Chip just indicated above.
The WC variant doesn't in any way capture the lower 32 bits. It only masks interrupts for two clocks, enough for one subsequent instruction, which can be any instruction but obviously only the non-WC variant will usefully be affected. EDIT2: This has now proven incorrect. The below data doesn't prove what I'd said.
If you are really interested there was a big thread about the extended CNT register.
Otherwise, they are two instructions that must follow one another.
I think the top 32 bits are two clocks ahead of the lower 32 bits. That's how it winds up time-aligned.
Oh, correct, I just tested this. I'd misunderstood what you meant by time-aligned. So it's not a prefix in any way, they are just two instructions that independantly sample two halves of the 64-bit counter at the respective timestamps.
EDIT: Therefore, the lower 32 bits could have rolled over between the two instructions. The IRQ masking does nothing useful. EDIT2: On second thought, the masking can be made good use of by always subtracting two off the value of the following non-WC instruction. This way any rollover gets reversed and no further checks are needed. ie: The following is the best way to read the whole 64-bit counter without glitches:
GETCT A WC
GETCT B
SUB B, #2
EDIT3: There is still a caveat with interrupts there. The SUB B,#2 may need to be shielded depending on what code path that snippet is placed in.
I"m not following the hops here ..
I think Chip was saying the upper 32b has a 2 clock phase shift in the carry, so the two reads appear to be capturing the same instant in time, on a 64b counter
ie if the first read has just rolled over, the second read will also have just incremented.
The required phase-fix depends on the designed read order, HL or LH - I think only one order is time-aligned.
If the HW is designed for HL, then I think the carry needs to be at 2^32-2,
I think Chip was saying the upper 32b has a 2 clock phase shift in the carry, so the two reads appear to be capturing the same instant in time, on a 64b counter
ie if the first read has just rolled over, the second read will also have just incremented.
Asking Peter's indulgence, since I fully agree with his observation about the objectives of this thread, but, at the same time, I also believe that is important for every P2 prospective user to have a better knowledge about the "inner stuff", thus, each one could have, perhaps, a better unsderstanding of what is inside its guts, and why things were done this or that way, thus enabling more people to take advantage of its power.
At the present case, since we are talking about time stamps, its trully important to understand why interrupts cannot be allowed to interfere into the execution path of a sequence such as GETCT M WC; GETCT N.
To simplify things, the sequence GETCT M WC; GETCT N needs to be understood as an unnbreakable 64-bit wide instruction, able to transfer a snapshot of the instantaneous contents of an ever-updating 64-bit wide register, whose increments occurs at sysclk rate.
Each P2 COG is a 32-bit processor, that can normally move information in up-to 32 bit-wide transactions; from the moment that CT was extended to 64 bits, there was the need to have some mechanism to allow the retrieval of the full 64-bit time stamp, as a snapshot, taken by a super-hyper-speed camera, pointed to an imaginary led-panel, where its ultra-fast-rolling bits are being internally displayed, while being updated at sysclk rate.
During the project phase, it was defined that the lower 32 bits, as taken by the snapshot, should agree with the moment of execution of the second instruction of the pair (GETCT N), so, if you use GETCT M WC; GETCT N, M and N must reflect the exact contents of CT, at that given moment. Not earlier, nor later.
To achieve that goal, the value transfered to M by the first part of the "double" instruction (GETCT M WC) must reflect the contents CT will have at some given future moment, two full sysclk cycles ahead in time, since its higher 32-bit wide halve could need to be increment by a unity, due to any possible roll that can occur at its lower 32 bits, up to the moment those bits are retrieved by the second halve of the double-instruction (GETCT N), and finally transfered to N.
Is all about keeping consistence with a 64-bit register contents, while doing the transfers in two consecutive (and unninterruptible) 32-bit wide moves.
If there wasn't the possibility of interrupts being allowed to interfere with the very moment of execution of GETCT N, in relation to the moment of execution of GETCT M WC, there were also not any reason to create an interrupt-blocking mechanism.
Out from its pure uses as a time-stamp grabber, other possible aplications of GETCT M WC interrupt-delaying mechanism, are to allow delaying the occurence of interrupts by very short, odd-sysclk count intervals, such as executing GETCT D WC, followed by, says, RDLUT D,{#}S. Or even being able to sync the start of an interrupt-acceptance window with the egg-beater HUB timming, by following GETCT D WC with any of the HUB egg-beater timing sync dependant instructions.
Holy cow! JMG is right in some way. Mike, I was wrong, I'll update the earlier post.
Somehow Chip must be conditionally incrementing the high word during the WC instruction depending on if the value of the lower 32 bits is within a couple of counts of carrying.
A delayed carry low to high would have been much smaller hardware me thinks. Both in the instruction and counter logics.
Okay, looking at the second line below, here's a count aligned report that lands the third sample bang on the low word rollover - The first sample being four clocks earlier. Which means I was expecting the second sample (with WC) to be all zeros ... but it has been incremented to match the third sample timing.
Yes, the two reads will get the correct value. The lower 32-bit count is delayed by 2 clocks such that the two successive reads will get the correct 64-bit value. It's all done with smoke and mirrors
Postedit: I am not sure how Chip achieved it in silicon. Perhaps the upper 32 bit counter is incremented at the lower 32-bit count of FFFF-FFFD instead of FFFF-FFFF. It does not matter, just that it works.
But the description is not clear as is. Not a good idea to leave it that way.
EDIT: It's why the questions keep recurring.
EDIT2: Defining a time compensation within a time measurement is a brain teaser. The direction conveyed by forward, back, delayed, ahead and behind become unclear. Even terms like advance and retard have to be understood rigidly to be useful in this context.
I'm now of the opinion that, technically, the first (high) read is effectively retarded (behind) by two clocks. Not ahead.
EDIT3: I guess it depends on whether the counter value is considered to be time or not. Hence why it's important to describe the instruction actions as post read adds and subtracts.
I actually don't remember the exact mechanism, but the 64-bit reading is always correct. I have a program which tests the dual instructions right around CT rollover. I will post it when I get back home.
Yes, the two reads will get the correct value. The lower 32-bit count is delayed by 2 clocks such that the two successive reads will get the correct 64-bit value. It's all done with smoke and mirrors
Postedit: I am not sure how Chip achieved it in silicon. Perhaps the upper 32 bit counter is incremented at the lower 32-bit count of FFFF-FFFD instead of FFFF-FFFF. It does not matter, just that it works.
Yes, That was what my post above suggested - that the carry is '2 early', which is hopefully similar logic to carry on rollover ?
I actually don't remember the exact mechanism, but the 64-bit reading is always correct. I have a program which tests the dual instructions right around CT rollover. I will post it when I get back home.
I actually don't remember the exact mechanism, but the 64-bit reading is always correct. I have a program which tests the dual instructions right around CT rollover. I will post it when I get back home.
I'm satisfied the GETCT instructions for reading the 64-bit CT counter are fine and don't need any software compensation. Just the documentation needs to be much clearer.
Comments
What advantages/fixes do you hope to come out in the new chip?
Please clarify: are you asking what the revision fixed from the previous hundred samples, or what the P2 can do in general?
-All known prior bugs fixed.
-Clock-gating implemented, reduces power by ~40%.
-PLL filter modified to reduce jitter and improve lock.
-System counter extended to 64 bits. GETCT WC retrieves upper 32-bits.
-Streamer has many new modes with SINC1/SINC2 ADC conversions for Goertzel mode.
-HDMI mode added to streamer with ascending and descending pinouts for easy PCB layout.
-SINC2/SINC3 filters added to smart pins for improving ENOB in ADC conversions.
-Each cog has four 8-bit sample-per-clock ADC channels that feed from new smart pin 'scope' modes.
-BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT can now work on a span of bits (+S[9:5] bits). Prior SETQ overrides S[9:5].
-DIRx/OUTx/FLTx/DRVx can now work on a span of pins (+D[10:6] pins). Prior SETQ overrides D[10:6].
-WRPIN/WXPIN/WYPIN/AKPIN can now work on a span of pins (+S[10:6] pins). Prior SETQ overrides S[10:6].
-BIT_DAC output now has two 4-bit settings for low and high states, instead of one 8-bit high-state setting.
-RDxxxx/WRxxxx+PTRx expressions now index -16..+16 with updating and -32..+31 without updating.
-Sensible PTRx behavior implemented for 'SETQ(2) + RDLONG/WRLONG/WMLONG' operations.
-RDLUT/WRLUT can now handle PTRx expressions.
-Cog LUT sharing is now glitch-free.
-POP now returns Z=1 if result=0, used to return result[30].
-XORO32 improved.
-Main PRNG upgraded to Xoroshiro128**.
-Phil
Edit: correction
It's just for the next instruction. I think the top 32 bits are two clocks ahead of the lower 32 bits. That's how it winds up time-aligned.
getct gets the saved value from 2 clocks before if a getct WC saved it, else the current lower 32 bits. That makes the subtraction not needed.
I might have misunderstood that, but it did make sense to me.
should b easy to test,
either B differs 2 from C or 4
remodeling my mancave so no bench to test it.
Chip said the high cnt is delivered 2 clocks ahead, if so then the test will not work since it will be 2.
This could verify all assumptions
Enjoy!
Mike
The WC variant doesn't in any way capture the lower 32 bits. It only masks interrupts for two clocks, enough for one subsequent instruction, which can be any instruction but obviously only the non-WC variant will usefully be affected.
EDIT2: This has now proven incorrect. The below data doesn't prove what I'd said.
EDIT: Here's a report line from my testing:
And here's the v33k sampling code:
Otherwise, they are two instructions that must follow one another.
I"m not following the hops here ..
I think Chip was saying the upper 32b has a 2 clock phase shift in the carry, so the two reads appear to be capturing the same instant in time, on a 64b counter
ie if the first read has just rolled over, the second read will also have just incremented.
The required phase-fix depends on the designed read order, HL or LH - I think only one order is time-aligned.
If the HW is designed for HL, then I think the carry needs to be at 2^32-2,
PS: There's no way Chip would have put an extra 64-bit adder in there to advance the second sample (with WC) by two to make it suit the third sample.
PPS: Also, phase shifting the carry will retard the high word, not advance it. That would work if the sampling order was reversed.
At the present case, since we are talking about time stamps, its trully important to understand why interrupts cannot be allowed to interfere into the execution path of a sequence such as GETCT M WC; GETCT N.
To simplify things, the sequence GETCT M WC; GETCT N needs to be understood as an unnbreakable 64-bit wide instruction, able to transfer a snapshot of the instantaneous contents of an ever-updating 64-bit wide register, whose increments occurs at sysclk rate.
Each P2 COG is a 32-bit processor, that can normally move information in up-to 32 bit-wide transactions; from the moment that CT was extended to 64 bits, there was the need to have some mechanism to allow the retrieval of the full 64-bit time stamp, as a snapshot, taken by a super-hyper-speed camera, pointed to an imaginary led-panel, where its ultra-fast-rolling bits are being internally displayed, while being updated at sysclk rate.
During the project phase, it was defined that the lower 32 bits, as taken by the snapshot, should agree with the moment of execution of the second instruction of the pair (GETCT N), so, if you use GETCT M WC; GETCT N, M and N must reflect the exact contents of CT, at that given moment. Not earlier, nor later.
To achieve that goal, the value transfered to M by the first part of the "double" instruction (GETCT M WC) must reflect the contents CT will have at some given future moment, two full sysclk cycles ahead in time, since its higher 32-bit wide halve could need to be increment by a unity, due to any possible roll that can occur at its lower 32 bits, up to the moment those bits are retrieved by the second halve of the double-instruction (GETCT N), and finally transfered to N.
Is all about keeping consistence with a 64-bit register contents, while doing the transfers in two consecutive (and unninterruptible) 32-bit wide moves.
If there wasn't the possibility of interrupts being allowed to interfere with the very moment of execution of GETCT N, in relation to the moment of execution of GETCT M WC, there were also not any reason to create an interrupt-blocking mechanism.
Out from its pure uses as a time-stamp grabber, other possible aplications of GETCT M WC interrupt-delaying mechanism, are to allow delaying the occurence of interrupts by very short, odd-sysclk count intervals, such as executing GETCT D WC, followed by, says, RDLUT D,{#}S. Or even being able to sync the start of an interrupt-acceptance window with the egg-beater HUB timming, by following GETCT D WC with any of the HUB egg-beater timing sync dependant instructions.
Possibilities are almost limitless.
Enjoy!
Mike
Somehow Chip must be conditionally incrementing the high word during the WC instruction depending on if the value of the lower 32 bits is within a couple of counts of carrying.
A delayed carry low to high would have been much smaller hardware me thinks. Both in the instruction and counter logics.
Okay, looking at the second line below, here's a count aligned report that lands the third sample bang on the low word rollover - The first sample being four clocks earlier. Which means I was expecting the second sample (with WC) to be all zeros ... but it has been incremented to match the third sample timing.
Repost of sampling code:
EDIT: How about "The first read upper 32 bits are adjusted to a timestamp of +2 clocks to suit the consecutively read lower 32 bits".
Postedit: I am not sure how Chip achieved it in silicon. Perhaps the upper 32 bit counter is incremented at the lower 32-bit count of FFFF-FFFD instead of FFFF-FFFF. It does not matter, just that it works.
EDIT: It's why the questions keep recurring.
EDIT2: Defining a time compensation within a time measurement is a brain teaser. The direction conveyed by forward, back, delayed, ahead and behind become unclear. Even terms like advance and retard have to be understood rigidly to be useful in this context.
I'm now of the opinion that, technically, the first (high) read is effectively retarded (behind) by two clocks. Not ahead.
EDIT3: I guess it depends on whether the counter value is considered to be time or not. Hence why it's important to describe the instruction actions as post read adds and subtracts.
Yes, That was what my post above suggested - that the carry is '2 early', which is hopefully similar logic to carry on rollover ?
IIRC this was tested, and proven ok ?
Chip,you have already posted it here.
http://forums.parallax.com/discussion/comment/1463105/#Comment_1463105
This one on the previous page might what you meant - http://forums.parallax.com/discussion/comment/1463013/#Comment_1463013