Changes and GETCT
System
Posts: 45
in Propeller 2
This discussion was created from comments split from: Using GETCT - Was "What can your P2 do?".
Comments
What advantages/fixes do you hope to come out in the new chip?
Please clarify: are you asking what the revision fixed from the previous hundred samples, or what the P2 can do in general?
-All known prior bugs fixed.
-Clock-gating implemented, reduces power by ~40%.
-PLL filter modified to reduce jitter and improve lock.
-System counter extended to 64 bits. GETCT WC retrieves upper 32-bits.
-Streamer has many new modes with SINC1/SINC2 ADC conversions for Goertzel mode.
-HDMI mode added to streamer with ascending and descending pinouts for easy PCB layout.
-SINC2/SINC3 filters added to smart pins for improving ENOB in ADC conversions.
-Each cog has four 8-bit sample-per-clock ADC channels that feed from new smart pin 'scope' modes.
-BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT can now work on a span of bits (+S[9:5] bits). Prior SETQ overrides S[9:5].
-DIRx/OUTx/FLTx/DRVx can now work on a span of pins (+D[10:6] pins). Prior SETQ overrides D[10:6].
-WRPIN/WXPIN/WYPIN/AKPIN can now work on a span of pins (+S[10:6] pins). Prior SETQ overrides S[10:6].
-BIT_DAC output now has two 4-bit settings for low and high states, instead of one 8-bit high-state setting.
-RDxxxx/WRxxxx+PTRx expressions now index -16..+16 with updating and -32..+31 without updating.
-Sensible PTRx behavior implemented for 'SETQ(2) + RDLONG/WRLONG/WMLONG' operations.
-RDLUT/WRLUT can now handle PTRx expressions.
-Cog LUT sharing is now glitch-free.
-POP now returns Z=1 if result=0, used to return result[30].
-XORO32 improved.
-Main PRNG upgraded to Xoroshiro128**.
-Phil
Edit: correction
It's just for the next instruction. I think the top 32 bits are two clocks ahead of the lower 32 bits. That's how it winds up time-aligned.
getct gets the saved value from 2 clocks before if a getct WC saved it, else the current lower 32 bits. That makes the subtraction not needed.
I might have misunderstood that, but it did make sense to me.
should b easy to test,
either B differs 2 from C or 4
remodeling my mancave so no bench to test it.
Chip said the high cnt is delivered 2 clocks ahead, if so then the test will not work since it will be 2.
This could verify all assumptions
Enjoy!
Mike
The WC variant doesn't in any way capture the lower 32 bits. It only masks interrupts for two clocks, enough for one subsequent instruction, which can be any instruction but obviously only the non-WC variant will usefully be affected.
EDIT2: This has now proven incorrect. The below data doesn't prove what I'd said.
EDIT: Here's a report line from my testing:
And here's the v33k sampling code:
Otherwise, they are two instructions that must follow one another.
I"m not following the hops here ..
I think Chip was saying the upper 32b has a 2 clock phase shift in the carry, so the two reads appear to be capturing the same instant in time, on a 64b counter
ie if the first read has just rolled over, the second read will also have just incremented.
The required phase-fix depends on the designed read order, HL or LH - I think only one order is time-aligned.
If the HW is designed for HL, then I think the carry needs to be at 2^32-2,
PS: There's no way Chip would have put an extra 64-bit adder in there to advance the second sample (with WC) by two to make it suit the third sample.
PPS: Also, phase shifting the carry will retard the high word, not advance it. That would work if the sampling order was reversed.
At the present case, since we are talking about time stamps, its trully important to understand why interrupts cannot be allowed to interfere into the execution path of a sequence such as GETCT M WC; GETCT N.
To simplify things, the sequence GETCT M WC; GETCT N needs to be understood as an unnbreakable 64-bit wide instruction, able to transfer a snapshot of the instantaneous contents of an ever-updating 64-bit wide register, whose increments occurs at sysclk rate.
Each P2 COG is a 32-bit processor, that can normally move information in up-to 32 bit-wide transactions; from the moment that CT was extended to 64 bits, there was the need to have some mechanism to allow the retrieval of the full 64-bit time stamp, as a snapshot, taken by a super-hyper-speed camera, pointed to an imaginary led-panel, where its ultra-fast-rolling bits are being internally displayed, while being updated at sysclk rate.
During the project phase, it was defined that the lower 32 bits, as taken by the snapshot, should agree with the moment of execution of the second instruction of the pair (GETCT N), so, if you use GETCT M WC; GETCT N, M and N must reflect the exact contents of CT, at that given moment. Not earlier, nor later.
To achieve that goal, the value transfered to M by the first part of the "double" instruction (GETCT M WC) must reflect the contents CT will have at some given future moment, two full sysclk cycles ahead in time, since its higher 32-bit wide halve could need to be increment by a unity, due to any possible roll that can occur at its lower 32 bits, up to the moment those bits are retrieved by the second halve of the double-instruction (GETCT N), and finally transfered to N.
Is all about keeping consistence with a 64-bit register contents, while doing the transfers in two consecutive (and unninterruptible) 32-bit wide moves.
If there wasn't the possibility of interrupts being allowed to interfere with the very moment of execution of GETCT N, in relation to the moment of execution of GETCT M WC, there were also not any reason to create an interrupt-blocking mechanism.
Out from its pure uses as a time-stamp grabber, other possible aplications of GETCT M WC interrupt-delaying mechanism, are to allow delaying the occurence of interrupts by very short, odd-sysclk count intervals, such as executing GETCT D WC, followed by, says, RDLUT D,{#}S. Or even being able to sync the start of an interrupt-acceptance window with the egg-beater HUB timming, by following GETCT D WC with any of the HUB egg-beater timing sync dependant instructions.
Possibilities are almost limitless.
Enjoy!
Mike
Somehow Chip must be conditionally incrementing the high word during the WC instruction depending on if the value of the lower 32 bits is within a couple of counts of carrying.
A delayed carry low to high would have been much smaller hardware me thinks. Both in the instruction and counter logics.
Okay, looking at the second line below, here's a count aligned report that lands the third sample bang on the low word rollover - The first sample being four clocks earlier. Which means I was expecting the second sample (with WC) to be all zeros ... but it has been incremented to match the third sample timing.
Repost of sampling code:
EDIT: How about "The first read upper 32 bits are adjusted to a timestamp of +2 clocks to suit the consecutively read lower 32 bits".
Postedit: I am not sure how Chip achieved it in silicon. Perhaps the upper 32 bit counter is incremented at the lower 32-bit count of FFFF-FFFD instead of FFFF-FFFF. It does not matter, just that it works.
EDIT: It's why the questions keep recurring.
EDIT2: Defining a time compensation within a time measurement is a brain teaser. The direction conveyed by forward, back, delayed, ahead and behind become unclear. Even terms like advance and retard have to be understood rigidly to be useful in this context.
I'm now of the opinion that, technically, the first (high) read is effectively retarded (behind) by two clocks. Not ahead.
EDIT3: I guess it depends on whether the counter value is considered to be time or not. Hence why it's important to describe the instruction actions as post read adds and subtracts.
Yes, That was what my post above suggested - that the carry is '2 early', which is hopefully similar logic to carry on rollover ?
IIRC this was tested, and proven ok ?
Chip,you have already posted it here.
http://forums.parallax.com/discussion/comment/1463105/#Comment_1463105
This one on the previous page might what you meant - http://forums.parallax.com/discussion/comment/1463013/#Comment_1463013