CNT extension to 64-bit
Cluso99
Posts: 18,069
I say from the outset, I am concerned about feature creep.
Postedit: Thought about it. It's feature creep and not warranted. Forget it
But here I raised whether it may be simple to extend the CNT register to permit above 32 bits
forums.parallax.com/discussion/comment/1453238/#Comment_1453238
Requirement: Minimal logic, minimal change, Minimal risk
For reference, here is the GETCT instruction
EEEE 1101011 000 DDDDDDDDD 000011000 GETCT D
Note CZL=000 which allows for expansion
Tony suggested this as an extension to my suggestion...
Further refinement...
* CNT[31:0] is extended to be CNT[63:0]
* When CNT[31] overflows (ie goes from 1 to 0), it sets an internal flag "FLAG31"
* GETCT instruction extended to be GETCNT D and GETCNTH D {WC}
* When GETCNT executes, it clears "FLAG31"
* When GETCNTH D {WC} executes, C is optionally set if "FLAG31" is set (ie an overflow of CNT[31] occurred since the last GETCNT executed)
I would expect in many instances, that when requiring longer count times, that only CNTH (CNT[63:32]) would be required.
EEEE 1101011 000 DDDDDDDDD 000011000 GETCNT D
EEEE 1101011 C01 DDDDDDDDD 000011000 GETCNTH D {WC}
Postedit: Thought about it. It's feature creep and not warranted. Forget it
But here I raised whether it may be simple to extend the CNT register to permit above 32 bits
forums.parallax.com/discussion/comment/1453238/#Comment_1453238
Requirement: Minimal logic, minimal change, Minimal risk
For reference, here is the GETCT instruction
EEEE 1101011 000 DDDDDDDDD 000011000 GETCT D
Note CZL=000 which allows for expansion
Tony suggested this as an extension to my suggestion...
Could GETCT set Z if CT has passed through zero since the last GETCT? This rollover bit cleared after GETCT. Could also set C if CT[31]=1 for completeness. The 64-bit count code go be:MOV CTHI,#0 ... GETCT CTLO WZ IF_Z ADD CTHI,#1
If Z if difficult, then use C.
Further refinement...
* CNT[31:0] is extended to be CNT[63:0]
* When CNT[31] overflows (ie goes from 1 to 0), it sets an internal flag "FLAG31"
* GETCT instruction extended to be GETCNT D and GETCNTH D {WC}
* When GETCNT executes, it clears "FLAG31"
* When GETCNTH D {WC} executes, C is optionally set if "FLAG31" is set (ie an overflow of CNT[31] occurred since the last GETCNT executed)
GETCNT countl GETCNTH counth wc if_c sub counth,#1 'because CNT[31:0] incremented since CNT[31:0] was read, so we need to adjust CNTH
I would expect in many instances, that when requiring longer count times, that only CNTH (CNT[63:32]) would be required.
EEEE 1101011 000 DDDDDDDDD 000011000 GETCNT D
EEEE 1101011 C01 DDDDDDDDD 000011000 GETCNTH D {WC}
Comments
By extending to 64-bits will give ~1,753 years !!!
An extra 8-bits CNT[39:0] gives ~55m
An extra 16-bits CNT[47:0] gives ~9.7 days
So perhaps we only need an extra 8-bits ???
So to simplify, just extend CNT to CNT[39:0] and change the GETCT D instruction to
EEEE 1101011 000 DDDDDDDDD 000011000 GETCT D 'reads CNT[31:0]
EEEE 1101011 001 DDDDDDDDD 000011000 GETCTX D 'reads CNT[39:7]
A problem here is interrupts. With split reads, if an INT and then an overflow occur, by the time it returns overflow info is lost.
One solution would be to have GETCNTH RegAdr read 64 bits in an atomic manner to 2 adjacent locations. Taking 3 or 4 SysCLKs to do so ?
Another is to have GETCNT always pause any INTs for 2 more sysclks, to effectively allow atomic 64b read. This could be a second GETCNT read, if opcode space is tight.
That extends the useful range of CNT to just 2339 years, but should be sufficient for most applications.
I think this is way more important than convenience stuff that could just be handled in the compiler/documentation (the WRPIN things).
This is something that people will run into and have to handle all the time.
Isn't 12 seconds enough ???
Hehe, everyone has their priorities.
I'm personally okay with 6 seconds. Software can do the rest.
I stumbled a lot over the P1 overflow at ~57 seconds. 55 minutes would be better, but if it is not prohibitive I would like to see a 64-bit cnt.
Just setting a flag at 32-bit overflow and using that to count overflows by one self in another register would need constant attention.
I am quite unsure about the getx, gety things, but if it would be possible to take a current snapshot of both cnt values at the same time and then reading them one after the other would avoid any interrupt problems.
Mike
See above, that is still not atomic.
GETCT could pause any pending INT for 2 SysCLKS, allowing any (optional) second GETCT to immediately follow.
If one GETCT immediately follows another (now safe from INTs), the second GETCT reads upper part of 40~64b, and does not add any INT pause. (A tiny state engine)
With a now known and locked 2 Sysclk delay on the 2 readings, I think a simple fixed 2 sysclk overflow pipeline is enough to give a perfect 40~64b capture, no upper holding buffer needed & no housekeeping.
There is only one CNT, so logic cost is not great. 8b means user-code still needs to manage rollovers.
24b is just under 7 years wrap time.
Yes, those are waitcnts, which test for >=, the CNT extension was not going to affect those. If someone is waiting, they can do that in a loop.
This 64b read of CNT means you do not have to ensure some SW somewhere, in the system is alive and tracking those 6~17s quanta.
Isn't that by COG shifted window view of 64b, more complex in logic and use, than just reading twice if you want 64b ?
Having to continuously handle cnt overflow for long duration timing operations is tedious.
For high precision timers, you choose the lowest 32bits, for seconds you'd bump it up a bit.
The window idea nicely handles the need for high/low precision, and for atomic transfers of the count.
Best of all, the count stays large precision and monotonic. CNT would never overflow, even in the worst cases.
You can easily sample the lower 32 bits, then the upper 32 bits and based on the lower 32bit value, sample it again if there was an overflow imminent.
You do need to consider the logic cost here - if you expect a nice-to-have 1 bit shifting granularity onto a 64b counter, that is a lot of config (5 offset bits per COG) and many MUXes .... (all this duplicated 8 times...)
So there is just one added instruction this way. Nothing else changes for the compilers.
PS: This also means you can implement CT as two 32-bit counters cascaded. Eliminating any risk of not meeting critical path problems later.
That was what I was trying to get at with the GETCTX D.
With a 40-bit counter, GETCTX D would return CNT[39:8]. This gives ~55mins.
With a 44-bit counter, GETCTX D would return CNT[43:12]. This would give ~14.5 hours.
With a 48-bit counter, GETCTX D would return CNT[47:16]. This gives ~9.7 days.
With any case (just implement one case) there would be no need to have rollover.
Chip is talking about a shiftable window. Where you could specify how far up the 64 bits you want to read the 32 bits or, for events, compare your 32-bits against. More flexible but more costly too.
Rayman,
It kind of is in the hub in that all cogs share the one counter, but, since it is read only, it can be read by all cogs simultaneously so it acts like it's part of each cog.
Yes. But I don't think it needs to be this flexible.
Just choose one of 40/44/48 bits and implement this. Definately no need to be any larger than 48 bits. Keep it simple
There are software ways around this if better granularity is also required by a specific user.
By making the total bits less than 64, software can read both halves if necessary, and compare the overlapping bits to see if rollover occurred. This is why I suggested say 48 bits.
This is true!
I'm not through reading comments here, but maybe a 48-bit counter with 16, or even 8, selectable window offsets would be good.
That's more config, logic and granularity compromise than a queued 32+32 read.
A 2 SysClk L32-> H32 delay (matches reading delay) would mean the upper 32 bits do not even need a holding register, and a 2 SysCLK monostable is used to hold-off interrupts to avoid INT aperture effects.
All up, quite simple logic, done once.
But if you want a long timeout using built-in timeout mechanisms, you are limited to ~8s.
I need to look through the Verilog code and see how CNT is used everywhere. That will indicate what approaches might be best.