That's more config, logic and granularity compromise than a queued 32+32 read.
A 2 SysClk L32-> H32 delay (matches reading delay) would mean the upper 32 bits do not even need a holding register, and a 2 SysCLK monostable is used to hold-off interrupts to avoid INT aperture effects.
All up, quite simple logic, done once.
But if you want a long timeout using built-in timeout mechanisms, you are limited to ~8s.
Well, yes, this thread is about 64b CNT and I'm talking about how to best read that 64b CNT, mainly for system timebase operation cases.
All COGS then have the same time, and to the same SYSCLK granularity.
Or were you meaning the shifting window would apply to ALL CNT operations ? I would tend to leave the WAIT code the same.
That could get quickly complex, for OBEX type code use, as well as between COGS, as a shifted window to the upper 32b of 48b, delivers a 200us granularity.
Maybe a compromise of using the movable window as a master scalar on a per cog basis. Add it to HUBSET and then there is no changes to anything else at all.
Makes the cost a lot smaller by only having one per cog then. EDIT: A 48-bit counter would also work out well here.
I mostly try to stay out of of these "gotta have" discussions but it seems simple to do as long as we can keep it simple.
I know we would never need a 64-bit count especially since it can't be set or prescaled so as to represent milliseconds from a reference date. But if we did have 64-bits then why can't the upper count be latched whenever the lower count is accessed and that way you only ever read the latched value and never have to take inter-stage rollover into account.
But if you had 48-bits then read the top or bottom 32-bits directly would be fine and keeps it super simple.
Maybe if you do two consecutive GETCT instructions the second returns the top 32 bits.
No extra instructions needed.
Yes, exactly my proposal.
With a addition of a small interrupt gate, so the first GETCT delays any INTs for 2 sysclks, so that the second GETCT is reliable/atomic. And a tweak to delay the carry L32 -> U32, so the delayed read appears correctly in-phase. (no split overflows)
What are people going to do with their 64-bit counts, anyway?
Save them. scale them, compare them ? the usual stuff ....
It means all COGs have the same CNT result, always...
Users can shift to scale to any LSB they choose, along with their SysCLK. eg 262.144MHz >> 18 gives 1.000 ms LSB
Are you guys serious?!! It seems rather late in the game to be talking about this kind of stuff. Besides, any program that can't decide within 232 clock ticks how to deal with counter overflow purely in software must be pretty lame. The P2 will never get finished if these kinds of distractions continue to get traction.
Are there any more interrupt event slots left? Can one be on 32-bit counter rollover?
This would allow you to do a delay longer than 2^32 cycles by first setting the interrupt to fire on 32-bit counter rollover for some number of rollovers, and then setting a normal 32-bit timer to fire at the appropriate point in the last 32-bit period.
selectable windows is not a good solution.
Either make it so we can properly read the whole thing or don't do anything.
Any other solution means we have to dedicate software to handle the wrap.
This is like a fundamental feature that we had in the p2-hot, and should have been here too. If I had noticed it was missing sooner I would have said something much earlier.
Timing is a critical part of most of the things we often do with the prop, and having it be a hassle to use in many cases makes it just annoying.
It should be 64bit and have a way to read sanely. This, in my mind, is WAY WAY more important than the WRPIN Smile you want. This will impact way more people with annoying stuff that is non-trivial to deal with (like the WRPIN stuff is).
Yes it's late, but Chip is doing other things that are way less important and might change things more than this would (lut read/write with incrementing ptr stuff). And changing things they solve much simpler issues than this just because he wants to make it easier for users.
So as it is if you want to time anything longer than a few seconds you have to do extra software to deal with longer timers.
I liked the way it was in P2-Hot. However, a 64-bit counter is really a necessity since it can be done in software. If I was Parallax I wouldn't make a lot of changes to the P2 at this point. It seems like all they should be doing is fixing bugs, and not introducing new features. I hope the next version of silicon isn't going to need bug fixes due to the new features.
Dave/Roy,
I gather that means you want the CT event compares to be upgraded ... hmm, ADDCTx refresh would have to change its behaviour. Currently ADDCTx uses a cog general 32-bit register for working space that gets initialised with a GETCT.
I guess this could be all folded into hidden 64-bit registers.
Are you guys serious?!! It seems rather late in the game to be talking about this kind of stuff. Besides, any program that can't decide within 232 clock ticks how to deal with counter overflow purely in software must be pretty lame. The P2 will never get finished if these kinds of distractions continue to get traction.
-Phil
Another problem is that with everybody soon playing with actual silicon, who's going to test out new FPGA images?
Dave/Roy,
I gather that means you want the CT event compares to be upgraded ... hmm, ADDCTx refresh would have to change its behaviour. Currently ADDCTx uses a cog general 32-bit register for working space that gets initialised with a GETCT.
I guess this could be all folded into hidden 64-bit registers.
WAITX could be extended with a SETQ.
No, I don't want the CT event compares upgraded. I'm suggesting that there should be no changes at all to the P2, except for bug fixes. A 64-bit counter can be implemented in software the same way we currently do in on the P1. In addition, the P2 supports interrupts, so there are additional ways that a 64-bit counter can be implemented on the P2.
Some accommodations would have to be made for these to take advantage of a 64-bit CT.
What should they be?
Reading the DOCs is not clear on if CT is cleared on reset, or can be loaded ? My inference is it is random ?
With a 64b CT, that would make sense to clear on reset, as you then have a System UP timer (likely to be quite important), as well as a solid absolute time, available to all COGs without needing some COG as time manager.
Any COG can be a watchdog / SOP monitor.
Re the above 3, the question is really how can those co-operate with a 64b CT ?
If you do need a 64b time waypoint, seems you could set a trigger on the L.32, and interrupt every 8.5 seconds at 250MHz, and check 64b CT, if still < Threshold, continue the INT polling.
ie do they need to be 64b deep ?
A system like a PLC, would likely scale the 64b CT, to some sensible LSB and then use 32b waypoints. eg 1ms scan time LSB, is 49 days of resolution.
Personally, I would just increase CNT to 48 bits, and add a variant GETCTX D (get count extended). This would return the top 32 bits of CNT (ie overlaps the top 16 bits of the lower 32bits). This would allow more coarser timing if required. 48 bits at 333MHz gives ~9.7 days which is plenty long enough.
Anyone who really wants a full 48 bits (all the maths will be difficult to manipulate on a 32 bit cpu) can compare the overlapped 16 bits to ensure a rollover did not occur between the two reads (if yes, upper requires a decrement).
This is a really simple change. KISS
In order to get time-aligned reads 2 clocks apart (GETCT takes 2 clocks), the upper long increments when then lower long is $0000_0001, not $FFFF_FFFF. This means that on reset, the counter must be initialized to $0000_0000_0000_0002 to avoid an early increment in the upper long. By the time user code starts running, the counter is already into the 10's of thousands.
I'd have a second instruction because it costs nothing being a single operand variety, and also you get the freedom to use it at a later point along with no interrupt shielding.
In order to get time-aligned reads 2 clocks apart (GETCT takes 2 clocks), the upper long increments when then lower long is $0000_0001, not $FFFF_FFFF.
I'd have a second instruction because it costs nothing being a single operand variety, and also you get the freedom to use it at a later point along with no interrupt shielding.
Good point. I've been avoiding adding new instructions.
Wait! without interrupt shielding, you can't get a reliable count. I think it's maybe best the way it is, in that case.
Comments
All COGS then have the same time, and to the same SYSCLK granularity.
Or were you meaning the shifting window would apply to ALL CNT operations ? I would tend to leave the WAIT code the same.
That could get quickly complex, for OBEX type code use, as well as between COGS, as a shifted window to the upper 32b of 48b, delivers a 200us granularity.
Makes the cost a lot smaller by only having one per cog then. EDIT: A 48-bit counter would also work out well here.
No extra instructions needed.
I know we would never need a 64-bit count especially since it can't be set or prescaled so as to represent milliseconds from a reference date. But if we did have 64-bits then why can't the upper count be latched whenever the lower count is accessed and that way you only ever read the latched value and never have to take inter-stage rollover into account.
But if you had 48-bits then read the top or bottom 32-bits directly would be fine and keeps it super simple.
I suspect that'll be Chip's fallback if the alternatives get too messy.
Yes, exactly my proposal.
With a addition of a small interrupt gate, so the first GETCT delays any INTs for 2 sysclks, so that the second GETCT is reliable/atomic. And a tweak to delay the carry L32 -> U32, so the delayed read appears correctly in-phase. (no split overflows)
Save them. scale them, compare them ? the usual stuff ....
It means all COGs have the same CNT result, always...
Users can shift to scale to any LSB they choose, along with their SysCLK. eg 262.144MHz >> 18 gives 1.000 ms LSB
-Phil
This would allow you to do a delay longer than 2^32 cycles by first setting the interrupt to fire on 32-bit counter rollover for some number of rollovers, and then setting a normal 32-bit timer to fire at the appropriate point in the last 32-bit period.
Either make it so we can properly read the whole thing or don't do anything.
Any other solution means we have to dedicate software to handle the wrap.
This is like a fundamental feature that we had in the p2-hot, and should have been here too. If I had noticed it was missing sooner I would have said something much earlier.
Timing is a critical part of most of the things we often do with the prop, and having it be a hassle to use in many cases makes it just annoying.
It should be 64bit and have a way to read sanely. This, in my mind, is WAY WAY more important than the WRPIN Smile you want. This will impact way more people with annoying stuff that is non-trivial to deal with (like the WRPIN stuff is).
Yes it's late, but Chip is doing other things that are way less important and might change things more than this would (lut read/write with incrementing ptr stuff). And changing things they solve much simpler issues than this just because he wants to make it easier for users.
So as it is if you want to time anything longer than a few seconds you have to do extra software to deal with longer timers.
I gather that means you want the CT event compares to be upgraded ... hmm, ADDCTx refresh would have to change its behaviour. Currently ADDCTx uses a cog general 32-bit register for working space that gets initialised with a GETCT.
I guess this could be all folded into hidden 64-bit registers.
WAITX could be extended with a SETQ.
IF WC is used C=1 if a timeout occurred else event occurred.
Another problem is that with everybody soon playing with actual silicon, who's going to test out new FPGA images?
Would just being able to read the lower and upper longs be sufficient?
What about the timer events?
What about timeout for WAITxxx instructions?
Currently, there are three things CT gets used in:
1) GETCT - read counter
2) ADDCTx - timer events
3) SETQ + WAITxxx WC - timeout
Some accommodations would have to be made for these to take advantage of a 64-bit CT.
What should they be?
A simple yes/no to each to avoid getting bogged down in the details.
Reading the DOCs is not clear on if CT is cleared on reset, or can be loaded ? My inference is it is random ?
With a 64b CT, that would make sense to clear on reset, as you then have a System UP timer (likely to be quite important), as well as a solid absolute time, available to all COGs without needing some COG as time manager.
Any COG can be a watchdog / SOP monitor.
Re the above 3, the question is really how can those co-operate with a 64b CT ?
If you do need a 64b time waypoint, seems you could set a trigger on the L.32, and interrupt every 8.5 seconds at 250MHz, and check 64b CT, if still < Threshold, continue the INT polling.
ie do they need to be 64b deep ?
A system like a PLC, would likely scale the 64b CT, to some sensible LSB and then use 32b waypoints. eg 1ms scan time LSB, is 49 days of resolution.
Anyone who really wants a full 48 bits (all the maths will be difficult to manipulate on a 32 bit cpu) can compare the overlapped 16 bits to ensure a rollover did not occur between the two reads (if yes, upper requires a decrement).
This is a really simple change. KISS
GETCT is now shielded from interrupts.
Works like this:
Do you guys think this is sufficient?
Ah, no buffer holding register then.
Good point. I've been avoiding adding new instructions.
Wait! without interrupt shielding, you can't get a reliable count. I think it's maybe best the way it is, in that case.