CNT extension to 64-bit

jmg · 2018-11-15 03:14

cgracey wrote: »

I started making it so that the 2nd GETCT would not shield interrupts, but then I figured it was more trouble than it was worth. More to explain, at least.

Cluso99's reverse read above with a new opcode, avoids that double-delay effect. (but it costs the new opcode decode, and TC is now on $FFFF_FFFE - maybe that can be 31b carry & 1b==, to keep speed & reduce logic ?)

jmg · 2018-11-15 03:19

cgracey wrote: »

Good idea. Just delay the carry by two clocks. That would be one flop used as an enable to the upper long flops. That would be smaller than a 32'h0000_0001 detector. I'll do it that way.

I know FPGA's take special care to have fast carry for counters, and I expect ASIC compilers can do the same thing too. 'See' a counter and optimize the design for Terminal Count ?
It would be nice to have the CT -> 64b well away from the critical path, for those keen on overclocking.

TonyB_ · 2018-11-15 03:44

Cluso99 wrote: »

BTW I make that ~2,338 years !!!!!

Cluso, thanks for re-editing the thread title!

cgracey · 2018-11-15 03:55

Cluso99 wrote: »
Chip,
You seemed to miss this post...
It's

Cluso99 wrote: »

1. With 2 instructions (can use existing GETCT instruction as CZI are not used) if you read the high first, that can disable Interrupts for one instruction. If you only need the lower then all is fine with no penalty.

2. With a smaller, say 48 bits, this provides advantages to do higher granularity simply. I see this as a more useful solution. For the rarer case where full granularity is required, then it's solvable by software. A 64 bit doesn't give this option without further software fiddling

From 1. above: So, with 64bits, only if you read the hi
		getct	lo			' current normal use (no hold interrupts)
...
		getct	hi			' just read the hi bits (holds off interrupts until next instruction executed)
		xxxx	  			' 
...
		getct	hi			' read the hi bits (holds off interrupts until next instruction executed)
		getct	lo			' read the lo bits

		setq	hi			'convert to seconds @250MHz
		qdiv	lo,##250_000_000
		getqx	seconds			'tops out at ~136 years of seconds
The hi now needs to +1 early rather than late (eg ~$FFFF_FFFE)

I like reading the top first and only shielding interrupts for that instruction. I don't understand how you get more time, though.

jmg · 2018-11-15 04:16

cgracey wrote: »

.... I don't understand how you get more time, though.

Do you mean the years ? Both numbers can apply, but to different things...

I think one number is the reach of 64 bits, whilst the other one is the 32bit reach, in seconds.
ie because when you divide 64b by 250M, it does not all fit into 32b

potatohead · 2018-11-15 04:18

This is sufficient. And it's easy.

And we've shielded stateful things from interrupts all over the place. IMHO, very, very good calls all of them, because overall chip complexity would have gone off the charts. If you ask me, that's one of the very best trade-offs made in this design cycle, and it's going to make P2 a notable chip, once people see it in action.

This simple thing fits right in with a whole lot of other simple things. Has my vote.

Another instruction would expand on this, and it's a different sort of simple thing, but not the very simplest thing. Honestly, people will pick up on how it is right now, and that's the dominant use case too.

Cluso99 · 2018-11-15 05:30

cgracey wrote: »

... I don't understand how you get more time, though.

64 bits = 4,294,967,296 * 4,294,967,296 = 1.844674407370955e +19
Now, in seconds at 250MHz = 1.844674407370955e+19 / 250,000,000 = 73,786,976,294.83821
In minutes = 73,786,976,294.83821 / 60 = 1,229,782,938.247303
In hours = 1,229,782,938.247303 / 60 = 20,496,382.30412172
In days = 20,496,382.30412172 / 24 = 854,015.9293384052
In years = 854,015.9293384052 /365.25 (century non leap years not accounted for) = 2,338.168184362506 years

Cluso99 · 2018-11-15 05:47

Chip,
As I understand this, there is a single counter running of 64 bits (was 32 bits) and this has to feed every cog. That is one hell of a bus of wires for little use. And it's clocking all the time on those long wires. I do hope the extra congestion doesn't cause timing or routing issues for OnSemi.

I am wondering how this could be simplified. As I said previously, I think 48 bits would be fine.

I wonder if the counter could be gated onto the I/O bus, or the HUB RAM bus, going to the cog(s) when being read?

Alternately, what about each cog having it's own counter, or at least most of the counter's bits. I know the silicon for a big counter is a lot of flops, but surely this would be relevant compared to the bus, and would relieve the routing congestion.

To me, this no longer seems the simple solution we all thought it would be, as the ramifications seem much bigger than first thought.

cgracey · 2018-11-15 05:51

Cluso99 wrote: »

Chip,
As I understand this, there is a single counter running of 64 bits (was 32 bits) and this has to feed every cog. That is one hell of a bus of wires for little use. And it's clocking all the time on those long wires. I do hope the extra congestion doesn't cause timing or routing issues for OnSemi.

I am wondering how this could be simplified. As I said previously, I think 48 bits would be fine.

I wonder if the counter could be gated onto the I/O bus, or the HUB RAM bus, going to the cog(s) when being read?

Alternately, what about each cog having it's own counter, or at least most of the counter's bits. I know the silicon for a big counter is a lot of flops, but surely this would be relevant compared to the bus, and would relieve the routing congestion.

To me, this no longer seems the simple solution we all thought it would be, as the ramifications seem much bigger than first thought.

There are some wires, but out of 64, only 2 change state, on average. It all goes into the wash. Nothing to get concerned about.

Peter Jakacki · 2018-11-15 06:01

With CT incrementing for every tick of a 320Mhz clock we will have to wait until the year 3844 for it to rollover

-1 U. 18446744073709551615  ok
$7FFFFFFFFFFFFFFF DUP . 9223372036854775807  ok
DUP 320000000 / DUP . 28823037615  ok
3600 / DUP . 8006399  ok
24 / DUP . 333599  ok
365 / DUP . 913  ok
913 2018 + . 2931  ok
913 2* . 1826  ok
1826 2018 + . 3844  ok

Electrodude · 2018-11-15 06:06

Would it save routing to just give each cog its own counter and make sure they're always synchronized on reset?

cgracey · 2018-11-15 06:20

Electrodude wrote: »

Would it save routing to just give each cog its own counter and make sure they're always synchronized on reset?

No, because then we have to recreate the counter logic in each cog. Better to send wires.

kwinn · 2018-11-15 20:38

Peter Jakacki wrote: »
With CT incrementing for every tick of a 320Mhz clock we will have to wait until the year 3844 for it to rollover
-1 U. 18446744073709551615  ok
$7FFFFFFFFFFFFFFF DUP . 9223372036854775807  ok
DUP 320000000 / DUP . 28823037615  ok
3600 / DUP . 8006399  ok
24 / DUP . 333599  ok
365 / DUP . 913  ok
913 2018 + . 2931  ok
913 2* . 1826  ok
1826 2018 + . 3844  ok

So no worries about a Y2K type problem for a while then.

AJL · 2018-11-15 22:32

kwinn wrote: »
Peter Jakacki wrote: »
With CT incrementing for every tick of a 320Mhz clock we will have to wait until the year 3844 for it to rollover
-1 U. 18446744073709551615  ok
$7FFFFFFFFFFFFFFF DUP . 9223372036854775807  ok
DUP 320000000 / DUP . 28823037615  ok
3600 / DUP . 8006399  ok
24 / DUP . 333599  ok
365 / DUP . 913  ok
913 2018 + . 2931  ok
913 2* . 1826  ok
1826 2018 + . 3844  ok
So no worries about a Y2K type problem for a while then.

Not questioning the reliability of the P2, but I seriously doubt that one will run for more than a thousand years between reset events :-)

cgracey · 2018-11-15 22:43

AJL wrote: »
kwinn wrote: »
Peter Jakacki wrote: »
With CT incrementing for every tick of a 320Mhz clock we will have to wait until the year 3844 for it to rollover
-1 U. 18446744073709551615  ok
$7FFFFFFFFFFFFFFF DUP . 9223372036854775807  ok
DUP 320000000 / DUP . 28823037615  ok
3600 / DUP . 8006399  ok
24 / DUP . 333599  ok
365 / DUP . 913  ok
913 2018 + . 2931  ok
913 2* . 1826  ok
1826 2018 + . 3844  ok
So no worries about a Y2K type problem for a while then.
Not questioning the reliability of the P2, but I seriously doubt that one will run for more than a thousand years between reset events :-)

I think the part would fail from eventual electromigration effects after 100 years, if it were run at high speed. If you ran it at 1MHz, you might make it to the end of the 64-bit counter. Of course, that would take even longer.

Peter Jakacki · 2018-11-15 22:43

AJL wrote: »
kwinn wrote: »
Peter Jakacki wrote: »
With CT incrementing for every tick of a 320Mhz clock we will have to wait until the year 3844 for it to rollover
-1 U. 18446744073709551615  ok
$7FFFFFFFFFFFFFFF DUP . 9223372036854775807  ok
DUP 320000000 / DUP . 28823037615  ok
3600 / DUP . 8006399  ok
24 / DUP . 333599  ok
365 / DUP . 913  ok
913 2018 + . 2931  ok
913 2* . 1826  ok
1826 2018 + . 3844  ok
So no worries about a Y2K type problem for a while then.
Not questioning the reliability of the P2, but I seriously doubt that one will run for more than a thousand years between reset events :-)

and that's the absurdity of having a full 64-bits since we would never use it in a thousand years, literally.

cgracey · 2018-11-15 22:44

Peter Jakacki wrote: »
AJL wrote: »
kwinn wrote: »
Peter Jakacki wrote: »
With CT incrementing for every tick of a 320Mhz clock we will have to wait until the year 3844 for it to rollover
-1 U. 18446744073709551615  ok
$7FFFFFFFFFFFFFFF DUP . 9223372036854775807  ok
DUP 320000000 / DUP . 28823037615  ok
3600 / DUP . 8006399  ok
24 / DUP . 333599  ok
365 / DUP . 913  ok
913 2018 + . 2931  ok
913 2* . 1826  ok
1826 2018 + . 3844  ok
So no worries about a Y2K type problem for a while then.
Not questioning the reliability of the P2, but I seriously doubt that one will run for more than a thousand years between reset events :-)
and that's the absurdity of having a full 64-bits since we would never use it in a thousand years, literally.

If we were to leave for a while at near-light-speed, we'd like to come back later and see that it was still working, though.

jmg · 2018-11-15 23:07

Peter Jakacki wrote: »

and that's the absurdity of having a full 64-bits since we would never use it in a thousand years, literally.

It can of course be made smaller, with small but finite routing and register savings.
10 years is probably too small, but ~100 could be ok, which comes in at a round 60 bits.
If the local eggbeater spins 3 LSBs, always in sync, you can also save routing those 3 bits, so there is scope to maybe shave 4+3 = 7 bits off the total route needs.

kwinn · 2018-11-15 23:18

Well, that comment was made with tongue firmly in cheek. Peter does have a point as far as 64 bits providing an absurdly long count. An additional 16 or even 8 bits would have been more than adequate for most things, although I suspect either one would not be all that much simpler than the 32 bit version. Perhaps "better to have it and not need it..." applies in this case.

jmg · 2018-11-15 23:26

kwinn wrote: »

... An additional 16 or even 8 bits would have been more than adequate for most things...

Not really, if you want a useful time-since-reset, you do not want that to wrap inside any sensible time. Another 8 bits gives up-time wraps every 1 hour!
Even 16 bits only nudges you out to 10 days. Both would need additional software and some time-manager COG allocated.
As my numbers indicated, you can decrease from 64b, but not by very much (~ 60 bits).

cgracey · 2018-11-15 23:28

jmg wrote: »

kwinn wrote: »

... An additional 16 or even 8 bits would have been more than adequate for most things...

Not really, if you want a useful time-since-reset, you do not want that to wrap inside any sensible time. Another 8 bits gives up-time wraps every 1 hour!
Even 16 bits only nudges you out to 10 days. Both would need additional software and some time-manager COG allocated.
As my numbers indicated, you can decrease from 64b, but not by very much (~ 60 bits).

If it were any less than 64 bits, it would seem miserly. The next step after 32 is 64.

ersmith · 2018-11-15 23:31

cgracey wrote: »

jmg wrote: »

kwinn wrote: »

... An additional 16 or even 8 bits would have been more than adequate for most things...

Not really, if you want a useful time-since-reset, you do not want that to wrap inside any sensible time. Another 8 bits gives up-time wraps every 1 hour!
Even 16 bits only nudges you out to 10 days. Both would need additional software and some time-manager COG allocated.
As my numbers indicated, you can decrease from 64b, but not by very much (~ 60 bits).

If it were any less than 64 bits, it would seem miserly. The next step after 32 is 64.

We could present it in the API as if it were 64 bits, but leave the top N bits hardcoded to 0. (This would have to be documented, but I doubt anyone would complain if the counter were restricted to, say, 100 years worth of cycles.) That's probably not going to be a huge saving, but it's something to consider if routing turns out to be tricky.

cgracey · 2018-11-15 23:37

ersmith wrote: »

cgracey wrote: »

jmg wrote: »

kwinn wrote: »

... An additional 16 or even 8 bits would have been more than adequate for most things...

Not really, if you want a useful time-since-reset, you do not want that to wrap inside any sensible time. Another 8 bits gives up-time wraps every 1 hour!
Even 16 bits only nudges you out to 10 days. Both would need additional software and some time-manager COG allocated.
As my numbers indicated, you can decrease from 64b, but not by very much (~ 60 bits).

If it were any less than 64 bits, it would seem miserly. The next step after 32 is 64.

We could present it in the API as if it were 64 bits, but leave the top N bits hardcoded to 0. (This would have to be documented, but I doubt anyone would complain if the counter were restricted to, say, 100 years worth of cycles.) That's probably not going to be a huge saving, but it's something to consider if routing turns out to be tricky.

Those three bits are a drop in the ocean, amid everything else in there.

jmg · 2018-11-16 00:09

ersmith wrote: »

We could present it in the API as if it were 64 bits, but leave the top N bits hardcoded to 0. (This would have to be documented, but I doubt anyone would complain if the counter were restricted to, say, 100 years worth of cycles.) That's probably not going to be a huge saving, but it's something to consider if routing turns out to be tricky.

Exactly. They have to prove it is not 64 bits first

There is no law that says you have to implement in quanta of 32 bits, eg I see parts with 24b counters.

cgracey wrote: »

Those three bits are a drop in the ocean, amid everything else in there.

Perhaps, but it looks like you can save 4+3 bits of routing, and it all adds up.. at some stage, all the added stuff will start to push down system clock speeds.

Rayman · 2018-11-16 01:43

Ok, 64 bits does seem a bit excessive...

Maybe there's something clever that can be done with upper bits?

Perhaps upper 5 bits get inc'd whenever an interrupt is called?
There's must be something fun here...

Or, maybe it gets inc'd whenever in a wait state?

Ok, maybe don't need to be 100% efficient. Maybe 2000+ years of counter life is cool...

Or, upper bits get inc'd by internal RC oscillator?

jmg · 2018-11-16 01:50

Rayman wrote: »

Ok, 64 bits does seem a bit excessive...

Maybe there's something clever that can be done with upper bits?

Perhaps upper 5 bits get inc'd whenever an interrupt is called?
There's must be something fun here...

There might be, but that would dictate adding masking for normal use comparisons.]
I'd be fine with saving routing to 60 (or 57) lines, and reading undefined as 0000.

Rayman wrote: »

Or, upper bits get inc'd by internal RC oscillator?

hehe, if that were possible, we could measure RC osc using the crystal.... Sadly, no..

Rayman · 2018-11-16 01:54

Well, people are amazed at some 200 year old light bulb in fire station...

Maybe more amazing will be 2k year old p2 system showing count on lcd...

Wait: the forever clock! Already exist?

Phil Pilgrim (PhiPi) · 2018-11-16 02:40

jmg wrote:

10 years is probably too small, but ~100 could be ok

cgracey wrote:

If it were any less than 64 bits, it would seem miserly. The next step after 32 is 64.

Oh, please! This is just nuts! In the P1, we've been more than happy with a 53-second rollover. I still submit that 32 bits is plenty, regardless of the clock speed. It has nothing to do with real time, only the number of clock ticks it takes to deal with rollover in software. And 2³² ticks is more than enough.

At this rate, the P2 is never going to get finished! Chip, and his forumista enablers (yes, "enablers", since mission creep is a form of addiction for Chip), what the hell are you thinking?!!

-Phil

Dave Hein · 2018-11-16 03:07

Phil, the P2 will be finished at some point, but it may take an extra round of silicon to fix the bugs that may get introduced with the new features. In the meantime, we'll be able to play around with the P2 from the first round of silicon. This will be obsoleted by the second round of silicon, which may end up being obsoleted by the third round of silicon. Eventually, there will be a stable version of the P2.

potatohead · 2018-11-16 03:17

I'm not overly concerned. I'm also not really advocating for new features. But I'm not going to balk at it. I think chip knows what domains works well and what didn't, and frankly what he wrote worked, minus interpretation difference in the tools.

Worst case, we do have a working mask set. If people get impatient, or use cases presented, on semi can be asked to make those, and it can work.

So we have one revision if a P2. It can be made into additional chips.

CNT extension to 64-bit

Comments