Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

evanh · 2018-01-31 11:42

F=A isn't going to show me anything!

Why should one Smartpin/pin combo see different speeds to another Smartpin/pin combo? You can't tell me the drive circuits differ that much.

cgracey · 2018-01-31 11:44

Well, at 80MHz, you might be on the cusp of pass/fail, so you may see variances by one count.

Maybe I'm not getting something.

evanh · 2018-01-31 11:44

cgracey wrote: »

It looks like my timing assignments are absorbed, finally, so I'm recompiling for the Prop123-A9 with 8 cogs, 512KB, and 64 smart pins. I'll have this posted in the morning.

Cool. Looking forward to it.

I'm off to bed now myself.

cgracey · 2018-01-31 22:02

Here is a new Prop123_A9_Prop2_8cogs_v31a:

https://drive.google.com/file/d/1m2vqJys_FjMVEhIH5z5yKpKuw-0kbWSk/view?usp=sharing

This file is using OnSemi configuration, where only the RESn, XI, XO, and P[63:0] come out. This means every pin is mapped directly to the pin on the Prop123_A9. There are no read-only buttons, or anything like that. Just pins. No LEDs, even. Just pins.

For the clock setting:

HUBSET #7 '80 MHz
HUBSET #0 '20 MHz
HUBSET #1 '20 KHz

This has 1ns timing constraints on P[63:0] inputs and outputs. Not sure if it will solve this 80MHz I/O problem, but let's see.

evanh · 2018-01-31 22:24

Huh, well, that's a result, and totally consistent now. It seems to have forced all the Smartpins further away from the physical pins. Every pin from 0 to 60 are identical, on Smartpin measured rise and fall times.

At 20 MHz:
- With registered pins, it's 3 clocks per transition.
- Without registered pins, it's 1 clock per transition.

At 80 MHz:
- With registered pins, it's 4 clocks per transition.
- Without registered pins, it's 2 clocks per transition.

evanh · 2018-01-31 22:33

I was hoping for the opposite swing direction. I guess there is some timing constraint pulling the Smartpins towards the Cogs and this has just added to it.

cgracey · 2018-01-31 23:04

Thanks for checking that, Evanh.

So, there were some faster paths, by chance, before. Now we have equal misery.

evanh · 2018-01-31 23:15

cgracey wrote: »

Now we have equal misery.

I still don't think it should be so. It'll just mean that at 60 MHz or thereabouts we'll get uneven results again.

cgracey · 2018-01-31 23:17

evanh wrote: »

cgracey wrote: »

Now we have equal misery.

I still don't think it should be so. It'll just mean that at 60 MHz or thereabouts we'll get uneven results again.

That's true. The constraints just tightened up the region of uncertainty.

evanh · 2018-01-31 23:29

I can't see why something as small as a Smartpin (and so directly associated with a single pin) is having to deal with such long latencies. It's almost like the OUT signal from the Cog/Hub goes in two routes separately to the Smartpin and the physical pin.

Seairth · 2018-02-01 00:21

btw, does any of this affect the expected maximum frequency on the final silicon? Are we still looking at 160MHz?

evanh · 2018-02-01 00:43

Seairth wrote: »

btw, does any of this affect the expected maximum frequency on the final silicon? Are we still looking at 160MHz?

I think Chip mentioned "unrestrained" when talking about Cog OUTs. Which means no relation to clock speed.

The I/O pins aren't a databus so aren't required to meet any particular timing limits. Hence the unpredictable readings on measuring a change coming from the Cogs. So, this area of focus could be going at 500 MHz and not be any more upset.

cgracey · 2018-02-01 01:07

Seairth wrote: »

btw, does any of this affect the expected maximum frequency on the final silicon? Are we still looking at 160MHz?

Yes, 160MHz.

cgracey · 2018-02-01 01:18

evanh wrote: »

Seairth wrote: »

btw, does any of this affect the expected maximum frequency on the final silicon? Are we still looking at 160MHz?

I think Chip mentioned "unrestrained" when talking about Cog OUTs. Which means no relation to clock speed.

The I/O pins aren't a databus so aren't required to meet any particular timing limits. Hence the unpredictable readings on measuring a change coming from the Cogs. So, this area of focus could be going at 500 MHz and not be any more upset.

Remember that for an OUT to drive a pin to a state, and then that pin voltage to drive the return IN signal, it takes time. I believe this is why we are seeing an extra clock.

I don't know if this is being understood, so I'll detail it. Remember, each of these takes time, particularly when the external pin is involved:

1) The OUT bit changes in the core and propagates to the I/O pin.
2) The I/O pin receives the OUT state and drives the bond pad accordingly.
3) The external I/O pin transitions, overcoming parasitic capacitance on the circuit board that is MANY times higher than the internal nodes.
4) The I/O pad input circuit sees the voltage change on the bond pad and that affects a change in the read state.
5) The changed read state propagates back into the core.

This is why we are seeing an extra cycle at higher clock rates. We are wrongly assuming that we have time to output and input within the same clock.

evanh · 2018-02-01 01:27

Yeah, I'm not convinced that that is anywhere enough. Those FPGA pins will be fast acting.

It seems a Smartpin should be making use of its potential for geolocal placement next to the physical pin. If the FPGA software doesn't care to do this, will the ASIC software care?

cgracey · 2018-02-01 01:50

evanh wrote: »

Yeah, I'm not convinced that that is anywhere enough. Those FPGA pins will be fast acting.

It seems a Smartpin should be making use of its potential for geolocal placement next to the physical pin. If the FPGA software doesn't care to do this, will the ASIC software care?

I don't think that placement is an issue. Remember that those pin input signals are all registered immediately in the core, before going to smart pins or cogs. And I set all their timing constraints to 1ns, which is about as low as you can go.

It's the outputs that take a long time to make a round trip. To output to a pin, very weak core signals must be amplified ~40x, through a cascade of increasingly-large inverters, just to drive the main output transistors. Then, that pin begins to slew, dragging all the PCB capacitance along with it. Then, the input receiver has to see the transition for a change in read-state to occur. Without any pin/PCB loading, it takes our custom Prop2 pins almost 5ns to make this turn-around in the worst-case corner.

If you were to stimulate the smart pins via sensed pin states, and not use the OUT bit as a direct input, you should see expected cycle counts up to 80MHz. I think because you are XOR'ing an internal signal against a round-trip signal, you are seeing an extra clock. If you are all internal, or all external, the timing should be as expected. Mixing the two is causing disparity. My thinking, anyway.

cgracey · 2018-02-01 01:59

Look at these I/O timings.

It looks to me like it's taking ~10ns to get the input states registered. That leaves only a few nanoseconds for the pin turn-around.

Maybe I can place some constraint to cause the input signals to get registered sooner.

cgracey · 2018-02-01 02:16

Evanh, could you please do your test with clocking enabled (P[8] = 1)?

I'm thinking that might use the register in the FPGA pin, itself.

Apparently, Quartus cannot be forced to use the input register inside the actual pin, but WILL do so from inference if the pin signal is being registered without intervening logic. That is the case when P[8] = 1. Let's see what that does.

evanh · 2018-02-01 03:14

cgracey wrote: »

Evanh, could you please do your test with clocking enabled (P[8] = 1)?

That's the "registered" results I've posted above (It adds 2 extra clocks):

v31a with the extra 1ns constraints:
At 20 MHz:
- With registered pins, it's 3 clocks per transition.
- Without registered pins, it's 1 clock per transition.
At 80 MHz:
- With registered pins, it's 4 clocks per transition.
- Without registered pins, it's 2 clocks per transition.

v31 without those constraints:
At 40 MHz:
- With registered pins, it's 3 clocks per transition.
- Without registered pins, it's 1 clock per transition.
At 80 MHz,
- With registered pins, it's 3 or 4 clocks per transition.
- Without registered pins, it's 1 or 2 clocks per transition.

cgracey · 2018-02-01 03:21

Sorry, Evanh. I saw your results, but then forgot that you had reported for both modes, already.

evanh · 2018-02-02 00:49

cgracey wrote: »

... Remember that those pin input signals are all registered immediately in the core, before going to smart pins or cogs. ...

Oh, I'd read that as the outputs registered in the hub, not the inputs in the core. When you say "core", what does that mean? Maybe this could be preventing the dedicated input flops from being used?

EDIT: I suppose that's why you asked to check the use of the extra registering. So then it would use those dedicated flops ... which didn't appear to make any difference ... which begs the question as to why the inputs seem so slow.

evanh · 2018-02-02 01:00

(deleted)

Rayman · 2018-02-03 19:06

Wow, more than a month without an update! That's what I call progress.
Maybe the next milestone is when Chip says Spin2 is done...

evanh · 2018-02-04 07:09

cgracey wrote: »

Look at these I/O timings.

It looks to me like it's taking ~10ns to get the input states registered. That leaves only a few nanoseconds for the pin turn-around.

Maybe I can place some constraint to cause the input signals to get registered sooner.

Chip,
I've had a shot at designing something using Quartus. It's certainly a learning curve! I've done the simplest thing I can think of for this test - Chaining a bunch of I/O together in a ring with one inline inverter.

I imported your TOP.QSF file from the P123_A9 board files. Compiling my design, I get 275 warnings! And no optimiser either, the TimeQuest package didn't want to install. Not that I think it would make any difference to this test.

Some of warnings are: no clocks defined, no constraints, missing "top" entity - this will be from your imported parameters since I've named the top entity differently. But the bulk of the warnings were due to not using all the assigned pins.

Anyway, Googled for how to build to a RBF file and used PX.EXE to load it into the FPGA, I got the 12 pins toggling high and low at about 10 MHz, or 50 ns for a rise to go all the way round, followed by another 50 ns for the subsequent fall.

So that's just over 4 ns of output+input propagation per pin.

Attached is the schematic for the design.

cgracey · 2018-02-04 09:13

evanh wrote: »

cgracey wrote: »

Look at these I/O timings.

It looks to me like it's taking ~10ns to get the input states registered. That leaves only a few nanoseconds for the pin turn-around.

Maybe I can place some constraint to cause the input signals to get registered sooner.

Chip,
I've had a shot at designing something using Quartus. It's certainly a learning curve! I've done the simplest thing I can think of for this test - Chaining a bunch of I/O together in a ring with one inline inverter.

I imported your TOP.QSF file from the P123_A9 board files. Compiling my design, I get 275 warnings! And no optimiser either, the TimeQuest package didn't want to install. Not that I think it would make any difference to this test.

Some of warnings are: no clocks defined, no constraints, missing "top" entity - this will be from your imported parameters since I've named the top entity differently. But the bulk of the warnings were due to not using all the assigned pins.

Anyway, Googled for how to build to a RBF file and used PX.EXE to load it into the FPGA, I got the 12 pins toggling high and low at about 10 MHz, or 50 ns for a rise to go all the way round, followed by another 50 ns for the subsequent fall.

So that's just over 4 ns of output+input propagation per pin.

Attached is the schematic for the design.

Wow! I'm glad you thought to try that out.

So, 4ns per pin. That timing report I posted showed a full 10ns per pin for time spent in the routing fabric. I think those two numbers might add together to make 14ns, or ~71MHz. Does that seem likely? Perhaps Quartus effectively winds up adding delay to make I/O's just right for meeting timing? I don't know. What do you think?

evanh · 2018-02-04 09:37

cgracey wrote: »

Perhaps Quartus effectively winds up adding delay to make I/O's just right for meeting timing? I don't know. What do you think?

Surely not, but I couldn't even venture to answer that really. Trying to work out what happens in that mass of software called Quartus is a lifetimes work me thinks.

cgracey · 2018-02-04 17:51

evanh wrote: »

cgracey wrote: »

Perhaps Quartus effectively winds up adding delay to make I/O's just right for meeting timing? I don't know. What do you think?

Surely not, but I couldn't even venture to answer that really. Trying to work out what happens in that mass of software called Quartus is a lifetimes work me thinks.

What do you make of the timing report I posted earlier?

It looks like Quartus had to improve the routing, just to meet the timing requirements.

Yanomani · 2018-02-04 20:47

Hi Chip

Where does the 1nS delay fits among those timing reported, from clock to effectively having a valid output level at the physical pin, in the last 3.3V domain flip-flop, whose insertion is commanded by the %C bit value, during WRPIN execution?

evanh · 2018-02-04 23:11

cgracey wrote: »

It looks like Quartus had to improve the routing, just to meet the timing requirements.

That was my early thinking. It just didn't seem to be trying to make quick I/O timing. At this stage it's still just a hunch.

I don't know what that report is really saying. 10.000 exactly, for everything, seems more like a setting than a simulation result. I'd like to do same at my end and try to make sense of what is what.

Where in the menus is the report?

cgracey · 2018-02-06 06:09

evanh wrote: »

cgracey wrote: »

It looks like Quartus had to improve the routing, just to meet the timing requirements.

That was my early thinking. It just didn't seem to be trying to make quick I/O timing. At this stage it's still just a hunch.

I don't know what that report is really saying. 10.000 exactly, for everything, seems more like a setting than a simulation result. I'd like to do same at my end and try to make sense of what is what.

Where in the menus is the report?

I used some selection to show all I/O timing. It was near the top of the list.

The 10.0 is nanoseconds. That is the timing goal I gave it.

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments