Understanding timing analysis
Seairth
Posts: 2,474
I am guessing that most of my issues understanding the TimeQuest Timing Analysis stems from lack of experience. I've attached the SDC file that I created, but I am a bit unsure whether it's correct. Feedback would be appreciated.
DE0-Nano.sdc.7z
Assuming that it is correct, I see that the Video PLLs are consistently an issue. Which leads me to the following question: for those of you who use video on the P1, how often do you need more than one Video PLL? Would it be feasible to move it out of CTR (of every cog) and have a single one that's global and configured via a hubop? This would reduce per-cog logic (252 registers, to start) and get rid of seven global clock lines. All cogs would share the single, global Video PLL.
Note, however, that I'm still not sure if it would fix any timing issues. (To whoever it was that removed the video functions altogether, how much of a difference did it make to fMax?)
2014-09-25: I have updated the attachment with a version that appears to be correct. Well, it allows the Nano build to compile without any TimeQuest errors. My guess is that the same file will also work for the DE2-115 and the BEMicro CV (just rename and copy).
2014-09-26: Added documentation links that are also posted later in the thread:
Quartus II Vol 2-6: Timing Analysis Overview
Quartus II Vol 2-7: The Quartus II TimeQuest Timing Analyzer
SDC and TimeQuest API Reference Manual
TimeQuest User Guide.pdf
DE0-Nano.sdc.7z
Assuming that it is correct, I see that the Video PLLs are consistently an issue. Which leads me to the following question: for those of you who use video on the P1, how often do you need more than one Video PLL? Would it be feasible to move it out of CTR (of every cog) and have a single one that's global and configured via a hubop? This would reduce per-cog logic (252 registers, to start) and get rid of seven global clock lines. All cogs would share the single, global Video PLL.
Note, however, that I'm still not sure if it would fix any timing issues. (To whoever it was that removed the video functions altogether, how much of a difference did it make to fMax?)
2014-09-25: I have updated the attachment with a version that appears to be correct. Well, it allows the Nano build to compile without any TimeQuest errors. My guess is that the same file will also work for the DE2-115 and the BEMicro CV (just rename and copy).
2014-09-26: Added documentation links that are also posted later in the thread:
Quartus II Vol 2-6: Timing Analysis Overview
Quartus II Vol 2-7: The Quartus II TimeQuest Timing Analyzer
SDC and TimeQuest API Reference Manual
TimeQuest User Guide.pdf
Comments
I'm happy to see you made your own fork, Seairth! Thanks for joining!
===Jac
TimeQuest User Guide.pdf
I'm still working my way through it. It's definitely worth the read if you are going to do any timing analysis at all.
Unravelling the mysteries of Quartus is quite time consuming.
Not nearly enough spare Nanoseconds in the day!
Maybe tomorrow I will publish a graphic VGA driver using SRAM. Today it started to work after a lot of experiments and using Polish equivalents of all English 4-letter words... and some other bad Polish words which (I think) have no one word English equivalent. The timing problems were awful. The state machine hangs, the addresses were wrong and then the barrel shifter of P1V was too slow. Trying to use optimizations made compilation several times slower (6..8 to 20..30 minutes) without any visible result. So I had to slow the Propeller to 106.25 MHz using the VGA pixel clock and saving one PLL. Then use some weird tricks to make the signal to reach its destination in time - even if this time was one cycle later, so if the state machine hanged at counter==0, it can run again if the counter==1.
So maybe there is a method to speed up what is critical to speed up using this tool. This barrel shifter starts to make random errors over 110 MHz where the rest of the Propeller works @ 140 MHz. This need to be optimized. First try failed - my version of the shifter using case statement was ever slower than Chip's (and he simply used >> operator). So maybe using time constraints can made the Quartus to compile this barrel shifter to run under these 7 nanoseconds and to put some signals together in less than 9 nanoseconds to make the state machine work with 106 MHz pixel clock.
Take a look at the first post in this thread. There is one in the attachment.
Hi pik33,
Can you tell me when you were getting these random barrel shifter errors, did the Quartus FMAX reports for hot/cold temp range etc mention that the maximum frequency of the design was expected to be over 110MHz? If so, I am wondering if we can have faith in these numbers without a detailed SDC constraint file setup...I was sort of hoping we wouldn't need to dive that deep into it.
I've just browsed through the TimeQuest stuff Seairth sent the link to. It looks somewhat complex but I'd be suprised if we neeed to dig down into every little nitty gritty to just get a basic first order estimate of the maximum clock rate. I would have hoped for example that the setup/hold timing within the paths used by the P1V design internally were already fixed by the Altera device characteristics and the fitment of the RTL that Quartus generates for it and so it could use that in its calculations. Unfortunately it's still early days so I still don't really know enough about it yet to make any informed decisions there. There are probably FPGA veterans saying no, it doesn't work like that at all and you need put timing constraints everywhere to get even moderately useful FMAX analysis. I was just hoping it wasn't that complicated, but often these days it really is.
Roger.
Here's a run-down of the SDC:
- create_clock is used to define the 50MHz base clock and a virtual clock to be used with the I/O constraints.
- derive_pll_clocks causes TimeQuest to find the plls and call create_generated_clock for each one (just one, in this case).
- create_generated_clock is used to define the cog_clk.
- create_generated_clock is used to define video clock (for each tap) that can be generated by each cog.
- set_input_delay and set_output_delay provide constraints for the I/O. I just left the min/max values at 0, since I have no idea what is going to be externally connected to those pins (and the LEDs are irrelevant, I think). This is also why I defined a virtual clock earlier.
- set_clock_groups is used to make sure that TimeQuest doesn't assume that there are paths between each of the defined clocks. Clocks in the same group are related, while clocks in different groups are not. For instance, the video clock taps are multiplexed, so there is no relationship between each of the taps.
- set_false_path is an alternative approach to set_clock_groups. TimeQuest saw that there was a relationship between the virtual clock and ctra outputs. I didn't want that relationship evaluated, so I indicated that it was a false path.
There are probably some things that could be simplified. For instance, I might have been able to use wildcards for the video clocks, which would also make the SDC more flexible when changing the number of cogs. But my TCL is weak, so I didn't get too crazy with this file.Thanks for the PDF...I'm going to look, but I'm not expecting much:)
Quartus II Vol 2-6: Timing Analysis Overview
Quartus II Vol 2-7: The Quartus II TimeQuest Timing Analyzer
Edit: And if you want to get into the Tcl weeds:
SDC and TimeQuest API Reference Manual
http://forums.parallax.com/showthread.php/156851-Some-overclocking-)?p=1284958&viewfull=1#post1284958
I seems to me that identifying these multicycle paths to Timequest is quite important to getting accurate timing results and - if I understand Chip's comment correctly - getting the best actual FMAX.
Thanks! I was actually looking for that post!
Thanks for this summary and the other information you provided Seairth. I'll want to try to look at this again sometime when I get a chance to understand it more and I can see ultimately it will be important.
This is what that post stated:
Only, I am not seeing this for the ALU. For instance, in cog.v, I see that the "s" and "d" registers are updated on "m[2]":
But the results from the ALU are then written on m[3]:
I don't see where the ALU has two clocks to settle, as m[2] transitions to m[3] on the same clock that "s" and "d" are being written. What am I missing?