FPGA timing constraints

I'm having a devil of a time getting these compiles done.

I added some timing constraints to make sure that the data was settling after the clock for the DAC outputs. That works. Meanwhile, the crazy optimizer sees these paths, which it construes as being very late, and can't get around to optimizing what matters.

Does anybody know about SDC constraints and what I need to do to make this work?

Comments

  • 22 Comments sorted by Date Added Votes
  • One of the main features of the Avant! Place and route tool (bought by Synopsys) solves exactly what you are dealing with. Originally for solving clock tree lead/lag timing issues, any signal or group of signals can be defined so that the lead/lag between multiple data paths all arrive at the destination within a specified timing window. The layout guys should be able to use a similar tool and set the defined parameters. I wouldn't try to tackle this by hand, you need a tool. I used Avant! extensively with gigabit Ethernet chip design timing requirements, where even using a CPU farm of a dozen or so UNIX computers, a solution sometimes took days to work out.



    Beau Schwabe -- Submicron Forensic Engineer
    ෴My Message෴www.Kit-Start.com - bschwabe@Kit-Start.com ෴෴ www.BScircuitDesigns.com - icbeau@bscircuitdesigns.com ෴෴

    "You just wait and see what I have up my sleeve ... After figuring out functionality, I feel like the guy who solved broadcasting color TV while keeping compatibility with existing B&W TV's"
  • AleAle Posts: 2,296
    I can sadly not offer any advice, I do not know the topic very well. Just thought that something like that has to be learned from some experienced people. Maybe there is a course/training sophisticated enough to learn the really advanced stuff.

    Just a thought could it be that the free version of the tools have some limitations ? (I read somewhere that you were using the free version, that the paid one didn't offer anything more).
  • cgraceycgracey Posts: 7,835
    edited May 18 Vote Up0Vote Down
    Thanks for your input, Guys.

    I found this video which helps, though it's from Xilinx, not Altera:

  • cgraceycgracey Posts: 7,835
    edited May 18 Vote Up0Vote Down
    It seems to compile and work okay for one cog. Now I'm halfway through compiling all 16. We'll see if that works.

    I set up the output delays to what has been working, and then set those to false paths to get the optimizer out of the picture:
    # setup
    derive_pll_clocks -use_tan_name
    
    create_clock -period "20 ns" -name {clock_50} {clock_50}
    create_clock -period "10 ns" -name {clkbuf} {clkbuf}
    
    set_false_path -from [get_keepers {*clkcfgi*}] -to [get_keepers {*clkcfg*}]
    set_false_path -to [get_ports dac_*]
    set_false_path -to [get_ports dac0_clk]
    set_false_path -to [get_ports dac1_clk]
    
    set_output_delay -clock clkbuf -max -rise 4 [get_ports dac_*]
    set_output_delay -clock clkbuf -max -fall 4 [get_ports dac_*]
    set_output_delay -clock clkbuf -min -rise 4 [get_ports dac_*]
    set_output_delay -clock clkbuf -min -fall 4 [get_ports dac_*]
    
    set_output_delay -clock clkbuf -max -rise 1 [get_ports dac0_clk]
    set_output_delay -clock clkbuf -max -fall 1 [get_ports dac0_clk]
    set_output_delay -clock clkbuf -min -rise 1 [get_ports dac0_clk]
    set_output_delay -clock clkbuf -min -fall 1 [get_ports dac0_clk]
    
    set_output_delay -clock clkbuf -max -rise 1 [get_ports dac1_clk]
    set_output_delay -clock clkbuf -max -fall 1 [get_ports dac1_clk]
    set_output_delay -clock clkbuf -min -rise 1 [get_ports dac1_clk]
    set_output_delay -clock clkbuf -min -fall 1 [get_ports dac1_clk]
    

    This, I think (as it seems to work), ensures that the output data to the DACs change 3ns after the clock changes, providing Tcycle - 3ns of setup time. This may all just be wrong-headed, though. We'll see what the result does. It's one hour into a two-hour compile and I'm going to sleep.
  • cgraceycgracey Posts: 7,835
    edited May 18 Vote Up0Vote Down
    As optimizations go, I think I've hit the limit. The critical paths are big dumb things that really can't be further simplified. I've made compiles where that critical-path stuff was pre-decoded in the prior clock, but it had no net effect. This is what a place-and-route guy once told me was "pushing sand" - you can squeeze it from one place to another, but you can't further redistribute it to cause a net speed increase. I feel like all the optimization work is complete.
  • These next compiles are going to be for 120MHz! That's 960 MIPS for 16 cogs.

    If anyone suspects flakiness at 120MHz, do a 'SETCLK #$7F' to drop down to 60MHz and see if things behave differently. The Fmax for 16 cogs is coming in at around 82MHz. So, we are overclocking by ~46%. The Fmax, of course, assumes high temp, low voltage, and worst process, which all amounts to SLOW expectations. The lighter 4-cog compiles are getting Fmax values of ~94MHz, which means we are only overclocking by ~28% in those cases.
  • 120 MHz would definitely help motivate me to update...

    None of the recent changes really inspired me to update.
    Was going to wait until changes actually stopped...
    Prop Info and Apps: http://www.rayslogic.com/
  • jmgjmg Posts: 10,345
    cgracey wrote: »
    These next compiles are going to be for 120MHz! That's 960 MIPS for 16 cogs.

    If anyone suspects flakiness at 120MHz, do a 'SETCLK #$7F' to drop down to 60MHz and see if things behave differently. The Fmax for 16 cogs is coming in at around 82MHz. So, we are overclocking by ~46%. The Fmax, of course, assumes high temp, low voltage, and worst process, which all amounts to SLOW expectations. The lighter 4-cog compiles are getting Fmax values of ~94MHz, which means we are only overclocking by ~28% in those cases.

    If you have a working SETCLK feature, perhaps on the next build consider a change to default choice ?

    * default to 60MHz, and allow users to increase to 120MHz -or-
    * default to RC osc value, (20MHz min now?) and go up from there
    * If there is a PLL+Divider in FPGA, start from 240MHz and support /2,/3,/4... as values, which will add 80MHz to 120M & 60M, and allow same-as-before operation.

    One risk of starting at 120MHz, is if someone's system is flaky, it may not run far enough to change SETCLK....
  • jmg wrote: »
    cgracey wrote: »
    These next compiles are going to be for 120MHz! That's 960 MIPS for 16 cogs.

    If anyone suspects flakiness at 120MHz, do a 'SETCLK #$7F' to drop down to 60MHz and see if things behave differently. The Fmax for 16 cogs is coming in at around 82MHz. So, we are overclocking by ~46%. The Fmax, of course, assumes high temp, low voltage, and worst process, which all amounts to SLOW expectations. The lighter 4-cog compiles are getting Fmax values of ~94MHz, which means we are only overclocking by ~28% in those cases.

    If you have a working SETCLK feature, perhaps on the next build consider a change to default choice ?

    * default to 60MHz, and allow users to increase to 120MHz -or-
    * default to RC osc value, (20MHz min now?) and go up from there
    * If there is a PLL+Divider in FPGA, start from 240MHz and support /2,/3,/4... as values, which will add 80MHz to 120M & 60M, and allow same-as-before operation.

    One risk of starting at 120MHz, is if someone's system is flaky, it may not run far enough to change SETCLK....

    I was thinking the same thing. Just before I read your message, I had changed things to start up at 60MHz, with a 'SETCLK #$FF' needed to get to 120MHz.
  • The 120 MHz Prop123-A9 compile turned out really well. Those timing constraints seem to work perfectly. Now, onto the rest of the compiles...
  • Wow. 60 MIPs is quite a jump forward.
  • Tubular wrote: »
    Wow. 60 MIPs is quite a jump forward.

    And it's exactly 10 clocks per USB bit period.
  • cgracey wrote: »
    The 120 MHz Prop123-A9 compile turned out really well. Those timing constraints seem to work perfectly. Now, onto the rest of the compiles...
    Nice! :)
    looking forward to taking V19 for a test drive.

    Melbourne, Australia
  • I've got all the compiles done, except for the BeMicro_A9, which is now underway. Probably upwards of two hours more to go. I tested every compile and they are all okay at 120MHz. I'll have this out in about 12 hours, I think.
  • evanhevanh Posts: 4,047
    yummy :)
  • I was just running all the demos to make sure they are up-to-date and I realized that for USB, it doesn't matter that we are 10.0000 clocks per USB bit, because we use a 16-bit NCO to generate the USB bit clock. We have an adder value of $1999 (not as expensive as it looks) to get 12Mbps at 120MHz. So, every great once-in-a-while, there's going to be an extra system clock within a USB bit. Oh, well.
  • All I need to do now is update the Google doc and wait for the BeMicro-A9 compile to finish. I'll be back in the morning to wrap this up.
  • So ~9.995 clocks per USB clock?
  • No need. Take the weekend off! We will come back on monday.
  • cgracey wrote: »
    The 120 MHz Prop123-A9 compile turned out really well. Those timing constraints seem to work perfectly. Now, onto the rest of the compiles...
    I'm looking forward to v19 too. The ~26 additional P2 clocks per USB byte should allow me to remove the "if full-speed" exception code on USB receive. :smile:
    garryj
  • jmgjmg Posts: 10,345
    cgracey wrote: »
    I was just running all the demos to make sure they are up-to-date and I realized that for USB, it doesn't matter that we are 10.0000 clocks per USB bit, because we use a 16-bit NCO to generate the USB bit clock. We have an adder value of $1999 (not as expensive as it looks) to get 12Mbps at 120MHz. So, every great once-in-a-while, there's going to be an extra system clock within a USB bit. Oh, well.
    I get this NCO effect :

    Fa=120M*0x1999/2^16 = 11998901.37 or 91.5ppm low (USB FS can tolerate 2500ppm)

    actual effect is to sometimes divide by 11, but mostly divide by 10
    To find how frequently those /11 appear :

    K = 1093 Ff = 1/(((K-1)*10/120M+1*11/120M)/K) = 11998902.2
    K = 1092 Ff = 1/(((K-1)*10/120M+1*11/120M)/K) = 11998901.2

    1-Ff/Fa = 13.973ppb difference (because it is not exactly every 1092 USB bits)

    1092/12M = 91us (appx) frame for repeat of the 8.33' ns jitter error in the ideal 83.33' ns period
    or around once every 136 bytes of data.
    It's much better than 80MHz, even if nominally outside the jitter number I found earlier.

    The 1ms frame rate, should be free of this effect, by using a wait framed to a precise 120000 SysCLKS, and systems that lock RC osc to USB, do so using the Frame-rate.

    A sampled USB system should not care, and a PLL based one, should be ok with a 1:1092 impulse.
  • Thanks for analyzing that, jmg. We'll see what kind of experience garryj has with this.
Sign In or Register to comment.