Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

1125126128130131160

Comments

  • TonyB_TonyB_ Posts: 1,324
    edited 2018-03-17 - 22:50:35
    jmg wrote: »
    Cluso99 wrote: »
    One thing that concerns me is the 16KB of Hub being mapped to the top of Hub ram. Previously it was to be dual mapped. ie for 512KB, the 16KB at 496-512KB would also appear at 1008-1024KB too. That gave us the ability to have a whole contiguous block of 512KB (or 256KB etc). Removing 16KB from the block disrupts the contiguous block concept. We also have a disruption at the bottom end where cog and lut addresses interfere with the mapping. If you cannot dual map the 16KB block, could you simply add an extra 4/8/16KB block at the end? Alternately, why not just ignore the top address bit(s) and therefore map the 512KB into both lower and upper 512KB spaces? Same if 256KB - map it 4x by ignoring 2 top address bits?
    'Adding on the end' was suggested before, but that requires new, skewed size memory compile, and it needs another address bit.
    Allowing the area to dual map seems like an ok thing to do, most programmers should grasp that - you just recommend to use the top copy, as the lower ones may change on 1MB variants.

    I think the full 512KB of hub RAM should always be available, unless and until the user enables debugging. I agree about dual mapping, as address conflicts between cog/LUT RAM and low hub RAM could be avoided by using the upper 512KB bank for the latter.
    jmg wrote: »
    cgracey wrote: »
    I added C and Z flags for when you read breakpoint status using 'GETBRK D WC/WZ/WCZ' in debug ISRs.

    On debug ISR entry, 'GETBRK D WC' can be used to find out if the cog has just been (re)started (C=0).
    What does Z bit encode ?

    Something else, perhaps. Why C=0 instead of C=1?

    I mentioned this a week ago (mnemonic changed since):
    TonyB_ wrote: »
    Is it possible for GETINT GETBRK D WCZ to write something useful to the flags, e.g. C = D[0] and Z = (D[31:0] == 0) ?

    GETBRK D WCZ returns all the pending skip bits and can be used outside debug interrupts for nested skipping. This is a great free bonus from the recent debug changes. If the GETBRK D WCZ flags are as I suggest, one can tell immediately whether the next instruction will be skipped and whether skipping has finished.

    It would be a waste of a useful flag for C to mean the same thing in both WC and WCZ.

    Formerly known as TonyB
  • This ROM is not an actual memory instance from On Semi. it is just a few lines of Verilog code that gets compiled into logic.
  • jmgjmg Posts: 14,013
    edited 2018-03-17 - 22:59:36
    cgracey wrote: »
    This ROM is not an actual memory instance from On Semi. it is just a few lines of Verilog code that gets compiled into logic.

    There are two ROMs in play - the Debug keyhole ROM is very small, just 8 longs and is compiled-verilog.
    IIRC the BOOT ROM is 16k (OnSemi true ROM library?) and serially loaded into RAM at startup ? - That may be the ROM Roy meant ?
    Code for that will be a binary file, so could be slightly later than the Verilog, but not much later if their scripts are well primed, - a few days ?
  • Chip,
    I meant the 16k ROM that is serially read in to HUB with the boot code, etc.
  • Roy Eltham wrote: »
    LOCKBYE is a terrible name for it. Please use one of the other suggested names for it.

    All this debugging stuff is great, but I too am worried about what this does to the timeline. I thought the verilog had to be locked down because OnSemi was working out the synthesizing? Isn't this causing them to have to redo stuff?

    I would think that LOCKREL would be a more logical and descriptive name since you are actually releasing the lock.
    In science there is no authority. There is only experiment.
    Life is unpredictable. Eat dessert first.
  • cgraceycgracey Posts: 11,745
    edited 2018-03-18 - 20:48:26
    kwinn wrote: »
    Roy Eltham wrote: »
    LOCKBYE is a terrible name for it. Please use one of the other suggested names for it.

    All this debugging stuff is great, but I too am worried about what this does to the timeline. I thought the verilog had to be locked down because OnSemi was working out the synthesizing? Isn't this causing them to have to redo stuff?

    I would think that LOCKREL would be a more logical and descriptive name since you are actually releasing the lock.

    LOCKREL it is, then.

    We got our test chips yesterday and I've been going through each circuit.

    The new fractional PLL works great, but the digital counters can't keep up with the maximum VCO frequency. To remedy this, I'm making a slight schematic change to be able to mux the VCO output directly to the PLL output, rather than always going through a final divide-by-two circuit. This will halve the VCO divider frequencies at the top end. This way, you'll still have a 6-bit crystal divider and a 10-bit VCO divider, but 4-bit post-VCO division will be limited to 2, 4, 6, 8, ...30. The old ÷32 option now selects the direct VCO output. You would use direct-VCO mode for PLL frequencies of 100MHz, or higher. This allows the dividers to stay well below the apparent test-chip limit of 420MHz. For example, to generate a 160MHz clock, the dividers have to currently run at 320MHz. there will be much more margin at 160MHz. This also makes 200MHz more reasonable, as the dividerss won't need to run at 400MHz.

    I've made a simple schematic change and now I'll make the layout change. I'll send the new GDS cell to On Semi, along with a new netlist, and they will rerun DRC and LVS, which takes just a minute.

    It's way easier to do this myself than trying to engage other people. Cheaper and faster, too.
  • jmgjmg Posts: 14,013
    edited 2018-03-18 - 21:42:09
    cgracey wrote: »
    We got our test chips yesterday and I've been going through each circuit.

    Great :)

    Does the Xtal Osc work ok - to what Xtal MHz ?
    If you drive from AC coupled 0.8V 'clipped sine' (actually a LPF square wave) what MHz can that clock to ?
    Cheap high stability GPS oscillators (±500ppb) are mostly Clipped Sine , 10~30MHz region.

    eg TG2016 (1.7~3.63v, 1.5mA max) from EPSON shows up well under $1/1k, in 32MHz, 30MHz, 27MHz, 26MHz, 16.368MHz, 16.384MHz, others show 19.2MHz etc


    Verical lists 7Q-19.200MBV-T TXC Corporation Oscillator VC-TCXO 19.2MHz ±2.5ppm (Tol) ±0.5ppm (Stability) 247+: $0.2531 ( & also 7L-26.000MBS-T)

    cgracey wrote: »
    The new fractional PLL works great, but the digital counters can't keep up with the maximum VCO frequency. To remedy this, I'm making a slight schematic change to be able to mux the VCO output directly to the PLL output, rather than always going through a final divide-by-two circuit. This will halve the VCO divider frequencies at the top end. This way, you'll still have a 6-bit crystal divider and a 10-bit VCO divider, but 4-bit post-VCO division will be limited to 2, 4, 6, 8, ...30. The old ÷32 option now selects the direct VCO output. You would use direct-VCO mode for PLL frequencies of 100MHz, or higher. This allows the dividers to stay well below the apparent test-chip limit of 420MHz. For example, to generate a 160MHz clock, the dividers have to currently run at 320MHz. there will be much more margin at 160MHz. This also makes 200MHz more reasonable, as the dividerss won't need to run at 400MHz.

    I'm not following the limits - where does 420MHz come from ? that's 210MHz SysCLK - above target ?
    Or, do you mean the PLL can go above the counters ability in its normal operating range ?
    If the 10b VCO divider clocks straight from the VCO, that's maybe expected .... - usual designs have a single FF/2 first, which can clock faster than a 10 bit divider.


    How close is the VCO to 50:50 duty cycle by design ?
    One reason to always /2 is that allows more variation in VCO duty cycle, and the /2 ensures the output is 50:50

    I guess if you never use the falling edge, clock symmetry is slightly less critical ?

  • jmgjmg Posts: 14,013
    cgracey wrote: »
    The new fractional PLL works great, but the digital counters can't keep up with the maximum VCO frequency. To remedy this, I'm making a slight schematic change to be able to mux the VCO output directly to the PLL output, rather than always going through a final divide-by-two circuit. This will halve the VCO divider frequencies at the top end. This way, you'll still have a 6-bit crystal divider and a 10-bit VCO divider, but 4-bit post-VCO division will be limited to 2, 4, 6, 8, ...30. The old ÷32 option now selects the direct VCO output. You would use direct-VCO mode for PLL frequencies of 100MHz, or higher. This allows the dividers to stay well below the apparent test-chip limit of 420MHz. For example, to generate a 160MHz clock, the dividers have to currently run at 320MHz. there will be much more margin at 160MHz. This also makes 200MHz more reasonable, as the dividerss won't need to run at 400MHz.

    Reading that more carefully, I'm not sure the change does remedy the issue, it just shifts the operating point ?

    If the VCO can still exceed the counter, what does that do during LOCK ? - if VCO appears to suddenly halve, or stutter, does that still eventually always lock ok, to the expected value ?
    Does the counter need to be faster, or the VCO max lower, to ensure 'linear' PLL operation ?
  • It remedies the issue Chip mentioned. What you're bringing up would be another issue.

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • jmgjmg Posts: 14,013
    evanh wrote: »
    It remedies the issue Chip mentioned. What you're bringing up would be another issue.
    Nope, the issue mentioned is "but the digital counters can't keep up with the maximum VCO frequency", the remedy applied, does not fix that issue.

  • Well, I guess we are reading it differently then. I see your quote as just part of an explanation rather than the actual problem.

    I read it as Chip being concerned with reliability of setting the sys-clock to 200 MHz, which would have required a VCO output of 400 MHz - right at the counter's limit. With the design change, the whole PLL can now be run at a lower rate to get the same result. Which provides reduced power consumption as a bonus.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • jmgjmg Posts: 14,013
    evanh wrote: »
    Well, I guess we are reading it differently then. I see your quote as just part of an explanation rather than the actual problem.

    I read it as Chip being concerned with reliability of setting the sys-clock to 200 MHz, which would have required a VCO output of 400 MHz - right at the counter's limit. With the design change, the whole PLL can now be run at a lower rate to get the same result. Which provides reduced power consumption as a bonus.
    All of that assumes you can attain lock - however, in order to get to that new operating point, the VCO/PLL has to first lock, and what is not clear, is when VCO exceeds the counter, can lock even occur ?
    VCO/PLL are analog in nature, and during lock, the PFD output varies until lock is achieved.

    ie "but the digital counters can't keep up with the maximum VCO frequency" remains an issue in the design.
  • It fixes the issue by not needing to go so fast, anymore. 160MHz will now only require a VCO/divider speed of 160MHz, not 320MHz.
  • cgraceycgracey Posts: 11,745
    edited 2018-03-19 - 02:17:48
    The problem is that the post-VCO dividers cannot keep up with the VCO when it goes over 420MHz. This has the effect of stopping the dividers outputs, so that the VCO appears to be under-frequency, causing errant feedback to further increase the VCO speed.

    I discovered this by starting the crystal oscillator and PLL at the same time. The PLL would sometimes go haywire. I realized that while the crystal is starting up, before it develops significant amplitude, there is spurious fast toggling, sending the PLL into high-frequency lock-up.

    So, you only want to enable the PLL when you have a good reference frequency. This means allowing 5ms for the crystal to start, then enabling the PLL, and then allowing it 1ms to stabilize before switching to it.
  • jmgjmg Posts: 14,013
    cgracey wrote: »
    The problem is that the post-VCO dividers cannot keep up with the VCO when it goes over 420MHz. This has the effect of stopping the dividers outputs, so that the VCO appears to be under-frequency, causing errant feedback to further increase the VCO speed.

    I discovered this by starting the crystal oscillator and PLL at the same time. The PLL would sometimes go haywire. I realized that while the crystal is starting up, before it develops significant amplitude, there is spurious fast toggling, sending the PLL into high-frequency lock-up.

    So, you only want to enable the PLL when you have a good reference frequency. This means allowing 5ms for the crystal to start, then enabling the PLL, and then allowing it 1ms to stabilize before switching to it.

    Does that solve the issue, or merely mask it ?

    Users will want to change dividers on the fly, and not expect any lockout - they can tolerate a settling time.

    If your VCO can lock-out, that becomes statistical, and data like this

    This plot (Figure 3. Lock time for the tuned IF) indicates the VCO can go to the limits, during lock seek.
    http://www.analog.com/en/analog-dialogue/articles/fast-locking-high-sensitivity-tuned-if-radio.html

    Similar here..
    http://www.ti.com/lit/an/slwa069/slwa069.pdf


    To me, that means your MAX VCO and MAX counter need to be compatible, ie either raise Max Counter or lower Max VCO ?
  • cgraceycgracey Posts: 11,745
    edited 2018-03-19 - 04:25:41
    I could slow down the VCO by slightly increasing its gate lengths, but I can't speed up the counters.

    It would certainly be better if it couldn't get into that positive-feedback mode.

    I will increase VCO gate lengths, in addition to adding the VCO-direct mode.

    By the way, the VCO output is very close to 50% duty cycle. The only reason that I had that divide-by-2 circuit at the end of the post-VCO divider was because only every Nth high pulse comes through, causing a variably-skewed duty cycle.
  • Did you observed how off (ps) from the ideal 50% it was?
  • Yanomani wrote: »
    Did you observed how off (ps) from the ideal 50% it was?

    I am going to simulate it soon.
  • IIRC, you've already disclosed its schematic into a post, sometime ago, though I couldn't find it, ATM.
  • jmgjmg Posts: 14,013
    edited 2018-03-19 - 04:41:33
    cgracey wrote: »
    I could slow down the VCO by slightly increasing its gate lengths, but I can't speed up the counters.

    It would certainly be better if it couldn't get into that positive-feedback mode.

    I will increase VCO gate lengths, in addition to adding the VCO-direct mode.

    Sounds good, how close is the measured, to the Spice predictions ?

    A spectrum receiver and sniffer loop might be able to tell you where the VCO is getting up to ?
    cgracey wrote: »
    By the way, the VCO output is very close to 50% duty cycle. The only reason that I had that divide-by-2 circuit at the end of the post-VCO divider was because only every Nth high pulse comes through, causing a variably-skewed duty cycle.

    Is that still an issue with this note from above ? but 4-bit post-VCO division will be limited to 2, 4, 6, 8, ...30. (32=/1)

    The /1 case has VCO duty, and all others are even numbers, so all can have 50% duty ?
  • By running different settings, I can tell that it fails just above 420MHz.

    Yes, duty should always be ~50%.
  • YanomaniYanomani Posts: 882
    edited 2018-03-19 - 05:01:09
    If things are not that bad, duty cycle jitter will be in the range of a few tens of nS pS.
  • Yanomani wrote: »
    If things are not that bad, duty cycle jitter will be in the range of a few tens of nS pS.

    Maybe 100ps out of a 5ns cycle.
  • evanhevanh Posts: 8,056
    edited 2018-03-19 - 06:31:51
    jmg wrote: »
    ie "but the digital counters can't keep up with the maximum VCO frequency" remains an issue in the design.

    As I said initially, that's a different problem that you're worrying about. Good idea to lead in by saying so.

    Be careful of what you're asking for here too. Having the VCO too constrained can limit its max frequency in adverse temperature. Wouldn't be very convenient if the PLL ran out of puff just because it had been nerf'd just to protect from buggy code.

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • cgracey wrote: »
    Maybe 100ps out of a 5ns cycle.

    The ring oscillator; how many inverters are in the chain?
  • Yanomani wrote: »
    If things are not that bad, duty cycle jitter will be in the range of a few tens of nS pS.

    Maybe 100ps out of a 5ns cycle.
    Yanomani wrote: »
    cgracey wrote: »
    Maybe 100ps out of a 5ns cycle.

    The ring oscillator; how many inverters are in the chain?

    There are 7 differential inverters in the loop.
  • jmgjmg Posts: 14,013
    evanh wrote: »
    Be careful of what you're asking for here too. Having the VCO too constrained can limit its max frequency in adverse temperature. Wouldn't be very convenient if the PLL ran out of puff just because it had been nerf'd just to protect from buggy code.

    It's rather more serious than 'buggy code', which is why Chip has indicated he is fixing it.
    Perhaps if you looked at the links I gave, and read Chip's comment of "It would certainly be better if it couldn't get into that positive-feedback mode." ?

  • evanh wrote: »
    jmg wrote: »
    ie "but the digital counters can't keep up with the maximum VCO frequency" remains an issue in the design.

    As I said initially, that's a different problem that you're worrying about. Good idea to lead in by saying so.

    Be careful of what you're asking for here too. Having the VCO too constrained can limit its max frequency in adverse temperature. Wouldn't be very convenient if the PLL ran out of puff just because it had been nerf'd just to protect from buggy code.

    Yes, I am a little concerned about slowing the VCO down. At different corners, the VCO-divider speed relationship may become reversed.

    If we say the VCO frequency must be programmed for 100...200 MHz, there's no way it will overshoot to 440 MHz on a legal setting change, or even to 210 MHz, for that matter.

    I need to run some simulations to get a better understanding.
  • jmgjmg Posts: 14,013
    edited 2018-03-19 - 07:15:12
    cgracey wrote: »
    Yes, I am a little concerned about slowing the VCO down. At different corners, the VCO-divider speed relationship may become reversed.

    That's why I asked how close this was to Spice predictions,
    If it was going to somewhere above 420MHz, and you want to be > 200MHz that should be enough elbow room ?

    The VCO & Divider will tend to largely self-track on the same die, faster process means faster VCO, but also faster Counters.
    cgracey wrote: »
    If we say the VCO frequency must be programmed for 100...200 MHz, there's no way it will overshoot to 440 MHz on a legal setting change, or even to 210 MHz, for that matter.
    ? See the plots I linked above from Analog Devices, which show a VCO can hit the limits during the attaining-lock time.

    It may not do this 100% of the time, but the fact it can happen, is enough to design for.
    ie the overshoot limit (below counter fMax) really needs to be designed into the VCO, rather than 'hoped for' via the control signal.


  • cgraceycgracey Posts: 11,745
    edited 2018-03-19 - 07:17:38
    TonyB_ wrote: »
    jmg wrote: »
    Cluso99 wrote: »
    One thing that concerns me is the 16KB of Hub being mapped to the top of Hub ram. Previously it was to be dual mapped. ie for 512KB, the 16KB at 496-512KB would also appear at 1008-1024KB too. That gave us the ability to have a whole contiguous block of 512KB (or 256KB etc). Removing 16KB from the block disrupts the contiguous block concept. We also have a disruption at the bottom end where cog and lut addresses interfere with the mapping. If you cannot dual map the 16KB block, could you simply add an extra 4/8/16KB block at the end? Alternately, why not just ignore the top address bit(s) and therefore map the 512KB into both lower and upper 512KB spaces? Same if 256KB - map it 4x by ignoring 2 top address bits?
    'Adding on the end' was suggested before, but that requires new, skewed size memory compile, and it needs another address bit.
    Allowing the area to dual map seems like an ok thing to do, most programmers should grasp that - you just recommend to use the top copy, as the lower ones may change on 1MB variants.

    I think the full 512KB of hub RAM should always be available, unless and until the user enables debugging. I agree about dual mapping, as address conflicts between cog/LUT RAM and low hub RAM could be avoided by using the upper 512KB bank for the latter.
    jmg wrote: »
    cgracey wrote: »
    I added C and Z flags for when you read breakpoint status using 'GETBRK D WC/WZ/WCZ' in debug ISRs.

    On debug ISR entry, 'GETBRK D WC' can be used to find out if the cog has just been (re)started (C=0).
    What does Z bit encode ?

    Something else, perhaps. Why C=0 instead of C=1?

    I mentioned this a week ago (mnemonic changed since):
    TonyB_ wrote: »
    Is it possible for GETINT GETBRK D WCZ to write something useful to the flags, e.g. C = D[0] and Z = (D[31:0] == 0) ?

    GETBRK D WCZ returns all the pending skip bits and can be used outside debug interrupts for nested skipping. This is a great free bonus from the recent debug changes. If the GETBRK D WCZ flags are as I suggest, one can tell immediately whether the next instruction will be skipped and whether skipping has finished.

    It would be a waste of a useful flag for C to mean the same thing in both WC and WCZ.

    About the hub RAM map...

    Debugging needs to be invokable in all cases, not just accommodated when needed. For this reason, we need to have a constant memory map that always allows for the possibility of debug. This means sacrificing some RAM. We are talking about 16KB here. That's just 3% of the 512KB total.

    About nesting SKIP patterns...

    Since CALLs are allowed within SKIP sequences, a CALL-depth counter must be maintained to know when we've RETurned and can resume the stalled SKIP pattern. This 4-bit counter is reset to 0 on SKIP/SKIPF/EXECF/XBYTE, incremented on CALL/CALLPA/CALLPB, and decremented on RET/_RET_. This setup/restore activity can be spoofed by using careful instruction sequences, but what would be much better would be to have automatic hardware stacks for SKIP data and CALL depths. Really, though, one level of SKIP seems sufficient to me.
Sign In or Register to comment.