One thing that concerns me is the 16KB of Hub being mapped to the top of Hub ram. Previously it was to be dual mapped. ie for 512KB, the 16KB at 496-512KB would also appear at 1008-1024KB too. That gave us the ability to have a whole contiguous block of 512KB (or 256KB etc). Removing 16KB from the block disrupts the contiguous block concept. We also have a disruption at the bottom end where cog and lut addresses interfere with the mapping. If you cannot dual map the 16KB block, could you simply add an extra 4/8/16KB block at the end? Alternately, why not just ignore the top address bit(s) and therefore map the 512KB into both lower and upper 512KB spaces? Same if 256KB - map it 4x by ignoring 2 top address bits?
'Adding on the end' was suggested before, but that requires new, skewed size memory compile, and it needs another address bit.
Allowing the area to dual map seems like an ok thing to do, most programmers should grasp that - you just recommend to use the top copy, as the lower ones may change on 1MB variants.
I think the full 512KB of hub RAM should always be available, unless and until the user enables debugging. I agree about dual mapping, as address conflicts between cog/LUT RAM and low hub RAM could be avoided by using the upper 512KB bank for the latter.
Is it possible for GETINT GETBRK D WCZ to write something useful to the flags, e.g. C = D[0] and Z = (D[31:0] == 0) ?
GETBRK D WCZ returns all the pending skip bits and can be used outside debug interrupts for nested skipping. This is a great free bonus from the recent debug changes. If the GETBRK D WCZ flags are as I suggest, one can tell immediately whether the next instruction will be skipped and whether skipping has finished.
It would be a waste of a useful flag for C to mean the same thing in both WC and WCZ.
This ROM is not an actual memory instance from On Semi. it is just a few lines of Verilog code that gets compiled into logic.
There are two ROMs in play - the Debug keyhole ROM is very small, just 8 longs and is compiled-verilog.
IIRC the BOOT ROM is 16k (OnSemi true ROM library?) and serially loaded into RAM at startup ? - That may be the ROM Roy meant ?
Code for that will be a binary file, so could be slightly later than the Verilog, but not much later if their scripts are well primed, - a few days ?
LOCKBYE is a terrible name for it. Please use one of the other suggested names for it.
All this debugging stuff is great, but I too am worried about what this does to the timeline. I thought the verilog had to be locked down because OnSemi was working out the synthesizing? Isn't this causing them to have to redo stuff?
I would think that LOCKREL would be a more logical and descriptive name since you are actually releasing the lock.
LOCKBYE is a terrible name for it. Please use one of the other suggested names for it.
All this debugging stuff is great, but I too am worried about what this does to the timeline. I thought the verilog had to be locked down because OnSemi was working out the synthesizing? Isn't this causing them to have to redo stuff?
I would think that LOCKREL would be a more logical and descriptive name since you are actually releasing the lock.
LOCKREL it is, then.
We got our test chips yesterday and I've been going through each circuit.
The new fractional PLL works great, but the digital counters can't keep up with the maximum VCO frequency. To remedy this, I'm making a slight schematic change to be able to mux the VCO output directly to the PLL output, rather than always going through a final divide-by-two circuit. This will halve the VCO divider frequencies at the top end. This way, you'll still have a 6-bit crystal divider and a 10-bit VCO divider, but 4-bit post-VCO division will be limited to 2, 4, 6, 8, ...30. The old ÷32 option now selects the direct VCO output. You would use direct-VCO mode for PLL frequencies of 100MHz, or higher. This allows the dividers to stay well below the apparent test-chip limit of 420MHz. For example, to generate a 160MHz clock, the dividers have to currently run at 320MHz. there will be much more margin at 160MHz. This also makes 200MHz more reasonable, as the dividerss won't need to run at 400MHz.
I've made a simple schematic change and now I'll make the layout change. I'll send the new GDS cell to On Semi, along with a new netlist, and they will rerun DRC and LVS, which takes just a minute.
It's way easier to do this myself than trying to engage other people. Cheaper and faster, too.
We got our test chips yesterday and I've been going through each circuit.
Great
Does the Xtal Osc work ok - to what Xtal MHz ?
If you drive from AC coupled 0.8V 'clipped sine' (actually a LPF square wave) what MHz can that clock to ?
Cheap high stability GPS oscillators (±500ppb) are mostly Clipped Sine , 10~30MHz region.
eg TG2016 (1.7~3.63v, 1.5mA max) from EPSON shows up well under $1/1k, in 32MHz, 30MHz, 27MHz, 26MHz, 16.368MHz, 16.384MHz, others show 19.2MHz etc
The new fractional PLL works great, but the digital counters can't keep up with the maximum VCO frequency. To remedy this, I'm making a slight schematic change to be able to mux the VCO output directly to the PLL output, rather than always going through a final divide-by-two circuit. This will halve the VCO divider frequencies at the top end. This way, you'll still have a 6-bit crystal divider and a 10-bit VCO divider, but 4-bit post-VCO division will be limited to 2, 4, 6, 8, ...30. The old ÷32 option now selects the direct VCO output. You would use direct-VCO mode for PLL frequencies of 100MHz, or higher. This allows the dividers to stay well below the apparent test-chip limit of 420MHz. For example, to generate a 160MHz clock, the dividers have to currently run at 320MHz. there will be much more margin at 160MHz. This also makes 200MHz more reasonable, as the dividerss won't need to run at 400MHz.
I'm not following the limits - where does 420MHz come from ? that's 210MHz SysCLK - above target ?
Or, do you mean the PLL can go above the counters ability in its normal operating range ?
If the 10b VCO divider clocks straight from the VCO, that's maybe expected .... - usual designs have a single FF/2 first, which can clock faster than a 10 bit divider.
How close is the VCO to 50:50 duty cycle by design ?
One reason to always /2 is that allows more variation in VCO duty cycle, and the /2 ensures the output is 50:50
I guess if you never use the falling edge, clock symmetry is slightly less critical ?
The new fractional PLL works great, but the digital counters can't keep up with the maximum VCO frequency. To remedy this, I'm making a slight schematic change to be able to mux the VCO output directly to the PLL output, rather than always going through a final divide-by-two circuit. This will halve the VCO divider frequencies at the top end. This way, you'll still have a 6-bit crystal divider and a 10-bit VCO divider, but 4-bit post-VCO division will be limited to 2, 4, 6, 8, ...30. The old ÷32 option now selects the direct VCO output. You would use direct-VCO mode for PLL frequencies of 100MHz, or higher. This allows the dividers to stay well below the apparent test-chip limit of 420MHz. For example, to generate a 160MHz clock, the dividers have to currently run at 320MHz. there will be much more margin at 160MHz. This also makes 200MHz more reasonable, as the dividerss won't need to run at 400MHz.
Reading that more carefully, I'm not sure the change does remedy the issue, it just shifts the operating point ?
If the VCO can still exceed the counter, what does that do during LOCK ? - if VCO appears to suddenly halve, or stutter, does that still eventually always lock ok, to the expected value ?
Does the counter need to be faster, or the VCO max lower, to ensure 'linear' PLL operation ?
Well, I guess we are reading it differently then. I see your quote as just part of an explanation rather than the actual problem.
I read it as Chip being concerned with reliability of setting the sys-clock to 200 MHz, which would have required a VCO output of 400 MHz - right at the counter's limit. With the design change, the whole PLL can now be run at a lower rate to get the same result. Which provides reduced power consumption as a bonus.
Well, I guess we are reading it differently then. I see your quote as just part of an explanation rather than the actual problem.
I read it as Chip being concerned with reliability of setting the sys-clock to 200 MHz, which would have required a VCO output of 400 MHz - right at the counter's limit. With the design change, the whole PLL can now be run at a lower rate to get the same result. Which provides reduced power consumption as a bonus.
All of that assumes you can attain lock - however, in order to get to that new operating point, the VCO/PLL has to first lock, and what is not clear, is when VCO exceeds the counter, can lock even occur ?
VCO/PLL are analog in nature, and during lock, the PFD output varies until lock is achieved.
ie "but the digital counters can't keep up with the maximum VCO frequency" remains an issue in the design.
The problem is that the post-VCO dividers cannot keep up with the VCO when it goes over 420MHz. This has the effect of stopping the dividers outputs, so that the VCO appears to be under-frequency, causing errant feedback to further increase the VCO speed.
I discovered this by starting the crystal oscillator and PLL at the same time. The PLL would sometimes go haywire. I realized that while the crystal is starting up, before it develops significant amplitude, there is spurious fast toggling, sending the PLL into high-frequency lock-up.
So, you only want to enable the PLL when you have a good reference frequency. This means allowing 5ms for the crystal to start, then enabling the PLL, and then allowing it 1ms to stabilize before switching to it.
The problem is that the post-VCO dividers cannot keep up with the VCO when it goes over 420MHz. This has the effect of stopping the dividers outputs, so that the VCO appears to be under-frequency, causing errant feedback to further increase the VCO speed.
I discovered this by starting the crystal oscillator and PLL at the same time. The PLL would sometimes go haywire. I realized that while the crystal is starting up, before it develops significant amplitude, there is spurious fast toggling, sending the PLL into high-frequency lock-up.
So, you only want to enable the PLL when you have a good reference frequency. This means allowing 5ms for the crystal to start, then enabling the PLL, and then allowing it 1ms to stabilize before switching to it.
Does that solve the issue, or merely mask it ?
Users will want to change dividers on the fly, and not expect any lockout - they can tolerate a settling time.
If your VCO can lock-out, that becomes statistical, and data like this
I could slow down the VCO by slightly increasing its gate lengths, but I can't speed up the counters.
It would certainly be better if it couldn't get into that positive-feedback mode.
I will increase VCO gate lengths, in addition to adding the VCO-direct mode.
By the way, the VCO output is very close to 50% duty cycle. The only reason that I had that divide-by-2 circuit at the end of the post-VCO divider was because only every Nth high pulse comes through, causing a variably-skewed duty cycle.
By the way, the VCO output is very close to 50% duty cycle. The only reason that I had that divide-by-2 circuit at the end of the post-VCO divider was because only every Nth high pulse comes through, causing a variably-skewed duty cycle.
Is that still an issue with this note from above ? but 4-bit post-VCO division will be limited to 2, 4, 6, 8, ...30. (32=/1)
The /1 case has VCO duty, and all others are even numbers, so all can have 50% duty ?
ie "but the digital counters can't keep up with the maximum VCO frequency" remains an issue in the design.
As I said initially, that's a different problem that you're worrying about. Good idea to lead in by saying so.
Be careful of what you're asking for here too. Having the VCO too constrained can limit its max frequency in adverse temperature. Wouldn't be very convenient if the PLL ran out of puff just because it had been nerf'd just to protect from buggy code.
Be careful of what you're asking for here too. Having the VCO too constrained can limit its max frequency in adverse temperature. Wouldn't be very convenient if the PLL ran out of puff just because it had been nerf'd just to protect from buggy code.
It's rather more serious than 'buggy code', which is why Chip has indicated he is fixing it.
Perhaps if you looked at the links I gave, and read Chip's comment of "It would certainly be better if it couldn't get into that positive-feedback mode." ?
ie "but the digital counters can't keep up with the maximum VCO frequency" remains an issue in the design.
As I said initially, that's a different problem that you're worrying about. Good idea to lead in by saying so.
Be careful of what you're asking for here too. Having the VCO too constrained can limit its max frequency in adverse temperature. Wouldn't be very convenient if the PLL ran out of puff just because it had been nerf'd just to protect from buggy code.
Yes, I am a little concerned about slowing the VCO down. At different corners, the VCO-divider speed relationship may become reversed.
If we say the VCO frequency must be programmed for 100...200 MHz, there's no way it will overshoot to 440 MHz on a legal setting change, or even to 210 MHz, for that matter.
I need to run some simulations to get a better understanding.
Yes, I am a little concerned about slowing the VCO down. At different corners, the VCO-divider speed relationship may become reversed.
That's why I asked how close this was to Spice predictions,
If it was going to somewhere above 420MHz, and you want to be > 200MHz that should be enough elbow room ?
The VCO & Divider will tend to largely self-track on the same die, faster process means faster VCO, but also faster Counters.
If we say the VCO frequency must be programmed for 100...200 MHz, there's no way it will overshoot to 440 MHz on a legal setting change, or even to 210 MHz, for that matter.
? See the plots I linked above from Analog Devices, which show a VCO can hit the limits during the attaining-lock time.
It may not do this 100% of the time, but the fact it can happen, is enough to design for.
ie the overshoot limit (below counter fMax) really needs to be designed into the VCO, rather than 'hoped for' via the control signal.
One thing that concerns me is the 16KB of Hub being mapped to the top of Hub ram. Previously it was to be dual mapped. ie for 512KB, the 16KB at 496-512KB would also appear at 1008-1024KB too. That gave us the ability to have a whole contiguous block of 512KB (or 256KB etc). Removing 16KB from the block disrupts the contiguous block concept. We also have a disruption at the bottom end where cog and lut addresses interfere with the mapping. If you cannot dual map the 16KB block, could you simply add an extra 4/8/16KB block at the end? Alternately, why not just ignore the top address bit(s) and therefore map the 512KB into both lower and upper 512KB spaces? Same if 256KB - map it 4x by ignoring 2 top address bits?
'Adding on the end' was suggested before, but that requires new, skewed size memory compile, and it needs another address bit.
Allowing the area to dual map seems like an ok thing to do, most programmers should grasp that - you just recommend to use the top copy, as the lower ones may change on 1MB variants.
I think the full 512KB of hub RAM should always be available, unless and until the user enables debugging. I agree about dual mapping, as address conflicts between cog/LUT RAM and low hub RAM could be avoided by using the upper 512KB bank for the latter.
Is it possible for GETINT GETBRK D WCZ to write something useful to the flags, e.g. C = D[0] and Z = (D[31:0] == 0) ?
GETBRK D WCZ returns all the pending skip bits and can be used outside debug interrupts for nested skipping. This is a great free bonus from the recent debug changes. If the GETBRK D WCZ flags are as I suggest, one can tell immediately whether the next instruction will be skipped and whether skipping has finished.
It would be a waste of a useful flag for C to mean the same thing in both WC and WCZ.
About the hub RAM map...
Debugging needs to be invokable in all cases, not just accommodated when needed. For this reason, we need to have a constant memory map that always allows for the possibility of debug. This means sacrificing some RAM. We are talking about 16KB here. That's just 3% of the 512KB total.
About nesting SKIP patterns...
Since CALLs are allowed within SKIP sequences, a CALL-depth counter must be maintained to know when we've RETurned and can resume the stalled SKIP pattern. This 4-bit counter is reset to 0 on SKIP/SKIPF/EXECF/XBYTE, incremented on CALL/CALLPA/CALLPB, and decremented on RET/_RET_. This setup/restore activity can be spoofed by using careful instruction sequences, but what would be much better would be to have automatic hardware stacks for SKIP data and CALL depths. Really, though, one level of SKIP seems sufficient to me.
Comments
I think the full 512KB of hub RAM should always be available, unless and until the user enables debugging. I agree about dual mapping, as address conflicts between cog/LUT RAM and low hub RAM could be avoided by using the upper 512KB bank for the latter.
Something else, perhaps. Why C=0 instead of C=1?
I mentioned this a week ago (mnemonic changed since):
GETBRK D WCZ returns all the pending skip bits and can be used outside debug interrupts for nested skipping. This is a great free bonus from the recent debug changes. If the GETBRK D WCZ flags are as I suggest, one can tell immediately whether the next instruction will be skipped and whether skipping has finished.
It would be a waste of a useful flag for C to mean the same thing in both WC and WCZ.
There are two ROMs in play - the Debug keyhole ROM is very small, just 8 longs and is compiled-verilog.
IIRC the BOOT ROM is 16k (OnSemi true ROM library?) and serially loaded into RAM at startup ? - That may be the ROM Roy meant ?
Code for that will be a binary file, so could be slightly later than the Verilog, but not much later if their scripts are well primed, - a few days ?
I meant the 16k ROM that is serially read in to HUB with the boot code, etc.
I would think that LOCKREL would be a more logical and descriptive name since you are actually releasing the lock.
LOCKREL it is, then.
We got our test chips yesterday and I've been going through each circuit.
The new fractional PLL works great, but the digital counters can't keep up with the maximum VCO frequency. To remedy this, I'm making a slight schematic change to be able to mux the VCO output directly to the PLL output, rather than always going through a final divide-by-two circuit. This will halve the VCO divider frequencies at the top end. This way, you'll still have a 6-bit crystal divider and a 10-bit VCO divider, but 4-bit post-VCO division will be limited to 2, 4, 6, 8, ...30. The old ÷32 option now selects the direct VCO output. You would use direct-VCO mode for PLL frequencies of 100MHz, or higher. This allows the dividers to stay well below the apparent test-chip limit of 420MHz. For example, to generate a 160MHz clock, the dividers have to currently run at 320MHz. there will be much more margin at 160MHz. This also makes 200MHz more reasonable, as the dividerss won't need to run at 400MHz.
I've made a simple schematic change and now I'll make the layout change. I'll send the new GDS cell to On Semi, along with a new netlist, and they will rerun DRC and LVS, which takes just a minute.
It's way easier to do this myself than trying to engage other people. Cheaper and faster, too.
Great
Does the Xtal Osc work ok - to what Xtal MHz ?
If you drive from AC coupled 0.8V 'clipped sine' (actually a LPF square wave) what MHz can that clock to ?
Cheap high stability GPS oscillators (±500ppb) are mostly Clipped Sine , 10~30MHz region.
eg TG2016 (1.7~3.63v, 1.5mA max) from EPSON shows up well under $1/1k, in 32MHz, 30MHz, 27MHz, 26MHz, 16.368MHz, 16.384MHz, others show 19.2MHz etc
Verical lists 7Q-19.200MBV-T TXC Corporation Oscillator VC-TCXO 19.2MHz ±2.5ppm (Tol) ±0.5ppm (Stability) 247+: $0.2531 ( & also 7L-26.000MBS-T)
I'm not following the limits - where does 420MHz come from ? that's 210MHz SysCLK - above target ?
Or, do you mean the PLL can go above the counters ability in its normal operating range ?
If the 10b VCO divider clocks straight from the VCO, that's maybe expected .... - usual designs have a single FF/2 first, which can clock faster than a 10 bit divider.
How close is the VCO to 50:50 duty cycle by design ?
One reason to always /2 is that allows more variation in VCO duty cycle, and the /2 ensures the output is 50:50
I guess if you never use the falling edge, clock symmetry is slightly less critical ?
Reading that more carefully, I'm not sure the change does remedy the issue, it just shifts the operating point ?
If the VCO can still exceed the counter, what does that do during LOCK ? - if VCO appears to suddenly halve, or stutter, does that still eventually always lock ok, to the expected value ?
Does the counter need to be faster, or the VCO max lower, to ensure 'linear' PLL operation ?
I read it as Chip being concerned with reliability of setting the sys-clock to 200 MHz, which would have required a VCO output of 400 MHz - right at the counter's limit. With the design change, the whole PLL can now be run at a lower rate to get the same result. Which provides reduced power consumption as a bonus.
VCO/PLL are analog in nature, and during lock, the PFD output varies until lock is achieved.
ie "but the digital counters can't keep up with the maximum VCO frequency" remains an issue in the design.
I discovered this by starting the crystal oscillator and PLL at the same time. The PLL would sometimes go haywire. I realized that while the crystal is starting up, before it develops significant amplitude, there is spurious fast toggling, sending the PLL into high-frequency lock-up.
So, you only want to enable the PLL when you have a good reference frequency. This means allowing 5ms for the crystal to start, then enabling the PLL, and then allowing it 1ms to stabilize before switching to it.
Does that solve the issue, or merely mask it ?
Users will want to change dividers on the fly, and not expect any lockout - they can tolerate a settling time.
If your VCO can lock-out, that becomes statistical, and data like this
This plot (Figure 3. Lock time for the tuned IF) indicates the VCO can go to the limits, during lock seek.
http://www.analog.com/en/analog-dialogue/articles/fast-locking-high-sensitivity-tuned-if-radio.html
Similar here..
http://www.ti.com/lit/an/slwa069/slwa069.pdf
To me, that means your MAX VCO and MAX counter need to be compatible, ie either raise Max Counter or lower Max VCO ?
It would certainly be better if it couldn't get into that positive-feedback mode.
I will increase VCO gate lengths, in addition to adding the VCO-direct mode.
By the way, the VCO output is very close to 50% duty cycle. The only reason that I had that divide-by-2 circuit at the end of the post-VCO divider was because only every Nth high pulse comes through, causing a variably-skewed duty cycle.
I am going to simulate it soon.
Sounds good, how close is the measured, to the Spice predictions ?
A spectrum receiver and sniffer loop might be able to tell you where the VCO is getting up to ?
Is that still an issue with this note from above ? but 4-bit post-VCO division will be limited to 2, 4, 6, 8, ...30. (32=/1)
The /1 case has VCO duty, and all others are even numbers, so all can have 50% duty ?
Yes, duty should always be ~50%.
Maybe 100ps out of a 5ns cycle.
As I said initially, that's a different problem that you're worrying about. Good idea to lead in by saying so.
Be careful of what you're asking for here too. Having the VCO too constrained can limit its max frequency in adverse temperature. Wouldn't be very convenient if the PLL ran out of puff just because it had been nerf'd just to protect from buggy code.
The ring oscillator; how many inverters are in the chain?
Maybe 100ps out of a 5ns cycle.
There are 7 differential inverters in the loop.
It's rather more serious than 'buggy code', which is why Chip has indicated he is fixing it.
Perhaps if you looked at the links I gave, and read Chip's comment of "It would certainly be better if it couldn't get into that positive-feedback mode." ?
Yes, I am a little concerned about slowing the VCO down. At different corners, the VCO-divider speed relationship may become reversed.
If we say the VCO frequency must be programmed for 100...200 MHz, there's no way it will overshoot to 440 MHz on a legal setting change, or even to 210 MHz, for that matter.
I need to run some simulations to get a better understanding.
That's why I asked how close this was to Spice predictions,
If it was going to somewhere above 420MHz, and you want to be > 200MHz that should be enough elbow room ?
The VCO & Divider will tend to largely self-track on the same die, faster process means faster VCO, but also faster Counters.
? See the plots I linked above from Analog Devices, which show a VCO can hit the limits during the attaining-lock time.
It may not do this 100% of the time, but the fact it can happen, is enough to design for.
ie the overshoot limit (below counter fMax) really needs to be designed into the VCO, rather than 'hoped for' via the control signal.
About the hub RAM map...
Debugging needs to be invokable in all cases, not just accommodated when needed. For this reason, we need to have a constant memory map that always allows for the possibility of debug. This means sacrificing some RAM. We are talking about 16KB here. That's just 3% of the 512KB total.
About nesting SKIP patterns...
Since CALLs are allowed within SKIP sequences, a CALL-depth counter must be maintained to know when we've RETurned and can resume the stalled SKIP pattern. This 4-bit counter is reset to 0 on SKIP/SKIPF/EXECF/XBYTE, incremented on CALL/CALLPA/CALLPB, and decremented on RET/_RET_. This setup/restore activity can be spoofed by using careful instruction sequences, but what would be much better would be to have automatic hardware stacks for SKIP data and CALL depths. Really, though, one level of SKIP seems sufficient to me.