New P2 Silicon Observations

1141517192023

Comments

  • evanhevanh Posts: 6,643
    edited 2018-10-23 - 00:32:52
    cgracey wrote: »
    Think about it. It would be messy. If you set an OUT bit high, and any other cog had that same pin's DIR bit high, the pin would go high. It's nice to qualify OUT bits in cogs with local DIR bits.

    grr, not happy with that explanation. I guess it is possible to coordinate multiple cogs to cooperate on one pin but has it ever been done?

    Anything else is just buggy code.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanhevanh Posts: 6,643
    edited 2018-10-23 - 00:39:04
    Okay, what happens if DIR is always early then? What sort of glitches would that create ... it'll be when enabling instead of disabling ... OUT is guaranteed to be initially low in this case ...
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • Worst I can think of is the momentary low drive before a possible high state. That doesn't seem bad to me.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanh wrote: »
    Worst I can think of is the momentary low drive before a possible high state. That doesn't seem bad to me.

    Right. It would correct immediately and have no lasting effect.
  • evanh wrote: »
    cgracey wrote: »
    Think about it. It would be messy. If you set an OUT bit high, and any other cog had that same pin's DIR bit high, the pin would go high. It's nice to qualify OUT bits in cogs with local DIR bits.

    grr, not happy with that explanation. I guess it is possible to coordinate multiple cogs to cooperate on one pin but has it ever been done?

    Anything else is just buggy code.

    The streamer output can be up to 32 pins. The DIR bits say which pins are really outputs. We would need a 32-bit mask register for the streamer output. Maybe that would be better. Not sure.
  • jmgjmg Posts: 13,264
    cgracey wrote: »
    evanh wrote: »
    News to me! Maybe that should be changed.

    Think about it. It would be messy. If you set an OUT bit high, and any other cog had that same pin's DIR bit high, the pin would go high. It's nice to qualify OUT bits in cogs with local DIR bits.

    You seem to assume LOW is ok to change to, but Hi is better avoided ?
    Systems can be active low or active high, so errant pulses in either directions should be avoided.

    If DIR has disables OUT, I'm unclear on why DIR needed to also force OUT internally - that invites exactly this type of spike issue ?

  • evanhevanh Posts: 6,643
    edited 2018-10-23 - 01:14:21
    I think I'm good with an early DIR. It's the least messing around with the prop design for sure.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • jmgjmg Posts: 13,264
    edited 2018-10-23 - 01:51:16
    cgracey wrote: »
    I had a talk with the engineer at ON Semi today about the P2 respin. We talked about clock-gating.

    At this point, it sounds like a big hassle to introduce clock-gating. It would slow down the project and cost more money. I think we should just go with what we've got.

    Any thoughts?

    What sort of Clock Gating did they discuss, and what can their tools manage ?

    It seemed that a single global enable per-cog, that disables the PLL feed when that COG is disabled would have no speed penalty, and give some improved power control. - it is a simple upstream enable, that can be local D-FF qualified.

    Addit: I think you can do this level of clock-distribution control in Verilog/FPGA and not need automated-tools.


    Nice still, is to have WAIT opcodes link into CLOCK gate, (as in P1) but I can see that level of Clock gate could impact timing as that's a live-gate situation (but you do have a lead-in time, as per P1, so it might not be critical-path).

    If all of the above is outside their tools scope, then yes, run with what you have, which is not ideal, but can certainly be sold.

  • P1
    In the P1 we only had DIR & OUT (& IN). It was not possible to change the output state and the direction concurrently. Hence, no glitches.

    P2
    In P2, we also have FLTx & DRVx (etc?). Here, we can change the output state and the direction concurrently, and this results in possible glitches. If we continue to use DIR & OUT, we should have no glitches - I am in the process of proving this, and other inconsistencies we see in different pins.

    However, what if the FLTx instructions were to float the pin on one clock and set the output value one clock later. This would have no visible effect and would result in clean outputs.
    The same goes for DRVx instructions. What if the output was set on one clock, and then the next clock the output was enabled. Yes, this would result in DRVx taking effect one clock later than the DIR instruction. But there would be no possible glitches, just nice and clean output.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Cluso99Cluso99 Posts: 14,749
    edited 2018-10-23 - 02:08:10
    Clock Gating

    I do not believe all the current (power) use can be attributed to the lack of clock gating as such. We need further testing to characterise just where the current is being wasted.

    IMHO there are a few areas that do not seem to reduce the clocking (ie current) when they shouldn't really be active. This could be achieved (as jmg said) by verilog code gating some of the clocking.

    Why does a cog need to have clocking when it's idle/stopped?

    Is the HUB RAM being activated when there is no read/write access occurring?

    These seem to be two simple things to both prove, and should be easy to fix.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Cluso99 wrote: »
    In the P1 we only had DIR & OUT (& IN). It was not possible to change the output state and the direction concurrently. Hence, no glitches.
    You might be surprised. Prop1 will likely have same logic that does this. It only takes change of DIR to create the glitch. The glitch size is dependant on the relative response. Prop1 may have been tuned or maybe Chip had tried to tune the Prop2 in early design but went the wrong way.

    It looks to be a clear cut reason now at least. The new info makes sense to me. I think Chip had discussed possibilities with OnSemi a week or so back, so was probably planning on this approach since then. Just had to fit it in his schedule.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • jmgjmg Posts: 13,264
    Cluso99 wrote: »
    In P2, we also have FLTx & DRVx (etc?). Here, we can change the output state and the direction concurrently, and this results in possible glitches. If we continue to use DIR & OUT, we should have no glitches - I am in the process of proving this, and other inconsistencies we see in different pins.

    Do you mean "we should have no glitches" after Chip has applied his delay patch ?
    Since DIR removes any OUT drive, I'm unclear why it should also hard-gate OUT -> LOW ?
    Any COG in idle would have both DIR and OUT low and there are other modes for open-drain operations.

    Is there a use case where having a hard-gate connection so DIR -> LOW also drives OUT->LOW allows users to do what is otherwise impossible ?
    ( I'm not sure if generate sub-sysclk glitches is a valid use-case ;) )

    Worse, if you fix the leading edge with delays, does that then mean you guarantee a glitch on the trailing edge ?
    ie when DIR is enabled again, if the user OUT is HI, the delayed out (still LOW) will drive (glitch) the pin low ?
  • Evan & jmg,
    I think you both missed the point, or perhaps I didn't explain it properly.

    P1 cannot change DIR & OUT concurrently. There is no instruction to change both at the same time.

    However, in P2, the FLTx and DRVx instructions can change both DIR & OUT concurrently (at the same time). This gives the possibility of a glitch, depending on which signal arrives first at the pin.
    It could be solved in FLTx by making the DIR change to INPUT and one clock later change the OUT value. This way a glitch is not possible because the pin is tri-stated before the output value is changed.
    It could be solved in DRVx by making the OUT value change and one clock later the DIR changed to an output. Again, this way no glitch is possible.
    Note in P2, instructions are 2 clocks, so the FLTx and DRVx cannot cause a following instruction to cause a glitch either.

    I believe what we are seeing, as has been described, is the minute propagation differences in the DIR and OUT lines. It's caused because both DIR and OUT are being changed on the same clock by the same instruction.
    If DIR is skewed to occur before OUT, then a glitch may/will occur on the DRVx instruction, but not on the FLTx instruction.
    On the other hand, if DIR is skewed to occur after OUT, the there will not be a glitch with the DRVx, but may/will occur for FLTx.
    The glitch occurs when OUT changes polarity with the DRVx and FLTx instructions, but not if OUT remains at the same level.

    Does this now make sense???
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • cgracey wrote: »
    evanh wrote: »
    I thought the glitch also happened when OUT was held high, and only DIR being lowered.

    Ah, yes, but remember that each cog's OUT output is AND'd with its DIR output, so when the DIR output goes low, so does the OUT output.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • Prop1 is most likely the same logic in that part.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanh wrote: »
    Prop1 is most likely the same logic in that part.

    That's right. It luckily doesn't glitch.
  • cgracey wrote: »
    evanh wrote: »
    I thought the glitch also happened when OUT was held high, and only DIR being lowered.

    Ah, yes, but remember that each cog's OUT output is AND'd with its DIR output, so when the DIR output goes low, so does the OUT output.
    cgracey wrote: »
    The timing rule is that the DIR signal must not arrive at the I/O pad circuit after the OUT signal. At all corners, this is the case, with some slack, actually. DIR can be a little later, but not enough to try to qualify. If we just make a timing rule on the respin that DIR arrives no later than OUT, we should be golden.
    cgracey wrote: »
    evanh wrote: »
    News to me! Maybe that should be changed.



    Think about it. It would be messy. If you set an OUT bit high, and any other cog had that same pin's DIR bit high, the pin would go high. It's nice to qualify OUT bits in cogs with local DIR bits.

    I missed this. I didn't realise that DIR gated OUT on the P1. Guess the larger feature size prevented glitches from being seen.

    Seems to me though, the solution to leave a small glitch to avoid a programming error where the OUTs are ORed from each COG, is the wrong fix. P2 is much faster, and as we have already seen, the glitch prevented testing the tri-state pin to check for a pull-down being present.

    I test many pins on most of my P1 boards for pull-ups to determine what board and peripherals are present.

    It's also possible that the glitches could cause erroneous clocking on some added chips. It's possible that problems could arise where we share pins, or where we are turning the line around.

    Chip, sorry but I am certainly not liking the current solution. Perhaps there is an alternate solution along the lines of what I proposed above??
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Cluso,
    I was iffy too. But it's not as scary as first seems:

    1: Chip is adjusting the skews to bring the DIR timing earlier. Attempting to eliminate all detectable glitches. Same as Prop1.

    2: We know the glitch vanishes if OUT clocking at the pad is enabled. So the hardware you suggested is already in the finished Prop2.

    3: OUT clocking could be enabled by default by the mask ROM. If all else fails, that's all it would take to get what you've asked for. One small software change.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • jmgjmg Posts: 13,264
    evanh wrote: »
    3: OUT clocking could be enabled by default by the mask ROM. If all else fails, that's all it would take to get what you've asked for. One small software change.
    That patches the boot issue, but does not fix the underlying silicon flaw.


    evanh wrote: »
    1: Chip is adjusting the skews to bring the DIR timing earlier. Attempting to eliminate all detectable glitches. Same as Prop1.

    Fiddling with skews needs extreme care, to avoid fixing one bug, and introducing another... (where you are not looking) here is an example
    DIR =================\__________________________/===================
    uOUT ===============================================================
    fOUT ===============\__________________________/====================
    Pin ================|~~~~~~~~~~~~~~~~~~~~~~~~~~~====================
                        ^--- current silicon issue
                        
    DIR =================\__________________________/===================
    uOUT ===============================================================
    dOUT ==================\__________________________/=================
    Pin =================~~~~~~~~~~~~~~~~~~~~~~~~~~~\_/=================
                         ^^ fixed                    ^^ oops, new bug
    fOUT = async forced OUT
    dOUT = delayed forced OUT (eg delay gates, or FF)
    

  • Prop1 did it this way and no one noticed. A fallback already exists.

    The only alternative I can see is force OUT clocking as hard-wired. Over-kill, imho.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • jmg wrote: »
    evanh wrote: »
    3: OUT clocking could be enabled by default by the mask ROM. If all else fails, that's all it would take to get what you've asked for. One small software change.
    That patches the boot issue, but does not fix the underlying silicon flaw.


    evanh wrote: »
    1: Chip is adjusting the skews to bring the DIR timing earlier. Attempting to eliminate all detectable glitches. Same as Prop1.

    Fiddling with skews needs extreme care, to avoid fixing one bug, and introducing another... (where you are not looking) here is an example
    DIR =================\__________________________/===================
    uOUT ===============================================================
    fOUT ===============\__________________________/====================
    Pin ================|~~~~~~~~~~~~~~~~~~~~~~~~~~~====================
                        ^--- current silicon issue
                        
    DIR =================\__________________________/===================
    uOUT ===============================================================
    dOUT ==================\__________________________/=================
    Pin =================~~~~~~~~~~~~~~~~~~~~~~~~~~~\_/=================
                         ^^ fixed                    ^^ oops, new bug
    fOUT = async forced OUT
    dOUT = delayed forced OUT (eg delay gates, or FF)
    

    The latter "^^oops, new bug" wouldn't be a problem, as the pin would correct to its driven state in under 1ns. It's not that often that DIR changes, anyway. It's a glaring problem, though, when you float a high pin and get a negative glitch that leaves that pin in a limbo state.
  • jmgjmg Posts: 13,264
    edited 2018-10-23 - 21:16:28
    cgracey wrote: »
    The latter "^^oops, new bug" wouldn't be a problem, as the pin would correct to its driven state in under 1ns.

    Not if it is register delayed, which is what was being suggested.
    cgracey wrote: »
    It's not that often that DIR changes, anyway. It's a glaring problem, though, when you float a high pin and get a negative glitch that leaves that pin in a limbo state.

    An unexpected glitch on a clock pin is still a problem, even if it lacks the stray C Sample&Hold aspect of the Drive-to-Float glitch, that boot bumps into.


    Summary: This is a break-before-make problem, which requires that each edge has a different avoid-overlap scheme. A simple delay-fix is not enough.
  • I had noticed in existing software that there wasn't much care about extra clocks on SPI/SD buses. The select pin syncs them up again.


    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • Another variant of changing the default, instead of putting it in the ROM, could be to invert the meaning of the C config bit in the PAD_IO config. Make it mean unclocked instead of clocked.

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • evanh wrote: »
    Another variant of changing the default, instead of putting it in the ROM, could be to invert the meaning of the C config bit in the PAD_IO config. Make it mean unclocked instead of clocked.

    That could be done. The application of that fix would vary with the pin mode, though. Could be done in the core.
  • As far as I can see, a simple logic change in Verilog as I suggested above, would fix this completely.
    Add a one clock delay to DIR if it goes low which enables drive. Add a one clock delay to OUT if DIR goes high which tri-states the pin.

    BTW P1 silicon is much slower so there probably was never a glitch. Don't forget it was hand laid so this logic was probably thought about.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Just got word from Wendy at OnSemi that if I introduce clock gating into my Verilog code, their synthesis tools will generate the appropriate structures and clock trees. Shouldn't cause a hiccup in the respin process.

    To do clock gating, a clock-qualifier in the form of a flop Q output would be AND'd with the clock signal which would have subsequent high fan-out. Seems like that flop and AND gate could hardly be early enough in the clock tree, considering the subsequent fanout to maybe 4,500 flops per cog.

    I'm still getting the streamer bug fixed. I had too many things too close together. Needing to separate them for sanity and provability.
  • What streamer bug? I must have missed something...

    All I remember is the pin glitches and the colorspace converter issue...
    Prop Info and Apps: http://www.rayslogic.com/
  • jmgjmg Posts: 13,264
    Rayman wrote: »
    What streamer bug? I must have missed something...

    All I remember is the pin glitches and the colorspace converter issue...

    This one I think
    https://forums.parallax.com/discussion/comment/1450278/#Comment_1450278
  • cgracey wrote: »
    Just got word from Wendy at OnSemi that if I introduce clock gating into my Verilog code, their synthesis tools will generate the appropriate structures and clock trees. Shouldn't cause a hiccup in the respin process.

    I think the general consensus on the forum was in favour of highest possible performance.
    How might this impact the potential top MHz ?

    J
Sign In or Register to comment.