Propeller II update - BLOG

pedward · 2013-09-12 20:46

LOL, that would be an amusing irony, an ULP Propeller 2 chip by accident!

Bill Henning · 2013-09-12 20:54

Very interesting update Chip.

Hopefully running at a low voltage will do the trick and let you get into the monitor.

I wonder what implications (the process being so much faster) will have for the final clock speed...

cgracey · 2013-09-12 21:01

Bill Henning wrote: »

Very interesting update Chip.

Hopefully running at a low voltage will do the trick and let you get into the monitor.

I wonder what implications (the process being so much faster) will have for the final clock speed...

Well, what we need to do is go back to the original fab and their low-power process. By going to OnSemi, we got chips two weeks faster for only $30k, instead of $60k at TSMC, but had to go with a hotter process. If we can accomplish some good verification work with these new chips, by running at a lower voltage, we'll be confident about trying again at TSMC.

KeithE · 2013-09-12 21:02

Are split lots in the works? And do you have access to the WAT (wafer acceptance test) data just to see where your process is? It's every chip designer's worst nightmare to have hold time problems. Besides lowering the voltage could you raise the temperature? I've seen a video of you running chips in a thermal chamber.

Can John run static timing analysis for this process to see where you are theoretically? Hopefully there's an appropriate library that you can use? Maybe he can also tell you how far off you would be at a certain corner condition.

Bill Henning · 2013-09-12 21:12

If, by lowering the voltage, you can get some good verification... might it not be worthwhile to re-tune for the hotter process? Would it not result in a higher clock rate? (at the expense of more power consumption)

Of course that assumes that it would not be a huge ammount of work. Mind you, if TMSC would get P2 out faster...

It sounds like you could do two shuttle runs at OnSemi for the cost of one at TSMC ... would two runs likely be enough to nail the timing issues?

It will be really interesting to see how far this batch will go with BOE disabled tomorrow.

cgracey wrote: »

Well, what we need to do is go back to the original fab and their low-power process. By going to OnSemi, we got chips two weeks faster for only $30k, instead of $60k at TSMC, but had to go with a hotter process. If we can accomplish some good verification work with these new chips, by running at a lower voltage, we'll be confident about trying again at TSMC.

cgracey · 2013-09-12 21:14

KeithE wrote: »

Are split lots in the works? And do you have access to the WAT (wafer acceptance test) data just to see where your process is? It's every chip designer's worst nightmare to have hold time problems. Besides lowering the voltage could you raise the temperature? I've seen a video of you running chips in a thermal chamber.

Can John run static timing analysis for this process to see where you are theoretically? Hopefully there's an appropriate library that you can use? Maybe he can also tell you how far off you would be at a certain corner condition.

I think if we had stuck with the intended fab process, things would have been fine today.

Redoing the synthesis work would cost a fortune, so we're going to head in the direction of using the original process. I should have had an idea that taking something as complex as that logic block, with all those tuned timing paths, and fab'ing it in a process that was outside of the original process' corners, probably wouldn't work right. I never considered hold-time, and only thought how setup-time wouldn't be an issue.

As far as John goes, the branch of Open-Silicon he worked for in Eau Claire was shut down right after our project was finished, so he went to work for Intel and is busy dealing with their stuff now.

Tomorrow, I hope, between lower voltage and temperature control, if necessary, we can get it to function as intended. That would be mighty redemptive.

jmg · 2013-09-12 21:20

cgracey wrote: »

Well, what we need to do is go back to the original fab and their low-power process. By going to OnSemi, we got chips two weeks faster for only $30k, instead of $60k at TSMC, but had to go with a hotter process. If we can accomplish some good verification work with these new chips, by running at a lower voltage, we'll be confident about trying again at TSMC.

I'm not quite following, as everything should track automatically ?

or do you have so much delay-tuning in this to maximize the speed, that it becomes more like a traveling wave design...with a definite sweet spot of operation.

Is this issue is between the core, and peripherals, or within the core itself ?

cgracey · 2013-09-12 21:28

jmg wrote: »

I'm not quite following, as everything should track automatically ?

or do you have so much delay-tuning in this to maximize the speed, that it becomes more like a traveling wave design...with a definite sweet spot of operation.

Is this issue is between the core, and peripherals, or within the core itself ?

That was my thinking (everything tracks), because I design our circuits where hold time is never an issue. To get more time out of a clock cycle, though, synthesis tools will play with clock delays and data path times to make more fit. So, it does become kind of like a traveling wave design that is tuned for a sweet spot of operation - that sweet spot being the area within the process corners. We fab'd outside of that area. It all has to do with what goes on inside the core, and not within our own circuitry.

jmg · 2013-09-12 21:37

You could do a simulation run, to check if there is a sweet spot on the process you actually used, but I guess that is not cheap ?

KeithE · 2013-09-12 21:57

jmg wrote: »

You could do a simulation run, to check if there is a sweet spot on the process you actually used, but I guess that is not cheap ?

You would typically look at this with static timing analysis for designs built out of standard cells, and if the libraries are available then it's quick and shouldn't cost much. Most of the work should have already been taken care of earlier in the project. You can run dynamic sims too just to double check that you haven't mislead the various tools - although there are tools to check your constraints too. It sounds like characterized libraries aren't available, otherwise Parallax would have run static timing with them before tapeout.

Edited to add - maybe someone has a good tapeout checklist. I can't give you the one we use, but here's a simple example http://suwito.net/documents/Design_Review_Checklist.pdf

jmg · 2013-09-12 22:01

KeithE wrote: »

You would typically look at this with static timing analysis for designs built out of standard cells, and if the libraries are available then it's quick and shouldn't cost much. Most of the work should have already been taken care of earlier in the project. You can run dynamic sims too just to double check that you haven't mislead the various tools - although there are tools to check your constraints too. It sounds like characterized libraries aren't available, otherwise Parallax would have run static timing with them before tapeout.

Could be worth doing, if the part works in most other areas.

Otherwise you can never be sure if you have a hard logic issue, or just a timing phantom that will go away.
A dice-roll best avoided.

Smarter to know where to place the chip to give the best possible test coverage.

Ym2413a · 2013-09-12 23:05

That's a whole lotta' craziness!
Hope the tests tomorrow can find out more.

Keep up the hard work! : ]

Cluso99 · 2013-09-13 00:03

i have been monitoring the thread all day. Thanks for all the updates. It is so fascinating to see what is happening.

I am hoping that you can get some functionality out of the chips tomorrow.

Never realised there are different processes within the same feature size.

Ym2413a · 2013-09-13 00:34

It is really neat to hear about all the little details that go into designing, testing & producing something of this complexity. : ]
I'm glued to this thread!

Ahle2 · 2013-09-13 02:22

I will try to sum up the situation.

Findings

The Flash SPI_CS (P89) signal looks correct, which can only mean that it's running the booter program.
Certain pins aren't transitioning when they should, even though the proper amount of time elapses for those instructions.
The timing looks all correct.
It seems like the cog is not executing all instructions properly because an instruction will execute to toggle a pin one way, and then another way, and one of those instructions doesn't seem to execute.
Monitor is not comming up. (because some instructions doesn't execute properly)
The internal RC oscillator seems to be running at 36MHz. Should be 20 MHz. (fast vs low-power process)
As Dave lowered the voltage on the test setup, he was getting more sensible behavior on the scope.

Cause of failure (current best thinking)

The chip was designed for a low-power 180nm process, but this run was fabricated using a fast 180nm process.
The difference between the intended process(low-power) and the one used(fast) will introduce hold time violations.
Hold time violations are (probably) the reason why some instructions doesn't exectute properly.

Left to test

Lower the voltage to see if that fixes the hold time violations issue.
If the "voltage fix" works, there's A LOT more to test. (we might even see Chip playing Space Invader with a propeller hat on)

/Johannes

Heater. · 2013-09-13 02:51

I need a drink.

NumPy · 2013-09-13 05:54

I would not recommend drinking while on the forums.

Rayman · 2013-09-13 06:09

I wonder if dunking it in liquid nitrogen would help...

David Betz · 2013-09-13 06:42

Rayman wrote: »

I wonder if dunking it in liquid nitrogen would help...

Are you talking about dunking the P2 chips into liquid nitrogen or yourself? :-)

Mike Green · 2013-09-13 06:51

Thanks again for sharing. "redemptive" ... I like that. After your stories about debugging the Prop 1 with the electron beam probe, I had assumed you could do something like that with at least part of the Prop 2, but, of course, with the increased density, the greater number of interconnect layers, and the complexity of the synthesized portion, that's not going to happen. This is really a "design it for the most part to just work" kind of effort. Good that you're patient and methodical.

cgracey · 2013-09-13 07:46

Mike Green wrote: »

Thanks again for sharing. "redemptive" ... I like that. After your stories about debugging the Prop 1 with the electron beam probe, I had assumed you could do something like that with at least part of the Prop 2, but, of course, with the increased density, the greater number of interconnect layers, and the complexity of the synthesized portion, that's not going to happen. This is really a "design it for the most part to just work" kind of effort. Good that you're patient and methodical.

We could use the electron beam prober on our own circuitry to good effect, but that mass of synthesized logic is a giant black box to us. The entire core comes down to "go" or "no go". Hopefully, we'll have some better luck this morning by running at lower voltage. Testing should start up again in about an hour.

David Betz · 2013-09-13 08:12

cgracey wrote: »

We could use the electron beam prober on our own circuitry to good effect, but that mass of synthesized logic is a giant black box to us. The entire core comes down to "go" or "no go". Hopefully, we'll have some better luck this morning by running at lower voltage. Testing should start up again in about an hour.

Hi Chip. Thanks for the updates! Let's hope that running at a lower voltage allows you to get further in your testing.

DaveJenson · 2013-09-13 08:35

Thanks for all the updates!
You know we are all hoping for a successful test.

Where else but Parallax could we get such open step by step reporting?!?
It's wonderful (full of wonder).

photomankc · 2013-09-13 08:56

DaveJenson wrote: »

Thanks for all the updates!
Where else but Parallax could we get such open step by step reporting?!?
It's wonderful (full of wonder).

I agree, it's absolutely fascinating to get this insight into the process. Win-lose-or-draw, I have learned a ton of things just watching this all unfold.

Good luck Chip n' Dave!

cgracey · 2013-09-13 09:57

I was thinking about this hold-time matter this morning and I think I realized something.

I've noticed for a while now that modern chips don't respond well to increased voltage like chips used to (they would run faster). I had been assuming that modern processes had somehow changed this basic 'law', but I'm thinking now that it is the modern design methodology that has changed things. In the old days, simple clock trees were deterministic and kept skew below what would cause hold-time problems. Nowadays, clock trees are automatically generated and they are quite complicated, tending towards high skew, though uncertainty limits are known, so Q-to-D paths (flipflop-to-flipflop) must be padded with buffers to ensure Q-to-D propagation times are not too small, opening the door to hold-time problems. This is economical, in a way, because you get to spend less area (and thought) on clock distribution and more on logic, with some logic being used to pad out Q-to-D propagation paths. The trouble is, though, that since hold-time problems are not eliminated by design practice, like they used to be, leaving only setup time as a concern (which manifested as a maximum operating frequency limitation), delay paths have to be implemented using expected process limitations as constraints. So, now, with modern design practice, going to a faster process will actually cause things to break, whereas before, you'd just get the benefit of increase Fmax at the cost of greater power dissipation.

KeithE · 2013-09-13 10:37

There are some presentations on this sort of thing at http://www.eecs.wsu.edu/~ee587/Handouts/ for those who want some exposure - for example lecture 7 discusses clock trees and power distribution.

If you have your static timing analysis report then you can see how much hold margin you have. Then you could estimate how it would be impacted by a process shift. Maybe John already gave you a number.

Have you double checked that OnSemi can't dial in their process to match TSMC? I thought that this was common, but I don't deal at this level. I just know that we tapeout to multiple foundries in parallel since our customers require it.

Cluso99 · 2013-09-13 11:07

cgracey wrote: »

I was thinking about this hold-time matter this morning and I think I realized something.

I've noticed for a while now that modern chips don't respond well to increased voltage like chips used to (they would run faster). I had been assuming that modern processes had somehow changed this basic 'law', but I'm thinking now that it is the modern design methodology that has changed things. In the old days, simple clock trees were deterministic and kept skew below what would cause hold-time problems. Nowadays, clock trees are automatically generated and they are quite complicated, tending towards high skew, though uncertainty limits are known, so Q-to-D paths (flipflop-to-flipflop) must be padded with buffers to ensure Q-to-D propagation times are not too small, opening the door to hold-time problems. This is economical, in a way, because you get to spend less area (and thought) on clock distribution and more on logic, with some logic being used to pad out Q-to-D propagation paths. The trouble is, though, that since hold-time problems are not eliminated by design practice, like they used to be, leaving only setup time as a concern (which manifested as a maximum operating frequency limitation), delay paths have to be implemented using expected process limitations as constraints. So, now, with modern design practice, going to a faster process will actually cause things to break, whereas before, you'd just get the benefit of increase Fmax at the cost of greater power dissipation.

Guess this is one of the consequences of increased complexity. Just imagine trying to manually do an i7 x86 with >billion transistors!

Delus · 2013-09-13 11:14

Cluso99 wrote: »

Guess this is one of the consequences of increased complexity. Just imagine trying to manually do an i7 x86 with >billion transistors!

Most of those billion+ transistors are memory cells which are done by hand + quite a few copy and pastes

cgracey · 2013-09-13 12:28

Testing this morning at lower voltages had mixed results, none very promising. Dave's heading up to my place with the test chips now. Maybe this evening we'll have a better idea about what the problems are.

David Betz · 2013-09-13 12:49

cgracey wrote: »

Testing this morning at lower voltages had mixed results, none very promising. Dave's heading up to my place with the test chips now. Maybe this evening we'll have a better idea about what the problems are.

Bummer. Are you and Dave going to have home-cooked pizza tonight from one of your pizza ovens?
I wonder how much it would cost to ship one of those to NH? :-)

Propeller II update - BLOG

Comments