Much better then, when doing a new chip design, to put on the Seven League Boots and give the rest of the industry a swift kick, yell 'try to keep up,' and set off...
What Ken described is a modestly practical, and more recognizable, upgrade for the Propeller; maybe not a piece-o-cake itself, but still far off from the option taken. His thinking is perfectly understandable. It's very difficult for a relatively small company, particularly in this business, to consistently come out with "try to keep up with us" semiconductors (certainly not in a reasonable time frame). Would some or all of his proposed upgrades garner substantive business for Parallax? Almost certainly, for they are features that people request -- or complain about their absence -- quite often.
I also support Chip's plan and realize that his visions are bigger than building business [which would be easier with incremental improvements]. He's always been 7-10 years ahead of anything I could possibly envision. By the time I was doing HPLOT and VPLOT on the Apple 2 to draw simple graphics he was making the Sound Ace. When I started to effectively use the BASIC Stamp the SX was already available, and the Propeller is still new to me though I've done several projects with it. The Propeller 2 is the same - another giant leap. When we're all done and gone, it'll have been a more interesting story the way he's writing it. But the story has to be written by paid authors and publishers, too, backed by a customer base who believes in what we're doing (you).
Remember, even a broken clock is right twice a day. LOL!
Honestly though, in this instance, I suspect your vision was the more correct one, everything considered. That said, I wish you guys the best of luck.
I can't help feeling that any talk of a "Prop one and a half" is a bit inappropriate at this time.
A huge anount of time and effort and no doubt cash has gone into the PII. It just has to be made to work.
Edit: Not sure how the cost of the last two runs compare to the experimental run that preceded them but if the test run wasn't particularly expensive, perhaps another test run is in order. In it, features and aspects currently in doubt could be vetted. Meanwhile, I'd hate to see the investment already made on the P2 abandoned or shelved.
I would spend the next 2-3 weeks getting as much 'what's functional' info, to build an errata list.
There may yet be a sweet spot of Vdd/Temperature where more works, in the hotter process.
Did JTAG make it into the P2 silicon ? What coverage does that have ?
Just to be clear, we ARE wearing Seven League Boots and don't plan on swapping them out for rubber galoshes. Most of us already have two (or three) pairs of SLBs at Parallax.
We're in P2 to finish it, period. Bang. Boom. Slap hands on table and high-fives....
If there's a short somewhere... Any chance that raising the voltage might burn it out?
I know some guys who put in ESD diodes backwards in a bipolar chip and the chip worked in simulation. (Nobody noticed the high current) They tried to burn them out but weren't able to do it. I think that they were pumping amps into the chips.
I mentioned before that you could potentially decap a chip and look at it with an IR sensitive camera to find the hot spots. Parallax could try doing this from the topside, but my experience is that it could be better to look from the bottom side. I don't know if looking from the bottom would be easy given the chips and breakout boards. There are companies that specialize in this sort of thing - e.g. EAG and others. Hard to say if it's worth doing or not because I didn't have to pay the bills. But it was critical for our project to identify the problem, and this did the trick.
I wonder where Parallax got the RAMs, and if there's a potential for a high current due to the process shift? Just a random thought...
Our current Propeller volume customers are in the 1-10K unit range. Their products tend to be more expensive [i.e., machines, terminals, equipment, control systems], often much greater than $500 and not in the competitive consumer electronics area. If this market holds true for P2 then I'll suggest a fairly high-priced chip no matter where it is fabricated.
That makes sense, thanks for the explanation. The idea of going with a process that lets you spin the chip at less cost and with faster turnaround will get us all real P2 chips sooner. And, if your volume customers aren't that price sensitive it seems like a good idea. Good luck!!
Another angle on this, would be to check how much of a hit occurs if doing
a) Removing the 'traveling wave' nature of the process-tuned, hold time protection
or
b) Adjust this for the fastest process, so it works on both flows.
It could compromise the slower flow a little, as it would be excessively conservative.
I've read from a few sources that this is how to generate a D flip-flop with an asynchronous 'clear' input. It turns out that this is behaving from power-up such that there is no clear, since res_n is low, as power rises, causing no negative edge event, and clk is still the whole time. This problem is apparent by the I/O pins not going high-z until clk begins cycling while internal res_n is still low, but after the RESn pin is raised. This is a headache, but not a showstopper. I need to figure out how to recode these flip-flops so that we wind up with a truly asynchronous reset, not one that must be practically correlated with clk.
From the above, it sounds like you gate off the RC osc, when reset is active, and enable it on release ?
That can make sync release of reset tricky, as the clock has not started when reset is active.
This may be something other than synth syntax issues, It may be some registers are not Async, where you expected them to be, or the reset nodes are not behaving as you expected.
Common in uC, is a Reset path that
a) has a passive (non clocked) glitch filter, so RST ignores pulses < maybe 50ns
b) Does Async port reset, so during reset, ports are in a known state, before any clocks arrive.
3) Cog0 seems to launch, and executes part of the booter program from hub ROM, but it acts erratically. I don't know if this is due to hold-time issues resulting from a process change, or perhaps our memories are not working as designed, due to the process change.
How much of the ROM does it fetch from, that you can trace ?
ie how many bytes before CS appears ?
Could an opcode-decode issue result in unexpected output to another pin ?
Maybe it fails on a specific opcode type, like conditional ?
The combination of High Icc and ROM failures, could indicate errors in the manual ROM patches to the RAM ?
If the ROM path is toast, there could be some test info produced by starting that avoids ROM - ie runs from RAM (even if random) - some chip-cracking tricks I've seen mentioned are power up with no reset at all, and narrow or shallow dips in Vcc.
There, they want to start the device, but side-step the security logic.
That verilog code should be fine. You might want to look at the reset related papers here to see if they stimulate thought. http://www.sunburst-design.com/papers/
Do you think there could be some switches in the synthesis tools that would force them to interpret that Verilog description as a true asynchronous clear DFF? The way the Verilog reads, it's actually not a true async-clear DFF because it doesn't accommodate the power-up case where res_n is low from start-up.
I wonder where Parallax got the RAMs, and if there's a potential for a high current due to the process shift? Just a random thought...
I designed all the memories and Beau laid them out. That whole frame was done like that. We made a test chip a few years ago to test all that stuff and it was okay.
Do you think there could be some switches in the synthesis tools that would force them to interpret that Verilog description as a true asynchronous clear DFF? The way the Verilog reads, it's actually not a true async-clear DFF because it doesn't accommodate the power-up case where res_n is low from start-up.
It is a little inferred, but this is also a common requirement. (so the inferences should be the same in all tools)
You should be able to search the reports, to see what the synthesis actually created.
The cell for an ASYNC RST FF, should have a easy to find name, and nodes.
I designed all the memories and Beau laid them out. That whole frame was done like that. We made a test chip a few years ago to test all that stuff and it was okay.
Did that test device device include ROM ? What Icc did it give ?
It is a little inferred, but this is also a common requirement. (so the inferences should be the same in all tools)
You should be able to search the reports, to see what the synthesis actually created.
The cell for an ASYNC RST FF, should have a easy to find name, and nodes.
Beau found some names of flops connected to resn and I'll get them from him tomorrow and look them up in the TSMC standard cell library description. What we have, anyway, are NOT async-clear DFF's, as they don't do anything until a clock is toggling.
Do you think there could be some switches in the synthesis tools that would force them to interpret that Verilog description as a true asynchronous clear DFF? The way the Verilog reads, it's actually not a true async-clear DFF because it doesn't accommodate the power-up case where res_n is low from start-up.
I've never seen this treated as anything other than an async reset by the tools. I know there are switches or pragmas for sync resets - e.g. to make sure that the reset is applied right before the D input of the flop in the cone of logic. You should be able to look in your gate level netlist to confirm. Did you scan the two reset papers for a quick overview?
Also did you get a pg (power ground) netlist for the core so you can run gatesims with some power modeling? It might help in other areas.
From the above, it sounds like you gate off the RC osc, when reset is active, and enable it on release ?
That can make sync release of reset tricky, as the clock has not started when reset is active.
This may be something other than synth syntax issues, It may be some registers are not Async, where you expected them to be, or the reset nodes are not behaving as you expected.
Common in uC, is a Reset path that
a) has a passive (non clocked) glitch filter, so RST ignores pulses < maybe 50ns
b) Does Async port reset, so during reset, ports are in a known state, before any clocks arrive.
How much of the ROM does it fetch from, that you can trace ?
ie how many bytes before CS appears ?
Could an opcode-decode issue result in unexpected output to another pin ?
Maybe it fails on a specific opcode type, like conditional ?
The combination of High Icc and ROM failures, could indicate errors in the manual ROM patches to the RAM ?
If the ROM path is toast, there could be some test info produced by starting that avoids ROM - ie runs from RAM (even if random) - some chip-cracking tricks I've seen mentioned are power up with no reset at all, and narrow or shallow dips in Vcc.
There, they want to start the device, but side-step the security logic.
It's hard to say if the RAM/ROM is failing, or just the core is failing. Maybe they both are. Cog0 always boots from ROM. There is no alternative possible. It only does one thing on power-up.
I have said many times, as have others, the P2 will not erode P1 sales. In fact, I believe it will enhance them by making the P1 more visible by having an ultra-big-brother.
That verilog code should be fine. You might want to look at the reset related papers here to see if they stimulate thought. http://www.sunburst-design.com/papers/
Thanks for sharing these. I read the first one pretty well and scanned the second one, as it was similar to the first (those two papers regarding resets).
If we coded in a way to ensure async-clear DFF's, how did we not get them?
One thing we did was give a multicycle=2 assignment for the reset path so that on reset release on the passive edge at 20MHz, the tools wouldn't bother synthesizing an overly-aggressive reset network. Could it have construed this multicycle assignment as a license to use synchronous resets? We only intended it to be a timing relaxer (2 clocks at 160MHz = 12.5ns, which is shorter than half a cycle at 20MHz, or 25ns).
It's hard to say if the RAM/ROM is failing, or just the core is failing. Maybe they both are. Cog0 always boots from ROM. There is no alternative possible. It only does one thing on power-up.
Is there any checksum, or does it just run a small state engine that firstly just copies XYZ bytes from Main Memory, ROM area, to COG0 RAM, and then starts the PgmCounter from 0000 ?
Starting without reset could give different results, if it runs more random code.
Thanks for sharing these. I read the first one pretty well and scanned the second one, as it was similar to the first (those two papers regarding resets).
If we coded in a way to ensure async-clear DFF's, how did we not get them?
One thing we did was give a multicycle=2 assignment for the reset path so that on reset release on the passive edge at 20MHz, the tools wouldn't bother synthesizing an overly-aggressive reset network. Could it have construed this multicycle assignment as a license to use synchronous resets? We only intended it to be a timing relaxer (2 clocks at 160MHz = 12.5ns, which is shorter than half a cycle at 20MHz, or 25ns).
I don't believe so, but I wonder if it's proper. I think it was jmg who mentioned synchronizing the deassertion of reset to the clock, and that's discussed in those papers. Let's assume that you're doing this. How will the reset deassertion timing be guaranteed if the tools think that it's a multicycle path? I assume that things start happening immediately after the deassertion of reset correct? Maybe I'm not understanding, but we would use multicycle paths for indicating that a result isn't needed at the next clock. An example might be the result of a multiplier not being needed right away as shown on this Altera page http://www.altera.com/support/software/timequest/clock/tq-clock.html. But for async reset you would have to meet the timing at every clock edge right? Anyways I guess you can talk to the guys that ran synthesis, static timing analysis,... - if they blessed this then hopefully I'm missing something.
Was either Formality or Conformal LEC (I would prefer this if DC was used for synthesis) used to verify that the gate level netlist is logically equivalent to the RTL?
If you can run some gatesims with SDF annotation, then you can experiment a bit to see how the netlist behaves. It should be able to catch gross reset problems as well - e.g. if reset isn't meeting timing.
Do you think there could be some switches in the synthesis tools that would force them to interpret that Verilog description as a true asynchronous clear DFF? The way the Verilog reads, it's actually not a true async-clear DFF because it doesn't accommodate the power-up case where res_n is low from start-up.
In our tools it is possible to insert the exact cell you need. You would need the synthesis guys to add a "don't touch" flag to all of your fixed instances which is easy enough if you choose a good naming convention for those cells.
As you have stated earlier your verilog looks correct, if you have access to any tools that will convert VHDL:
process (clk, reset)
begin
if (reset = '1') then
...
elsif (rising_edge(clk)) then
...
end if;
end process;
Has worked for us for all of our chips (University research lab), but we also have full control of our own chip synthesis.
Good luck with what ever path you choose, I know how frustrating misbehaving chips can be. I am a big propeller fan and can't wait to see the next one in action!
David
Traversing the "RESn" signal (Green) in the image, I traced it to only one of several buffers... ( CLKBUFX3ZZ --> BUFX12 --> ) The BUFFX12 drove five D-FF's and an additional 1X Buffer (<-- All of those nets in Pink) ... See attached image
Did the synthesis guys do any pipelining for you? Could you be seeing the output of a pipeline that was only reset at the far end?
I take it it isn't something silly like the input pad being double flopped
Could you be seeing the output of a pipeline that was only reset at the far end?
Delus, this question made me look in the right place and I see the problem now!
There is a set of flops that all the cogs' OUT and DIR signals pass through. I added those flops to relax timing. All 8 cogs' OUT and DIR signals get OR'd together, then they go through these flops. The output of these flops then go to the actual pins. The thing is, I didn't think to make these flops resettable. That's why the async-cleared pin signals don't show up until after reset and the clock starts toggling. I never noticed this on the FPGA because the main clock was never gated. This means that the chip is getting reset okay - it's just not showing up on the pins immediately.
This can be fixed by modifying our frame circuitry so that the clock is always running during reset. If we resynthesize, we'll change the Verilog to add the resets.
Okay. Reset problem understood. That's not a showstopper, after all.
We still have 44ma quiescent current and possible hold-time and/or memory problems.
Thanks to all you guys for thinking about these issues.
Comments
Honestly though, in this instance, I suspect your vision was the more correct one, everything considered. That said, I wish you guys the best of luck.
In the meantime, Propeller 1 it is.
A huge anount of time and effort and no doubt cash has gone into the PII. It just has to be made to work.
Edit: Not sure how the cost of the last two runs compare to the experimental run that preceded them but if the test run wasn't particularly expensive, perhaps another test run is in order. In it, features and aspects currently in doubt could be vetted. Meanwhile, I'd hate to see the investment already made on the P2 abandoned or shelved.
There may yet be a sweet spot of Vdd/Temperature where more works, in the hotter process.
Did JTAG make it into the P2 silicon ? What coverage does that have ?
This detail is important, but given the full-custom split of the I/O, this is not a simple problem to fix.
What might be practical, is a Test pin that splits access to the Core/IO, and allows independent HW based test coverage of the IO area, and core.
We're in P2 to finish it, period. Bang. Boom. Slap hands on table and high-fives....
I know some guys who put in ESD diodes backwards in a bipolar chip and the chip worked in simulation. (Nobody noticed the high current) They tried to burn them out but weren't able to do it. I think that they were pumping amps into the chips.
I mentioned before that you could potentially decap a chip and look at it with an IR sensitive camera to find the hot spots. Parallax could try doing this from the topside, but my experience is that it could be better to look from the bottom side. I don't know if looking from the bottom would be easy given the chips and breakout boards. There are companies that specialize in this sort of thing - e.g. EAG and others. Hard to say if it's worth doing or not because I didn't have to pay the bills. But it was critical for our project to identify the problem, and this did the trick.
I wonder where Parallax got the RAMs, and if there's a potential for a high current due to the process shift? Just a random thought...
a) Removing the 'traveling wave' nature of the process-tuned, hold time protection
or
b) Adjust this for the fastest process, so it works on both flows.
It could compromise the slower flow a little, as it would be excessively conservative.
From the above, it sounds like you gate off the RC osc, when reset is active, and enable it on release ?
That can make sync release of reset tricky, as the clock has not started when reset is active.
This may be something other than synth syntax issues, It may be some registers are not Async, where you expected them to be, or the reset nodes are not behaving as you expected.
Common in uC, is a Reset path that
a) has a passive (non clocked) glitch filter, so RST ignores pulses < maybe 50ns
b) Does Async port reset, so during reset, ports are in a known state, before any clocks arrive.
How much of the ROM does it fetch from, that you can trace ?
ie how many bytes before CS appears ?
Could an opcode-decode issue result in unexpected output to another pin ?
Maybe it fails on a specific opcode type, like conditional ?
The combination of High Icc and ROM failures, could indicate errors in the manual ROM patches to the RAM ?
If the ROM path is toast, there could be some test info produced by starting that avoids ROM - ie runs from RAM (even if random) - some chip-cracking tricks I've seen mentioned are power up with no reset at all, and narrow or shallow dips in Vcc.
There, they want to start the device, but side-step the security logic.
Do you think there could be some switches in the synthesis tools that would force them to interpret that Verilog description as a true asynchronous clear DFF? The way the Verilog reads, it's actually not a true async-clear DFF because it doesn't accommodate the power-up case where res_n is low from start-up.
I designed all the memories and Beau laid them out. That whole frame was done like that. We made a test chip a few years ago to test all that stuff and it was okay.
Only the core blob came from outside.
It is a little inferred, but this is also a common requirement. (so the inferences should be the same in all tools)
You should be able to search the reports, to see what the synthesis actually created.
The cell for an ASYNC RST FF, should have a easy to find name, and nodes.
Beau found some names of flops connected to resn and I'll get them from him tomorrow and look them up in the TSMC standard cell library description. What we have, anyway, are NOT async-clear DFF's, as they don't do anything until a clock is toggling.
The ROM was implemented by changing out some 6T SRAM cells with 1T ROM cells. I don't remember what the Icc was, but it was low on the SRAMs.
I've never seen this treated as anything other than an async reset by the tools. I know there are switches or pragmas for sync resets - e.g. to make sure that the reset is applied right before the D input of the flop in the cone of logic. You should be able to look in your gate level netlist to confirm. Did you scan the two reset papers for a quick overview?
Also did you get a pg (power ground) netlist for the core so you can run gatesims with some power modeling? It might help in other areas.
It's hard to say if the RAM/ROM is failing, or just the core is failing. Maybe they both are. Cog0 always boots from ROM. There is no alternative possible. It only does one thing on power-up.
Now, where is that P2 problem???
Thanks for sharing these. I read the first one pretty well and scanned the second one, as it was similar to the first (those two papers regarding resets).
If we coded in a way to ensure async-clear DFF's, how did we not get them?
One thing we did was give a multicycle=2 assignment for the reset path so that on reset release on the passive edge at 20MHz, the tools wouldn't bother synthesizing an overly-aggressive reset network. Could it have construed this multicycle assignment as a license to use synchronous resets? We only intended it to be a timing relaxer (2 clocks at 160MHz = 12.5ns, which is shorter than half a cycle at 20MHz, or 25ns).
Is there any checksum, or does it just run a small state engine that firstly just copies XYZ bytes from Main Memory, ROM area, to COG0 RAM, and then starts the PgmCounter from 0000 ?
Starting without reset could give different results, if it runs more random code.
I don't believe so, but I wonder if it's proper. I think it was jmg who mentioned synchronizing the deassertion of reset to the clock, and that's discussed in those papers. Let's assume that you're doing this. How will the reset deassertion timing be guaranteed if the tools think that it's a multicycle path? I assume that things start happening immediately after the deassertion of reset correct? Maybe I'm not understanding, but we would use multicycle paths for indicating that a result isn't needed at the next clock. An example might be the result of a multiplier not being needed right away as shown on this Altera page http://www.altera.com/support/software/timequest/clock/tq-clock.html. But for async reset you would have to meet the timing at every clock edge right? Anyways I guess you can talk to the guys that ran synthesis, static timing analysis,... - if they blessed this then hopefully I'm missing something.
Was either Formality or Conformal LEC (I would prefer this if DC was used for synthesis) used to verify that the gate level netlist is logically equivalent to the RTL?
If you can run some gatesims with SDF annotation, then you can experiment a bit to see how the netlist behaves. It should be able to catch gross reset problems as well - e.g. if reset isn't meeting timing.
In our tools it is possible to insert the exact cell you need. You would need the synthesis guys to add a "don't touch" flag to all of your fixed instances which is easy enough if you choose a good naming convention for those cells.
As you have stated earlier your verilog looks correct, if you have access to any tools that will convert VHDL: Has worked for us for all of our chips (University research lab), but we also have full control of our own chip synthesis.
Good luck with what ever path you choose, I know how frustrating misbehaving chips can be. I am a big propeller fan and can't wait to see the next one in action!
David
Beau, I looked up all those flops and they do have asynchronous resets!
The chip behaves, though, exactly as if reset was first going through a clocked flop, before it enters the blob. I don't know how this can be.
I take it it isn't something silly like the input pad being double flopped
Then something else must be in the paths.
Does the reset also drive some FF, right at the IO pins too, in order to force pin state ?
or do you rely on the core, to define some of the mapped IO registers, before IO is defined ?
Delus, this question made me look in the right place and I see the problem now!
There is a set of flops that all the cogs' OUT and DIR signals pass through. I added those flops to relax timing. All 8 cogs' OUT and DIR signals get OR'd together, then they go through these flops. The output of these flops then go to the actual pins. The thing is, I didn't think to make these flops resettable. That's why the async-cleared pin signals don't show up until after reset and the clock starts toggling. I never noticed this on the FPGA because the main clock was never gated. This means that the chip is getting reset okay - it's just not showing up on the pins immediately.
This can be fixed by modifying our frame circuitry so that the clock is always running during reset. If we resynthesize, we'll change the Verilog to add the resets.
Okay. Reset problem understood. That's not a showstopper, after all.
We still have 44ma quiescent current and possible hold-time and/or memory problems.
Thanks to all you guys for thinking about these issues.
You were right there!