Bodes well for multiple cogs in a DE0, even allowing for quite a few LEs for each smart pin.
I'm curious how you divvied up the 6 ALU blocks to reduce current - by critical path? Instruction coding? Instruction frequency?
By functional blocks:
add, inc, rot, logic, mul, mux
Each block's internals take about the same time to resolve, while each block varies in response time. They are all within 60% of each other's sizes. I arranged the instruction op codes so that the blocks were easy to decode. Their different response times makes the final mux easy to arrange, with the slowest (add) at the top of the mux.
Right now, I'm going to make the 'add' block go faster by pre-decoding a few internal signals before the flops. This will get the adder off to a faster start and earlier finish time, as it is sticking WAY out, slowing everything down. In this design, the Z flag must be totally resolved by the end of the ALU cycle. On the Prop2, it was half-resolved, with other circuits getting the final result early in the next cycle. With these power-saving flop partitions, though, everything must be done at the end of the cycle, so the Z-test must complete AFTER the result mux. This tacks more time onto the cycle. By getting rid of SUMZ/SUMNZ (which were hardly used, according to the analyses Phil, Heater, and Roy did), I can pre-determine the bottom-level inputs to the 'add' block to make it go faster. This thing is sticking out like a sore thumb, causing the Z-test to stretch things out to 100MHz, when we were 120MHz before. I had gotten one compile to go 140MHz, but I haven't been able to repeat it.
This thing is sticking out like a sore thumb, causing the Z-test to stretch things out to 100MHz, when we were 120MHz before. I had gotten one compile to go 140MHz, but I haven't been able to repeat it.
100/120/140 MHz or MIPS? Didn't it use to be 200MHz?
New Cog ALU 1800 LEs
P1 Cog complete 1850 LEs
P2 Cog complete ~30,000 LEs
Shows the complexity and hence power of the resulting P2 design.
The shakedown seems to have resulted in something extra-ordinary!!
A very efficient design plus >200MHz likely too.
In the quest to get more instructions, I was wondering if anyone has thought about using the IF_xx bits for instructions that would make no sense if they were conditional.
For example "IF_Z NEGZ value1, value2" would not make sense, since the instruction could be assembled as just "NEG value1,value2".
Likewise "IF_NZ NEGZ value,value2" would be assembled as just "MOV value1,value2".
So could not the IF_Z condition flag be used to indicate a totally different instruction for NEGZ and the like ?
In the quest to get more instructions, I was wondering if anyone has thought about using the IF_xx bits for instructions that would make no sense if they were conditional.
For example "IF_Z NEGZ value1, value2" would not make sense, since the instruction could be assembled as just "NEG value1,value2".
Likewise "IF_NZ NEGZ value,value2" would be assembled as just "MOV value1,value2".
So could not the IF_Z condition flag be used to indicate a totally different instruction for NEGZ and the like ?
No, you can't substitute like that. In the first example, if Z=0, then NEGZ doesn't execute at all and the D register (value1) does not get changed. NEG (without a predicate) always updates D, which may not be your intent.
Even if there was, the 0.4mm pitch package is big enough already, having it too big would limit some applications I feel. Here's a footprint comparison between P1 and P2 (thermal pad in blue is approximate in size).
Correction, I have used a QFP100 with 0.65mm pitch before (not TQFP).
Doesn't some of the packaging options depend on the one Chip found with the large thermal pad. It sounded like he wanted to use that packager so it's up to what they have to offer.
Mainly bonding mapping and deciding what not to bond, as well as NREs for test and packaging setups.
Amkor show these possible pin counts, in a Thermal Pad, 14x14 package : 52/64/80/100/120/128
and the appx Pin pitch these pin counts imply are
12.5/(128/4) = 0.390625
12.5/(120/4) = 0.416 12.5/(100/4) = 0.5 << Package Chip has chosen
12.5/(80/4) = 0.625 (0.65mm) << Might just manage 64io ?
12.5/(64/4) = 0.78125 (0.8mm) << 64 pin 0.8mm seems common Asian option
12.5/(52/4) = 0.9615
and there is also Body Size 14mm x 20mm, Pitch 0.65mm, and also a 128 pin 0.8mm
There is a bit of a trend for Asian uC vendors (eg Renesas) to offer both 0.5mm and 0.8mm packages, which must be production-flow/handling driven.
I've seen consumer PCBs with very long pads / solder thieves, 45° placed, on what I think was 0.8mm, that looked to be wave soldering handled. - some info here http://www.ami.ac.uk/courses/topics/0170_wsp/
Infineon also says this
[" Wave soldering an exposed-pad QFP is not possible because the package has to be attached to the PCB by SMD glue. In QFPs with exposed pad, this is not possible if the exposed pad is intended to be soldered to the PCB. Furthermore, wave soldering is only possible if the products in QFPs are qualified for wave soldering (passing the solder heat test).
Generally wave soldering of QFPs is difficult because the leads have to be soldered at all four sides of the component. Using a 45° rotated layout is recommended to allow the solder to wet the leads more easily. Wave soldering of QFPs with lead pitches of 0.65 mm or smaller may lead to excessive bridging and therefore is not recommended."]
ie ~0.8mm is wave solder min.
Looks like a common option is the 64p 0.8mm one, question is, what IO's can that give (~48?) , and is that still useful ?
I will use a stainless stencil on the .5mm Prop, not wave, and there will percentage of bridge fixes. .65mm is very easy. I was more asking about .65mm as an option. I will be dropping 3 SO16 wide body parts and 1 SO8 with the implementation oft the new Prop, so I would be happy with the large package.
So much space, yet 25 pins crammed together. I think I'd prefer a BGA 100 10x10 array at 1mm centers, although I understand the thermal needs. To get access to a TQFP pins you will need some tiny traces at the chip fanned out as large a the .65mm part anyway.
I will use a stainless stencil on the .5mm Prop, not wave, and there will percentage of bridge fixes. .65mm is very easy. I was more asking about .65mm as an option. .
If adding another package, it might be smarter to gain more than just slightly better yields on manual assembly.
The 0.8mm option seems to open up wave soldering, and I find there is a 88 pin 0.8mm offering in 20mm x 20mm.
(which may be a better combination that the 28mm 128 pin 0.8mm ?)
88 pins should manage 64io quite well, or the more common 80pins fits 0.8mm at 14mm x 20mm, but has fewer Vcc pins for still 64 io.
Comments
Bodes well for multiple cogs in a DE0, even allowing for quite a few LEs for each smart pin.
I'm curious how you divvied up the 6 ALU blocks to reduce current - by critical path? Instruction coding? Instruction frequency?
That's right.
The cog allows two clocks for the ALU to settle. By that time, the result WILL be ready.
By functional blocks:
add, inc, rot, logic, mul, mux
Each block's internals take about the same time to resolve, while each block varies in response time. They are all within 60% of each other's sizes. I arranged the instruction op codes so that the blocks were easy to decode. Their different response times makes the final mux easy to arrange, with the slowest (add) at the top of the mux.
Right now, I'm going to make the 'add' block go faster by pre-decoding a few internal signals before the flops. This will get the adder off to a faster start and earlier finish time, as it is sticking WAY out, slowing everything down. In this design, the Z flag must be totally resolved by the end of the ALU cycle. On the Prop2, it was half-resolved, with other circuits getting the final result early in the next cycle. With these power-saving flop partitions, though, everything must be done at the end of the cycle, so the Z-test must complete AFTER the result mux. This tacks more time onto the cycle. By getting rid of SUMZ/SUMNZ (which were hardly used, according to the analyses Phil, Heater, and Roy did), I can pre-determine the bottom-level inputs to the 'add' block to make it go faster. This thing is sticking out like a sore thumb, causing the Z-test to stretch things out to 100MHz, when we were 120MHz before. I had gotten one compile to go 140MHz, but I haven't been able to repeat it.
100/120/140 MHz or MIPS? Didn't it use to be 200MHz?
100MHz for the ALU means 100MIPS, or 200MHz clock.
New Cog ALU 1800 LEs
P1 Cog complete 1850 LEs
P2 Cog complete ~30,000 LEs
Shows the complexity and hence power of the resulting P2 design.
The shakedown seems to have resulted in something extra-ordinary!!
A very efficient design plus >200MHz likely too.
Can't wait to see it in action. Just dusting off my nano board as we speak.
Not including branch instructions, right?
For example "IF_Z NEGZ value1, value2" would not make sense, since the instruction could be assembled as just "NEG value1,value2".
Likewise "IF_NZ NEGZ value,value2" would be assembled as just "MOV value1,value2".
So could not the IF_Z condition flag be used to indicate a totally different instruction for NEGZ and the like ?
Just a thought....
Bean
No, you can't substitute like that. In the first example, if Z=0, then NEGZ doesn't execute at all and the D register (value1) does not get changed. NEG (without a predicate) always updates D, which may not be your intent.
We are not short of opcodes. However, on the earlier P2 Chip used these bits to define some instructions.
A LOT more money! (million+ instead of in the 10s-100s of thousands)
Oops, misread.
Even if there was, the 0.4mm pitch package is big enough already, having it too big would limit some applications I feel. Here's a footprint comparison between P1 and P2 (thermal pad in blue is approximate in size).
Correction, I have used a QFP100 with 0.65mm pitch before (not TQFP).
I googled and found someone selling a probe adapter for one, so I guess it does/did exist...
I have to say that 0.65 is enormously easier than 0.5 mm for me.
Don't know if it's an option, but I'd also much prefer 0.65 mm spacing.
For the old P2, maybe it didn't matter because I guess we'd mostly buy Parallax made modules anyway...
But, the new P2 has enough RAM that I think I could do cool things without the need for SDRAM.
So, I'd like to be able to solder the chip myself.
I can do 0.5 mm, but it often needs very difficult rework, needing a loop to inspect the leads...
BTW: Why does everybody quote everything somebody says when replying? People seem pretty quote happy here...
Mainly bonding mapping and deciding what not to bond, as well as NREs for test and packaging setups.
Amkor show these possible pin counts, in a Thermal Pad, 14x14 package : 52/64/80/100/120/128
and the appx Pin pitch these pin counts imply are
12.5/(128/4) = 0.390625
12.5/(120/4) = 0.416
12.5/(100/4) = 0.5 << Package Chip has chosen
12.5/(80/4) = 0.625 (0.65mm) << Might just manage 64io ?
12.5/(64/4) = 0.78125 (0.8mm) << 64 pin 0.8mm seems common Asian option
12.5/(52/4) = 0.9615
and there is also Body Size 14mm x 20mm, Pitch 0.65mm, and also a 128 pin 0.8mm
There is a bit of a trend for Asian uC vendors (eg Renesas) to offer both 0.5mm and 0.8mm packages, which must be production-flow/handling driven.
I've seen consumer PCBs with very long pads / solder thieves, 45° placed, on what I think was 0.8mm, that looked to be wave soldering handled. - some info here http://www.ami.ac.uk/courses/topics/0170_wsp/
Infineon also says this
[" Wave soldering an exposed-pad QFP is not possible because the package has to be attached to the PCB by SMD glue. In QFPs with exposed pad, this is not possible if the exposed pad is intended to be soldered to the PCB. Furthermore, wave soldering is only possible if the products in QFPs are qualified for wave soldering (passing the solder heat test).
Generally wave soldering of QFPs is difficult because the leads have to be soldered at all four sides of the component. Using a 45° rotated layout is recommended to allow the solder to wet the leads more easily. Wave soldering of QFPs with lead pitches of 0.65 mm or smaller may lead to excessive bridging and therefore is not recommended."]
ie ~0.8mm is wave solder min.
Looks like a common option is the 64p 0.8mm one, question is, what IO's can that give (~48?) , and is that still useful ?
Need to figure out what X I need...
If adding another package, it might be smarter to gain more than just slightly better yields on manual assembly.
The 0.8mm option seems to open up wave soldering, and I find there is a 88 pin 0.8mm offering in 20mm x 20mm.
(which may be a better combination that the 28mm 128 pin 0.8mm ?)
88 pins should manage 64io quite well, or the more common 80pins fits 0.8mm at 14mm x 20mm, but has fewer Vcc pins for still 64 io.
http://www.pcb-3d.com/tag/qfp