LOGIC TOO BIG!!!!
cgracey
Posts: 14,152
The synthesis guy just came back and said that the logic+memories area is looking to be 72 mm2. We only have 58 mm2 of space in the middle of our huge 8.5 x 8.5 mm die. We are 24% oversized!!!
We need to shave at least 14 mm2 (72 - 58).
We have 16 instances of 8192x32 SP RAM at 1.57mm2 = ~25mm2.
We have 32 instances of 512x32 DP RAM at 0.292mm2 = ~9.3mm2.
Those RAMs total to ~34mm2.
This means our current logic is 72 - 34 = 38 mm2. That's over 3x what Prop2-Hot was with 8 cogs! The smart pins are contributing to this bloat.
Each smart pin is 1/9th the logic of a cog, so 64 of them are equivalent to 64/9 = 7 cogs. The CORDIC is equivalent to 2 cogs. So, we have 16 + 7 + 2 = 25 cogs' equivalent of logic here. This means a cog's worth of logic is about 1.5 mm2 (38 / 25).
If we cut the cogs from 16 down to 8, we save 8 * (1.5 mm2 for logic + 2 * 0.292 mm2 for 512x32 DP RAMs) = 16.7 mm2. That gets us where we need to be, by 2.7 mm2. That 2.7 mm2 will hopefully be enough spare room.
I told him to reduce the cogs from 16 to 8. That also halves the main RAM, for now, but I believe they have a 16kx32 instance we could use to keep the hub RAM at 512KB. We'll see what he comes back with.
We need to shave at least 14 mm2 (72 - 58).
We have 16 instances of 8192x32 SP RAM at 1.57mm2 = ~25mm2.
We have 32 instances of 512x32 DP RAM at 0.292mm2 = ~9.3mm2.
Those RAMs total to ~34mm2.
This means our current logic is 72 - 34 = 38 mm2. That's over 3x what Prop2-Hot was with 8 cogs! The smart pins are contributing to this bloat.
Each smart pin is 1/9th the logic of a cog, so 64 of them are equivalent to 64/9 = 7 cogs. The CORDIC is equivalent to 2 cogs. So, we have 16 + 7 + 2 = 25 cogs' equivalent of logic here. This means a cog's worth of logic is about 1.5 mm2 (38 / 25).
If we cut the cogs from 16 down to 8, we save 8 * (1.5 mm2 for logic + 2 * 0.292 mm2 for 512x32 DP RAMs) = 16.7 mm2. That gets us where we need to be, by 2.7 mm2. That 2.7 mm2 will hopefully be enough spare room.
I told him to reduce the cogs from 16 to 8. That also halves the main RAM, for now, but I believe they have a 16kx32 instance we could use to keep the hub RAM at 512KB. We'll see what he comes back with.
Comments
It is still 512 KB HUB ram, but just 8 COGs, right?
Bad news.
edit: ahh, I see 16kx32.
Mike
Really, I don't know that 8 cogs would be too few. They do a lot more than on Prop1, so 16 might have been huge overkill.
OnSemi does have a 16384 x 32 SP RAM instance we can use to keep the RAM at 512KB.
They are resynthesizing now with 8 cogs.
Was Prop2-Hot never going to have a Smartpins like feature?
It just had counters, like the Prop1.
We really need to keep all the smart pins, because they are needed to control the analog portions of each pin.
16 sounded good, but HUBEXEC and LUT ram will reduce the need for COGs bound to a single Object/Protocol as usually done on the P1.
TACHYON for example runs mostly on one COG and produces quite small wonders on a P1.
Time to write some fast P2-P2 transfer code using a couple of smart pins on each side to make it more easy to use more then one P2 on a board. The serial boot already supports multiple P2s just make it easy to use multiple P2 in a sane manner.
Mike
ONWARD!
Only 8 cogs hurts a bit LOT.
At least hub and smart pins remain and fingers crossed silicon >160MHz.
Have to change P2 planning a bit.
512kB better than 16 cogs for most, I think.
Now sit tight for power estimations, though I have to admit that package at 20C/watt JA is quite impressive
Please go ahead and build it and they will come.
One could say a truly P2.
And then there are the smart-pins, way more powerful as them 2 counters of the P1 COG and 8 per COG, statistically.
Also displaying 16 separate COGs in Blockly needs way to much horizontal space on the screen, and will be confusing...
and that 2.7mm2 could be used for a little bit more HUB, like say 640K?, 596K?, EEPROM?
Ducks and runs for cover...
Mike
512k wins no matter how many different ways you look at it!
and should work nicely with your new SPIN interpreter.
Is jmpret still there in P2? I struggled long with it and now that I basically now how to use it, I would really miss the co-routine aspect of it. Time-slicing at will.
Mike
512K is already too small, really. 128K would be a death wish.
So you are saying it is being synthesized at 256k + 8 COGs, with a hope it could be 512k +8 COGs ?
-Phil
The 8-cog 166MHz synthesis run completed really quickly, which is always a good sign. The 16-cog synthesis effort had been running for most of the day and wasn't getting anywhere, so they killed it and ran the new 8-cog one, instead.
So, it looks like we will have 4 mm2 to spare on the die, after upping the eight hub RAMs to 16384 x 32, in order to maintain 512KB. And that's already accounting for CTS, DFT, floorplanning, and placement density.
One thing I have to work on, though, is the CORDIC. It was failing timing, miserably, due to K-factor correction. This involves two 40-bit adders in sixteen stages which need to become their own stages. So, the CORDIC will take 16 more clocks and need 2048 more flops and 32 more 40-bit adders (X and Y in 16 stages). I'll work on this now and have it ready for the morning. It was only reaching 80MHz, or so, but I had them set a parameter to exclude the K-factor-correction adders and then everything compiled just fine.
I still like VGA. And NTSC.