Welcome to the Parallax Discussion Forums, sign-up to participate.

# LOGIC TOO BIG!!!!

Posts: 8,416
edited October 2
The synthesis guy just came back and said that the logic+memories area is looking to be 72 mm2. We only have 58 mm2 of space in the middle of our huge 8.5 x 8.5 mm die. We are 24% oversized!!!

We need to shave at least 14 mm2 (72 - 58).

We have 16 instances of 8192x32 SP RAM at 1.57mm2 = ~25mm2.

We have 32 instances of 512x32 DP RAM at 0.292mm2 = ~9.3mm2.

Those RAMs total to ~34mm2.

This means our current logic is 72 - 34 = 38 mm2. That's over 3x what Prop2-Hot was with 8 cogs! The smart pins are contributing to this bloat.

Each smart pin is 1/9th the logic of a cog, so 64 of them are equivalent to 64/9 = 7 cogs. The CORDIC is equivalent to 2 cogs. So, we have 16 + 7 + 2 = 25 cogs' equivalent of logic here. This means a cog's worth of logic is about 1.5 mm2 (38 / 25).

If we cut the cogs from 16 down to 8, we save 8 * (1.5 mm2 for logic + 2 * 0.292 mm2 for 512x32 DP RAMs) = 16.7 mm2. That gets us where we need to be, by 2.7 mm2. That 2.7 mm2 will hopefully be enough spare room.

I told him to reduce the cogs from 16 to 8. That also halves the main RAM, for now, but I believe they have a 16kx32 instance we could use to keep the hub RAM at 512KB. We'll see what he comes back with.

• Posts: 1,777
edited October 2
I hope with main ram you refer to COG ram.

It is still 512 KB HUB ram, but just 8 COGs, right?

edit: ahh, I see 16kx32.

Mike
I am just another Code Monkey.
A determined coder can write COBOL programs in any language. -- Author unknown.
Press any key to continue, any other key to quit

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
• Posts: 8,416
Yes. Same 512KB hub RAM, but only 8 cogs.

Really, I don't know that 8 cogs would be too few. They do a lot more than on Prop1, so 16 might have been huge overkill.
• Posts: 276
Sacrificing cogs is better than losing hub RAM. There would have been cogs doing nothing in many cases. People will have to write more efficient code! Let's hope 16kx32 instance is available to keep to 512KB. Will the egg beater need re-doing? Looking on the bright side, maximum power consumption will be lower.
Formerly known as TonyB
• Posts: 8,416
Deja vu, I know. This is like Groundhog Day, except we aren't looping, anymore.

OnSemi does have a 16384 x 32 SP RAM instance we can use to keep the RAM at 512KB.

They are resynthesizing now with 8 cogs.
• Posts: 8,416
edited October 2
Yes, power will be lower and hub latency will be reduced. This feels fine, to me.
• Posts: 8,416
The egg-beater is parameter-driven for 1/2/4/8/16 cogs. Fewer cogs also means a slight speed-up.
• Posts: 4,450
Bugger. I was thinking there was going to be room to spare at 16 Cogs.

Was Prop2-Hot never going to have a Smartpins like feature?
The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
• Posts: 8,416
edited October 2
evanh wrote: »
Bugger. I was thinking there was going to be room to spare at 16 Cogs.

Was Prop2-Hot never going to have a Smartpins like feature?

It just had counters, like the Prop1.

We really need to keep all the smart pins, because they are needed to control the analog portions of each pin.
• Posts: 1,777
I think RAM is more important as COGs are. Going back to 8 is sad but better as half the HUB.

16 sounded good, but HUBEXEC and LUT ram will reduce the need for COGs bound to a single Object/Protocol as usually done on the P1.

TACHYON for example runs mostly on one COG and produces quite small wonders on a P1.

Time to write some fast P2-P2 transfer code using a couple of smart pins on each side to make it more easy to use more then one P2 on a board. The serial boot already supports multiple P2s just make it easy to use multiple P2 in a sane manner.

Mike
I am just another Code Monkey.
A determined coder can write COBOL programs in any language. -- Author unknown.
Press any key to continue, any other key to quit

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
• Posts: 8,980
edited October 2
8 is fine. Truth is, 16 COGS made sense without the tasker in hot. We added events, which can do a lot. People can cram a lot into a COG now, and it can respond quick, if needed.

ONWARD!
Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
• Posts: 1,984
edited October 2
Arrrrrgh! :(
Only 8 cogs hurts a bit LOT.
At least hub and smart pins remain and fingers crossed silicon >160MHz.
Have to change P2 planning a bit.
Melbourne, Australia
• Posts: 8,980
edited October 2
I was mentally ready for this. Features were all we can eat. Of course it got big. :D
Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
• Posts: 8,980
We need to test on 8 COGS now.
Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
• Posts: 9,816
Well Clusso99 got his wish for a scaled down P2. :) Albeit in the same package. :(
Infernal Machine
• Posts: 8,416
To get 16 cogs, we'd have to settle for 128KB hub RAM.
• Posts: 8,980
That's too small to be the first one.
Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
• Posts: 2,839
Finally, proper cull-the-engineers kind of wranglings. Must be in the home straight

512kB better than 16 cogs for most, I think.

Now sit tight for power estimations, though I have to admit that package at 20C/watt JA is quite impressive
• Posts: 6,697
Chip, you have packed so much into this design that it is like a triple redundant system and the loss of 8 cogs won't be noticed by practically all except the most esoteric applications since you have interrupts, smart pins, hub exec etc etc etc.

Tachyon Forth - compact, fast, forthwright and interactive

Tachyon Forth News Blog
TACHYON DEMONSTRATOR
Brisbane, Australia
• Posts: 8,980
Yup
Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
• Posts: 1,777
edited October 3
Hey, we have 8 pins per COG now, that is the double of PINs as the P1 has. The COG has a LUT now, that is the double of RAM per COG as the P1 has, P2 will run at 160Mhz, that is the double of the 80Mhz a P1 has. The P2 even has 16 times more HUB ram as the P1 has.

One could say a truly P2.

And then there are the smart-pins, way more powerful as them 2 counters of the P1 COG and 8 per COG, statistically.

Also displaying 16 separate COGs in Blockly needs way to much horizontal space on the screen, and will be confusing...

and that 2.7mm2 could be used for a little bit more HUB, like say 640K?, 596K?, EEPROM?

Ducks and runs for cover...

Mike
I am just another Code Monkey.
A determined coder can write COBOL programs in any language. -- Author unknown.
Press any key to continue, any other key to quit

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
• Posts: 8,416
edited October 3
...And the Prop2 executes instructions in two clocks, instead of the Prop1's four clocks. So, cogs execute 4x faster, when considering the 160MHz.
• Posts: 1,984
16 cos with 128k vs 8 cogs with 512k ram.

512k wins no matter how many different ways you look at it! :)

Melbourne, Australia
• Posts: 8,416
512KB enables nice VGA displays.
• Posts: 1,984
cgracey wrote: »
512KB enables nice VGA displays.

and should work nicely with your new SPIN interpreter. :)
Melbourne, Australia
• Posts: 1,777
Having 'just' 8 COGs instead of that dreamed 16, will have impact on driver code. Throwing two COGs at a problem is not as easy anymore as with abundant 16 of them. SO having USB mouse, keyboard AND a usb stick as file system will fast reduce the number of COGS.

Is jmpret still there in P2? I struggled long with it and now that I basically now how to use it, I would really miss the co-routine aspect of it. Time-slicing at will.

Mike
I am just another Code Monkey.
A determined coder can write COBOL programs in any language. -- Author unknown.
Press any key to continue, any other key to quit

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
• Posts: 2,156
Definitely do 512K hub with 8 cogs, not 128K hub with 16 cogs.

512K is already too small, really. 128K would be a death wish.
• Posts: 10,659
cgracey wrote: »
If we cut the cogs from 16 down to 8, we save 8 * (1.5 mm2 for logic + 2 * 0.292 mm2 for 512x32 DP RAMs) = 16.7 mm2. That gets us where we need to be, by 2.7 mm2.
That 2.7 mm2 will hopefully be enough spare room.

I told him to reduce the cogs from 16 to 8. That also halves the main RAM, for now, but I believe they have a 16kx32 instance we could use to keep the hub RAM at 512KB. We'll see what he comes back with.

So you are saying it is being synthesized at 256k + 8 COGs, with a hope it could be 512k +8 COGs ?

• Posts: 21,412
cgracey wrote:
512KB enables nice VGA displays.
Is that still a major consideration? The world is all HDMI now, and the Raspberry Pi does that for cheap. Before refactoring the P2's features to fit the die constraints ad hoc, it might be well to consider how it affects a realistic application arena.

-Phil
Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away. -Antoine de Saint-Exupery
• Posts: 8,416
edited October 3
Good news!

The 8-cog 166MHz synthesis run completed really quickly, which is always a good sign. The 16-cog synthesis effort had been running for most of the day and wasn't getting anywhere, so they killed it and ran the new 8-cog one, instead.

So, it looks like we will have 4 mm2 to spare on the die, after upping the eight hub RAMs to 16384 x 32, in order to maintain 512KB. And that's already accounting for CTS, DFT, floorplanning, and placement density.

One thing I have to work on, though, is the CORDIC. It was failing timing, miserably, due to K-factor correction. This involves two 40-bit adders in sixteen stages which need to become their own stages. So, the CORDIC will take 16 more clocks and need 2048 more flops and 32 more 40-bit adders (X and Y in 16 stages). I'll work on this now and have it ready for the morning. It was only reaching 80MHz, or so, but I had them set a parameter to exclude the K-factor-correction adders and then everything compiled just fine.
• Posts: 8,416
cgracey wrote:
512KB enables nice VGA displays.
Is that still a major consideration? The world is all HDMI now, and the Raspberry Pi does that for cheap. Before refactoring the P2's features to fit the die constraints ad hoc, it might be well to consider how it affects a realistic application arena.

-Phil

I still like VGA. And NTSC.