What would be the advantage of a tiny P2 over a P1?
Guess just important for volume users.
Personally, I think I'd just always use the big chip.
I agree, these variants aren't aimed at hobbyists like myself but commercial outfits consuming hundreds if not thousands of P-2's every month.
Still I'd like to know what the pricing will be on these. Pricing will have to be competitive with ARM's and PIC32's for folks to be interested in them. Say under if they keep it under $5.00 in onesies and half that at 1000, they'll have a contender.
Would it be possible to fit 8KB-32KB of OTP (or even better, flash), especially on the smaller parts?
Some form of Dual-die packaging should be low cost impact. Nuvoton have parts like this.
Depends on what cells OnSemi has in what processes is selected,but flash would usually be too slow to run code from directly.
That does open a 'boot flash in the corner' solution, like FPGA and FTDI use on their MCUs
I think Chip is talking about eliminating all the custom analog stuff for these.
This would let Treehouse have total control, I guess.
If I were Chip, I might try this reduced version first. It is pretty much guaranteed to work, if the FPGA version works. Maybe that's what he's thinking about...
I think Chip is talking about eliminating all the custom analog stuff for these.
This would let Treehouse have total control, I guess.
If I were Chip, I might try this reduced version first. It is pretty much guaranteed to work, if the FPGA version works. Maybe that's what he's thinking about...
They do have a PAD RING in the shuttle, (right now?) which tests the custom flows.
Once the results on that are in, the need for any 'reduced version' becomes clearer.
I need DIPs...my soldering iron and I have an agreement. If I keep it in my drawer, it will stop ruining my chips
Even the P1 is so easy to solder even with an iron using the old place, tack, then flow solder method which I used in the very early days of smd before pastes were readily available. However if you ever get to try the paste and toaster oven method you would never want to go back to DIP. BUT, you have to TRY it.
Notwithstanding, DIP adaptors make far more sense for the limited quantity of DIPs that would be used as any commercial production will always use smd.
I worked all day and got the egg-beater parameterized for 16, 8, 4, 2, or 1 set of cogs and hub RAM slices.
Now, a parameter in the top-level Verilog file selects the cog and egg-beater metrics, complete with cog FIFO depth reduction. This saves a lot of logic on smaller egg-beater configurations and also reduces their cycle latency.
This was the key ingredient needed to branch out and make reduced versions of the Prop2 with lower numbers of cogs and hub RAM instances. This will really shrink silicon areas for smaller parts. If we had kept the egg-beater at 16 spokes, we would have needed 16 hub RAM instances, as well, for even 1 cog. Now, we only need as many hub RAM instances as there are cogs. This saves 14 RAMs in a 2-cog Prop2, not to mention 14 sets of mux's and registers for 32 bits of data both ways and some address bits.
I just compiled for 1 cog / 1 hub slice, and it works! There's no waiting for the hub if you're the only cog, now.
I will keep the number of locks constant, at 16, and the timing for COGINIT/COGSTOP/CORDIC instructions will remain at 1:16. I also kept the FBLOCK size at 64 bytes, even though the granularity could have been reduced for the smaller egg-beaters. This maintains instruction compatibility. The only differences to software, aside from there being fewer cogs and smaller hub RAM, is that latencies are reduced in hub RAM accesses. Everything else acts the same.
Now we need is a chip version with...
* 2 Cogs + 512KB Hub Ram
* 8 Cogs + 128KB Hub Ram
* Shared LUT between all adjacent cogs (not pairs!!!) - will require RD/WRLUTX instruction for access
* 64 I/O (mix of smart and normal ?)
BTW with the above, I don't think the egg-beater would be required.
Alternately make the 8 cogs+128KB 2 sets of 4 Cog + 64KB.
I don't see a need for dip, unlike the 1990's we now oshpark and oshstencils
It lets you create small little boards for under $5 each.
Not hard to place by hand parts that have 0.8-0.65mm pin spacing when you use a mylar paste stencil.
And that bottom GND pad can be soldered by hand from the back side if you just put a big-enough hole in the PCB to accommodate the soldering iron tip. I think a 0.100" plated-through hole would be perfect, right under the middle of the chip.
How many LE's did that 1 cog/1 hub slice take, Chip?
I don't know, but by dropping from 16 slices down to 1 (cog and hub RAM), the egg-beater ALM count went from 1,017 down to 103. Also, the flops in the cog's FIFO went from 756 down to 216 (there's a static 5 levels of FIFO that add to the spoke count).
I just added some compile-time stuff to inhibit the COGNEW function from allocating cogs that don't exist in less-than-16 configurations. Those cog enable flops go away now. Cogs that don't exist will always read 'not busy', but they can't be turned on.
All that's left is a mechanism in the hub for returning what kind of Prop2 it is:
BTW with the above, I don't think the egg-beater would be required.
-1
Losing the egg-beater would break the streamer and block transfer instructions.
Also I believe Hubexec would suffer too.
In a 2 cog system, 1:2 hub access would translate to 1 access per instruction. So there would be little benefit in the egg-beater and fifo die space for hub exec or hub access. In fact, delays to fill the hubexec fifo might in fact be detrimental.
The streamer access might be halved which might be a consideration.
Anyway, my posturing of a possible chip configuration is unlikely to happen anyway.
BTW with the above, I don't think the egg-beater would be required.
-1
Losing the egg-beater would break the streamer and block transfer instructions.
Also I believe Hubexec would suffer too.
In a 2 cog system, 1:2 hub access would translate to 1 access per instruction. So there would be little benefit in the egg-beater and fifo die space for hub exec or hub access. In fact, delays to fill the hubexec fifo might in fact be detrimental.
The streamer access might be halved which might be a consideration.
Anyway, my posturing of a possible chip configuration is unlikely to happen anyway.
If you were streaming hub data out to DACs or pins on every clock, you would need the egg-beater. HyperRAM will require this, for example.
In the case of hub exec, or anything that uses the FIFO for reading, the action starts as soon as the first long is read in. The FIFO only winds up filling if it's being used for less than a long per clock.
On second thought, I'll just have this data in the ROM. Objects will just do as configured, most likely, anyway. When a development host logs onto the chip, it will give this information.
This means the hardware is back to being "done" and I'll return to getting the next release ready.
I get the sense that total hub RAM would shrink as the number of slices was reduced. Is that necessary, or could each slice be increased to keep overall size the same? Like maybe an 8 cog version with 1MB (8 128KB slices), in the existing package?
Comments
But, chip price or size aren't usually limiting factors for me...
FAST encoder quad decode? Heck, single-axis dedicated motion chips with relatively low performance, list for $30+
I agree, these variants aren't aimed at hobbyists like myself but commercial outfits consuming hundreds if not thousands of P-2's every month.
Still I'd like to know what the pricing will be on these. Pricing will have to be competitive with ARM's and PIC32's for folks to be interested in them. Say under if they keep it under $5.00 in onesies and half that at 1000, they'll have a contender.
or SO-28, in 1.27mm pitch with the thick and large package ?
Why do you prefer that, over the TQFP32 ?
Some form of Dual-die packaging should be low cost impact. Nuvoton have parts like this.
Depends on what cells OnSemi has in what processes is selected,but flash would usually be too slow to run code from directly.
That does open a 'boot flash in the corner' solution, like FPGA and FTDI use on their MCUs
I can understand no ADC, but it could/would still have basic pull-up/pull down control ?
This would let Treehouse have total control, I guess.
If I were Chip, I might try this reduced version first. It is pretty much guaranteed to work, if the FPGA version works. Maybe that's what he's thinking about...
Once the results on that are in, the need for any 'reduced version' becomes clearer.
There's untold combinations ...
Even the P1 is so easy to solder even with an iron using the old place, tack, then flow solder method which I used in the very early days of smd before pastes were readily available. However if you ever get to try the paste and toaster oven method you would never want to go back to DIP. BUT, you have to TRY it.
Notwithstanding, DIP adaptors make far more sense for the limited quantity of DIPs that would be used as any commercial production will always use smd.
I worked all day and got the egg-beater parameterized for 16, 8, 4, 2, or 1 set of cogs and hub RAM slices.
Now, a parameter in the top-level Verilog file selects the cog and egg-beater metrics, complete with cog FIFO depth reduction. This saves a lot of logic on smaller egg-beater configurations and also reduces their cycle latency.
This was the key ingredient needed to branch out and make reduced versions of the Prop2 with lower numbers of cogs and hub RAM instances. This will really shrink silicon areas for smaller parts. If we had kept the egg-beater at 16 spokes, we would have needed 16 hub RAM instances, as well, for even 1 cog. Now, we only need as many hub RAM instances as there are cogs. This saves 14 RAMs in a 2-cog Prop2, not to mention 14 sets of mux's and registers for 32 bits of data both ways and some address bits.
I just compiled for 1 cog / 1 hub slice, and it works! There's no waiting for the hub if you're the only cog, now.
I will keep the number of locks constant, at 16, and the timing for COGINIT/COGSTOP/CORDIC instructions will remain at 1:16. I also kept the FBLOCK size at 64 bytes, even though the granularity could have been reduced for the smaller egg-beaters. This maintains instruction compatibility. The only differences to software, aside from there being fewer cogs and smaller hub RAM, is that latencies are reduced in hub RAM accesses. Everything else acts the same.
Peter, totally yes. Cluso has provided Chip with lots of configs for all the FPGA boards we're using if I'm not mistaken.
Now we need is a chip version with...
* 2 Cogs + 512KB Hub Ram
* 8 Cogs + 128KB Hub Ram
* Shared LUT between all adjacent cogs (not pairs!!!) - will require RD/WRLUTX instruction for access
* 64 I/O (mix of smart and normal ?)
BTW with the above, I don't think the egg-beater would be required.
Alternately make the 8 cogs+128KB 2 sets of 4 Cog + 64KB.
Yes, I know!!! Not all cogs are equal rubbish.
Also is that 330um / 100um figure a linear distance along the perimeter, or are you talking square microns?
If people don't want to solder to it, just add a pogo pin.
Losing the egg-beater would break the streamer and block transfer instructions.
Also I believe Hubexec would suffer too.
This sidebar is worth it. The design is looking "done" save for testing and bug fixes.
We should test a few of these in the near future.
I like that we bring meaningful choices to the table rather than struggle with implementing them in an all in one.
Very good Chip and PJV for exploring this.
Of course. There will now be multiple compiles for most boards.
I don't know, but by dropping from 16 slices down to 1 (cog and hub RAM), the egg-beater ALM count went from 1,017 down to 103. Also, the flops in the cog's FIFO went from 756 down to 216 (there's a static 5 levels of FIFO that add to the spoke count).
Those are distances along the perimeter.
That's true.
All that's left is a mechanism in the hub for returning what kind of Prop2 it is:
Cogs: 16/8/4/2/1
RAM: 1024k/768k/512k/256k/128k/64k/32k/16k
Pins: 64/32/24/16
Smart pins: all/half
Something like that, anyway.
The streamer access might be halved which might be a consideration.
Anyway, my posturing of a possible chip configuration is unlikely to happen anyway.
If you were streaming hub data out to DACs or pins on every clock, you would need the egg-beater. HyperRAM will require this, for example.
In the case of hub exec, or anything that uses the FIFO for reading, the action starts as soon as the first long is read in. The FIFO only winds up filling if it's being used for less than a long per clock.
On second thought, I'll just have this data in the ROM. Objects will just do as configured, most likely, anyway. When a development host logs onto the chip, it will give this information.
This means the hardware is back to being "done" and I'll return to getting the next release ready.