The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

jmg · 2015-08-10 05:42

Cluso99 wrote: »

We cannot expect the P2 to do everything. Some things just need to be done in software. If there are glitches then either the design is wrong or the software must work around it. This is what we did on the P1, and it is what is done on every other micro.

Actually, nope. Other MCUs often have such sampling conditioning, and it is not a 'design is wrong' issue at all, but a natural consequence of across domain, or dual pin sampling.

In the USB example, because clocks are not locked, there is a narrow, but finite, window where a single clock sample, unconditioned, may give a false positive.
Likewise a ESD spike can false-trigger a single-sample pin.
The actual sampling aperture of a single D-FF will be < 1ns

Sapieha · 2015-08-10 06:19

Hi Chip.

One design question on Timers in COG.

It is possible to have any instruction to start Timers in 2-3 different COG's in SYNC.

TIMSYNC COG's ---- 32 bit's value specifying TIMERS (16 COG's x 2 Timers)

0 = don't change anything, 1 = Restart that timers in SYNC

jmg · 2015-08-10 06:52

Sapieha wrote: »

One design question on Timers in COG.

? There are no longer any timers in COGs.

potatohead · 2015-08-10 07:15

Well yes there is, and it's the one for the timer interrupt event source.

Given that, his question could be read as, "Is there a way to start timers in different COGS in sync?"

Cluso99 · 2015-08-10 07:26

jmg wrote: »

Cluso99 wrote: »

We cannot expect the P2 to do everything. Some things just need to be done in software. If there are glitches then either the design is wrong or the software must work around it. This is what we did on the P1, and it is what is done on every other micro.

Actually, nope. Other MCUs often have such sampling conditioning, and it is not a 'design is wrong' issue at all, but a natural consequence of across domain, or dual pin sampling.

In the USB example, because clocks are not locked, there is a narrow, but finite, window where a single clock sample, unconditioned, may give a false positive.
Likewise a ESD spike can false-trigger a single-sample pin.
The actual sampling aperture of a single D-FF will be < 1ns

This is precisely why you have to "qualify" your sample. If you have a great hardware lump then it will do the processing required. But if you are making the hardware by software representation (ie typical bit-banging) then you need to qualify all signals.
The P1 and the P2 is by design, a software configurable device - that is its beauty. If Chip adds hardware blocks for all the peripherals desired, then the P1/P2will no longer be a P1/P2, and then the P2 will compete with all the other micros out there.

jmg · 2015-08-10 07:38

Cluso99 wrote: »

If Chip adds hardware blocks for all the peripherals desired, then the P1/P2will no longer be a P1/P2, and then the P2 will compete with all the other micros out there.

I'm not following the logic here.
The P2 already competes with other Micros.
Sampling Flip-Flops are very a long way from "hardware blocks for all the peripherals desired", so that point seems contrived.

evanh · 2015-08-10 14:02

Cluso is right. The Prop does not compete with anything. For sure there is weigh-ups when deciding to use a Prop but those are not commercial factors. And I don't foresee the Prop2 ever breaking into commercially minded thinking either.

The point Cluso makes about bit-bashing vs peripheral blocks is just how far away they are from each other. There's nothing contrived at all.

evanh · 2015-08-10 14:06

PS: The Stamp was never a commercially minded product either but that didn't stop it being financially successful.

Seairth · 2015-08-10 14:40

Sapieha wrote: »

One design question on Timers in COG.

It is possible to have any instruction to start Timers in 2-3 different COG's in SYNC.

TIMSYNC COG's ---- 32 bit's value specifying TIMERS (16 COG's x 2 Timers)

0 = don't change anything, 1 = Restart that timers in SYNC

One approach with the existing design might be to use a LOCK change event as your global sync signal. You would still need to account for slight timing discrepancies, depending on whether you are polling LOCK change event or are using an ISR.

On the other hand, if the primary idea is to slave the timing of multiple cogs to a "master" timer, you could use the LOCK as a synchronization barrier and a single timer on one of the cogs to set/reset the barrier LOCK. All of the waiting cogs would then unblock, do their thing, then wait on the LOCK again.

Bill Henning · 2015-08-10 15:15

How about:

INTERRUPT #%xxcccciii

where

xx = don't care (for P2)

cccc = cog number

iii = interrupt vector

Causes an interrupt to vector %iii of cog %cccc

cgracey wrote: »

Thanks for your input, Guys.

The single-stepping seems to work fine, but we need a mechanism for breaking asynchronously, so that we can stop the cog and poll the execution point. Any ideas on how that should work? I think it needs to be another cog or a pin event.

jmg · 2015-08-11 00:49

evanh wrote: »

The Prop does not compete with anything. For sure there is weigh-ups when deciding to use a Prop but those are not commercial factors. And I don't foresee the Prop2 ever breaking into commercially minded thinking either.

Wow, oh dear, - I hope Parallax have not wasted their time, given you seem to believe there is no commercial footprint for P2 designs !!

potatohead · 2015-08-11 01:47

Don't worry, there are a lot of types of commercial minds, and this feature set will make enough sense to gain a respectable following too.

Dave Hein · 2015-08-11 02:07

P2 Watch: 50 days since the end of Spring. 43 days to the beginning of Fall.

        June                  July                 August
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
   -- -- -- -- -- --            1  2  3  4                     1
-- -- -- -- -- -- --   5  6  7  8  9 10 11   2  3  4  5  6  7  8
-- -- -- -- -- -- --  12 13 14 15 16 17 18   9 10 -- -- -- -- --
21 22 23 24 25 26 27  19 20 21 22 23 24 25  -- -- -- -- -- -- --
28 29 30              26 27 28 29 30 31     -- -- -- -- -- -- --
                                            -- --

I thought interrupts were done, but people continue to suggest changes.
I hope the P2 instruction set is completed.
Is P2 Day almost here? I'm starting to wonder/wander again.
Could we get a project update from Parallax please?

potatohead · 2015-08-11 02:20

Chip seemed to signal thanks, and he's on docs and the things needed for an image.

Seems there isn't too much.
Docs, ROM, fuses, and the boot process.

We know we get one before the Smart Pin work begins.

Maybe we dont have too long now.

evanh · 2015-08-11 22:58

jmg wrote: »

evanh wrote: »

The Prop does not compete with anything. For sure there is weigh-ups when deciding to use a Prop but those are not commercial factors. And I don't foresee the Prop2 ever breaking into commercially minded thinking either.

Wow, oh dear, - I hope Parallax have not wasted their time, given you seem to believe there is no commercial footprint for P2 designs !!

I didn't say the Prop has no commercial footprint. Only that it has no commercial competitor. Just like the Stamp.

Chip and Ken have stated the Prop was and is not a commercial endeavourer - not in those words but certainly in the same vein. Ken had stated once or twice that the market Parallax fills is educational. It's a whole package deal, the processor used may be important for the what's being taught but it's only a small part of the whole package.

Don't get me wrong, I'm here because I like the architectural design of the Propeller. As are many others. And I'll be sticking around for that same reason. And I very much like the Prop2 as well.

Leon · 2015-08-12 08:24

What about XMOS devices? They are direct competitors for the Propellers:

http://www.xmos.com/

evanh · 2015-08-12 10:25

The Xmos parts have similarities to the Prop, and I think recent developments have moved Prop2 towards the Xmoses. However, for the average school, I don't think the technical comparisons make much inroads.

Is there any provider of educational material using an Xmos part for there to be competition?

Heater. · 2015-08-12 11:25

Sadly I have never heard of such XMOS initiatives in education.

Leon · 2015-08-12 11:45

They provide plenty of support to teachers and students via email and forums:

https://www.xmos.com/contact/education?secure=1

http://www.xcore.com/forum/index.php

They also provide scholarships and support young (12-14) students:

http://www.xmos.com/news/press/17555

ErNa · 2015-08-13 08:06

The business model of Parallax is to provide means to educate people. The business model of XMOS is to sell semiconductors. That makes a huge difference. And as I lost a lot of money and lifetime following a company with a business model to sell semiconductors I prefer to work with a company that educates people to enable them to also use semiconductors that will suddenly see EOL.

koehler · 2015-08-13 10:55

Thats plainly incorrect.

Parallax's business model is the same as that of XMOS, Microchip, Atmel or any other non-not for profit, to make money.

Their focus may be geared towards the edu market, however they are not the same as something like RPi foundation, which is as an actual charity.

They did make an attempt with Parallax Semi to move into the larger business/industrial sector at one time, which also goes against your point.
That failed, and they have decided to stick with the edu, hobby/low-volume specialty niche.

I have no doubt that Ken would be thrilled to have one of the big automakers call them up for a demo of the P2.
If that ever does happen, I expect you'd see a change of direction and pace that influx of $ would have.

evanh · 2015-08-13 12:46

Yes, of course Parallax is a for-profit business, just targeted at packaged education deals is all. Even the hobby aspect is not a big earner - I believe Ken has said exactly this.

The label "commercial" in this context is as a chip design for selling as just another micro-controller. JMG was suggesting that the Prop is competing with other micro-controllers commercially. I tried to lay out how the Prop came to be, because it most certainly was not from trying to be a commercially competitive design. The architecture is just too far off from the norm for that ... someone casually checking out the price will probably not look again because they're comparing with feature rich PIC32's and the like. Eight cores doesn't mean a lot when you only want one to manage all the spec'd I/O. And 32kB of main memory is a serious cap. The Hub is cool how it works but it is also a limitation.

There may well be some volume-ish commercial customers, I wouldn't know ... maybe that's what triggered the attempt at "Parallax Semi". Doesn't change what I've said as being on the mark though.

cgracey · 2015-08-14 00:54

I've got the whole chip (minus the smart pins) compiling on the Cyclone V -A9 device now. It's using 60% of the FPGA logic.

A single-cog Prop2 compiles in 4 minutes and has an Fmax of ~120MHz on the Cyclone IV. It's about 105Mhz on the Cyclone V, which always seems to be slower than the Cyclone IV.

Here's the crazy thing, though: When I do a full-chip compile with 16 cogs on the Cyclone V -A9 device, the critical paths become flop-to-flop interconnect delays, with no logic in-between. These paths connect the hub RAMs' inputs and the CORDIC's results. I think on the ASIC, this wiring delay won't be such a problem. These paths lower the full-chip Fmax to ~82MHz on the -A9. We'll probably just run it at 80MHz on the FPGA, then. I'm sure we could go to 100MHz, too, without any problems, given likely workbench temperatures.

I have paved the way for adding the block-r/w instructions and will add them next. The big impediment to implementation is out of the way, so it should be easy.

Here is a big question for you guys:

We are planning on this:

16 cogs w/ 512KB hub RAM

If things turn out overly-big silicon-wise, and we need to reduce the chip size, which of the following would be better?

16 cogs w/256KB hub RAM, lots of cogs and less RAM
-or-
8 cogs w/512KB hub RAM, fewer cogs and faster memory access

Hopefully, we can get it all in there. With 16 of these new cogs, we are using about 60% of the logic that only 8 of the P2-Hot cogs would have required.

John Abshier · 2015-08-14 01:07

I vote for 16 cogs, but expect to be in minority unless first vote trumps.

John Abshier

Tubular · 2015-08-14 01:21

8 cogs, particularly for the faster memory access

8 cogs should also simplify the cog<>pin muxing

Cluso99 · 2015-08-14 01:30

What a dilemma!

Currently 512KB would be a minimum. There are now lots of chips with this RAM plus additional Flash. Two years ago 128KB was possibly acceptable.

However, with a more powerful P2, I tend to think more than 8 cogs would be extremely useful.

I presume it's not possible to go down to the next lower geometry given the time the P2 has taken???

mark · 2015-08-14 01:57

Tubular wrote: »

8 cogs, particularly for the faster memory access

8 cogs should also simplify the cog<>pin muxing

This is a pretty solid argument. And from what I understand of the new interrupt scheme and hubex (which would see better performance in jumps, btw), 8 cogs are pretty effective. Then when you also consider hub<->pin throughput, fast P2-to-P2 comms would be a viable solution for those who need more than 8 cogs.

potatohead · 2015-08-14 01:58

I vote 512K RAM as the priority.

We've made the COGS really useful again. Seems like this dilemma comes up at some level of COG utility.

I also vote, in advance of the question, to be conservative with the Smart Pins. There will be some things we need them to do, so that has to get done. But if it comes down to the pins trumping COGS and RAM? Make them a little less smart.

In other words, once the COGS and RAM are a lock, let's make sure they stay locked and live with what's left for Smart Pins.

As long as it's not gonna end up hot, and we don't end up doing this again, I'm good either way any of this goes, as long as it means we get a chip this go around.

evanh · 2015-08-14 02:08

Potato, you vote to keep or lose the RAM?

I'm itching to see how well HubExec works with the 16-way egg beater latencies. But I'd not want to sacrifice the RAM just to see that.

whicker · 2015-08-14 02:10

12 COGS and 32K * 12 = 384K RAM (or heck, double it for 768K).

Internal routing looks easier with 12 than with 16 anyway.

B C 1 2
A m m 3
9 m m 4
8 7 6 5

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments