Observations of a multi-tasker

Heater. · 2013-09-21 11:22

David,

Yes, that is the way I have always thought about it.

The hardware threading let's you mop up a bunch of small, lesser demanding tasks into a single COG.

But think about that...it means you have more COGs free for those demanding and timing critical tasks as a result!

It must be a big win. Even if the multi-threading determinism is not perfect it gives you free COGs in which to make it so.

Still, I bet you can write a much more jitter free and faster FullDuplexSerial, for example, using hardware scheduling than the old JMPRET coroutine trick or the cooperative TSKSWITCH. Or what about multi-port FDS?

David Betz · 2013-09-21 11:25

Heater. wrote: »

David,

Yes, that is the way I have always thought about it.

The hardware threading let's you mop up a bunch of small, lesser demanding tasks into a single COG.

But think about that...it means you have more COGs free for those demanding and timing critical tasks as a result!

It must be a big win. Even if the multi-threading determinism is not perfect it gives you free COGs in which to make it so.

That's a good way to think about it.

ctwardell · 2013-09-21 12:01

David Betz wrote: »

So is the conclusion to this P2 thread scheduling discussion that if you want determinism you run only one thread on a COG and if you don't care that much about determinism, you can use the threading support?

I wouldn't say it is that absolute. You could have one timing critical thread and three less critical threads, you would need to not use any operations that can stall the pipeline in the 3 less critical threads.

C.W.

KC_Rob · 2013-09-21 12:04

Heater. wrote: »

KC_Rob, ..

However, what you suggest, more cores, is a valid way to go. If you look at the Parallella chip from Adapteva http://www.adapteva.com/ you find this philosophy:

Very interesting! Something else new to check out - which seems an almost daily occurrence anymore. Thanks for the tip!

Heater. · 2013-09-21 12:36

KC_Rob,

You might like this vid by Andreas Olofsson describing Adapteva, the problems facing the future of computer design, the parallel solutions and so on https://www.youtube.com/watch?v=DX9OMgmedbQhttp://

David Betz · 2013-09-21 13:49

Heater. wrote: »

KC_Rob,

You might like this vid by Andreas Olofsson describing Adapteva, the problems facing the future of computer design, the parallel solutions and so on https://www.youtube.com/watch?v=DX9OMgmedbQhttp://

Andreas Olofsson spoke right after Chip at the Open Hardware Summit a few weeks ago. Chip's talk was a lot more interesting. The other guy didn't really give any interesting details about his hardware.

Heater. · 2013-09-21 14:24

David,

Andreas Olofsson spoke right after Chip at the Open Hardware Summit

Oh man, what a pairing. Where is that fricken video?

You have to admire the pair of them. Both have a single minded dedication to their design ideals. Both are attempting pretty huge and audacious creations with limited budgets and man power.

ozpropdev · 2013-09-21 18:53

David Betz wrote: »

So is the conclusion to this P2 thread scheduling discussion that if you want determinism you run only one thread on a COG and if you don't care that much about determinism, you can use the threading support?

I think you summed it up nicely David.
We now have a choice which way to go.

Brian

ozpropdev · 2013-09-22 17:30

Hi Heater

Heater. wrote: »

KC_Rob,

You might like this vid by Andreas Olofsson describing Adapteva, the problems facing the future of computer design, the parallel solutions and so on https://www.youtube.com/watch?v=DX9OMgmedbQhttp://

Wow! Very interesting stuff...

Heater. wrote: »

David,
Oh man, what a pairing. Where is that fricken video?

You have to admire the pair of them. Both have a single minded dedication to their design ideals. Both are attempting pretty huge and audacious creations with limited budgets and man power.

I couldn't agree more.
We need more visionary's like Chip and Andreas in our world!

Brian

David Betz · 2013-09-23 04:58

David Betz wrote: »

Andreas Olofsson spoke right after Chip at the Open Hardware Summit a few weeks ago. Chip's talk was a lot more interesting. The other guy didn't really give any interesting details about his hardware.

I just watched this video and it is quite a bit more interesting than his talk at the Open Hardware Summit. Thanks for the link!

I was interested to hear his comments on the cost of taping out a chip. If I heard right, he said you could make a 28nm chip for $100k. Isn't that significantly cheaper than what Parallax is paying for 180nm chips? How do they get that low price? On the other hand, I think he said that Adapteva paid $1m for each of the chips they made.

Also, I guess they did this with a staff of 5. Just think what Parallax could have done if they could have cloned Chip and Beau! :-)

Heater. · 2013-09-23 06:29

David,

You heard correct. I've watched that video twice now.

It was not clear to me. He said Adapteva were spending a million per chip. How much of that was synthesis, shuttle runs etc and how much of that was development cost.

Given what they have been spending I think that 100K was something of a wild guess at what could be done if you had the right team, as he said. That Parallella development board is riding on the nearly one million of Kickstarter pledges and that's not the Epiphany chip only the board design.

On the other hand as he has been in the chip design business for a long time I guess he knows his way around and has contacts in the right places.

David Betz · 2013-09-23 07:01

Heater. wrote: »

David,

You heard correct. I've watched that video twice now.

It was not clear to me. He said Adapteva were spending a million per chip. How much of that was synthesis, shuttle runs etc and how much of that was development cost.

Given what they have been spending I think that 100K was something of a wild guess at what could be done if you had the right team, as he said. That Parallella development board is riding on the nearly one million of Kickstarter pledges and that's not the Epiphany chip only the board design.

On the other hand as he has been in the chip design business for a long time I guess he knows his way around and has contacts in the right places.

A big factor might be that there is no analog on the Epiphany chip. I wonder how much easier P2 would have been for Parallax if the analog had been left out? I'm certainly not suggesting that that should be done but it may explain at least some of the difference in manufacturing cost between the two chips.

Heater. · 2013-09-23 11:50

Andreas' idea with the Epiphany chip is to have a focus on the floating point unit. The core it sits in is very simple, only 32 or so instructions. Those cores might be simpler than the Props, floating point aside. Then there is 32K RAM for each core. The grid communications network is only 10% of each core area. So actually the whole thing can be very simple. Then it's stamped out 16 or 64 times on a chip.

David Betz · 2013-09-23 11:57

Heater. wrote: »

Andreas' idea with the Epiphany chip is to have a focus on the floating point unit. The core it sits in is very simple, only 32 or so instructions. Those cores might be simpler than the Props, floating point aside. Then there is 32K RAM for each core. The grid communications network is only 10% of each core area. So actually the whole thing can be very simple. Then it's stamped out 16 or 64 times on a chip.

Maybe we could emulate his simple processor on the Propeller! It might run faster than the ZPU. :-)

Phil Pilgrim (PhiPi) · 2013-09-23 12:15

I'm not sure how the core-to-core comms work on the Epiphany. On the one hand, he states that it's networked such that any core can talk to any other by writing a register. On the other hand, when he talks about connection latency, he mentions that the cores are only locally connected, i.e. to their nearest neighbors, in order to keep connections short. Under this scenario, it would seem that talking to a distant core would require multiple hops and a lot of intermediation.

-Phil

Heater. · 2013-09-23 12:38

Phil,

I think you are right. As far as I can tell the whole communication "grid" is implemented with logic local to each core. And all these bits of logic are plugged together to form the grid. There are no long wires as Andreas says.

So I would guess nearest neighbor communications is best and the whole thing is suited to problems that can be divided up that way. I have always wondered if there is a sufficiently large set of such problems or algorithms to make the whole idea worth while. Still there is a vid on youtube where some research group demos a synthetic aperture radar processing algorithm running 10 times faster on the Epiphany chip than some honking great 3GHz AMD 64 beast. Whilst using a hundred times less power.

Heater. · 2013-09-23 12:41

David,

Maybe we could emulate his simple processor on the Propeller! It might run faster than the ZPU

Ouch!

You might be right. As Andeas says what compilers want is lots of registers. So he has gone for that. And no caches, just fast static RAM. Sounds like a COG to me.

David Betz · 2013-09-23 13:27

Heater. wrote: »

David,

Ouch!

You might be right. As Andeas says what compilers want is lots of registers. So he has gone for that. And no caches, just fast static RAM. Sounds like a COG to me.

A COG with 32k of memory. :-)

KeithE · 2013-09-23 13:35

David Betz wrote: »

I just watched this video and it is quite a bit more interesting than his talk at the Open Hardware Summit. Thanks for the link!

I was interested to hear his comments on the cost of taping out a chip. If I heard right, he said you could make a 28nm chip for $100k. Isn't that significantly cheaper than what Parallax is paying for 180nm chips? How do they get that low price? On the other hand, I think he said that Adapteva paid $1m for each of the chips they made.

Also, I guess they did this with a staff of 5. Just think what Parallax could have done if they could have cloned Chip and Beau! :-)

Here's an article giving an estimated $2 million dollar figure for development costs assuming a multiproject wafer, and they admit that this approach does not work for consumer electronics. So you're looking at millions more to ramp it into high-volume production. You can google the mask costs for 28 nm.

http://www.adapteva.com/white-papers/a-lean-fabless-semiconductor-startup-model/

KC_Rob · 2013-09-23 16:52

David Betz wrote: »

A COG with 32k of memory. :-)

With more on the way -- he deems 32k relatively puny.

I just got around to watching the entire talk. Andreas is a smart guy, not just on the technical stuff, either; he keeps the big picture in view as well.

Heater: once again, thanks for sharing!

Phil Pilgrim (PhiPi) · 2013-09-23 19:23

KC_Rob wrote:

Andreas is a smart guy, ...

Aversion is a powerful teacher. It sounds like he got plenty of that at Analog Devices!

-Phil

Heater. · 2013-09-23 23:15

Phil,

Aversion is a powerful teacher. It sounds like he got plenty of that at Analog Devices!

Andeas' story about the AD Tiger Sharc is amazing. What was it? 100 million dollars or so and eight years to develop. And after all that only 1% of the chip area is concerned with actually calculating anything. I wonder how well the Tiger Sharc is selling and will they ever break even on it.

Regarding his ideas about developing a chip for 100K. He does point out that if you want to make an Apple style Soc with a billion transistors and 5 dollar unit price it might cost you a billion dollars to develop. To get to 100K he was talking about low volume price insensitive applications, like 100 dollar chips. That's why the first markets they approached for the Epiphany were military.

ozpropdev · 2013-09-23 23:51

A quick look at the Tigersharc processor shows that the cheapest version is $184 in quantities of 1K+.

Heater. · 2013-09-24 00:35

ozpropdev,

A quick look at the Tigersharc processor shows that the cheapest version is $184 in quantities of 1K+.

According to Andreas Olofsson's presentation you can:
1) Spend a billion dollars on a billion transistor chip and sell for 5 dollars a piece in huge quantities. Like an Apple Soc.
2) Spend 100K and and build a chip that sells for 200 dollars in small quantities.
3) Something on a line between those two extremes.

Seems the Tigersharc is way off the curve here. Amazingly big and expensive to develop, and ending up with a massive unit price.

No wonder Andreas said he was "ticked off" by Tiger Sharc development. Language which seems quite out of character for him.

KC_Rob · 2013-09-24 10:15

Heater. wrote: »

Seems the Tigersharc is way off the curve here. Amazingly big and expensive to develop, and ending up with a massive unit price.

No wonder Andreas said he was "ticked off" by Tiger Sharc development. Language which seems quite out of character for him.

A huge, costly example of what can happen when you lose sight of the forest for the trees.

Observations of a multi-tasker

Comments