Propeller II update - BLOG

Heater. · 2013-12-09 00:31

brucee,

I agree there are a lot of Rpi's probably sitting in desk drawers.

What's this all about? There are probably a lot of everything sitting in desk drawers. Why the Pi put down?

Linux brings a huge amount of open source software. But as a lot of it is free, that is probably what some of it is worth..

That is nonsense and a gross insult to thousands of very skilled programmers who have created thousands of very high quality open source programs. Even on a Pi we have a kernel, networking, web servers, all kinds of other services, a selection of editors, high quality compilers for dozens languages. And so on and so on. This is all very high quality and high value stuff. It runs much of the worlds computing infrastructure.

Oh yeah, then there is Java, that's an odd case of being free and worth even less than it's price:)

Aside: Don't for get that Free Software is not about price.

Conversely there is a lot of pay for software that is of no value at all. It does not run in any place I would like it to. Not on a Propeller, not on a Pi, not even on my PC or tablet.

Heater. · 2013-12-09 01:02

rod1963,

Determinism has been around longer than Parallax and has been accomplished on supposedly inferior interrupt driven systems.

That depends on what meaning you attaching to "determinism".

Interrupts are a way to share the resource of a single CPU among competing software functions and respond to external events in a timely and efficient fashion (as opposed to polling for example).

Interrupt driven systems can be totally deterministic. If you have enough CPU power to service all your interrupts and meet the appropriate deadlines.

Where this determinism falls down is when I take software function A that you have written, and software function B that someone else has written and put them together in my own project. If A and B have tight, critical timing requirements they may well not work together. I have to study A and B carefully and see how they will interact timing wise. I may have to fiddle with priorities to get things working. It may not be possible because there just is no way both A and B can respond to their interrupts simultaneously with the required latency.

The way out of this bind is multi-core. With many cores I can throw many such functions at my project an know 100% that they will work as well as if they had the whole chip to themselves.

In this way the Propeller was revolutionary, I can't think of any other multi-core micro-controllers except the XMOS devices. (Please, nobody bring up that stupid Green Arrays thing).

In short I think "determinism" is still a bragging point. When an engineer needs to meet stringent timing requirements he can use multiple MCU's, or he can go to FPGA. The Prop offers a simpler cheaper alternative.

Systems that are good enough to keep the space shuttle, jet aircraft and vehicles working or industrial plants running,

Not quite. The avionics and other systems I have worked on, the Boeing 777 Primary Flight Computers for example, have often included a pile of custom ASICs, FPGAs or micro-controllers exactly to handle all this high speed timing critical stuff. The main CPU's cannot be expected to do that. Filling that kind of role is the aim of XMOS and I guess Parallax.

Ramon · 2013-12-09 01:38

$0.02 about HUB slot sharing: This feature is being discussed on performance terms, but not on flexibility terms.

There is a clear trade-off between number of cogs, HUB memory size, and HUB access times (HUB bandwidth per cog). Did your remember when Chip proposed adding 4 cores up to 12 cogs? Some people liked more cogs, and other people liked more ram. Almost nobody will be happy with a P2 just having only 2 COGS and 2 Mb HUB, or a P2 with 48 cogs and 1Kb HUB (each one waiting 48 cycles to access the HUB). This is just an example. At the end the decission was to have P2 with 8 COGS, 512 registers and 256k HUB. And the possibility to execute code directly from HUB.

I think of HUB slot sharing as a way to decide the performance of our propeller application without worrying too much if our application need to use a lot of cogs, or our application just only use 2 or 4 cogs and would benefit to have faster access to the HUB or shared memory. It will be of an extremely flexibility if this feature is implemented in a way than can allow to define the number of active cogs and number of hub cycles. Also if we can also turn off cogs and save energy, then much better. I feel that this doesn't break any philosophy (*) or makes any propeller cog different to each other. It can be used or not. It is FLEXIBILITY. (But actually doesn't know if it is even remotely possible, because it will need to split HUB cycles into two different HUB cycles.)

Parallax would need to decide if the effort is worth or not for P2, but for sure this question will arise again on a future P3 with 16, 24 or 48 COGs.

(*) it will introduce a new philosophy: LESS is MORE. (already applied for DAC bus with good results, and recently proposed for code instructions)

Heater. · 2013-12-09 01:53

Ramon,

...or our application just only use 2 or 4 cogs and would benefit to have faster access to the HUB or shared memory.

This is true. problem is when I take two objects, from OBEX or elsewhere, that will only work if they have double the HUB access time. When I put them together with my program that actually uses more COGs they no longer get the access they expect and my program fails.

I don't like this potential for uncertainty when mixing and matching objects in my project. It's not about applications deciding on HUB access its about the components requiring it.

Others may not care about this aspect of the problem.

We can call it "philosophy" but it may also present practical problems for people in the future.

Brian Fairchild · 2013-12-09 02:30

It strikes me that many of the suggested 'improvements' are slowly but surely dragging the Px away from the original Propeller concept and are, in the own way, breaking the 'determinism' of the idea.

To me, determinism is more than about just timing. It's about being able to look at a set of processes and knowing that they will just work together. It's that idea that set the original P1 apart from the interrupt driven competition.

And to me, the core Propeller concept was that constantly rotating propeller/gear/distributor sitting at the heart of the chip. You could look at a block diagram and see how the chip worked as a whole, with the COGs getting their fair share of the Hub RAM. Suddenly we now have a chip where each COG supports interleaved threads, where COGs can have unequal and competing Hub access slots, where the whole concept starts to get very cloudy.

Many of the suggested improvements are clearly driven by people personal areas of interest.

What's missing is someone sitting down with a blank sheet of paper and working out the optimum mix of features to prevent the Px being pulled too far off course. There are a number of fixed limits determined by the chosen IC process which in turn determines things like HUB access speed and the size of various elements when laid out on the die. These are physical limits which cannot realistically be changed. There are then some decisions to be made about the number of COGs, the size of HUB RAM, the number of IO pins etc etc. All of which compete for the physical hard limits. There will be no one mix that is right for everyone, but there will be a sweet spot that brings forth a chip that remains true to the Propeller concept.

Heater. · 2013-12-09 02:52

Brian,

To me, determinism is more than about just timing. It's about being able to look at a set of processes and knowing that they will just work together. It's that idea that set the original P1 apart from the interrupt driven competition.

Exactly.

However threads don't break that P1 style determinism.

We already had threads in the P1. Have a look at FullDuplexSerial for an example.

My point is that whatever code you write that runs in a COG is the atomic unit of modularization here. It does not matter if it has threads or not or if they are hardware scheduled or not. It's just a process going on in the COG like any other.

What matters is what happens when you slam two such COG "black boxes" together in your program.

Brian Fairchild · 2013-12-09 03:07

Heater. wrote: »

We already had threads in the P1. Have a look at FullDuplexSerial for an example.

Indeed, I've even taken that concept and extended it to an 8-way transmit-only fixed-baudrate UART for DMX.

cgracey · 2013-12-09 03:22

Baggers wrote: »

...
Whilst everyone is modding instructions, I'd also like to add if I may, ( So sorry Ken, although this was a request from a long while back ) but one where you can write each byte from a long into each low byte of 4 longs in a quad and another instruction to write the low bytes from a quad into a single long again. No worries if it's too late or can't be added.

Baggers, could you please elaborate? I've read this several times, but I'm not getting what you are meaning. I know this has to do with pixel components. Thanks!

evanh · 2013-12-09 03:37

cgracey wrote: »

Baggers, could you please elaborate? I've read this several times, but I'm not getting what you are meaning. I know this has to do with pixel components. Thanks!

The way I read it is in the first instruction he wants the content of one long spread into the low bytes of four consecutive longs, and with the other instruction concatenate the low bytes of four consecutive longs into one long.

ozpropdev · 2013-12-09 03:39

cgracey wrote: »

Baggers, could you please elaborate? I've read this several times, but I'm not getting what you are meaning. I know this has to do with pixel components. Thanks!

evanh wrote: »

The way I read it is in the first instruction he wants the content of one long spread into the low bytes of four consecutive longs, and with the other instruction combine the low bytes of four consecutive longs into one long.

		getbyte	x0,mylong,#0
		getbyte	x1,mylong,#1
		getbyte	x2,mylong,#2
		getbyte	x3,mylong,#3

x0		long	0
x1		long	0
x2		long	0
x3		long	0
mylong		long	$12345678

Something like this?

Edit: And the reverse of that.

cgracey · 2013-12-09 03:49

evanh wrote: »

The way I read it is in the first instruction he wants the content of one long spread into the low bytes of four consecutive longs, and with the other instruction concatenate the low bytes of four consecutive longs into one long.

Okay, Evanh and Ozpropdev. I see now. Thanks.

Ramon · 2013-12-09 04:01

Heater,

I understand your concerns about the importance of OBEX, considering that Propeller has to do every protocol on their own. It is true that compatibility problems may arise between those objects.

But said that, as long as I can use that OBEX objects that do not rely on modified HUB cycle behaviour, I consider those potential problems as MINOR problems. Current P1 (or any other IC) is not free when trying to use code writen from other people into our own applications: they can also have problems dealing with excesive number of pins used, or excesive HUB ram used.

Baggers · 2013-12-09 09:04

Hi Chip, pretty much what ozpropdev said

Also another one that would be a great help is writing long to hubram masked, ie, writing each of the 4 bytes of a long to hubram, if the byte is non zero, could be on WIDE also for big sprites.
the reasoning for this is for quickly drawing multi layered backgrounds to an 8bit screen buffer for example :-

if you were writing a long to hub ram ( maybe call the instruction WRLMASK d,s ) and the source data was
0x01004680 it would only write 0x01yy4680 and the yy would be whatever was in hub ram at that location previously.

also, just had a quick thought, could you make a WRWMASK only writing the 8 longs if the each long was nonzero?

Baggers · 2013-12-09 09:25

Heater, Yes, it was it's >4 threads I was talking about, when it would then reduce the speed of the cores.
I also hated the way it compiled code, getting it to do a display driver was a pain, as you change a single line of code and it could recompile in a totally different way due to it's "optimisation" nowhere near as good as PASM, or even spin for that matter, as because you knew how long your code would take, and if you added a bit of code, it wouldn't mess up timing of the code you had already done due to an optimisation pass, I spent more time fighting with the optimisation messing my asm up also, as it was making it slower also! I got that fed up in such a short time with it, that I just put it in the drawer where it belonged gathering dust, whereas the Prop1 and 2, have always been a pleasure to code, from the first line of code to the last.

I will admit that the hardware threading that is in the P2 is undeniably a great addition.

David Betz · 2013-12-09 09:41

Baggers wrote: »

I will admit that the hardware threading that is in the P2 is undeniably a great addition.

True it is nice however it breaks determinism with even two threads.

Heater. · 2013-12-09 09:57

Baggers,

Some bright marketing genius at XMOS decided to rename the "cores" as "tiles" and the hardware scheduled threads "logical cores". Which I noticed they refer to as just "cores" in their videos. Basically they are pulling a fast one in making potential customers think the single core chips have 8 cores. It's down right dishonest not to mention stupid.

So yes, in the new XMOS lingo, after 4 logical-cores the logical-cores start to slow down. The, up to four, tiles are however totally independent.

You have obviously tried push the XMOS further than I ever did. I had decided that life was to short to be getting into yet another assembler language for XMOS. But presumably if you stick to a fixed number of threads and have some running assembler then whatever the optimizer does to C and XC code cannot effect the rate at which the assembler threads execute.

I kind of gave up the struggle when I wanted to connect the XMOS links to external devices, say a Propeller just for fun. I thought that would be a way to communicate without wasting threads on creating a UART or whatever, just use the xlink hardware, right. Well the Prop end was easy enough, with advice from forums members we soon had a nice fast xlink driver on the Propeller. Then I hit the contortions you have to go through to drive those links on the XMOS, it's hard when you have potentially intermittent connection. Then I discovered different versions of the XMOS chips had different xlink protocols and speeds. It was all to much.

Heater. · 2013-12-09 10:02

David,

True it is nice however it breaks determinism with even two threads.

Sort of maybe. Those P2 threads are no more non-deterministic that the coroutines used on the P1 as in FullDuplexSerial.

Main thing is that those threads don't effect the timing of anything outside the COG they run on. So COG level determinism is maintained.

KC_Rob · 2013-12-09 10:07

Heater. wrote: »

Baggers,

Some bright marketing genius at XMOS decided to rename the "cores" as "tiles"

Did they get the "tiles" terminology from Tilera (or maybe it was vice versa)?

David Betz · 2013-12-09 10:09

Heater. wrote: »

David,

Sort of maybe. Those P2 threads are no more non-deterministic that the coroutines used on the P1 as in FullDuplexSerial.

Main thing is that those threads don't effect the timing of anything outside the COG they run on. So COG level determinism is maintained.

Yes, that is true although that would also be true of an XMOS "tile". However many threads you run on one tile will not affect the determinism of another tile. Is XMOS even promoting their multi-tile chips anymore? The "G" series chips are rather old. Have they been replaced with any new multi-tile chips?

Bill Henning · 2013-12-09 10:23

Now they promote based on number of "cores"

An easy way to tell how many (real) core aka "tiles" is divide the memory by 64KB

David Betz wrote: »

Yes, that is true although that would also be true of an XMOS "tile". However many threads you run on one tile will not affect the determinism of another tile. Is XMOS even promoting their multi-tile chips anymore? The "G" series chips are rather old. Have they been replaced with any new multi-tile chips?

Baggers · 2013-12-09 11:17

Heater. wrote: »

Baggers,

Some bright marketing genius at XMOS decided to rename the "cores" as "tiles" and the hardware scheduled threads "logical cores". Which I noticed they refer to as just "cores" in their videos. Basically they are pulling a fast one in making potential customers think the single core chips have 8 cores. It's down right dishonest not to mention stupid.

So yes, in the new XMOS lingo, after 4 logical-cores the logical-cores start to slow down. The, up to four, tiles are however totally independent.

You have obviously tried push the XMOS further than I ever did. I had decided that life was to short to be getting into yet another assembler language for XMOS. But presumably if you stick to a fixed number of threads and have some running assembler then whatever the optimizer does to C and XC code cannot effect the rate at which the assembler threads execute.

I kind of gave up the struggle when I wanted to connect the XMOS links to external devices, say a Propeller just for fun. I thought that would be a way to communicate without wasting threads on creating a UART or whatever, just use the xlink hardware, right. Well the Prop end was easy enough, with advice from forums members we soon had a nice fast xlink driver on the Propeller. Then I hit the contortions you have to go through to drive those links on the XMOS, it's hard when you have potentially intermittent connection. Then I discovered different versions of the XMOS chips had different xlink protocols and speeds. It was all to much.

Yeah, upon pushing it, that's when it stumbles

whereas the prop keeps on giving it's all! But yeah I know they pulled the fast one with their tiles and cores, trying to sell it as more than it was, if they'd have been honest in the first place and not changing known words with new words and replacing others with said known words lol.

I can totally agree on the Prop side of things being way easier and the XMOS being all too much!

It probably had it's uses as a chip, but for me, it just didn't do anything for me, programming wise, or fun wise.

cgracey · 2013-12-09 11:18

Baggers wrote: »

Hi Chip, pretty much what ozpropdev said

Also another one that would be a great help is writing long to hubram masked, ie, writing each of the 4 bytes of a long to hubram, if the byte is non zero, could be on WIDE also for big sprites.
the reasoning for this is for quickly drawing multi layered backgrounds to an 8bit screen buffer for example :-

if you were writing a long to hub ram ( maybe call the instruction WRLMASK d,s ) and the source data was
0x01004680 it would only write 0x01yy4680 and the yy would be whatever was in hub ram at that location previously.

also, just had a quick thought, could you make a WRWMASK only writing the 8 longs if the each long was nonzero?

That's really simple. On a hub write, I would just cancel the byte-write signals for any related data bytes that are zero. So, bytes of zero in the write data wouldn't be written, leaving what was already in hub memory alone. This could be done for WRLONG for bytes and for WRWIDE for both bytes and longs. Neat!

Baggers · 2013-12-09 11:22

cgracey wrote: »

That's really simple. On a hub write, I would just cancel the byte-write signals for any related data bytes that are zero. So, bytes of zero in the write data wouldn't be written, leaving what was already in hub memory alone. This could be done for WRLONG for bytes and for WRWIDE for both bytes and longs. Neat!

That will be an awesome addition thank Chip

PS, gonna get my board up to Coley and see if he can swap the expansion board he has onto mine, to see if it's the expansion board that's gone, and try my expansion on his board, see if it's that or the connector on the fpga board, it's a bit wobbly, so it could be that that's the issue with my setup. Either way, I'll get to the bottom of it soon

Bill Henning · 2013-12-09 11:32

Very neat - and a huge speadup for rendering sprites!

cgracey wrote: »

That's really simple. On a hub write, I would just cancel the byte-write signals for any related data bytes that are zero. So, bytes of zero in the write data wouldn't be written, leaving what was already in hub memory alone. This could be done for WRLONG for bytes and for WRWIDE for both bytes and longs. Neat!

Coley · 2013-12-09 13:39

Jim,
I have a sneaking suspicion that your DE2 is up the swanny so send me that and I will take a look, bit swamped with work at the moment but I will have a look at it if you can get it to me.

I've printed a nice plate/standoff for the expansion board so it doesn't wobble anymore ;-) I'll do one for you too if we can get it working again....

Regards,

Coley

Baggers wrote: »

That will be an awesome addition thank Chip

PS, gonna get my board up to Coley and see if he can swap the expansion board he has onto mine, to see if it's the expansion board that's gone, and try my expansion on his board, see if it's that or the connector on the fpga board, it's a bit wobbly, so it could be that that's the issue with my setup. Either way, I'll get to the bottom of it soon

Ken Gracey · 2013-12-09 13:45

Baggers wrote: »

That will be an awesome addition thank Chip

PS, gonna get my board up to Coley and see if he can swap the expansion board he has onto mine, to see if it's the expansion board that's gone, and try my expansion on his board, see if it's that or the connector on the fpga board, it's a bit wobbly, so it could be that that's the issue with my setup. Either way, I'll get to the bottom of it soon

I think you're referring to the DE2-115 I brought you two years back? If so, we should be in a position to service these pretty soon as a Terasic supplier of this particular board. I am also restarting the DE2-115 adapter board manufacturing. These two efforts should keep your FPGA boards functioning.

David Betz · 2013-12-09 13:48

Ken Gracey wrote: »

I think you're referring to the DE2-115 I brought you two years back? If so, we should be in a position to service these pretty soon as a Terasic supplier of this particular board. I am also restarting the DE2-115 adapter board manufacturing. These two efforts should keep your FPGA boards functioning.

Ken,

If you manufacture more of the DE2-115 add-on boards could you please make the board a little wider and include mounting holes so it can be attached firmly to the DE2-115 board? There are mounting holes on the DE2-115 board on either side of the connector that the add-on board plugs into. It should be pretty easy to just widen the board and drill some holes for secure mounting.

Thanks!
David

Bill Henning · 2013-12-09 14:12

+1

I have a sticky note tower underneath mine to keep it level (and the connectors safe) that I hate.

David Betz wrote: »

Ken,

If you manufacture more of the DE2-115 add-on boards could you please make the board a little wider and include mounting holes so it can be attached firmly to the DE2-115 board? There are mounting holes on the DE2-115 board on either side of the connector that the add-on board plugs into. It should be pretty easy to just widen the board and drill some holes for secure mounting.

Thanks!
David

jmg · 2013-12-09 14:13

David Betz wrote: »

If you manufacture more of the DE2-115 add-on boards could you please make the board a little wider and include mounting holes so it can be attached firmly to the DE2-115 board? There are mounting holes on the DE2-115 board on either side of the connector that the add-on board plugs into. It should be pretty easy to just widen the board and drill some holes for secure mounting.

Yes, adding mechanical support is always time well spent

Ken Gracey · 2013-12-09 14:57

jmg wrote: »

Yes, adding mechanical support is always time well spent

I paid a visit to our manufacturing to see what kind of tooling investment we have in the Terasic DE2-115 Adapter Board. Answer: not much. Though our stencil is metal it's also a one-up and there's little investment in the PnP programming due to the small number of parts, so I can easily make PCB changes. Believe it or not, this is the first time I've had a close-up look at this board - Chip handled it sorta informally so it didn't get a Parallax part number and the benefits of our formalized business tools that keep making whatever we ordered until somebody tells the machines and people to stop. I can tell this is his design because it has no mechanical mounting holes [like the original Propeller Demo Board . . .hope he doesn't read this!].

Any more changes other than widening the board to accommodate two mounting holes and hardware for .020" spacers?

What about this vertical PropPlug header? If we widen the PCB and move the header a little north then I could replace it with a right-angle connector. However, the board would need to be widened enough so that the right-angle header is entirely within the confines of the PCB perimeter (we send populated boards through a pizza cutter).

Thanks,

Ken Gracey

Propeller II update - BLOG

Comments