Prop II: Speculation & Details... Will it do what you want???

tonyp12 · 2011-05-06 12:16

I hope that the boot loader can also look for SPI flash, and not just i2c and file system on sd-cards.
For when you want share boot code with 2MB of data files,
and i2c is to slow and 4gb sd-cards are overkill.

potatohead · 2011-05-06 12:27

@Phil, good point! That will be a hard habit to break for me. Maybe when we see more stuff targeted to it...

David Betz · 2011-05-06 12:32

Phil Pilgrim (PhiPi) wrote: »

The Spin interpreter has to stay in ROM. That said, we should quit calling it a "Spin interpreter." It simply is not. It's an interpreter for bytecodes that can be the target many languages, not just Spin. For that reason, it's too useful not to stay on-chip.

-Phil

Where can I find the official documentation for the Spin VM? I would like to write code to target it but I haven't found a definitive description of the instruction set, memory layout, etc. Please don't tell me I can figure it out by reading the source code because that's not really reasonable.

Phil Pilgrim (PhiPi) · 2011-05-06 12:51

David,

AFAIK, there is no official documentation for the bytecodes, although I believe unofficial bytecode lists can be found within these forum walls. Those who have written their own Spin compilers have had to resort to the "unreasonable" option of scouring the source code for clues.

-Phil

David Betz · 2011-05-06 12:54

Phil Pilgrim (PhiPi) wrote: »

David,

AFAIK, there is no official documentation for the bytecodes, although I believe unofficial bytecode lists can be found within these forum walls. Those who have written their own Spin compilers have had to resort to the "unreasonable" option of scouring the source code for clues.

-Phil

I guess I'm really saying that if there is no official description of the VM then it really is a "Spin" VM. The Java VM is well documented and a number of languages target it. If we could have an officially sanctioned definition of the "Parallax VM" (my attempt to decouple the Spin language from its VM) then more languages might use it.

Phil Pilgrim (PhiPi) · 2011-05-06 13:00

David,

Parallax may be in a difficult position regarding documenting the VM. I don't know for sure, but I assume they think that if they document it, they have to support it. OTOH, they would probably benefit by making it easier to port other languages to the Prop. The best middle ground might well be documentation with a non-support disclaimer.

-Phil

David Betz · 2011-05-06 13:05

Phil Pilgrim (PhiPi) wrote: »

David,

Parallax may be in a difficult position regarding documenting the VM. I don't know for sure, but I assume they think that if they document it, they have to support it. OTOH, they would probably benefit by making it easier to port other languages to the Prop. The best middle ground might well be documentation with a non-support disclaimer.

-Phil

Maybe someone should try to assemble a compilation of all of the "instruction lists" and unofficial notes on the Spin VM and try to put them together into a description we could run by Chip for his blessing? It should probably be run by this forum first since there are many people here who know a lot about it.

Heater. · 2011-05-06 13:22

@potatohead

the benefit is known implementations that focus development

I agree. However moving the ROM contents out to your EEPROM does not sacrifice those "known implementations"

Imagine that when you hit F11 in your Prop Tool your program is compiled and the Spin byte code interpreter code is linked into the resulting binary for download to the Prop. When loaded into the Prop that interpreter is loaded to a COG and runs your Spin byte codes. As it does now.

The user need never know this is happening. There would be no difference. The known implementation would be used. Same goes for the font tables, trig tables and whatever else. It is no different than using standard libraries in many other systems.

@Phil Pilgrim.

The Spin interpreter has to stay in ROM.

I'm sure that it will. I still not sure why it has to.
I see no need for any ROM content apart from the boot loader. As I said none of it is of any use until code is loaded and running.
Unless the boot loader is written in Spin that is. (Is it?)

...we should quit calling it a "Spin interpreter."

You are right. That is sloppy. It's lazy usage "Spin" the language leads to "Spin byte codes" leads to "Spin Interpreter".

By the way isn't it called PNUT or some such? Sure I saw something like that in the published interpreter source code.

It's an interpreter for bytecodes that can be the target many languages, not just Spin. For that reason, it's too useful not to stay on-chip.

I'm still not seeing the logic that makes that second sentence follow from the first. The interpreter is the same useful if it is loaded COG at boot time from your binary image.

potatohead · 2011-05-06 13:24

Interesting thoughts.

It was built for SPIN, right? Happens to be useful for other things, which is a bonus. I can see this line of reasoning. I think I agree actually. Perhaps the next revision will be focused on more than SPIN. This one though, is SPIN.

As for source code documentation, I'm of two minds on that. One, the code is a complete specification. Hard to use though. I do consider it the authoritative document however. We have that code in source form, and that's as good as it ever gets. All behavior can be known, tested for, etc...

Seems to me, additional documentation, written for the purpose of using that interpreter for things other than SPIN, would require some support. Fair call.

What is the overall value of that kind of effort, particularly when we've got zog, catalina, general LMM, SPIN in RAM, etc...?

To me, the high value is having SPIN itself. Porting other things to that system won't bring a significant speed boost, over say, LMM. And the large body of SPIN code trumps other languages using that byte code, rapidly opening the door to the question, "Why not just use LMM, PASM, etc... or just SPIN?"

What can we port to the byte code engine that would be of high value?

potatohead · 2011-05-06 13:31

@heater, I'm struggling with this, having bounced back and forth over the years.

On one hand, I agree with you, but... for the fact that it is software, and once it's software that can change, it will change. With this crowd, do we expect anything less? (and I mean that in the very best of ways)

So then, if it's loaded at boot, from the boot device, it's gonna see fragmentation, and that speaks to a more general purpose CPU purpose, not a micro-controller type device.

Frankly, I would like to see that direction taken at Prop III, where there is no HUB on chip. Then we have a CPU, large address space, and all the structure needed to really extend and scale the device.

Prop II really is just a nice step up from Prop I. Say SPIN was loaded as part of the EEPROM image on Prop I. And that could have been a option, if Chip were not worried about protecting that at the time. Later on, he saw the same value many of us did, and released it.

So say that never happened, and it was just stuffed into the EEPROM.

Wouldn't there now be Cluso's SPIN, Phil's SPIN, etc???

And how would that have impacted the boot strapping we all contributed to over the years?

Not sure a ROM makes bad sense, given these things.

Also, it seems to me, the circuit size differences mean ROM isn't that expensive. There seems to be more downsides to a load it all approach than upsides to me right now.

David Betz · 2011-05-06 13:34

potatohead wrote: »

One, the code is a complete specification. Hard to use though. I do consider it the authoritative document however. We have that code in source form, and that's as good as it ever gets. All behavior can be known, tested for, etc...

In a technical sense you are correct. I suppose you could also say that, even if we didn't have the source code, the binary of the Spin VM is the complete specification. This is the problem I have with a lot of open source code projects as well. Sometimes (not always), the availablity of source code is used as a reason not to provide decent documentation. While, for instance, it is true that Linux source is available so you can fix any bug you find in Linux, for practical purposes this isn't possible for 99% of the people who use Linux and hence having source code brings no additional value to them. I think it is fair to say that the Spin VM is only supposed to execute compiled Spin code on the Propeller 1 and no other guarantees are given or implied. The VM for Propeller 2 will undoubtedly be different and incompatible and also intended solely for executing compiled Spin code. This is fine and I have no problem with it as long as we don't pretend that the VM is a general purpose execution environment intended to be used by multiple languages. As you say, there is no real need to target the Spin VM. There are many other alternatives.

Phil Pilgrim (PhiPi) · 2011-05-06 13:36

potatohead wrote:

It was built for SPIN, right? Happens to be useful for other things, which is a bonus. I can see this line of reasoning. I think I agree actually. Perhaps the next revision will be focused on more than SPIN. This one though, is SPIN.

No. The VM was designed to take optimal advantage of the Propeller hardware features, as was Spin. A correlation does not necessarily entail causality.

-Phil

potatohead · 2011-05-06 13:50

@David: Yes. I hear that loud and clear.

The way I always sorted things out was it wasn't so important that any one user could make use of the source code, only that it be possible for them to do so. Lots of ways to get use value out of source code.

One is to just use it! That's done all the time. Another is to modify / change / improve it. That's done all the time too, though by far fewer people. Often, people who contribute have time to fill a role, and that's it.

Documenting things is another value. It's often the case that the code works as intended, but was never fully specified. A documentation project could have as much value as the code did, but only to those who would make use of it. The dynamics on docs are the same as code, with just the user profile being different.

In the case of SPIN, the spec was to implement SPIN on the Prop to expose the chip to people in a simple and effective way. Nailed it!

So then, how many people would target the byte codes, given those documents exist? That's the question I asked above, and it's always the core question for docs, IMHO. When I think about it, I don't find very many use cases that make sense, that don't also make sense for LMM, etc...

And supporting docs isn't a whole lot different from supporting code.

There is a similar thing in play with the video generator. It was built and works as designed, but I am not entirely sure the ways it's being used today were even on the table when it was built. Repeated calls for documentation have been made, each with solid reasons. In each case, it's difficult because the default cases are pretty easy, and the more intriguing ones are not, which makes good documentation hard. It's quite possible to produce a very solid set of docs, yet still have people say, "that was not in the documentation", and they would say so, because the system was designed to be enabling, not the means to a end.

What I mean by that is how the generators get used is a function of what they do, plus what we discover about that. This is a function of software defined systems. The more grey matter applied over time, the more they do, and often they do things in ways not even on the drawing board early on.

The SPIN interpreter is the same way. We know the core of what it does, but over time, have found a lot of use cases that lie outside what was planned. How do we document those without first having done the innovation required to realize the use case in the first place?

Secondly, a list of the byte codes and behavior is possible, but only after running a full test suite on the thing, right? All that was necessary to realize the prop was to vet SPIN against the planned use cases. Anything else is more or less up to those that decide to blaze that trail.

I guess I am saying I question the value of labor intensive documentation efforts, without high value use cases attached to them, and I can see no such case regarding the SPIN byte code engine. And the video generators were mentioned, not to start another discussion on that, but just as a parallel, in that we are just now exploring some of the more technical boundaries of it. None of that stuff was planned early on, meaning the docs required to understand that behavior, before attempting it, seems a very large and expensive project that may well not pay off.

If we had a set of docs that would allow better faster ports to the SPIN byte code engine, what would be ported and why and how does that compare to say, LMM, etc...?

Sal Ammoniac · 2011-05-06 13:50

Leon wrote: »

The 16-bit PICs were designed to execute high-level languages like C efficiently.

Leon,

I don't think you're ever going to convince anyone as to the validity of your statement. Even given the nature of the kludges needed to make C work at all on the Prop, it still doesn't seem to sink in that the chip was never designed to execute C efficiently and that even something like a PIC will run rings around it running C, and a $3 ARM will blow it completely out of the water. I don't see this changing on the Prop II either, given what I know about its architecture.

I've given up trying to use the Prop as an application processor, rather I'm using it as an I/O processor (where it shines) and sticking to ARMs for the app code for my projects.

Heater. · 2011-05-06 13:52

potatohead,

Wouldn't there now be Cluso's SPIN, Phil's SPIN, etc???

Good point. Yes there is room for things to get messy.

However there might also be a Heaters ZOG interpreter for GGC, a RossH PASM kernel for Catalina, an ICC PASM kernel for C, something for forth or BASIC or whatever. How sweet would that be? None of those systems even want or need Spin. This fracturing happens anyway, just at a higher level. Think of the added features in the HomeSpun and BST Spin compilers that can make code incompatible with the Propeller Tool.

Those four ROM blocks on the image posted earlier sure like like that they eat a lot of space that could otherwise be used for active circuitry. Rather than sitting there dead wasting space as they do when I run C under ZOG or Catalina.

potatohead · 2011-05-06 13:53

@Phil, yes. Figures a PERL programmer would get right to the meat of that.

So yeah, I can't disagree. However, it is SPIN. I do not believe any plans to use it in a general way were made, until after we worked with the chip for a while. And I've got the same question for you that I put to David: What would be ported that would have value enough to warrant doing the work to document it as a general case byte code engine?

(Which makes my case for SPIN, barring some good answer, that would / could be known back at the time of development)

David Betz · 2011-05-06 13:59

potatohead wrote: »

I guess I am saying I question the value of labor intensive documentation efforts, without high value use cases attached to them, and I can see no such case regarding the SPIN byte code engine.

I completely agree with you and hence I think we should just regard it as the "Spin VM" rather than thinking of it as a general target platform. My point was just that *if* we want to think of it as a general target we should have documentation at the level of what is supplied for the Java VM. Since we don't care to use it for anything other than running compiled Spin (which it does extremely well!), there is no need for that documentation.

potatohead · 2011-05-06 14:05

@heater

Well, yes. But... What active circuitry? And how does the value of that balance with some known reference things everybody can count on absolutely?

It's not that doing it that way is without merit. Totally hear you on that.

The overall value equation doesn't add up for me. Having the reference tools means code focus, and that's a pretty high value thing, particularly given how different the prop is. We need some of that code focus, because it helps to add value to overcome the differences as a exception to adopting the chip in the first place.

@Sal, Leon: I don't see your statements as fair. If you set the expectation that PASM is the expectation for execution speed, then C looks a bit weak on the Prop. However, that's a poor expectation, as is the one where C equals applications. There are a lot of things written in C, but then again, there will be more and more things not written in C over time too.

The prop is a concurrent multi-processor, and because of that isn't a simple, high speed linear compute chip. When tasks are done in parallel, Props smoke a lot of other CPUs, which is why we are here.

Now, the view of COG as micro-code puts C, etc... into perspective. Prop I is a bit slow, and that's largely because the virtual machine model wasn't really thought through at the time of development. On this next iteration of the Prop is is known, and that will matter. Execute speed on Prop II is going to be potent, making LMM very possible and practical, basically extending the COG as peripheral model to C with few trade-offs.

A quick look at what Ross has done tells me that C can be bent to work on a Propeller, and work well. A single COG running C code LMM style can be augmented with COGS doing things as needed, placing the PASM speed where it's needed, and the slower bits where needed. LMM, XMM, PASM can deliver great performance, where the application profile is considered and the resources of the device are applied appropriately.

That's the trouble. It's not just compile 'n run in most cases, and that is a direct artifact of how the Prop operates. Lose that, and you don't have a prop.

Over time, the efforts needed to make better and easier use of the chip features will continue to reduce the impact of this problem. New CPU designs need boot strapping. If they don't, then they simply are not new, just variations on the same old same old.

A look back, compared to now, reveals a lot has been done. C today can perform as SPIN does today, with PASM filling in just as it does for SPIN, and that's how the chip operates today. Good alignment.

Prop II will execute LMM considerably better, and will have the on-chip RAM resources needed to do bigger scale things. In short, the "can't do applications" will absolutely go away.

@David: totally.

Heater. · 2011-05-06 14:20

potatohead,

But... What active circuitry?

No idea. A while back Chip was asking the forum if he could drop the lock hardware. Obviously he felt pushed for space or had some other functionality in mind that could use the space.

But as an example: What about those super high speed hardware serial links that could be used to build chains or arrays of Propellers on a PCB?
Many here dream about having more memory, others want 16 COGS not 8, Being able to build multi-prop systems in a standard way with high speed channels would serve to satisfy those power hungry users. Especially if Spin if and the Prop tool could encompass all code for all those connected Props in your one single application, compiled with F11 and transparently downloaded to all connected Props.

I can dream a bit some times...

Sal Ammoniac · 2011-05-06 14:28

potatohead wrote: »

@Sal, Leon: I don't see your statements as fair. If you set the expectation that PASM is the expectation for execution speed, then C looks a bit weak on the Prop. However, that's a poor expectation, as is the one where C equals applications. There are a lot of things written in C, but then again, there will be more and more things not written in C over time too.

It depends on your prospective. Like it or not, the professional embedded development community uses C as its dominant language (with C++ in second place), and that is not going to change anytime soon. C support is what they expect, and if it's not there, or not efficient or straightforward to use, they'll look elsewhere. It looks like Parallax is going after this market with their launch of Parallax Semiconductor, but without strong C support, I don't see them making much headway.

The prop is a concurrent multi-processor, and because of that isn't a simple, high speed linear compute chip. When tasks are done in parallel, Props smoke a lot of other CPUs, which is why we are here.

That is true, and that's a strong selling point of the Propeller as we're all aware. On the other hand, there are architectures like the unmentionable 'X' chip that is also multicore, but was designed from the ground up to run C efficiently and it supports the type of debug environment (Eclipse/JTAG) that professional embedded developers have come to expect and demand.

It's also a mistake to assume the rest of the industry is going to stand still with respect to multicore processors. It won't be long before many MCU processor architectures are multicore just like most PC processors are today.

Ariba · 2011-05-06 14:41

Heater. wrote:

Imagine that when you hit F11 in your Prop Tool your program is compiled and the Spin byte code interpreter code is linked into the resulting binary for download to the Prop. When loaded into the Prop that interpreter is loaded to a COG and runs your Spin byte codes. As it does now.

The user need never know this is happening. There would be no difference. The known implementation would be used. Same goes for the font tables, trig tables and whatever else. It is no different than using standard libraries in many other systems.

@Phil Pilgrim.

The Spin interpreter has to stay in ROM.
I'm sure that it will. I still not sure why it has to.
I see no need for any ROM content apart from the boot loader. As I said none of it is of any use until code is loaded and running....

The reason is very simple: ROM space is 1/6 of RAM space on the chip!
So if you don't implement the Spin-VM in ROM you get the space for additional 496/6 = 83 longs of HubRAM.
Then when the compiler has to add the Spin-VM to the code, the code will be 496 longs longer and this code goes into HubRAM so at the end you have lost 496-83 = 413 longs of HubRAM.
Don't say that RAM space can be used later for data, in this case you can't start any other Spin cog from your code.

Andy

Heater. · 2011-05-06 14:48

Ariba,

OK. Can't fault that logic.

davidsaunders · 2011-05-06 14:51

On the other hand, there are architectures like the unmentionable 'X' chip that is also multicore, but was designed from the ground up to run C efficiently and it supports the type of debug environment (Eclipse/JTAG) that professional embedded developers have come to expect and demand.

The 'X' is only available up to 4 cores, the Prop/Prop II are 8 core. As to C support on the Prop II, this will be simple and transparent to the programmer. The architecture will allow LMM code to run at nearly half native speed, couple this with GCC, and you can have > 60MIPS per cog C that looks to the programmer as if it were in a linear address space. Because of the Prop II architecture it should be possible to have a 60 MIPS per cog XMM C compiler just as transparent, so long as your app can sacrifice 2 cogs, effectively giving you a 6 core CPU at 60MIPS+ per core, and up to 4GB external memory.

In short the Prop II will fulfill this need plus some.

potatohead · 2011-05-06 14:51

Well, let's say LMM runs very fast on Prop II. Catalina, if nothing else, is going to be effective. Seems to me, that's going to open the door for C.

Given the changes we've seen; namely, increased throughput between HUB and COG, rep instruction, and some auto-increment / decrement / index capability, LMM kernels are going to be quite fast. A Propeller used in that way is going to look a lot like a linear compute chip, and it could even be interrupted, with some minor speed trade-off in the LMM kernel.

Software defined silicon then. It's not too unreasonable to see a interrupt capable kernel, or kernels running on a few COGS, do exactly what other chips do. Or... not! That's the beauty of it.

As for headway, some perspective is in order. It is not necessary for Parallax to become the next Microchip. It is only necessary that the share needed to support the propeller ecosystem at a nice profit be realized.

Apple has done that for years, and today has extended that to be very profitable, leveraging many core differentiators over time to build a very solid business that continue to have a moderate overall PC share.

The same is possible for Parallax, and the fact that they are a private company is actually a significant competitive advantage here.

For that to occur, it's only necessary for a small sub-set of overall potential users of the technology see the value of the chip.

The focus then is on margin and making the most of the differentiators, and communicating that to people clearly so that those prospects, who would see the competitive advantage, do so, and adopt the tech successfully.

It is not possible to do something that is actually new, without also doing the boot-strapping needed to enable successful adoption of the technology. Parallax sees that, and is making the efforts necessary. From there, it's just a matter of time before we know whether or not the new effort is successful enough to be longer term viable.

I suspect it is, and again Parallax being a private company means the variance on what is possible and practical is considerable, and not directly comparable to others in this tech space, who do not operate on the same basis.

Seen this movie a lot of times. Parallax is actually well positioned here, and that's all we need to know as users right now. Our futures are fairly secure, with a lot of upside possible.

David Betz · 2011-05-06 14:55

davidsaunders wrote: »

The 'X' is only available up to 4 cores, the Prop/Prop II are 8 core.

Each of the "X" processor core supports 8 hardware threads. That means you can have 32 threads running on a 4 core chip.

Leon · 2011-05-06 15:10

Each thread is 50 or 100 MIPS giving 400 MIPS per core.

Multiple chips can be interconnected via very fast XLinks for very high-performance deterministic parallel systems.

Sal Ammoniac · 2011-05-06 15:14

davidsaunders wrote: »

The 'X' is only available up to 4 cores, the Prop/Prop II are 8 core.

Each XMOS core supports 8 hardware threads. Even if you restrict it to 4 to make sure you get full speed per thread, you still have 4x4 = 16 hardware threads and 1600 MIPS on a 4 core chip.

As to C support on the Prop II, this will be simple and transparent to the programmer. The architecture will allow LMM code to run at nearly half native speed, couple this with GCC, and you can have > 60MIPS per cog C that looks to the programmer as if it were in a linear address space. Because of the Prop II architecture it should be possible to have a 60 MIPS per cog XMM C compiler just as transparent, so long as your app can sacrifice 2 cogs, effectively giving you a 6 core CPU at 60MIPS+ per core, and up to 4GB external memory.

In short the Prop II will fulfill this need plus some.

But it won't have hardware debugging, which is a requirement of most professional developers.

Heater. · 2011-05-06 15:22

OK. At risk of life and limb here I am going to make the Prop-XMOS comparison.

Let's take the Prop II projected performance and the current XMOS XS1-L1 for comaprison.
These two are likely of similar price.

Cores     RAM   I/O   Max H/W  MIPS/Thread       Price     
                Pins  Threads                           
XS1-L1  1 64K   36    8        50                $7.5
XS1-L1  1 64K   64    8        50                $7.5
Prop II 8 128K  92    8        160 (40 MIPS LMM) $12
Prop I  8 32K   32    8        20  (4 MIPS ???)  $8

Notes:
Approximate prices. Prop II price is unknown of course.
XS1 MIPS can go to 100 when 4 or less threads are used.
I have not counted Prop HUB ROM, COG RAM or XS-1 OTP.
Have now included the 64 pin X chip.

Hmm...Is Prop II: Too little, too late ?

Looks to me like it's going to stand up very well. More pins, more RAM, good turn of speed. Even LMM C will be in the running.

markaeric · 2011-05-06 15:35

At what point would increased hub access frequency show little real world benefits? With the prop2, we'll be able to rd/wr 4 LONGs per hub cycle, have access to it every 8 clocks, and execute 1 instruction per clock (for the most part). Besides for code that doesn't synchronize to the hub (code that might try to read/write to the hub randomly, but as quickly as possible), at what point does hub access no longer become a bottleneck for LMM kernels, or other high hub-throughput code? I know that the hub model itself is in itself a bottleneck when compared to cog ram, but given that it won't change, would hub access at rate greater than once every 8 clocks really be worth while? What about 4?

A practical approach to increasing hub throughput without requiring some bandwidth allocation mechanism (and the associated problems it can create when using objects from the obex) was on my mind several days ago. The solution I came up with basically breaks up hub ram into 8 banks (or more if there were more cogs) with sequential addressing - where bank A is accessed with address range $0000 to $0FA0, bank B $0FA1 to $1F40, etc, etc (assuming 4K banks with a total of 32K ram). Each cog then contains it's own "inverse hub" where each cog has access to a bank of memory every clock in sequential order. Example: cog0 start with access to bank A, while cog 1 has access to bank B... cog7 has access to bank H, next clock cog0 has access to bank H, cog1 has access to bank A, so on and so forth. This ensures all cogs always have the same bandwidth and can give a cog access to anywhere in memory in a maximum of 8 clocks. The biggest problem I can see is that it can make it a bit more difficult to allocate memory for large buffers but should be possible with careful programming and capable programming tools.

Leon · 2011-05-06 15:39

Heater,

But by the time the Prop II comes out, XMOS will probably be shipping their next generation devices, delivering 1000 MIPS/core (a guess on my part) with a lot more RAM.

Prop II: Speculation & Details... Will it do what you want???

Comments