Why is Spin an Interpreted Language?

bgp · 2013-09-20 18:16

I am new to microcontroller programming, but have been doing software development for a long time.

I'm curious why Spin was designed as an interpreted language, rather than compiled. I've written several programs in Spin so far and am still learning the language, but I haven't come across anything which would seem to be impossible or highly impractical to implement if it were to be directly compiled.

My first question was why Spin existed in the first place, rather than using a language like C. However, after learning a bit more I see some significant advantages to having a new language (my thought on any new language is always "dude, if you're going to make a new language, please have a good reason"). Things like having control of the IO pins built into the language instead of using functions or macros in C, and the fact that you don't want C coders jumping in there and trying to do malloc() all over the place like they would on a PC with lots of memory. It also gives the ability to totally control various optimization points the same team making the language is making the hardware.

And clearly PASM has it's purpose and that makes sense.

But specifically on the point of interpreted vs. compiled for Spin - I guess I just don't see the point in having it interpreted. I would think writing a Spin -> PASM converter would be a not-infeasible task and a good overall approach. Was it simpler to implement as interpreted? Easier to keep compatibility and still move forward by having the bytecode as an intermediate layer? Or maybe higher level concepts like "objects" made more sense to implement with a bytecode layer? Something else?

Just interested in thoughts on that from those informed of such things.

Bill Henning · 2013-09-20 18:30

PASM is limited to a maximum of 512 instructions in a cog (well, 496 if not using the shadow registers), and each instruction takes four bytes.

Chip designed Spin to fit the Propeller nicely (and it does), and to allow writing larger programs (byte code programs take less space) with assembly language drivers/code as needed. Think of Spin as high-level "glue" for fast-as-possible pasm code.

Since the advent of LMM, it would be possible to write a Spin -> LMM compiler, however compiling to byte code allows much larger (if slower) programs.

Mike Green · 2013-09-20 18:45

A few other details:
1) The instruction set of the Propeller does not have index registers. To access cog arrays, you have to use instruction modification. Similarly, access to bytes is relatively expensive. It's much better to have the code to do these things only once in the cog and access this code through an interpreter mechanism.
2) Multiply and divide have to be done by subroutine. There are no instructions to do these. Like #1, using an interpreter is a much better way to include these operations in your program than subroutine calls or repeating the code when needed.
3) PropBasic has an option to produce PASM. It's a very simple compiler. Have a look at the PASM code it generates for simple statements and expressions.
4) Most program have only small portions that can be improved by using something like PASM and you don't know which portions until the program is working and you can analyze it for usage patterns. Using an interpretive code is actually a very useful balance for optimal space and speed when small pieces (like time-dependent I/O drivers or a floating point library) are rewritten in PASM.

jazzed · 2013-09-20 19:14

Propeller-GCC can produce native COG code programs. We have some examples of it. Almost anything else is interpreted except PASM and PropBasic native programs. Those tiny COGs are challenging, but having 8 of them is much better than most single CPU solutions with big memory address spaces.

Bill's answers were pretty good.

Heater. · 2013-09-21 01:04

To my mind the fundamental reasons for Spin being compiled to byte code are doen to the Propeller architecture.

The Propeller architecture is such that it's CPU's (COGs) can only execute native instructions from their 512 LONGS of local memory space. To execute code stored in the larger HUB memory it has to be fetched with special instructions (RDBYTE, RDWORD,RDLONG) into the COG.

That is to say you have to write a fetch, execute loop in PASM that runs in the COG working on code fetched from HUB. Basically an interpreter or virtual machine.

The obvious way to do this is to compile your high level language to byte codes and run them like the Java VM. Of course that VM or interpreter has to fit within 512 LONGS of COG memory so the byte codes had better be very simple and efficient to interpret and execute.

Now, one could compile a "normal" language like C to those byte code. But I think Chip wanted something more approachable by beginners to programming and that made use of all those COGs very easy.

Further. Their is a technique for fetching PASM instructions, instead of byte codes, from HUB and executing them in a very tight loop in the COG. So called Large Memory Model (LMM) This allows to run native code stored in HUB at about one quarter full speed. And that allows languages like C to be compiled to big HUB resident binaries. All three of the Propeller compilers do this: ImageCraft, Catalina, propgcc.

Problem was the LMM technique was not discovered until some time after the Prop was in production! Thank you Bill Henning. Also the resulting binaries are rather large. You can get a lot more functionality into the Prop with Spin than C compiled to LMM. Although the LMM will be faster.

The fact that Spin is interpreted and therefore rather slow does not matter so much on the Prop because for the parts of your code that need to be speedy we have PASM that can run in it's own COG very fast. PASM is bout the easiest assembly language to use ever and it is integrated into the Spin language and compiler in such a way as to make it's use very easy and natural. The best of both worlds.

Given all that, we see there are very good reasons for inventing Spin and its byte codes. It is a very elegant solution to the problems the Prop architecture presents.

Edit: Many argued, myself included, that building a C compiler to generate native code that executes directly in COG at full speed would be a big effort for little benefit. The space availble is to small, the instruction set does not suite C, no index registers, no stack operations etc etc. As it happened the propgcc team created an in COG code generater as one of the first steps to targetting GCC at the Propeller. Turns out it actually useful, you can write in COG programs in C. More amazingly you can have your HUB resident C code "cached" in and out of COG so that tighht loops run at full speed.

pedward · 2013-09-21 13:30

SPIN is a compiled and bytecode interpreted language. The code density is about 4x that of raw machine code, however the speed is about 40x slower than raw machine code. The primary advantage of this design is having a high level language that uses sparse amount of resources at runtime in exchange for absolute speed.

bgp · 2013-09-21 18:38

When I posted the question I didn't quite track with the difference between HUB memory and COG memory - just that alone makes a huge difference in how things would be organized. Also the code density point, with the assembly instructions all being dwords and SPIN being bytecode. And it is also very true that in many cases the performance cost imposed by the interpreter is not a problem (in a cog that spends most of it's time waiting for input, for example), and worth it in terms of the other advantages.

I guess my only other question would be why was the hardware designed like that (instructions only runnable from one cog's memory [to prevent having to lock memory?], and dword fixed-length machine code instructions). I assume for simplicity/efficiency, but I don't claim to know even a fraction of the factors that go into microcontroller design, so will leave it at that.

But from the above discussion that definitely helps clarify a number of the factors involved. Awesome. It does make a lot of sense and seems quite powerful, especially given the constraints involved in something this small.

Mike Green · 2013-09-21 19:13

The cogs are fairly complete functional units including their own memory so they can function independently and simultaneously. HUB memory is treated somewhat like an I/O device in that special instructions are used to access it via the HUB logic which handles the sharing. The instructions are 32-bit because that's the basic memory and computational unit for the cog. 32-bit arithmetic is handy for a lot of the anticipated uses of a cog. With 32-bits, you can have a good RISC (reduced instruction set computer) with two addresses supplied and one assumed (2nd source same as destination). You can have a conditional execution mask, optional result writing and plenty of opcodes.

pedward · 2013-09-21 22:30

To have simultaneous access to the HUB memory would require something like 16 ports and make the muxing logic out of control (use up a lot of silicon). The Propeller 2 has 4 ports IIRC, Chip could clarify.

bgp · 2013-09-21 22:48

Mike Green wrote: »

The cogs are fairly complete functional units including their own memory so they can function independently and simultaneously. HUB memory is treated somewhat like an I/O device in that special instructions are used to access it via the HUB logic which handles the sharing. The instructions are 32-bit because that's the basic memory and computational unit for the cog. 32-bit arithmetic is handy for a lot of the anticipated uses of a cog. With 32-bits, you can have a good RISC (reduced instruction set computer) with two addresses supplied and one assumed (2nd source same as destination). You can have a conditional execution mask, optional result writing and plenty of opcodes.

Gotcha, I'll read up more on instruction encoding, but I can see that a lot of stuff gets packed in there and allows for good performance by having everything you need to execute the instruction one shot.

pedward wrote: »

To have simultaneous access to the HUB memory would require something like 16 ports and make the muxing logic out of control (use up a lot of silicon).

Yeah, I get the idea, makes sense.

--

Thanks everyone!

bgp · 2013-09-21 23:12

Heater. wrote: »

Further. Their is a technique for fetching PASM instructions, instead of byte codes, from HUB and executing them in a very tight loop in the COG. So called Large Memory Model (LMM) This allows to run native code stored in HUB at about one quarter full speed. And that allows languages like C to be compiled to big HUB resident binaries. All three of the Propeller compilers do this: ImageCraft, Catalina, propgcc.

Problem was the LMM technique was not discovered until some time after the Prop was in production! Thank you Bill Henning. Also the resulting binaries are rather large. You can get a lot more functionality into the Prop with Spin than C compiled to LMM. Although the LMM will be faster.

@Heater - This is an interesting concept, and looks like it allows one to "bolt on" an execution model that is more akin to what you'd see being done on a traditional x86 processor - by emulating it with this simple copy-and-execute layer. I wonder if this approach will become more common - being a good way to get performance and still have the code residing in the (more spacious) HUB memory. Although the other side of this is (afaict) the reason it is necessary is due to the cruft that comes along with the C runtime library and whatever other overhead exists from that environment (which Spin avoids by being purpose-built for it's task).

Heater. · 2013-09-22 00:57

bgp,

The Large Memory Model (LMM) is actually quite subtle an ingenious thing that relies on some almost unique features of the Propeller MCU in order to work.
Let's see:

1) All instructions are 32 bit wide, no exceptions. That means you can have a very small simple loop that fetches instructions from HUB and runs them. It never has to woory about how much to fetch.

2) Prop instructions never set the flags (condition code like zero, carry) unless you indicate that they should. That means that whatever the fetch/execute loop does it will never mess up the flags as set by the actual user program you are tryiing to run. No need to save restore flags in that loop.

3) Again because all instructions are the same size when you fetch them you can drop them into a slot in the very loop that is doing the fetch/execute and they will be run as it goes around. No need to save them in some data area then jump to execute them then jump back to the loop.

4) As you see from the architecture docs COGs take turns to access hub memory. They can only access HUB every 16 clocks or four instructions (I think). Turns out you make that fetch execute loop hit a HUB access slot pretty much every time by unrolling the loop. This gives you a fetch/excute engine running at one quarter COG speed. Not bad.

5) Perhaps there is some other Prop feature that helps LMM I have forgotten here.

As you see LMM is quite a trick. No wonder nobody thought of it until way after the prop was released. It is Bill Henning we have to thank for that.

It could be come more popular. We all ready have three C compilers for the Prop that use LMM. I have used LMM in part of an emulation of the Z80 processor that allows one to run CP/M on the Prop. I have also made a toy high level language that generates LMM binaries.LMM is not just about the C language. You can write LMM programs in assembler if you like.

LMM gives you the ability to have bigger programs than fit in COG at an execution speed that is one quater of native COG speed. That's perhaps ten times faster than the Spin byte codes.

I would not worry about the cruft of run time of C and their libraries. I have often used C code with no standard run time support or C libraries. This is common practice amoung users of small MCU's in embedded systems who are short on space. If ou only want to flash a LED you don't need to include printf!

Besides, compilers like Catalina and propgcc offer very small run times and customized versions of standard lib functions that are very small if you need them.

Bill Henning · 2013-09-22 05:57

(Thanks Heater)

bgp:

Here is the original thread where I announced LMM:

http://forums.parallax.com/showthread.php/89640-ANNOUNCING-Large-memory-model-for-Propeller-assembly-language-programs!

My original intent was to keep the LMM kernel as small as possible, in order to keep most of the cog memory available for FCACHE .. which would allow demand loading code that loops (that fits in the FCACHE area) so that it would run at almost full cog speed. Later I played with the idea of a DCACHE area (for caching global data) but that did not really help, and an FLIB (demand loaded library) that did work well (for small libraries that contain code that loops - ie str*, mem*)

Beside ImageCraft C, Catalina and PropGCC, and your test compiler, Bean's PropBasic can also generate LMM code - and I am pretty sure there are some other compilers that target it for internal use and testing.

Why is Spin an Interpreted Language?

Comments