Re: Why no SPIN compile to assembly.

potatohead · 2008-01-04 02:14

Edit: @VIRAND --Thought it might work on it's own thread!

Assembly runs in the COG only, unless one runs a VM to interpret assembly instructions in the HUB memory. The SPIN interpreter fits into a COG, and it runs a SPIN program that exists in the HUB memory space.

So, SPIN, compiled to assembly, would be limited to small programs that fit into COGS.

Or...

More development would have to have happened to enable LMM (hub resident assembly code) code, run in a fashion similar to how SPIN code is currently executed. An assembly kernel, running on a COG, would handle executing the LMM code, just as the SPIN interpreter does for SPIN byte code.

This is also the same reason why there is no "in-line" assembly constructs possible within SPIN. There is no context for it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!

Post Edited (potatohead) : 1/4/2008 3:33:53 AM GMT

Paul Baker · 2008-01-04 02:15

Does this post belong in another thread, or are you spawning OT matter into it's own thread?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

potatohead · 2008-01-04 02:17

It was an attempt to answer a maybe out of scope question in the benchmarking thread. If it's not successful, it can be tossed!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!

hippy · 2008-01-04 03:36

The original question posed elsewhere ...

VIRAND said...
I just started wondering... Spin interpreter has no terminal console, so why is it even there?
Why doesn't PropIDE assemble Spin into PASM instead of interpreted bytecodes if it does the same thing much slower?

PASM doesn't have a terminal console either so why is PASM there ? In fact, why does the Propeller chip even exist ?

The answer is that neither PASM nor Spin need to have a terminal console to perform useful work and it is easy enough to add one if required ( it's usually easiest with Spin ).

The Propeller Tool could compile to PASM rather than bytecode but then it could only compile programs small enough to fit into a Cog's memory. By compiling to bytecode much larger programs can be created and executed.

The trade-off is; PASM - small programs but fast, Spin - larger programs but slower.

VIRAND · 2008-01-05 00:01

Sorry if I asked a pest question.

Spin implies to me that it uses cogs, when in the case of long programs it may not,
especially most programs I've written thus far. It seems to me that ...

Ok, I just realized something, if cogs have their own extra 2K memory apart from the 32K.

... it seemed to me that otherwise a long program that wouldn't fit in one cog would
shut down after starting up the next cog where the following code is, unless it must
loop back across the boundary of a cog. I guess it doesn't work that way at all.

Someone wrote a short 'trace program' (as I anachronismically call it) which runs in a
cog and pulls instructions out of hub ram and single step executes them, which is not
interpretation and should only be a little slower than normal PASM execution in a cog.
I don't know what the trace program was called except perhaps a virtual machine that
runs native code. Ahh, yes, it was called LARGE MEMORY MODEL (LMM).

Again sorry for upsetting with my question about SPIN. I use SPIN much more than PASM,
but just wondered why it interprets instead of assembles. I understand the answer given
is that it's more convenient for large programs.

Maybe Off-topic again, but related someone wrote an on-Propeller assembler here. I can't get the keyboard
to work, so I'll find that topic and ask how, expecting a problem with the different pins used
by my keyboard possibly being incompatible with their keyboard driver because they aren't adjacent.
Sorry for wasting space here thinking out loud about this. Propeller is awesome and I don't want
anyone mad at me for simple misunderstandings. One thing that's awesome is so much can be on
the chip, and perhaps the reason it isn't is because people got excited about that after it was done.

It's probably coincidence but I might have posted a couple years ago about the propeller that when
I heard how fast it was going to be, it should make it's own video. Nevertheless I was surprised to
find it has 8 cogs which each have that capability just before I started using Propellers. Peace and Joy!

hippy · 2008-01-05 00:37

VIRAND said...
Propeller is awesome and I don't want anyone mad at me for simple misunderstandings.

Do not worry about that, I am sure no one is mad at you.

Spin works in a similar manner to Large Memory Model. The small Spin Interpreter resides in the Cog ( like that 'trace program' ) and interprets the Spin Bytecode from outside. It is a very elegant solution but can suffer from limited execution speed.

deSilva · 2008-01-05 01:46

VIRAND - in fact your question is extremely clever. So this could well be a reason to become "mad"

The unusual architecture of the propeller can be looked upon in at least two ways:

(A) It is a computer with 8 very flexible, nano-programmable "functional units". A standard computer (van Neumann) contains dedicated functional units as: "control unit/instruction decoding", "I/O", "ALU", "floating point unit"; modern processors also many of of them so they can sometimes perform more than one instruction per clock ("superscalar"). From this point of view it is absolutely natural to define a (virtual) machine language that is executed by a control unit. This Nano Program is called SPIN-interpreter, but you can easily (as proven by Hippy and at least one FORTH author) write your own. SPIN does not depend on other functional units. It behaves as if all processing in a computer were perfomed by the "control unit" alone, which would be an extreme waste of resources!
An extended version of such an "interpreter" would of course delegate some time lengthy operation onto another COG. This is especially useful for complex I/O operations.

However scheduling of concurrent work is no easy task, neither in real processors nor in the Propeller Environment. So the "SPIN-way" is this: you must point the interpreter to such possibilities explicitely by COGNEW instructions

Putting all together: Something like SPIN - a virtual machine - is very natural for an architecture as the Propeller. I think Hippy's recent activities have shown that the power and beauty of the Propeller is hardly explored: Defining your own optimized VM for your specific application!

The concept behing the LMM model is to use a VM 90% similar to the "Nano Code" of the functional units (PASM). This concept profits from their generic speed in the case of that 90% match.

It is still unclear - and a very exciting question! - how large the average speed-up wrt SPIN (or another "interpreted" language as FORTH) will be. Anyhow you will pay a large penalty wrt memory, as a PASM instruction is rather redundant (this can easily be measured by "zipping" a binary file full of 32k machine code).

(B) A complementary (and more conventional) view shows the Propeller as 8 fully staffed microcomputers plus a large (as it were) fast (as it were) accessible secondary memory. This is the "non VM approach".

A "real program" must fit into one COG. Period. If not you have to use overlay techniques, which however becomes difficult when the overlay loading logic has to reside side by side with the overlays themselves in a limited memory. The second challenge is linear data memory access.
This could be done manually in machine language with support from a clever IDE - in fact deSilva is working on that from time to time

The memory penalty is the same as with LMM, the speed is also unclear as the wait for "overlay load operations" can add up to the same order of magnitude as the LMM overhead. There are more awkward constraints e.g. only tight loops will be possible...

As you see this is all very political, compromise upon compromise.
ImageCraft has decided - and most likely for good reasons - to use the LMM concept. After some provisos I think this is a wise move: they will profit directly from the enhancements of the Propeller II.

It is most important to be able to run programs from a rich library environment at competative speed on the propeller.

Post Edited (deSilva) : 1/5/2008 3:06:27 AM GMT

potatohead · 2008-01-05 03:44

No worries VIRAND. I just fired off the thread figuring the discussion would land here and all would be good. Was actually curious to see where it went, as I find this particular topic interesting.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!

ImageCraft · 2008-01-10 01:53

I am not a SPIN person (sorry, C and asm usually) so I may be wrong, but most interpreted languages can be compiled. For the Propeller, obviously you will need some sort of LMM kernel.

CardboardGuru · 2008-01-10 08:28

There's no theoretical reason why C can't be compiled. But the resulting code would be at least 4 times the size due to PASM instructions being 32 bits vs byte code for spin. And with prop programs often hitting the 32K ceiling, this is a significant issue.

The same goes for C vs spin. There's a speed/size trade off. C code will be faster than Spin. But you'll be able to squeeze more functionality into 32K with Spin than C.

Unless someone does a byte code version of C. Which might come close to the size/speed balance of Spin.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Help to build the Propeller wiki - propeller.wikispaces.com
Play Defender - Propeller version of the classic game
Prop Room Robotics - my web store for Roomba spare parts in the UK

hippy · 2008-01-10 14:15

It would be interesting to know what the code density of Spin bytecode is compared against the equivalent in PASM but I don't think that's going to be at all easy to assess.

Spin bytecode opcodes are extremely well optimised with most bytecodes being single or two byte instructions and most operations being reasonably complex. Even after taking the fact that Spin is a stack machine architecture out of the equation, PASM requires two longs to load a constant greater than $1FF and 'register indirect' requires a not inconsiderable sequence of PASM instructions. Something like 'a[noparse][[/noparse] b ] := true' would usually be three bytes of bytecode but I estimate around six PASM instructions / Cog memory cells, a ratio of 1:8. Some Spin constructs, mainly branches, give better ratios and can drop as low as 1:2, maybe lower. My feeling is that the average ratio is 1:6.

Compared to a traditional microprocessor architecture the Propeller could be said to be lacking; just 496 in-Cog instructions for maximum processing speed with a notable lack of register indirect instructions, 8K instructions using LMM at around four-times slower speed, 16K with a Thumb-style LMM and another reduction of speed, or 24K ( my guesstimate ) of Spin instructions at the slowest speed. That doesn't mean the Propeller is as lacking as it may sound in practice, and the Prop II will help resolve some issues. The unchanged Cog design will mean the Propeller will never be as fast as it could be with a larger direct executable memory, but that doesn't mean it's necessarily less suitable in the real world either.

Where any C ( or other ) compiler targets itself for generated code, and consequently its usefulness and value, remains to be seen. Some may be looking for compilers which can generate fastest in-Cog code to avoid coding in PASM, others may be looking for compilers which can generate large applications and are less interested in speed. A compiler which can do both is the ideal.

deSilva · 2008-01-10 18:30

hippy said...
It would be interesting to know what the code density of Spin bytecode is compared against the equivalent in PASM but I don't think that's going to be at all easy to assess.

It is extremely simple: I gave the receipe some time ago: Just ZIP it! But look out that you zip the code and not all padding in the binary file

..

said...
Some may be looking for compilers which can generate fastest in-Cog code to avoid coding in PASM, others may be looking for compilers which can generate large applications and are less interested in speed. A compiler which can do both is the ideal.

For me the most important thing with C is that you can finally tap the wealth of algorithms collected for 40 years now in C.
The second important thing is that you can finally use datastructures

hippy · 2008-01-10 20:06

True, a straight-forward size comparison gives the answer ( I'm not sure about the purpose behind the zipping you suggest ) but only when the Spin and PASM are equivalents, and then there's the debate about what exactly is equivalent. Something in-Cog may be short and sweet because it doesn't do all that Spin does, add on a stack to allow the same for example and ratio goes up. Same inherent problems as benchmarking each.

deSilva · 2008-01-11 23:35

LZW encoding is a very effective compression not only for text but also for machine code like files. For a larger file (>10k raw data) the quotient of compressed file / raw file will give you a quite accurate metric of the "information density". (1-compr/raw is the "redundancy")

A well written algorithm in a well compiled languge for a well designed machine architecture should have no redundancy at all

hippy · 2008-01-12 02:58

I still don't see the validity in practice.

If some PASM code happens to generate a lot of zero bits in each 32-bit instruction, that should get tremendously well compressed compared to the same PASM code which compresses worse simply because different register addresses were used which mashes up the 'lots of zero bits' advantage.

If two identical ( other than address location ) PASM programs don't give the same "information density" result then the whole theory seems to fall to pieces. It's measuring compressibility not anything else.

deSilva · 2008-01-12 03:17

Both of your quoted versions of assembly code should compress similarly.

But I see some pitfalls here. A well known means for "information hiding" is randomization. A compiler can easily and unknowingly generate "noise" by - e.g. - always selecting different registers for intermediate results rather than always using the same - free - register. This will lok like a more complex algorithm to a compressor. But I think such things will everage when the program consists of some thousend instructions...

Similar issues will occur with preset mass data... I would much like to know how the sine tables in ROM would compress; they obviously carry nearly no mentionable information at all...

VIRAND · 2008-01-12 20:42

Ok, I see... Unlike 8-bit systems, 32-bit instructions are inefficient. It's sort of like the "microcode" in CISC chips.
Since 32K was plenty for 8-bit machines, it might be a good project to make and use a 6502 emulator in PASM for example,
since 6502 had not too many instructions and incredible programs were written for it in 32K,
such as on Apple II, C64, Atari, etc.

In other words, coding in Assembler on 8 bit systems often made smaller faster programs
than in High Level Language, but compact code is Not a given advantage in this case of of PASM vs. Spin.

hippy · 2008-01-13 00:47

Yes, I'd say that was about right. Propeller 32-bit instructions are very efficient in what they do but not efficient in terms of memory use. The Spin Interpreter is an equivalent of a 6502 emulator written in PASM.

Re: Why no SPIN compile to assembly.

Comments