Spin vs. C Speed

Sal Ammoniac · 2016-08-15 20:19

Is code written in C any faster than code written in Spin on the Propeller?

I've written an SPI master driver in Spin and the maximum transfer rate I can get is 14K bps. If I recode this in C can I get faster rates?

(I know I can get faster rates in PASM, but all I'm looking for here is a quick and dirty SPI routine to test some slave code that I've written for another platform.)

kwinn · 2016-08-15 20:24

Sal Ammoniac wrote: »

Is code written in C any faster than code written in Spin on the Propeller?

I've written an SPI master driver in Spin and the maximum transfer rate I can get is 14K bps. If I recode this in C can I get faster rates?

(I know I can get faster rates in PASM, but all I'm looking for here is a quick and dirty SPI routine to test some slave code that I've written for another platform.)

C should be faster.

David Betz · 2016-08-15 20:28

Sal Ammoniac wrote: »

Is code written in C any faster than code written in Spin on the Propeller?

I've written an SPI master driver in Spin and the maximum transfer rate I can get is 14K bps. If I recode this in C can I get faster rates?

(I know I can get faster rates in PASM, but all I'm looking for here is a quick and dirty SPI routine to test some slave code that I've written for another platform.)

C using the COG memory model will be much faster than Spin but you can only build very small programs since they have to fit in 512 longs of COG memory.

C using the LMM memory model will also be faster than Spin but you won't be able to fit as much code in 32k of hub memory since LMM code is less dense than Spin bytecodes.

C using the CMM memory model will be about the same speed as Spin and use about the same amount of memory.

Peter Jakacki · 2016-08-15 22:49

For any Q&D tests just use Tachyon Forth, the SPI speed is 4Mhz and it is all high level interactive.

To setup SPI pins with SCK on 0, MOSI on 1, MISO, on 2, SS on 3 just type:
&03.02.01.00 SPIPINS

To write 8 bits at a time, say $1A type:
$1A SPIWRB
and to release SS which is automatically enabled on any transfer type SPICE

A snippet of code that writes an SPI long to the WIZnet W5500 chip:

pub L! ( long wizadr -- )
	SPIWR16 DROP						--- write address
 	wrc SPIWRB DROP
 	SPIWR SPIWR SPIWR SPIWR DROP				--- write 4 bytes (long)
 	SPICE
	;

DavidZemon · 2016-08-15 23:25

Sal Ammoniac wrote: »

Is code written in C any faster than code written in Spin on the Propeller?

I've written an SPI master driver in Spin and the maximum transfer rate I can get is 14K bps. If I recode this in C can I get faster rates?

(I know I can get faster rates in PASM, but all I'm looking for here is a quick and dirty SPI routine to test some slave code that I've written for another platform.)

I like kwinn's answer, short and sweet. All of the different models available via PropGCC are faster than Spin.

However, since you're just looking to test some slave code on another chip, you might be better off switching libraries within your current language rather than switching languages (and therefore toolchains). This post mentions that the Propeller Tool ships with an assembly version of the SPI library, meaning you wouldn't have to code it up, so it should be very easy to test with. Look for SPI_ASM.spin.

And finally, if you're going to jump to PropGCC, I PropWare's SPI library will run 4 MHz in CMM mode. Example of writing to another microcontroller and expecting an echo response would look like this.

David Betz · 2016-08-16 00:56

DavidZemon wrote: »

I like kwinn's answer, short and sweet. All of the different models available via PropGCC are faster than Spin.

I'm not sure this is entirely accurate. I've seen examples posted showing Spin being faster than CMM in at least a few cases. However, it is usually at least a little faster.

Heater. · 2016-08-16 08:37

Spin: Very slow.

C: Faster but too big.

C with CMM: Perhaps faster and smaller. Who knows actually.

Forth: SPIWR SPIWR SPIWR SPIWR DROP ......

So PASM it is.

David Betz · 2016-08-16 09:02

Heater. wrote: »

Spin: Very slow.

C: Faster but too big.

C with CMM: Perhaps faster and smaller. Who knows actually.

Forth: SPIWR SPIWR SPIWR SPIWR DROP ......

So PASM it is.

However, PASM is also too big. What then? Well, I guess the answer has to be a combination of more than one of the above. Spin+PASM has certainly been used effectively in many projects and CMM+PASM could probably be used effectively as well. I believe it is possible to combine assembly with Tachyon as well although it is probably not necessary there as often since Tachyon itself is quite fast compared with Spin or C unless you use the COG memory model with C. Anyway, I realize that you weren't completely serious in your post. I agree that there is no clear choice if you want to use a single language in your program.

Peter Jakacki · 2016-08-16 09:43

Tachyon: timing how long it takes to erase 1,024 longs with SPI.

0 1024 LAP FOR SPIWR SPIWR SPIWR SPIWR NEXT LAP .LAP 11.061ms ok

or 2.7us/byte which at this speed is not much different from PASM.

David Betz · 2016-08-16 09:50

Peter Jakacki wrote: »
Tachyon: timing how long it takes to erase 1,024 longs with SPI.
0 1024 LAP FOR SPIWR SPIWR SPIWR SPIWR NEXT LAP .LAP 11.061ms ok
or 2.7us/byte which at this speed is not much different from PASM.

There is no doubt that that is both fast and powerful. Unfortunately, it is also opaque to many people who haven't yet "drunk the Kool-Aid". :-)

David Betz · 2016-08-16 10:25

Peter Jakacki wrote: »
Tachyon: timing how long it takes to erase 1,024 longs with SPI.
0 1024 LAP FOR SPIWR SPIWR SPIWR SPIWR NEXT LAP .LAP 11.061ms ok
or 2.7us/byte which at this speed is not much different from PASM.

What do LAP and .LAP do? I can't find them in the Tachyon glossary.

Edit: Never mind. I found them on a second attempt. I have no idea why my first search failed. Sorry for the false alarm!

Peter Jakacki · 2016-08-16 11:27

David Betz wrote: »
Peter Jakacki wrote: »
Tachyon: timing how long it takes to erase 1,024 longs with SPI.
0 1024 LAP FOR SPIWR SPIWR SPIWR SPIWR NEXT LAP .LAP 11.061ms ok
or 2.7us/byte which at this speed is not much different from PASM.
There is no doubt that that is both fast and powerful. Unfortunately, it is also opaque to many people who haven't yet "drunk the Kool-Aid". :-)

Yes, once indoctrinated, most people find it difficult to accept new ideas or different ways of doing things. Imagine if people were indoctrinated in "Forth" first!!

David Betz · 2016-08-16 12:09

Peter Jakacki wrote: »
David Betz wrote: »
Peter Jakacki wrote: »
Tachyon: timing how long it takes to erase 1,024 longs with SPI.
0 1024 LAP FOR SPIWR SPIWR SPIWR SPIWR NEXT LAP .LAP 11.061ms ok
or 2.7us/byte which at this speed is not much different from PASM.
There is no doubt that that is both fast and powerful. Unfortunately, it is also opaque to many people who haven't yet "drunk the Kool-Aid". :-)
Yes, once indoctrinated, most people find it difficult to accept new ideas or different ways of doing things. Imagine if people were indoctrinated in "Forth" first!!

That would certainly help. Also, I would think the power of Tachyon on the Propeller might encourage some people to give it a try even though it might be a steep learning curve. In fact, I guess it has. There seems to be a lively Tachyon discussion here on the forums. I guess you did something right! :-)

Dave Hein · 2016-08-16 12:49

Has anybody ever written a C to Forth converter? Then we could get the best of both worlds -- a high level language combined with the efficient Tachyon kernel.

David Betz · 2016-08-16 12:55

Dave Hein wrote: »

Has anybody ever written a C to Forth converter? Then we could get the best of both worlds -- a high level language combined with the efficient Tachyon kernel.

I suspect that once you implement the C calling convention you will have lost most of the efficiency of Tachyon. A lot of its efficiency seems to be due to very clever programming that avoids lots of stack operations.

Dave Hein · 2016-08-16 13:11

Yes, that would be a problem with converting C to Forth. However, it should be possible to develop a HLL that efficiently translates into Forth. But the compiler would need to be able to translate HLL statements into code that avoided growing or shrinking the stack size. It does seem like it would be difficult to write a compiler that could mimic a clever human programmer.

David Betz · 2016-08-16 13:14

Dave Hein wrote: »

Yes, that would be a problem with converting C to Forth. However, it should be possible to develop a HLL that efficiently translates into Forth. But the compiler would need to be able to translate HLL statements into code that avoided growing or shrinking the stack size. It does seem like it would be difficult to write a compiler that could mimic a clever human programmer.

It's certainly beyond my ability.

ersmith · 2016-08-16 13:45

The answer depends on "which Spin" and "which C"? Traditionally Spin is implemented via the Spin bytecode interpreter in the ROM, which is indeed quite a bit slower than the LMM code that most C compilers for the Propeller produce. However, the fastspin compiler is a different implementation of the Spin language which compiles to PASM (either COG or LMM mode) and is generally somewhere in between Catalina and PropGCC in terms of speed (although results vary -- for some cases it can outperform all the C compilers, in other cases it may underperform all the C compilers).

ersmith · 2016-08-16 13:52

David Betz wrote: »

C using the CMM memory model will be about the same speed as Spin and use about the same amount of memory.

My experience is that CMM code is about twice as fast as Spin but about 10-20% larger. This is a generalization, of course, individual programs will differ, but I've run a lot of benchmarks and the result holds in most cases (for CMM programs compiled with -Os, that is). CMM also offers you the choice of compiling with FCACHE mode (-O2) and depending on the program that can provide a huge speedup.

MJB · 2016-08-16 14:01

David Betz wrote: »
Peter Jakacki wrote: »
Tachyon: timing how long it takes to erase 1,024 longs with SPI.
0 1024 LAP FOR SPIWR SPIWR SPIWR SPIWR NEXT LAP .LAP 11.061ms ok
or 2.7us/byte which at this speed is not much different from PASM.
What do LAP and .LAP do? I can't find them in the Tachyon glossary.

Edit: Never mind. I found them on a second attempt. I have no idea why my first search failed. Sorry for the false alarm!

and if it is not in the glossary, just search the source - that's what I do anyhow ;-)

ersmith · 2016-08-16 14:18

Since the question is about SPI speed, here are the results from the SPI benchmark on the Multi-Language Benchmark thread. I've used a recent version of GCC, so performance is better than on the original thread. This is time to send 512 bytes out an SPI wire with no throttling at all.

GCC COG:        114692 cycles
fastspin COG:   139348 cycles
GCC LMM:        229680 cycles
fastspin LMM:   721136 cycles
GCC CMM:       4325824 cycles
openspin:     11732816 cycles

GCC LMM benefits here from FCACHE, which fastspin hasn't implemented yet.

D.P · 2016-08-16 16:06

Funny, Peter presents the answer but it's in the wrong flavor of kool-aid.

The problem could have already been solved and the OP could have moved on.

So heater is right, just a few { ** * -> & } and the problem will magically be solved, sheesh.

David Betz · 2016-08-16 16:26

D.P wrote: »

Funny, Peter presents the answer but it's in the wrong flavor of kool-aid.

The problem could have already been solved and the OP could have moved on.

So heater is right, just a few { ** * -> & } and the problem will magically be solved, sheesh.

But the OP asked about Spin and C, not Forth. Not that there is anything wrong with Forth. It's just that it is off topic for this thread.

David Betz · 2016-08-16 16:28

ersmith wrote: »

David Betz wrote: »

C using the CMM memory model will be about the same speed as Spin and use about the same amount of memory.

My experience is that CMM code is about twice as fast as Spin but about 10-20% larger. This is a generalization, of course, individual programs will differ, but I've run a lot of benchmarks and the result holds in most cases (for CMM programs compiled with -Os, that is). CMM also offers you the choice of compiling with FCACHE mode (-O2) and depending on the program that can provide a huge speedup.

Yes, that's my experience as well. I said "about the same speed" since I remember someone posting code that was actually faster in Spin. Of course, I can't recall the thread so maybe I should just keep my mouth shut! :-)

Sal Ammoniac · 2016-08-16 16:41

I'll give GCC a try tonight and see what kind of improvement I get. I'm just using the Propeller for some quick and dirty testing as it won't be able to generate an SPI stream fast enough to test the max speed the slave can do even if I write it in PASM (the slave is implemented on an FPGA and if the timing holds it looks like it'll be able to do about 200 Mbps.)

As for Forth--I'm not a Kool-Aid drinker (as someone else on this thread so eloquently put it.) I've tried to like Forth over the years but have never been able to get my head around it. One of my best friends calls it "that hippy language".

JasonDorie · 2016-08-16 16:51

In my own experience, CMM-mode C/C++ has been as much as 2x faster than Spin. It benefits from both the cache and an optimizer, which original Spin does not have.

ersmith · 2016-08-16 20:40

Sal Ammoniac wrote: »

I'll give GCC a try tonight and see what kind of improvement I get. I'm just using the Propeller for some quick and dirty testing as it won't be able to generate an SPI stream fast enough to test the max speed the slave can do even if I write it in PASM (the slave is implemented on an FPGA and if the timing holds it looks like it'll be able to do about 200 Mbps.)

If your code is already written in Spin, why not give fastspin a try? fastspin will take your existing code and compile it to LMM that's way faster than the Spin interpreter.

If you're already a C programmer and prefer to use that, you can use spin2cpp (which is very closely related to fastspin) to convert your Spin code to C as a starting point.

ersmith · 2016-08-16 20:45

Here's an example of how fastspin converts SPIN to PASM (and ultimately to binary -- you don't have to look at the PASM at all, it's all automatic, but it's there if you want to see what the compiler did).

Original SPI code:

con
  D0 = 0
  CLK = 1

pub spiout(value)
 value ><= 8            'MSB first
 repeat 8               '8 bits
   outa[DO] := value
   outa[CLK] := 1
   value >>= 1
   outa[CLK] := 0

Compiled PASM equivalent:

_spiout
        mov     _spiout__mask_0001, #1
        rev     arg1, #24
        mov     _spiout__idx__0002, #8
L__0021
        shr     arg1, #1 wc
        muxc    OUTA, _spiout__mask_0001
        or      OUTA, #2
        andn    OUTA, #2
        djnz    _spiout__idx__0002, #L__0021
_spiout_ret
        ret

Sal Ammoniac · 2016-08-17 03:28

Here are some results of a simple SPI master coded in C and built using SimpleIDE v1.0.2 (RC2) with optimization set to "-O2 Speed":

CMM: 1.33 Mbps
LMM: 1.33 Mbps
COG: 1.43 Mbps

I'm surprised to see no difference between the CMM and LMM results and am also surprised to see the COG model results so slow.

These results are much better than the 14 Kbps I was getting with Spin, but still disappointing.

Electrodude · 2016-08-17 03:33

There's probably no difference between CMM and LMM because they're both loaded into fcache only once, and then they run at full native speed out of fcache after that.

DavidZemon · 2016-08-17 03:58

You can get up to 4 Mbps (burst, not sustained) with PropWare. I haven't measured sustained and am not in a position to do so this instant. Do let me know if you try though, otherwise I'll try to give it a go tomorrow.

Spin vs. C Speed

Comments