Several Newbie Questions

Chuck Davis · 2015-03-02 10:45

I'm new to the Propeller (although I tried it for a while many years ago). I was wondering if someone could answer some minor questions for me:

1) On most 32-bit processors, 32 bit operations are faster than byte or word operations. Is this true with the SPIN language?. ie. given that memory is not an issue, should I prefer to use long operands as much as possible?

2) Does the SPIN interpreter fit entirely into a cog, or does it use some sort of overlay scheme? If the latter, are there any performance implications we should know about?

3) Is there a rough number of some sort available for SPIN interpreter performance (instructions per second) ? Obviously this depends on complexity, just looking for a ball park... At what point do I have to resort to PASM?

4) Isn't it possible to have memory collisions, even when updating a single variable if two cogs are trying to access it? Examples seem to talk about multiple variables, but even and operation like X=X+1 should be a problem
if multiple COGS are sharing the variable.

5) Just a small gripe: Why in the world is the assignment statement := instead of just = ????. This is the source of dozens of recompiles, especially when moving back and forth between SPIN and C.

Love the system! Once I get used to it I'll probably love it more....

Chuck

Dave Hein · 2015-03-02 10:56

1) Byte, word and long operations all run at the same speed. The only speed difference would be when accessing the first 8 stack and VAR variables. There are compact instructions for longs for those cases.
2) Yes, the Spin interpreter fits entirely in a cog.
3) The Spin interpreter runs around 1 MIPS. You use PASM when Spin isn't fast enough.
4) Yes, an operation such as X:=X+1 could collide if two cogs are doing it. That why the Prop supports locks.
5) Spin uses := because that's what Chip wanted. There's no other reason. When going from C to Spin watch out for >= and <=. They do something different in Spin.

Heater. · 2015-03-02 11:29

Spin, being interpreted from bytecodes at run time, is so slow that I would not worry about the difference between 8, 16 or 32 bit access. If indeed there are any. Certainly a COG can read/write any of those at the same speed.

By the way, how is 8 bit access on a any 32 bit CPU slower? Any pointers to examples.

If these things are an issue for you application you are going to want to use PASM.

Yes, multiple COGs accessing the same data can be an issue. Locks can resolve this but you will find that most applications get away without that. If you have one writer and one reader you can get by without locks. The FullDuplexSerial object with it's input and output cyclic buffers is a classical example of that.

":=" comes from the ancient and long forgotten Pascal programming language. The language used to create the Propeller Tool. I guess it's what Chip likes to use.

Those other operators are some weird aberration nobody can explain:)

ksltd · 2015-03-02 12:52

1) No difference in performance of byte, word or long accesses via the Spin interpreter or the underlying assembly language instructions
2) Yes
3) No
4) Yes; use of locks is required for synchronization as the memory subsystem has no read-modify-write primitives
5) C has perhaps the worst syntax and most ambiguous semantics of any language, ever. The use of := for assignment predates "C" by many, many years. The use of the == operator and the errors that result when one uses = in value context are absurd shortcomings of both C and C++. The better question is why does Spin use == for its equality operator instead of simply =.

pjv · 2015-03-02 14:41

Dave Hein wrote: »

3) The Spin interpreter runs around 1 MIPS. You use PASM when Spin isn't fast enough.

Dave;

I'm not sure where these numbers come from.... my experience is that Spin instructions are much slower than that.... often in the 20 to 50 usec range.

For a Prop running at 80 MHZ, I measure times for the following code as :

     repeat................11 usec
        out ^= 1............8 usec
        out ^= 1............8 usec
        out ^= 1............8 usec
        out ^= 1............8 usec

So, a very simple instruction is around 10 usec; 10 times what you are saying.

What gives here ?

Cheers,

Peter (pjv)

Mike Green · 2015-03-02 15:06

@pjv,
The Spin interpretive codes take on the order of a couple of microseconds to execute. The Spin source examples you cited consist of several operations. It's like stating instruction execution times, but giving examples in C source code. You have to look at the compiler generated code.

Dave Hein · 2015-03-02 15:18

I thought about saying 500 KIPS instead of 1 MIPS, but I was being generous. I timed "i := 1" and it takes 228 cycles, or 2.85 usec. "i := 1" requires two bytecodes, one to load the value of 1 on the stack, and the second one to store the value to "i". So that's 1.425 usec per instruction, which works out to 700 KIPS. Some bytecodes take longer to execute, so 500 KIPS might be fair. REPEAT and ^= are extended instructions, so they will take a little longer. I don't think there are many instructions that take 20 to 50 usec. Even a multiply takes only 15 usec.

Chuck Davis · 2015-03-02 15:18

pjv wrote: »
Dave;

I'm not sure where these numbers come from.... my experience is that Spin instructions are much slower than that.... often in the 20 to 50 usec range.

For a Prop running at 80 MHZ, I measure times for the following code as :
     repeat................11 usec
        out ^= 1............8 usec
        out ^= 1............8 usec
        out ^= 1............8 usec
        out ^= 1............8 usec
So, a very simple instruction is around 10 usec; 10 times what you are saying.

What gives here ?

Cheers,

Peter (pjv)

I ran a dumb test

repeat 1_000_000

and got 16.3 seconds or about 61,000 instructions per second

other basics
Counter ++ = 40,000 instructions per sec
Counter := Counter +1 = 21,400 instructions per second (pays to use ++, I guess)
Counter := Counter * 3 = 10,700 instructions per second

Of course if you have 8 cogs going, you can multiply by 8, if you can divide your work up that way.

As far as byte vs. long, I read somewhere that (at least in C) the compiler has to convert the byte to a long, do the math, and convert it back to a byte. Thus the idea that it's faster to just use longs in the first place.

Just for fun, I ran the above test for Counter := Counter +1 with Counter as a byte instead of a long, and it only ran at 15,200 instructions per second, so apparently there is some overhead

pjv · 2015-03-02 15:19

Hi Mike;

Thanks for the comments.

I do understand that the SPIN interpreter executes instructions internally at the native speed of the processor.... 20 MHZ (at an 80 MHZ clock). And it takes several, perhaps numerous of those to effect a SPIN code statement. So anytime I thought of the speed of the SPIN "language", I was not thinking of the sub-statement level, as that is not terribly meaningful.... at least to me. I'm interested in how fast SPIN will execute the code I'm writing, and I presumed that was the nature of the Poster's question as he was considering using assembler.

Cheers,

Peter (pjv)

pjv · 2015-03-02 15:49

Hi Dave and Chuck;

Thanks for your input.

I really don't have a very good understanding of the complexities that SPIN deals with in executing byte codes. Or for that matter how many of those it takes to effect any particular SPIN "instruction", or what the meaning of an "extended instruction" is. So to me its all a matter of how fast I can wiggle the port bits, or how fast I can achieve a certain result.

So in repeating my small test and using a simple assignment statement insted of the previous XOR function, and got the following:

    repeat           6.8 usec
       outa := 1    4.6 usec
       outa := 0    4.6 usec
       outa := 1    4.6 usec
       outa := 0    4.6 usec

That certainly is a bunch faster. Still, when writing SPIN code, all I can do is allow 20-ish usec per relatively simple statement, and 100 or more for complex ones. In the end, I need a scope to really display what is really going on as I have no way to "cycle count", or "bytecode count" in SPIN. I'm just not at that level.

Cheers,

Peter (pjv)

DavidZemon · 2015-03-02 16:01

Welcome Chuck! I hope you're enjoying the Propeller as much as the rest of us!

If you're already familiar with C, you might feel more at home with PropGCC - the port of GCC 4.6 for the Propeller.

Heater. wrote: »

By the way, how is 8 bit access on a any 32 bit CPU slower? Any pointers to examples.

The only example I've seen where X-bit access is slower than 2X-bit access comes from an assignment such as "mySmallVar = myBigVar;" - the compiler will mask off the extra bits, requiring an extra instruction or two.

ksltd · 2015-03-02 16:17

The concept of MIPS makes no sense when talking about a p-code interpreter. But if you insist, the execution time of p-code ops is sufficiently varied that everyone can be right.

NWCCTV · 2015-03-02 16:59

SwimDude0614 wrote: »

If you're already familiar with C, you might feel more at home with PropGCC

QUOTE] Or maybe even SimpleIDE.

DavidZemon · 2015-03-02 18:07

Oh yes. I didn't mean to exclude SimpleIDE in that. In my head, "PropGCC" is an umbrella that encompasses SimpleIDE, manual invocation of propeller-elf-gcc, make files, cmake, propware, etc. Anything involving PropGCC in any way.

Several Newbie Questions

Comments