performance SPIN/ASM

mynet43 · 2007-09-30 18:26

Rarely have I seen so many negative comments come out of a simple speed test [noparse]:)[/noparse]

I think we all know the Propeller has limitations, they're well documented. But within those limitations, I feel it's a fantastic processor.

The languages are easy to learn and well documented.

Even spin is fast enough to do a respectable amount of work. I'm working on a rocket tracking program that samples two adc ports 128 times/sec and runs these through a Kalman filter to calculate altitude, velocity and acceleration, which it does very well, using the floating point routines. I was prepared to write these in assembly language, but it turns out that spin is fast enough and leaves enough extra time to log a considerable amount of data as well as detecting liftoff, apogee, fire stages at the right times and deploy the parachute.

Debugging is easy. With built in vga and tv display it's easy to track data flow, even in assembly language.

I'm now designing my third board for the Propeller. It's a 1" x 2" board, small enough for robotics, rockets or other control functions. On this little board, I have: prop plug conn, vga out, tv out, 3 voltage references, 4 servo ports, 8 adc ports, uSD socket, 2 power ports plus a number of I/O pins. I think I'd be hard pressed to do this with any other little processor.

It's really interesting to hear the opinions of the other people on this forum, many who are extremely smart and experienced. We should all be able to say what we think.

I prefer to think positive, enjoy the Propeller we have, and try to be patient (not easy) waiting for the next Propeller.

All food for thought.

Jim

deSilva · 2007-09-30 18:40

I shall stop my contributions to this thread with the following remarks to Bill:

Bill Henning said...

(A) SPIN is MUCH faster than Basic stamps
(B) Due to its capabilities, we keep forgetting... but the propeller is a MICROCONTROLLER
(C) Depends on what you are doing. For a controller, its ok. For a general purpose computer, you are right.

(ad A) Most microcontrollers are programmed with C, among the AVR community BASCOM is very popular. All giving blazing fast speed from a high level language with a 20 MHz processor. This is MY reference.

(ad

The notion of a microcontroller is changing. It were $1 PICs some time ago, and they do have their merrits and will live forever as the 8051 and 6805. But due to demand from $1 mp3 players and $1 cell phones the standards have grown..

(ad C) ATMEL has just increased the FLASH memory of their 8-bit AVR model from 128 to 256 kB . You need not only masses of code but also masses of tables. Don't say: "But you can add an SD card!" Of course I can. I can add it to any micro.

And eagerly looking forward to your LMM!

mirror · 2007-09-30 22:33

Here's what I'm doing with my "toy":

- Sample 24 analog channels at 1000 samples per second
- Sample 8 digital channels at 1000 samples per second - pulse width, period, number of pulses, relative phase of pulses
- Store ALL sampled information to SD card
- Communicate with host PC over Ethernet connection
- Handle bi-directional debugging port (RS232)
- Two extra RS232 telemetry ports (TX only)

These features are all contained in ONE toy!

Some of this has been hard to write, but once written, the primary benefit that I see in the propeller is that it is deterministic!!!

I don't get thge HUB bottleneck, communication from COG to Main memory is 1 longword every 16 clocks and 80 MHz, that's 20 million bytes per second for each cog! It's 60% faster than 100Mbps Ethernet!! That's fast enough for me.

Forget about making a super fast individual cog (large memory model stuff), it's not the point. Learn how to get those cogs to interoperate.

From my experience so far, to know Spin alone will leave you somewhat crippled when it comes to using the Propeller. You need Spin *and* assembly if you want to become a singing monk. (Singing 7 of course - because the cog ID's go from 0 to 7).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Sapieha · 2007-09-30 23:43

Hi mirror.

Only tables in my system have 32KB
SD is to slow for scan it on fly.
And Sytem program have RAM + ROM 48

Very optymized code and it is not how fast Prppeller is (it is Fast enough for my) but how much RAM it have

Ps. I cant have spin in my system(it is to slow) I must code my "interpreter" in ASM code.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.

Sapieha

Post Edited (Sapieha) : 9/30/2007 11:49:10 PM GMT

mirror · 2007-10-01 00:00

Sapieha,

I agree, the huge challange in my application has been the RAM.

I have about 15 FIFO buffers tying my system together. The biggest challange is to make all the FIFO buffers the right size. Some of them are relatively huge (1-2Kbytes), but others are quite small (16 bytes for RS232 port).

There are times that I've thought about splitting this task down the middle:
1) Input processing in one propeller
2) SD card, ethernet and RS232 comms in another propeller.

I'm using Rockiki's low level SD driver (at the moment), but I'm not using the FAT stuff - It is too slow. The SD card is not removable from my system, so doesn't need to be PC compatable.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Sapieha · 2007-10-01 00:13

Hi mirror.

In my system I have 400 engines 0,5 KW to control + all switches on all + analog sensors

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.

Sapieha

Post Edited (Sapieha) : 10/1/2007 1:11:13 AM GMT

hippy · 2007-10-01 01:42

Are we not just getting bogged down in, 'the Propeller does not suit my application' ?

If we want to contrast the Propeller with other options, want to identify any 'weaknesses' when set against others, it means knowing what can do the job, does suit the application. Then we can compare the two fairly; price, support chips necessary, cost of development tools and so on. To talk vaguely of the Propeller not being good enough isn't very meaningful nor useful.

A Propeller does what it can, and does not do what it cannot. The Propeller is a Propeller, not something else. I see no problem with that, as it is no different to every other device which exists.

It would be nice if our chosen tool did all the jobs we'd ever like it to but that would be a notable first. It doesn't seem very constructive to keep saying the Propeller isn't suited to everything; that's a given fact. So those are probably my final words on this topic.

OzStamp · 2007-10-01 02:36

Anybody that makes silly negative comments in the very forum run by people that have
developed a sensational product... is really just putting themselves in the gutter..

Not only that.. it can be commercially very damaging..
So next time "turn brain on first ... compile... recheck maybe.. type and run "

You do not have to be smart to realize that the Propeller is great
You just need to think slightly different.. (put a dif hat on as they say)
Some people are just stuck in that "Old fashioned .. I feel comfortable in that zone.. way"

Show me another $12.95 US( low volume price) dollar chip that has as much punch as the Propeller...

cheers
Ron OZ

Ken Peterson · 2007-10-01 03:05

Dedicated....

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

The more I know, the more I know I don't know.· Is this what they call Wisdom?

Sapieha · 2007-10-01 08:56

Hi OzStamp.

As all would be satisfied from all it have stop development.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.

Sapieha

OzStamp · 2007-10-01 09:17

Hi Sap..

I think I understand what your trying to say..

To suggest or have different thoughts about a certain product is fine.

But to be so negative and call a product a joke is really stupid.
Don't push me in the gutter ..

Thats's like going to the MicroChip website/Forum ·and openly post comments there
saying how pathetic their little PIC chips are..
Now that would open a bigger can of worms as many millions of people
use these little rippers...( 1 Propeller has eight baby Pics + more )
So what the hell is wrong with that..

So there is a place and a time...every body is entitled to voice their opinion
but try to be· thoughtfull of the people that have spend so much of their time and money
to come up with a truly remarkable product..

I voiced my concern re this pathetic comment as 3 people emailed me PM
and made me aware of the particular post.. so it not just me ..

Lets move on and be kind 2 each other.
Post some usefull stuff and stop nagging.. to all of us.. I have better things todo.
I look at this Forum many times a day ..time permitting as I enjoy going thru the posts

Take care Sapieha··· .. where are you located..?

Ronald· OZ

·

Sapieha · 2007-10-01 09:29

Hi OzStamp.

This is many (patchetic Chips).
And I found Propeller fine but only to small project.
And only problem I see in it is for litle RAM in one COG for more complex system.

" where are you located..? " Sweden

Ps. I never say "Joke"

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.

Sapieha

Post Edited (Sapieha) : 10/1/2007 9:45:41 AM GMT

hippy · 2007-10-01 13:34

Back to Assembler, Large Memory Model, and the 32K instruction limit ...

BradC said...
Oh yeah, I forgot about the word size of assembler.. yes, this would be cutting oneself off at the knees..

Thinking about it further, if one were prepared to sacrifice a Cog to pre-emptive paging, the notion of using external Eeprom may not be too bad.

There's more than one concept of Large Memory Model and my view has been of a simplistic paging one, allowing only JMP and CALL between pages. Minimise the 'kernel' and maximise the paging size. With the Hub split into pages it may well be possible to have the kernel simply check the right page is there before going ahead and loading. Only minimal overhead when it is. It stuffs my concept of a CALL storing the return address in Hub rather than Cog when doing an inter-page call but that could be overcome by using an inter-page call stack or, less efficiently by vectoring CALLs as jumps to CALLs with jumps back again in locked, non-swappable pages.

Bill is right ( above ), that it's hard to do any of this without a dedicated toolset for the job. An ideal tool would take a linear PASM program of any length, re-factor it and make it entirely LMM compliant behind the scenes. Short term reality is likely to be macro commands and assemblers tuned to LMM and those commands. I headed that way with my own Assembler but cannot even get it to boot my PASM code. With more experience under my belt I'll be going back to that.

It will be interesting to see how ImageCraft's C Compiler does its stuff.

Added :

Then there's the middle ground; a new language, compiler and Cog Interpreter which is well suited to that language and can deliver increased performance over Spin. Perhaps a better balance between blazingly fast PASM and somewhat slower Spin.

Post Edited (hippy) : 10/1/2007 1:43:09 PM GMT

BradC · 2007-10-01 16:27

hippy said...
Back to Assembler, Large Memory Model, and the 32K instruction limit ...

BradC said...
Oh yeah, I forgot about the word size of assembler.. yes, this would be cutting oneself off at the knees..

Thinking about it further, if one were prepared to sacrifice a Cog to pre-emptive paging, the notion of using external Eeprom may not be too bad.

Funny you should mention that.. last night I scratched up a basic concept for exactly that.. Thought process was to use the HUB ram more like an L2 Cache, with the COG ram an L1 cache
having all the program stored externally.. it's quite a neat idea, and with the right planning and memory technology could be an interesting _prop_osition

Ken Peterson · 2007-10-01 16:41

I've been thinking of putting tracks on my Saturn so I can excavate my back yard with it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

The more I know, the more I know I don't know.· Is this what they call Wisdom?

potatohead · 2007-10-01 16:58

(big grin here ken)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!

Fred Hawkins · 2007-10-01 17:40

mirror said...
I don't get the HUB bottleneck, communication from COG to Main memory is 1 longword every 16 clocks and 80 MHz, that's 20 million bytes per second for each cog! It's 60% faster than 100Mbps Ethernet!! That's fast enough for me.

Worth saying again.

The HUB bottleneck may exist only in programmer's minds after being dunned with warnings that the HUB can take 7..22 cycles. So my question of you, are there any tricks (interleaving, say) in getting this kind of throughput? And are there any good techniques for managing writes and reads
in a way that keeps the 16 cycle heartbeat ticking along?

potatohead · 2007-10-01 17:47

I put a coupla three instructions between HUB ops and it seems to run consistently. When doing video, HUB misses show up as sparkles on the screen. The higher pixel timings more or less highlight that condition straight away. I've gotten the higher timings with the above interleaves.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!

BradC · 2007-10-01 18:12

potatohead said...
I put a coupla three instructions between HUB ops and it seems to run consistently.

My calcs (and testing) show that three instructions between wr/rd to/from the hub should push the timing over the 16 cycle edge and cause each
hub op to be spaced at 32 cycle intervals rather than 16. By this I mean

wrlong
nop
nop
wrlong
nop
nop
wrlong

will be no faster than
wrlong
wrlong
wrlong

While
wrlong
nop
nop
nop
wrlong
nop
nop
nop
wrlong

Should in theory be twice as slow.

Am I misinterpreting what you said about interleaving instructions?

potatohead · 2007-10-01 18:20

No.

It's two, if I can get away with it. Two is the best. Sometimes I'll do three or more just because it's better to waste the time and be getting something done, than not, or order of ops forces the matter. All depends on what has to happen in the loop. If there is a branch, it gets more complex. I'll generally get the loop running, then crank the pixel clock, so that I can see the misses, then either cull instructions, combine them, move them, etc... until it runs nicely.

Another thing I do is get a loop running, then start adding nops until I see the miss, so I've an idea how close it's running to the edge.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!

BradC · 2007-10-01 18:30

Yeah, by my math, it's either 2 or 6. If you can't fit it in 2, you may as well jam 6 in there as the system will sit and wait for 25 cycles anyway.
<notes that down for future reference>

I agree with the nop padding.. I do that too, but if you are doing more than 3 nops, then a mov x, cnt | waitcnt x, y can give you much finer accounting.
I can increase it cycle at a time until it either misbehaves (or as in most of my loops) misses the waitcnt clock and locks up for ~1 minute.

Nice to know how much breathing room you have available [noparse]:)[/noparse]

potatohead · 2007-10-01 18:37

"3 nops, then a mov x, cnt | waitcnt x, y can give you much finer accounting."

Nice!

Added to playbook.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!

BradC · 2007-10-01 18:52

oops.. there is an add in there.. if you don't add at least #5 it'll die

mov x, cnt
add x, #5
waitcnt x, y

But yeah.. I live by that one.. [noparse]:)[/noparse]

deSilva · 2007-10-01 19:41

I wonder why you don't COUNT the ticks? There is really no need for ANY speculation

This is one oft he best features of the Prop!

mirror · 2007-10-01 21:46

When I'm designing tight assembler loops with a number of hub operations, I first write the code so that it's logically correct, and then I reorganise it to interleave 2 instructions between every hub operation.

I don't see ANY reason whatsoever to use "NOP"s. You gain absolutely nothing, and it costs 2 assembler instructions in the process. To re-iterate:

wrlong
wrlong

will run just as fast / slow as

wrlong
nop
nop
wrlong

The real benefit is when you do something like:

wrlong
instr a
instr b
wrlong

then you get to do some useful work with that otherwise idle time.

There are valid places to use NOP instructions to solve timing issues, but "synchronising" hub instructions is NOT one of those cases. I've yet to use a NOP anywhere.

deSilva · 2007-10-01 22:55

There was an example the other day:
Polling asynchroniously 32 bit words from a VERY fast ADC, around 6 MHz
You had around 3 instructions, but not exactly; you were "a little bit" to fast @ 80 MHz
We unrolled the loop of 3x32 instructions and identified two or three places where to insert a NOP to delay the polling...

It is a very instructive thread with a brilliant analysis by deSilva

http://forums.parallax.com/showthread.php?p=671376

Post Edited (deSilva) : 10/1/2007 11:07:31 PM GMT

Fred Hawkins · 2007-10-01 23:20

Considering: on one hand, deSilva head, on the other, Mike Green head after his coronation* thread...

And the winner is:

*albeit unasked for

Seriously, thanks to all, even deS, for the WRLONG/RDLONG timings.

BradC · 2007-10-02 03:22

mirror said...
When I'm designing tight assembler loops with a number of hub operations, I first write the code so that it's logically correct, and then I reorganise it to interleave 2 instructions between every hub operation.

I don't see ANY reason whatsoever to use "NOP"s. You gain absolutely nothing, and it costs 2 assembler instructions in the process. To re-iterate:

No, I think perhaps I was not as clear as I may have been. I don't use nops, they were just there as examples as to how you interleave instructions
without slowing the code execution down.

I could also have used mov, shl, or any other instruction. nop was just easier to type [noparse]:)[/noparse]

mirror · 2007-10-02 04:14

BradC said...

mirror said...
When I'm designing tight assembler loops with a number of hub operations, I first write the code so that it's logically correct, and then I reorganise it to interleave 2 instructions between every hub operation.

I don't see ANY reason whatsoever to use "NOP"s. You gain absolutely nothing, and it costs 2 assembler instructions in the process. To re-iterate:

No, I think perhaps I was not as clear as I may have been. I don't use nops, they were just there as examples as to how you interleave instructions
without slowing the code execution down.

I could also have used mov, shl, or any other instruction. nop was just easier to type [noparse]:)[/noparse]

That's OK. I guess I don't want to see new programmers inserting NOP's because they think they're needed to make it work right - there are other processors that also·need instructions to keep the instruction pipe working correctly. Obviously those instructions are not needed for correct operation.

In only 1 case is an extra instruction neccessary for correct operation, if you're using self modifying code to change an instruction that is to be executed, then you need an extra cycle of settling before using that instruction.

deSilva · 2007-10-02 06:04

If in doubt, consult deSilva's Tutorial, especially Sidetrack F

performance SPIN/ASM

Comments