Shop OBEX P1 Docs P2 Docs Learn Events
performance SPIN/ASM - Page 2 — Parallax Forums

performance SPIN/ASM

2

Comments

  • mynet43mynet43 Posts: 644
    edited 2007-09-30 18:26
    Rarely have I seen so many negative comments come out of a simple speed test [noparse]:)[/noparse]

    I think we all know the Propeller has limitations, they're well documented. But within those limitations, I feel it's a fantastic processor.

    The languages are easy to learn and well documented.

    Even spin is fast enough to do a respectable amount of work. I'm working on a rocket tracking program that samples two adc ports 128 times/sec and runs these through a Kalman filter to calculate altitude, velocity and acceleration, which it does very well, using the floating point routines. I was prepared to write these in assembly language, but it turns out that spin is fast enough and leaves enough extra time to log a considerable amount of data as well as detecting liftoff, apogee, fire stages at the right times and deploy the parachute.

    Debugging is easy. With built in vga and tv display it's easy to track data flow, even in assembly language.

    I'm now designing my third board for the Propeller. It's a 1" x 2" board, small enough for robotics, rockets or other control functions. On this little board, I have: prop plug conn, vga out, tv out, 3 voltage references, 4 servo ports, 8 adc ports, uSD socket, 2 power ports plus a number of I/O pins. I think I'd be hard pressed to do this with any other little processor.

    It's really interesting to hear the opinions of the other people on this forum, many who are extremely smart and experienced. We should all be able to say what we think.

    I prefer to think positive, enjoy the Propeller we have, and try to be patient (not easy) waiting for the next Propeller.

    All food for thought.

    Jim
  • deSilvadeSilva Posts: 2,967
    edited 2007-09-30 18:40
    I shall stop my contributions to this thread with the following remarks to Bill:
    Bill Henning said...

    (A) SPIN is MUCH faster than Basic stamps
    (B) Due to its capabilities, we keep forgetting... but the propeller is a MICROCONTROLLER
    (C) Depends on what you are doing. For a controller, its ok. For a general purpose computer, you are right.

    (ad A) Most microcontrollers are programmed with C, among the AVR community BASCOM is very popular. All giving blazing fast speed from a high level language with a 20 MHz processor. This is MY reference.

    (ad B) The notion of a microcontroller is changing. It were $1 PICs some time ago, and they do have their merrits and will live forever as the 8051 and 6805. But due to demand from $1 mp3 players and $1 cell phones the standards have grown..

    (ad C) ATMEL has just increased the FLASH memory of their 8-bit AVR model from 128 to 256 kB . You need not only masses of code but also masses of tables. Don't say: "But you can add an SD card!" Of course I can. I can add it to any micro.


    And eagerly looking forward to your LMM!
  • mirrormirror Posts: 322
    edited 2007-09-30 22:33
    Here's what I'm doing with my "toy":

    - Sample 24 analog channels at 1000 samples per second
    - Sample 8 digital channels at 1000 samples per second - pulse width, period, number of pulses, relative phase of pulses
    - Store ALL sampled information to SD card
    - Communicate with host PC over Ethernet connection
    - Handle bi-directional debugging port (RS232)
    - Two extra RS232 telemetry ports (TX only)

    These features are all contained in ONE toy!

    Some of this has been hard to write, but once written, the primary benefit that I see in the propeller is that it is deterministic!!!

    I don't get thge HUB bottleneck, communication from COG to Main memory is 1 longword every 16 clocks and 80 MHz, that's 20 million bytes per second for each cog! It's 60% faster than 100Mbps Ethernet!! That's fast enough for me.

    Forget about making a super fast individual cog (large memory model stuff), it's not the point. Learn how to get those cogs to interoperate.

    From my experience so far, to know Spin alone will leave you somewhat crippled when it comes to using the Propeller. You need Spin *and* assembly if you want to become a singing monk. (Singing 7 of course - because the cog ID's go from 0 to 7).

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
  • SapiehaSapieha Posts: 2,964
    edited 2007-09-30 23:43
    Hi mirror.

    Only tables in my system have 32KB
    SD is to slow for scan it on fly.
    And Sytem program have RAM + ROM 48

    Very optymized code and it is not how fast Prppeller is (it is Fast enough for my) but how much RAM it have

    Ps. I cant have spin in my system(it is to slow) I must code my "interpreter" in ASM code.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Nothing is impossible, there are only different degrees of difficulty.

    Sapieha

    Post Edited (Sapieha) : 9/30/2007 11:49:10 PM GMT
  • mirrormirror Posts: 322
    edited 2007-10-01 00:00
    Sapieha,

    I agree, the huge challange in my application has been the RAM.

    I have about 15 FIFO buffers tying my system together. The biggest challange is to make all the FIFO buffers the right size. Some of them are relatively huge (1-2Kbytes), but others are quite small (16 bytes for RS232 port).

    There are times that I've thought about splitting this task down the middle:
    1) Input processing in one propeller
    2) SD card, ethernet and RS232 comms in another propeller.

    I'm using Rockiki's low level SD driver (at the moment), but I'm not using the FAT stuff - It is too slow. The SD card is not removable from my system, so doesn't need to be PC compatable.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
  • SapiehaSapieha Posts: 2,964
    edited 2007-10-01 00:13
    Hi mirror.

    In my system I have 400 engines 0,5 KW to control + all switches on all + analog sensors

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Nothing is impossible, there are only different degrees of difficulty.

    Sapieha

    Post Edited (Sapieha) : 10/1/2007 1:11:13 AM GMT
  • hippyhippy Posts: 1,981
    edited 2007-10-01 01:42
    Are we not just getting bogged down in, 'the Propeller does not suit my application' ?

    If we want to contrast the Propeller with other options, want to identify any 'weaknesses' when set against others, it means knowing what can do the job, does suit the application. Then we can compare the two fairly; price, support chips necessary, cost of development tools and so on. To talk vaguely of the Propeller not being good enough isn't very meaningful nor useful.

    A Propeller does what it can, and does not do what it cannot. The Propeller is a Propeller, not something else. I see no problem with that, as it is no different to every other device which exists.

    It would be nice if our chosen tool did all the jobs we'd ever like it to but that would be a notable first. It doesn't seem very constructive to keep saying the Propeller isn't suited to everything; that's a given fact. So those are probably my final words on this topic.
  • OzStampOzStamp Posts: 377
    edited 2007-10-01 02:36
    Anybody that makes silly negative comments in the very forum run by people that have
    developed a sensational product... is really just putting themselves in the gutter..

    Not only that.. it can be commercially very damaging..
    So next time "turn brain on first ... compile... recheck maybe.. type and run "

    You do not have to be smart to realize that the Propeller is great
    You just need to think slightly different.. (put a dif hat on as they say)
    Some people are just stuck in that "Old fashioned .. I feel comfortable in that zone.. way"

    Show me another $12.95 US( low volume price) dollar chip that has as much punch as the Propeller...

    cheers
    Ron OZ
  • Ken PetersonKen Peterson Posts: 806
    edited 2007-10-01 03:05
    Dedicated.... smile.gif

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    The more I know, the more I know I don't know.· Is this what they call Wisdom?
  • SapiehaSapieha Posts: 2,964
    edited 2007-10-01 08:56
    Hi OzStamp.

    As all would be satisfied from all it have stop development.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Nothing is impossible, there are only different degrees of difficulty.

    Sapieha
  • OzStampOzStamp Posts: 377
    edited 2007-10-01 09:17
    Hi Sap..

    I think I understand what your trying to say..

    To suggest or have different thoughts about a certain product is fine.

    But to be so negative and call a product a joke is really stupid.
    Don't push me in the gutter ..

    Thats's like going to the MicroChip website/Forum ·and openly post comments there
    saying how pathetic their little PIC chips are..
    Now that would open a bigger can of worms as many millions of people
    use these little rippers...( 1 Propeller has eight baby Pics + more )
    So what the hell is wrong with that..

    So there is a place and a time...every body is entitled to voice their opinion
    but try to be· thoughtfull of the people that have spend so much of their time and money
    to come up with a truly remarkable product..

    I voiced my concern re this pathetic comment as 3 people emailed me PM
    and made me aware of the particular post.. so it not just me ..

    Lets move on and be kind 2 each other.
    Post some usefull stuff and stop nagging.. to all of us.. I have better things todo.
    I look at this Forum many times a day ..time permitting as I enjoy going thru the posts

    Take care Sapieha··· .. where are you located..?

    Ronald· OZ









    ·
  • SapiehaSapieha Posts: 2,964
    edited 2007-10-01 09:29
    Hi OzStamp.

    This is many (patchetic Chips).
    And I found Propeller fine but only to small project.
    And only problem I see in it is for litle RAM in one COG for more complex system.

    " where are you located..? " Sweden

    Ps. I never say "Joke"

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Nothing is impossible, there are only different degrees of difficulty.

    Sapieha

    Post Edited (Sapieha) : 10/1/2007 9:45:41 AM GMT
  • hippyhippy Posts: 1,981
    edited 2007-10-01 13:34
    Back to Assembler, Large Memory Model, and the 32K instruction limit ...
    BradC said...
    Oh yeah, I forgot about the word size of assembler.. yes, this would be cutting oneself off at the knees..

    Thinking about it further, if one were prepared to sacrifice a Cog to pre-emptive paging, the notion of using external Eeprom may not be too bad.

    There's more than one concept of Large Memory Model and my view has been of a simplistic paging one, allowing only JMP and CALL between pages. Minimise the 'kernel' and maximise the paging size. With the Hub split into pages it may well be possible to have the kernel simply check the right page is there before going ahead and loading. Only minimal overhead when it is. It stuffs my concept of a CALL storing the return address in Hub rather than Cog when doing an inter-page call but that could be overcome by using an inter-page call stack or, less efficiently by vectoring CALLs as jumps to CALLs with jumps back again in locked, non-swappable pages.

    Bill is right ( above ), that it's hard to do any of this without a dedicated toolset for the job. An ideal tool would take a linear PASM program of any length, re-factor it and make it entirely LMM compliant behind the scenes. Short term reality is likely to be macro commands and assemblers tuned to LMM and those commands. I headed that way with my own Assembler but cannot even get it to boot my PASM code. With more experience under my belt I'll be going back to that.

    It will be interesting to see how ImageCraft's C Compiler does its stuff.

    Added :

    Then there's the middle ground; a new language, compiler and Cog Interpreter which is well suited to that language and can deliver increased performance over Spin. Perhaps a better balance between blazingly fast PASM and somewhat slower Spin.

    Post Edited (hippy) : 10/1/2007 1:43:09 PM GMT
  • BradCBradC Posts: 2,601
    edited 2007-10-01 16:27
    hippy said...
    Back to Assembler, Large Memory Model, and the 32K instruction limit ...
    BradC said...
    Oh yeah, I forgot about the word size of assembler.. yes, this would be cutting oneself off at the knees..

    Thinking about it further, if one were prepared to sacrifice a Cog to pre-emptive paging, the notion of using external Eeprom may not be too bad.

    Funny you should mention that.. last night I scratched up a basic concept for exactly that.. Thought process was to use the HUB ram more like an L2 Cache, with the COG ram an L1 cache
    having all the program stored externally.. it's quite a neat idea, and with the right planning and memory technology could be an interesting _prop_osition
  • Ken PetersonKen Peterson Posts: 806
    edited 2007-10-01 16:41
    I've been thinking of putting tracks on my Saturn so I can excavate my back yard with it. smile.gif

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    The more I know, the more I know I don't know.· Is this what they call Wisdom?
  • potatoheadpotatohead Posts: 10,261
    edited 2007-10-01 16:58
    (big grin here ken)

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
  • Fred HawkinsFred Hawkins Posts: 997
    edited 2007-10-01 17:40
    mirror said...
    I don't get the HUB bottleneck, communication from COG to Main memory is 1 longword every 16 clocks and 80 MHz, that's 20 million bytes per second for each cog! It's 60% faster than 100Mbps Ethernet!! That's fast enough for me.
    Worth saying again.

    The HUB bottleneck may exist only in programmer's minds after being dunned with warnings that the HUB can take 7..22 cycles. So my question of you, are there any tricks (interleaving, say) in getting this kind of throughput? And are there any good techniques for managing writes and reads
    in a way that keeps the 16 cycle heartbeat ticking along?
  • potatoheadpotatohead Posts: 10,261
    edited 2007-10-01 17:47
    I put a coupla three instructions between HUB ops and it seems to run consistently. When doing video, HUB misses show up as sparkles on the screen. The higher pixel timings more or less highlight that condition straight away. I've gotten the higher timings with the above interleaves.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
  • BradCBradC Posts: 2,601
    edited 2007-10-01 18:12
    potatohead said...
    I put a coupla three instructions between HUB ops and it seems to run consistently.

    My calcs (and testing) show that three instructions between wr/rd to/from the hub should push the timing over the 16 cycle edge and cause each
    hub op to be spaced at 32 cycle intervals rather than 16. By this I mean

    wrlong
    nop
    nop
    wrlong
    nop
    nop
    wrlong

    will be no faster than
    wrlong
    wrlong
    wrlong

    While
    wrlong
    nop
    nop
    nop
    wrlong
    nop
    nop
    nop
    wrlong

    Should in theory be twice as slow.

    Am I misinterpreting what you said about interleaving instructions?
  • potatoheadpotatohead Posts: 10,261
    edited 2007-10-01 18:20
    No.

    It's two, if I can get away with it. Two is the best. Sometimes I'll do three or more just because it's better to waste the time and be getting something done, than not, or order of ops forces the matter. All depends on what has to happen in the loop. If there is a branch, it gets more complex. I'll generally get the loop running, then crank the pixel clock, so that I can see the misses, then either cull instructions, combine them, move them, etc... until it runs nicely.

    Another thing I do is get a loop running, then start adding nops until I see the miss, so I've an idea how close it's running to the edge.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
  • BradCBradC Posts: 2,601
    edited 2007-10-01 18:30
    Yeah, by my math, it's either 2 or 6. If you can't fit it in 2, you may as well jam 6 in there as the system will sit and wait for 25 cycles anyway.
    <notes that down for future reference>

    I agree with the nop padding.. I do that too, but if you are doing more than 3 nops, then a mov x, cnt | waitcnt x, y can give you much finer accounting.
    I can increase it cycle at a time until it either misbehaves (or as in most of my loops) misses the waitcnt clock and locks up for ~1 minute.

    Nice to know how much breathing room you have available [noparse]:)[/noparse]
  • potatoheadpotatohead Posts: 10,261
    edited 2007-10-01 18:37
    "3 nops, then a mov x, cnt | waitcnt x, y can give you much finer accounting."

    Nice!

    Added to playbook.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
  • BradCBradC Posts: 2,601
    edited 2007-10-01 18:52
    oops.. there is an add in there.. if you don't add at least #5 it'll die

    mov x, cnt
    add x, #5
    waitcnt x, y

    But yeah.. I live by that one.. [noparse]:)[/noparse]
  • deSilvadeSilva Posts: 2,967
    edited 2007-10-01 19:41
    I wonder why you don't COUNT the ticks? There is really no need for ANY speculation smile.gif
    This is one oft he best features of the Prop!
  • mirrormirror Posts: 322
    edited 2007-10-01 21:46
    When I'm designing tight assembler loops with a number of hub operations, I first write the code so that it's logically correct, and then I reorganise it to interleave 2 instructions between every hub operation.

    I don't see ANY reason whatsoever to use "NOP"s. You gain absolutely nothing, and it costs 2 assembler instructions in the process. To re-iterate:

    wrlong
    wrlong

    will run just as fast / slow as

    wrlong
    nop
    nop
    wrlong

    The real benefit is when you do something like:

    wrlong
    instr a
    instr b
    wrlong

    then you get to do some useful work with that otherwise idle time.

    There are valid places to use NOP instructions to solve timing issues, but "synchronising" hub instructions is NOT one of those cases. I've yet to use a NOP anywhere.
  • deSilvadeSilva Posts: 2,967
    edited 2007-10-01 22:55
    There was an example the other day:
    Polling asynchroniously 32 bit words from a VERY fast ADC, around 6 MHz
    You had around 3 instructions, but not exactly; you were "a little bit" to fast @ 80 MHz
    We unrolled the loop of 3x32 instructions and identified two or three places where to insert a NOP to delay the polling...

    It is a very instructive thread with a brilliant analysis by deSilva wink.gif

    http://forums.parallax.com/showthread.php?p=671376

    Post Edited (deSilva) : 10/1/2007 11:07:31 PM GMT
  • Fred HawkinsFred Hawkins Posts: 997
    edited 2007-10-01 23:20
    Considering: on one hand, deSilva head, on the other, Mike Green head after his coronation* thread...

    And the winner is:


    *albeit unasked for

    Seriously, thanks to all, even deS, for the WRLONG/RDLONG timings.
  • BradCBradC Posts: 2,601
    edited 2007-10-02 03:22
    mirror said...
    When I'm designing tight assembler loops with a number of hub operations, I first write the code so that it's logically correct, and then I reorganise it to interleave 2 instructions between every hub operation.

    I don't see ANY reason whatsoever to use "NOP"s. You gain absolutely nothing, and it costs 2 assembler instructions in the process. To re-iterate:

    No, I think perhaps I was not as clear as I may have been. I don't use nops, they were just there as examples as to how you interleave instructions
    without slowing the code execution down.

    I could also have used mov, shl, or any other instruction. nop was just easier to type [noparse]:)[/noparse]
  • mirrormirror Posts: 322
    edited 2007-10-02 04:14
    BradC said...
    mirror said...
    When I'm designing tight assembler loops with a number of hub operations, I first write the code so that it's logically correct, and then I reorganise it to interleave 2 instructions between every hub operation.

    I don't see ANY reason whatsoever to use "NOP"s. You gain absolutely nothing, and it costs 2 assembler instructions in the process. To re-iterate:

    No, I think perhaps I was not as clear as I may have been. I don't use nops, they were just there as examples as to how you interleave instructions
    without slowing the code execution down.

    I could also have used mov, shl, or any other instruction. nop was just easier to type [noparse]:)[/noparse]
    That's OK. I guess I don't want to see new programmers inserting NOP's because they think they're needed to make it work right - there are other processors that also·need instructions to keep the instruction pipe working correctly. Obviously those instructions are not needed for correct operation.

    In only 1 case is an extra instruction neccessary for correct operation, if you're using self modifying code to change an instruction that is to be executed, then you need an extra cycle of settling before using that instruction.
  • deSilvadeSilva Posts: 2,967
    edited 2007-10-02 06:04
    If in doubt, consult deSilva's Tutorial, especially Sidetrack F smile.gif
Sign In or Register to comment.