Shop OBEX P1 Docs P2 Docs Learn Events
Performance boost from spin to asm? — Parallax Forums

Performance boost from spin to asm?

__red____red__ Posts: 470
edited 2011-06-02 17:46 in Propeller 1
Admittedly I'm three days into actually using my propeller so please forgive the greenness of my question:
repeat
outa[SHLD]~ 'Load 'em up!
outa[CLK]~~ 'Get clock in correct position for...
outa[SHLD]~~'Clock transitions now shift data.
v1 := v1 << 1 + ina[SER] 'First bit is already in position.
repeat (32)
outa[CLK]~~ 'Transit up and shift
v1 := v1 << 1 + ina[SER] 'Read bit
outa[CLK]~ 'Drop the clock in preparation for next cycle
v2 := v1
Given the simplicity of the above code, would re-writing it in asm gain me any significant performance increase? It's currently taking 3.1ms to execute the outer loop. Given that my application requires me to read 260 keys instead of just 32 I need to be able to poll a fair amount on one cog before deploying to more.

lsa.jpg


Thanks,



Red
1024 x 204 - 26K
lsa.jpg 26.2K

Comments

  • Mike GreenMike Green Posts: 23,101
    edited 2011-02-15 11:31
    I'm guessing, but it looks like you're trying to read a chain of 74HC165s.

    Yes, you'll get a tremendous speed increase by redoing this in assembly language. There are some very fast techniques for doing this using a cog counter to generate the clock, but a straightforward inner loop will take about 5 instructions to shift in a bit from the 74HC165, then another few instructions to handle each long that results and gets stored in main (hub) memory as well as managing the memory addresses and counts. You're talking about roughly 10us per long (32 bits).
  • __red____red__ Posts: 470
    edited 2011-02-15 14:48
    Mike,

    Thank you for the reply and yes, you're right - I am reading from a chain of 74HC165s.

    I've seen code to do the clock like you said but I'm fearful that my reading may lose sync with the counter so I've avoided that method.

    Looks like I have some reading to do wrt asm. 3.1ms to 10us would be well worth the time invested in learning asm.

    Thanks,



    Red
  • Mike GreenMike Green Posts: 23,101
    edited 2011-02-15 16:06
    Look at some of the Object Exchange objects that manage SPI devices using assembly language. Look at the low level SPI read routine in the Winbond Flash Memory I/O object in the Object Exchange.
  • __red____red__ Posts: 470
    edited 2011-06-02 08:01
    Mike Green wrote: »
    Look at some of the Object Exchange objects that manage SPI devices using assembly language. Look at the low level SPI read routine in the Winbond Flash Memory I/O object in the Object Exchange.

    So, I've written the following which appears to work very well in Gear. Looking forward to testing it on real hardware when I get home tonight:
    DAT
                  org 0
    entrypoint    or DIRA, DIRMASK                  ' Set pins to stun
    :outerloop    andn OUTA, LOADPIN                ' Set LD to low
                  andn OUTA, CLKPIN                 ' Set clockpin low
                  or OUTA, LOADPIN                  ' Set Load high
                  mov bitcntr, #32                  ' We're reading 32 bits...
    :loop         andn OUTA, CLKPIN                 ' Set LD to low (Yes, this is redundant
                                 
                  ror INA, #28 NR,WC                ' C should contain value of CLK (27)
                  rcl KEYSTATE, #1                  ' C should now be pushed into LSB of keystate
                  or OUTA, CLKPIN                   ' Set CLK high
                  djnz bitcntr, #:loop              ' Around the loop we go
                  jmp #:outerloop                   ' ... and back for another snapshot
    
    DIRMASK       long      %00000110_00000000_00000000_01111111
    LOADPIN       long      %00000010_00000000_00000000_00000000
    CLKPIN        long      %00000100_00000000_00000000_00000000  
    SERPIN        long      %00001000_00000000_00000000_00000000
    BITCNTR       long      $0
    
    Probably not the perfect code (like I should be using a counter instead of bit-banging myself) but I'm happy with the start that I've made :-)

    Thanks again,



    Red
  • Mike GreenMike Green Posts: 23,101
    edited 2011-06-02 08:06
    In the future, please do not use the [ QUOTE ] tags for including code, particularly Propeller code. Use [ CODE ] tags instead for small code fragments or use the Attachment Manager (use the Go Advanced button) when posting a reply with code included.
  • __red____red__ Posts: 470
    edited 2011-06-02 08:26
    I was actually re-formatting it within a code block while you were posting.

    Done!

    Of course, now through my browsing I'm reading the "Tricks and Traps" pdf and it looks like
    ror INA, #28 NR,WC
    
    may be invalid, even with the NR flag.


    edit:

    That Traps and Tricks document saved my life again it seems:

    "This is because, no matter
    how many positions are shifted, the carry (if written) always gets the initial
    value of bit 0 for right shifts/rotates or bit 31 for left shifts/rotates"
  • JasonDorieJasonDorie Posts: 1,930
    edited 2011-06-02 17:46
    I think I read at one point that there is a very approximate 300:1 ratio of time spent executing an operation in SPIN vs executing that same operation in PASM. I realize that the two are very different, but SPIN has a significant amount of overhead because it is interpreted, so unless you're doing something that is essentially atomic in SPIN (like a multiply), it'll always be an order of magnitude faster in PASM.

    Given that your 3.1ms loop becomes 10us, or 310x faster, that seems about right.
Sign In or Register to comment.