Increasing Speed of 'Propeller SPI Engine v1.2'

Dantes Legacy · 2012-03-16 09:19

I have recently swapped over from Beau Schwabe's Spin SPI Engine Object to the Assembly SPI Engine (http://obex.parallax.com/objects/431/) and noticed a great increase in speed on my SPI bus. I know that this is mainly due to the fact that the SPI functions take a lot less time to do in assembly. I have been tinkering with the assembly slightly to speed things up further and have had some success.

I was wondering if anyone had integrated the use of counters into the clocking of this object to speed things up even further? Similar to how the SPI is programmed for both the FSRW( http://obex.parallax.com/objects/92/) and SD-MMC_FATEngine(http://obex.parallax.com/objects/619/) which are developed for SD Card programming in particular.

Unfortunately, my experience with ASM is significantly less than that with SPIN. So I thought I might throw it out to the experts and see what we can come up with!

I was hoping to change the following:
(Taking the MSBPRE_ function as an example which is used by the SHIFTIN_ function to shift in up to 32 bits of data)

 MSBPRE_                                                 ''     Receive Data MSBPRE
 'MSBPRE_Sin    test    t1,             ina     wc        ''          Read Data Bit into 'C' flag
               rcl     t3,             #1                ''          rotate "C" flag into return value
               call    #PreClock                         ''          Send clock pulse
               djnz    t4,             #MSBPRE_Sin       ''          Decrement t4 ; jump if not Zero
               jmp     #Update_SHIFTIN                   ''     Pass received data to SHIFTIN receive variable
 ''t1 is used for DataPin mask
 ''t2 is used for the ClockPin mask
 ''t3 is used to hold DataValue SHIFTIN/SHIFTOUT
 ''t4 is used to hold # of Bits

To a version using the counters to clock the data in like so:
(CTRA and CTRB are initialised at start of program just like the SD-MMC_FATEngine object)

              movi    frqa,                  #%0000_0001_0                ' Start the clock - read 1 .. 32 bits.
MSBPRE_Sin    waitpne t2,                    t2                     ' Get bit.
              rcl     t3,                    #1                           '
              waitpeq t2,                    t2                     '
              test    t1,                    ina wc                       '
               
              djnz    t4,                    #MSBPRE_Sin                 ' Loop until done.
              jmp     #Update_SHIFTIN
              mov     frqa,                  #0                                     '
              rcl     t3,                    #1
''t1 is used for DataPin mask
''t2 is used for the ClockPin mask
''t3 is used to hold DataValue SHIFTIN/SHIFTOUT
''t4 is used to hold # of Bits

Am I on the right track by editing the SPI ASM object like so? Any tips would be very much appreciated as I am not that experienced in ASM.

I figure that once I understand the concepts of changing this function to shift in data, I will be able to edit the rest myself.

kuroneko · 2012-03-19 20:31

Before this gets completely forgotten, the change looks OK. However, since you assemble the last bit after the loop has finished you want to move the jump to the update function after said assembly. What's your speed target here? Depending on that value you may have to look for a different approach (e.g. get rid of the waitpxx insns and know when the clock transitions).

Also, there is a bug in 1.2 re: MSBFIRST_ the shr t5, #1 should really be a ror t5, #1.

Dantes Legacy · 2012-03-20 02:03

kuroneko wrote: »

Before this gets completely forgotten, the change looks OK. However, since you assemble the last bit after the loop has finished you want to move the jump to the update function after said assembly. What's your speed target here? Depending on that value you may have to look for a different approach (e.g. get rid of the waitpxx insns and know when the clock transitions).

Also, there is a bug in 1.2 re: MSBFIRST_ the shr t5, #1 should really be a ror t5, #1.

Thanks for the info kuroneko (that's 'black cat' if my very limited knowledge of Japanese serves me right?).

I'll fix that MSBFIRST_ bug now. I see what you mean about the jump being in the wrong place too. The target speeds I am hoping to get are anywhere >= 10MHz clocking frequency if at all possible. I haven't measured it correctly with an oscilloscope, but before editing the ASM, I was achieving speeds of about 1 MHz (rough estimation).

Glad to know that my first serious attempt at ASM is going well so far.

So your saying the code should look like this instead:

MSBPRE_                                                 ''     Receive Data MSBPRE
              movi    frqa,                  #%0000_0001_0                ' Start the clock - read 1 .. 32 bits.
MSBPRE_Sin    waitpne t2,                    t2                     ' Get bit.
              rcl     t3,                    #1                           '
              waitpeq t2,                    t2                     '
              test    t1,                    ina wc                       '
               
              djnz    t4,                    #MSBPRE_Sin                 ' Loop until done.
              mov     frqa,                  #0                                     '
              rcl     t3,                    #1              
              jmp     #Update_SHIFTIN
''t1 is used for DataPin mask
''t2 is used for the ClockPin mask
''t3 is used to hold DataValue SHIFTIN/SHIFTOUT
''t4 is used to hold # of Bits

kuroneko · 2012-03-20 02:21

Dantes Legacy wrote: »

So your saying the code should look like this instead:

Yes. Note that I didn't pay too much attention whether the clock polarity is correct. The way you have it now samples data immediately after a rising clock edge (so it looks more like a POST CLOCK example). Depending on the mode you're using you may have to rotate stuff around a bit to move the sample point to the right place. I'm sure you figure it out.

Regarding speed, 10Mbit/s^A are not a problem, neither are 20MBit/s^A when you're prepared to throw some more resources at it. Also, sending stuff is easy, it's the receiving bit which is tricky.

^A assuming 80MHz clock (8 cycles/bit and 4 cycles/bit respectively)

Dantes Legacy · 2012-03-20 02:56

I wasn't paying that much attention to the clock polarity myself either. I better get working on that and make sure it is correct. If I could get 20MBits/s, I would be extremely happy!

Thanks for your help so far. It's good to know I am on the right track.

jmg · 2012-03-20 03:01

kuroneko wrote: »

Regarding speed, 10Mbit/s^A are not a problem, neither are 20MBit/s^A when you're prepared to throw some more resources at it. Also, sending stuff is easy, it's the receiving bit which is tricky.

I can see getting burst rates to those numbers, with some preparatory housekeeping, however that would lower the average rates.

If the Prop had even slightly smarter Cog-Pin mapping, all of this could be done faster.

eg imagine if RL INA and RL OUTA worked to read-and-shift a single IO bit per opcode.

Smarter Cog-Pin mapping control, would also eliminate mask operations inside a loop - they could be moved
outside the loop, in a config step.

Perhaps in Prop II ?

Mark_T · 2012-03-20 06:21

This thread seems relevant: http://forums.parallax.com/showthread.php?113722-EDITED-fast-SPI-out-1-bit-per-instruction

Dantes Legacy · 2012-03-20 06:49

jmg wrote: »

I can see getting burst rates to those numbers, with some preparatory housekeeping, however that would lower the average rates.

Yeah, I have used a very good NCO burst object before which helped me understand the functions of CTRx, PHSx and FRQx very well. (Tracy Allen's 'NCOBurst.spin' on OBEX) I did find that the signal can get a bit unstable at top speeds though. I found it hard to implement it into an SPI engine successfully so I figured ASM was my best option.

Mark_T wrote: »

This thread seems relevant: http://forums.parallax.com/showthread.php?113722-EDITED-fast-SPI-out-1-bit-per-instruction

It does indeed! I see this must be were the latest version of the fsrw object was in development. The code in the first post is a very well explained example that should help. Thanks Mark_T!

Mark_T · 2012-03-20 07:39

Dantes Legacy wrote: »

Yeah, I have used a very good NCO burst object before which helped me understand the functions of CTRx, PHSx and FRQx very well. (Tracy Allen's 'NCOBurst.spin' on OBEX) I did find that the signal can get a bit unstable at top speeds though. I found it hard to implement it into an SPI engine successfully so I figured ASM was my best option.

It does indeed! I see this must be were the latest version of the fsrw object was in development. The code in the first post is a very well explained example that should help. Thanks Mark_T!

Well I just googled for "fast spi propeller" or something similar...

Dantes Legacy · 2012-03-20 08:16

So I have had some success with using the 'WriteSPI' and 'ReadSPI' functions from Kye's SD-MMC_FATEngine Object and have decided to implement them into the code for the Assembly SPI Object. At the moment, all I am interested in is using the MSBFIRST mode when using the SHIFTOUT function to write out data and the MSBPRE mode when using the SHIFTIN function (If I remember correctly, that makes it SPI Mode '0'). So I decided to use the 'WriteSPI' and 'ReadSPI' functions for this as they include two different speed options (and are well laid out). The problem is that I am not sure if I am shifting the bits left or right correctly in order to read and write to my SPI network.

Does anyone have any pointers on the following code? I know I am doing something wrong, but I'm not sure where yet.

DAT           org
'  
'' SPI Engine - main loop
'
''t1 is used for DataPin mask
''t2 is used for the ClockPin mask
''t3 is used to hold DataValue SHIFTIN/SHIFTOUT
''t4 is used to hold # of Bits
'*********************************************************************************************************************************
' //////////////////////Initialization/////////////////////////////////////////////////////////////////////////////////////////
init          mov     ctra,                  clockCounterSetup            ' Setup counter modules.
              mov     ctrb,                  dataInCounterSetup    ' ctra & ctrb have been set accordingly at setup.
              mov     SPITiming,             #1         '' 1 = Slow speeds, 0 = fast speeds
              jmp     #loop                                   
'*********************************************************************************************************************************
loop          rdlong  t1,par          wz                ''wait for command (Read a LONG. t1 becomes the PAR registers address. WZ specifies that the Z flag should be set (1) if PAR is zero.)
        if_z  jmp     #loop                             ''If Z is set, jump to 'loop' address (return to start)
              movd    :arg,#arg0                        ''get 5 arguments ; arg0 to arg4 (Move 'Dpin' destination field to ':arg'
              mov     t2,t1                             ''    &#9474; (datapin mask(t1) is stored in clockpin mask(t2))
              mov     t3,#5                             ''&#61626;&#9472;&#9472;&#9472;&#9496; (DataValue become MSBFIRST)
:arg          rdlong  arg0,t2                           ''(Dpin becomes destination address of clockpin mask)
              add     :arg,d0                           ''(Add 0x200 to arg)
              add     t2,#4                             ''(Add LSBFIRST to clockpin mask) 
              djnz    t3,#:arg                          ''
              mov     address,t1                        ''preserve address location for passing
                                                        ''variables back to Spin language.
              wrlong  zero,par                          ''zero command to signify command received
              ror     t1,#16+2                          ''lookup command address
              add     t1,#jumps
              movs    :table,t1
              rol     t1,#2
              shl     t1,#3
:table        mov     t2,0
              shr     t2,t1
              and     t2,#$FF
              jmp     t2                                ''jump to command
jumps         byte    0                                 ''0
              byte    SHIFTOUT_                         ''1
              byte    SHIFTIN_                          ''2
              byte    NotUsed_                          ''3
NotUsed_      jmp     #loop
'################################################################################################################
'tested OK
SHIFTOUT_                                               ''SHIFTOUT Entry
              mov     t4,             arg3      wz      ''     Load number of data bits
    if_z      jmp     #Done                             ''     '0' number of Bits = Done
              mov     t1,             #1        wz      ''     Configure DataPin
              shl     t1,             arg0
              muxz    outa,           t1                ''          PreSet DataPin LOW
              muxnz   dira,           t1                ''          Set DataPin to an OUTPUT
              mov     t2,             #1        wz      ''     Configure ClockPin
              shl     t2,             arg1              ''          Set Mask             
              test    ClockState,     #1        wc      ''          Determine Starting State
    if_nc     muxz    outa,           t2                ''          PreSet ClockPin LOW
    if_c      muxnz   outa,           t2                ''          PreSet ClockPin HIGH              
              muxnz   dira,           t2                ''          Set ClockPin to an OUTPUT
'*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^
              sub     _MSBFIRST,      arg2    wz,nr     ''     Detect MSBFIRST mode for SHIFTOUT
    if_z      jmp     #WriteSPI             
'*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^
              jmp     #loop                             ''     Go wait for next command
'------------------------------------------------------------------------------------------------------------------------------
SHIFTIN_                                                ''SHIFTIN Entry
              mov     t4,             arg3      wz      ''     Load number of data bits
    if_z      jmp     #Done                             ''     '0' number of Bits = Done
              mov     t1,             #1        wz      ''     Configure DataPin
              shl     t1,             arg0
              muxz    dira,           t1                ''          Set DataPin to an INPUT
              mov     t2,             #1        wz      ''     Configure ClockPin
              shl     t2,             arg1              ''          Set Mask             
              test    ClockState,     #1        wc      ''          Determine Starting State
    if_nc     muxz    outa,           t2                ''          PreSet ClockPin LOW
    if_c      muxnz   outa,           t2                ''          PreSet ClockPin HIGH              
              muxnz   dira,           t2                ''          Set ClockPin to an OUTPUT
'*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^
              sub     _MSBPRE,        arg2    wz,nr     ''     Detect MSBPRE mode for SHIFTIN
    if_z      jmp     #ReadSPI
'*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^
              jmp     #loop                             ''     Go wait for next command
         
'------------------------------------------------------------------------------------------------------------------------------
'tested OK
Update_SHIFTIN
              mov     t1,             address           ''     Write data back to Arg4
              add     t1,             #16               ''          Arg0 = #0 ; Arg1 = #4 ; Arg2 = #8 ; Arg3 = #12 ; Arg4 = #16
              wrlong  t3,             t1
              add     t1,             #4                ''          Point t1 to Flag ... Arg4 + #4
              wrlong  zero,           t1                ''          Clear Flag ... indicates SHIFTIN data is ready
              jmp     #loop                             ''     Go wait for next command
'------------------------------------------------------------------------------------------------------------------------------
' 'ReadSPI' & 'WriteSPI' taken from Kye's SD-MMC_FATEngine Object. 
' Variables changed to work with Beau Schwabe's Propeller SPI Engine Object
' /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
'                       Read SPI
' /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
readSPI                 mov     t4,             #8                           ' Setup counter to read in 1 - 32 bits.
                        mov     t3,             #0 wc                        '
readSPIAgain            mov     phsa,           #0                           ' Start clock low.
                        tjnz    SPITiming,      #readSPISpeed                '
' //////////////////////Slow Reading///////////////////////////////////////////////////////////////////////////////////////////
                        movi    frqa,           #%0000_0001_0                ' Start the clock - read 1 .. 32 bits.
readSPILoop             test    t1,             ina wc                       '
                        rcl     t3,             #1                           '
                        waitpne t2,             t2                           ' Get bit.
                        waitpeq t2,             t2                           '
                        djnz    t4,             #readSPILoop                 ' Loop until done.
                        jmp     #readSPIFinish                                      '
' //////////////////////Fast Reading///////////////////////////////////////////////////////////////////////////////////////////
readSPISpeed            movi    frqa,           #%0010_0000_0                ' Start the clock - read 8 bits.
                        test    t1,             ina wc                       ' Read in data.
                        rcl     t3,             #1                           '
                        test    t1,             ina wc                       '
                        rcl     t3,             #1                           '
                        test    t1,             ina wc                       '
                        rcl     t3,             #1                           '
                        test    t1,             ina wc                       '
                        rcl     t3,             #1                           '
                        test    t1,             ina wc                       '
                        rcl     t3,             #1                           '
                        test    t1,             ina wc                       '
                        rcl     t3,             #1                           '
                        test    t1,             ina wc                       '
                        rcl     t3,             #1                           '
                        test    t1,             ina wc                       '
' //////////////////////Finish Up//////////////////////////////////////////////////////////////////////////////////////////////
readSPIFinish           mov     frqa,           #0                           ' Stop the clock.
                        rcl     t3,             #1                           '
                        cmpsub  t4,             #8                           ' Read in any remaining bits.
                        tjnz    t4,             #readSPIAgain                '
                        jmp     #Update_SHIFTIN
'readSPI_ret             ret                                                 ' Return. Leaves the clock high.
' /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
'                       Write SPI
' /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
writeSPI                rdbyte  phsb,           t3        ' Set phsb as byte to be written
                        mov     t4,             #8                           ' Setup counter to write out 1 - 32 bits.
                        ror     phsb,           t4                           '
writeSPIAgain           mov     phsa,           #0                           ' Start clock low.
                        tjnz    SPITiming,      #writeSPISpeed               '
' //////////////////////Slow Writing//////////////////////////////////////////////////////////////////////////////////////////
                        movi    frqa,           #%0000_0001_0                ' Start the clock - write 1 .. 32 bits.
writeSPILoop            shl     phsb,           #1                           '
                        waitpeq t2,             t2                           ' Set bit.
                        waitpne t2,             t2                           '
                        
                        djnz    t4,             #writeSPILoop                ' Loop until done.
                        jmp     #writeSPIFinish                              '
' //////////////////////Fast Writing//////////////////////////////////////////////////////////////////////////////////////////
writeSPISpeed           movi    frqa,           #%0100_0000_0                ' Write out data.
                        shl     phsb,           #1                           '
                        shl     phsb,           #1                           '
                        shl     phsb,           #1                           '
                        shl     phsb,           #1                           '
                        shl     phsb,           #1                           '
                        shl     phsb,           #1                           '
                        shl     phsb,           #1                           '
' //////////////////////Finish Up//////////////////////////////////////////////////////////////////////////////////////////////
writeSPIFinish          mov     frqa,           #0                           ' Stop the clock.
                        cmpsub  t4,             #8                           ' Write out any remaining bits.
                        shl     phsb,           #1                           '
                        tjnz    t4,             #writeSPIAgain               '
                        neg     phsb,           #1                           '
                        jmp     #loop
'writeSPI_ret            ret                                                         ' Return. Leaves the clock low.

Increasing Speed of 'Propeller SPI Engine v1.2'

Comments