Increasing Speed of 'Propeller SPI Engine v1.2'
I have recently swapped over from Beau Schwabe's Spin SPI Engine Object to the Assembly SPI Engine (http://obex.parallax.com/objects/431/) and noticed a great increase in speed on my SPI bus. I know that this is mainly due to the fact that the SPI functions take a lot less time to do in assembly. I have been tinkering with the assembly slightly to speed things up further and have had some success.
I was wondering if anyone had integrated the use of counters into the clocking of this object to speed things up even further? Similar to how the SPI is programmed for both the FSRW( http://obex.parallax.com/objects/92/) and SD-MMC_FATEngine(http://obex.parallax.com/objects/619/) which are developed for SD Card programming in particular.
Unfortunately, my experience with ASM is significantly less than that with SPIN. So I thought I might throw it out to the experts and see what we can come up with!
I was hoping to change the following:
(Taking the MSBPRE_ function as an example which is used by the SHIFTIN_ function to shift in up to 32 bits of data)
To a version using the counters to clock the data in like so:
(CTRA and CTRB are initialised at start of program just like the SD-MMC_FATEngine object)
Am I on the right track by editing the SPI ASM object like so? Any tips would be very much appreciated as I am not that experienced in ASM.
I figure that once I understand the concepts of changing this function to shift in data, I will be able to edit the rest myself.
I was wondering if anyone had integrated the use of counters into the clocking of this object to speed things up even further? Similar to how the SPI is programmed for both the FSRW( http://obex.parallax.com/objects/92/) and SD-MMC_FATEngine(http://obex.parallax.com/objects/619/) which are developed for SD Card programming in particular.
Unfortunately, my experience with ASM is significantly less than that with SPIN. So I thought I might throw it out to the experts and see what we can come up with!
I was hoping to change the following:
(Taking the MSBPRE_ function as an example which is used by the SHIFTIN_ function to shift in up to 32 bits of data)
MSBPRE_ '' Receive Data MSBPRE
'MSBPRE_Sin test t1, ina wc '' Read Data Bit into 'C' flag
rcl t3, #1 '' rotate "C" flag into return value
call #PreClock '' Send clock pulse
djnz t4, #MSBPRE_Sin '' Decrement t4 ; jump if not Zero
jmp #Update_SHIFTIN '' Pass received data to SHIFTIN receive variable
''t1 is used for DataPin mask
''t2 is used for the ClockPin mask
''t3 is used to hold DataValue SHIFTIN/SHIFTOUT
''t4 is used to hold # of Bits
To a version using the counters to clock the data in like so:
(CTRA and CTRB are initialised at start of program just like the SD-MMC_FATEngine object)
movi frqa, #%0000_0001_0 ' Start the clock - read 1 .. 32 bits.
MSBPRE_Sin waitpne t2, t2 ' Get bit.
rcl t3, #1 '
waitpeq t2, t2 '
test t1, ina wc '
djnz t4, #MSBPRE_Sin ' Loop until done.
jmp #Update_SHIFTIN
mov frqa, #0 '
rcl t3, #1
''t1 is used for DataPin mask
''t2 is used for the ClockPin mask
''t3 is used to hold DataValue SHIFTIN/SHIFTOUT
''t4 is used to hold # of Bits
Am I on the right track by editing the SPI ASM object like so? Any tips would be very much appreciated as I am not that experienced in ASM.
I figure that once I understand the concepts of changing this function to shift in data, I will be able to edit the rest myself.

Comments
Also, there is a bug in 1.2 re: MSBFIRST_ the shr t5, #1 should really be a ror t5, #1.
Thanks for the info kuroneko (that's 'black cat' if my very limited knowledge of Japanese serves me right?).
I'll fix that MSBFIRST_ bug now. I see what you mean about the jump being in the wrong place too. The target speeds I am hoping to get are anywhere >= 10MHz clocking frequency if at all possible. I haven't measured it correctly with an oscilloscope, but before editing the ASM, I was achieving speeds of about 1 MHz (rough estimation).
Glad to know that my first serious attempt at ASM is going well so far.
So your saying the code should look like this instead:
MSBPRE_ '' Receive Data MSBPRE movi frqa, #%0000_0001_0 ' Start the clock - read 1 .. 32 bits. MSBPRE_Sin waitpne t2, t2 ' Get bit. rcl t3, #1 ' waitpeq t2, t2 ' test t1, ina wc ' djnz t4, #MSBPRE_Sin ' Loop until done. mov frqa, #0 ' rcl t3, #1 jmp #Update_SHIFTIN ''t1 is used for DataPin mask ''t2 is used for the ClockPin mask ''t3 is used to hold DataValue SHIFTIN/SHIFTOUT ''t4 is used to hold # of BitsRegarding speed, 10Mbit/sA are not a problem, neither are 20MBit/sA when you're prepared to throw some more resources at it. Also, sending stuff is easy, it's the receiving bit which is tricky.
A assuming 80MHz clock (8 cycles/bit and 4 cycles/bit respectively)
Thanks for your help so far. It's good to know I am on the right track.
I can see getting burst rates to those numbers, with some preparatory housekeeping, however that would lower the average rates.
If the Prop had even slightly smarter Cog-Pin mapping, all of this could be done faster.
eg imagine if RL INA and RL OUTA worked to read-and-shift a single IO bit per opcode.
Smarter Cog-Pin mapping control, would also eliminate mask operations inside a loop - they could be moved
outside the loop, in a config step.
Perhaps in Prop II ?
It does indeed! I see this must be were the latest version of the fsrw object was in development. The code in the first post is a very well explained example that should help. Thanks Mark_T!
Well I just googled for "fast spi propeller" or something similar...
Does anyone have any pointers on the following code? I know I am doing something wrong, but I'm not sure where yet.
DAT org ' '' SPI Engine - main loop ' ''t1 is used for DataPin mask ''t2 is used for the ClockPin mask ''t3 is used to hold DataValue SHIFTIN/SHIFTOUT ''t4 is used to hold # of Bits '********************************************************************************************************************************* ' //////////////////////Initialization///////////////////////////////////////////////////////////////////////////////////////// init mov ctra, clockCounterSetup ' Setup counter modules. mov ctrb, dataInCounterSetup ' ctra & ctrb have been set accordingly at setup. mov SPITiming, #1 '' 1 = Slow speeds, 0 = fast speeds jmp #loop '********************************************************************************************************************************* loop rdlong t1,par wz ''wait for command (Read a LONG. t1 becomes the PAR registers address. WZ specifies that the Z flag should be set (1) if PAR is zero.) if_z jmp #loop ''If Z is set, jump to 'loop' address (return to start) movd :arg,#arg0 ''get 5 arguments ; arg0 to arg4 (Move 'Dpin' destination field to ':arg' mov t2,t1 '' │ (datapin mask(t1) is stored in clockpin mask(t2)) mov t3,#5 ''───┘ (DataValue become MSBFIRST) :arg rdlong arg0,t2 ''(Dpin becomes destination address of clockpin mask) add :arg,d0 ''(Add 0x200 to arg) add t2,#4 ''(Add LSBFIRST to clockpin mask) djnz t3,#:arg '' mov address,t1 ''preserve address location for passing ''variables back to Spin language. wrlong zero,par ''zero command to signify command received ror t1,#16+2 ''lookup command address add t1,#jumps movs :table,t1 rol t1,#2 shl t1,#3 :table mov t2,0 shr t2,t1 and t2,#$FF jmp t2 ''jump to command jumps byte 0 ''0 byte SHIFTOUT_ ''1 byte SHIFTIN_ ''2 byte NotUsed_ ''3 NotUsed_ jmp #loop '################################################################################################################ 'tested OK SHIFTOUT_ ''SHIFTOUT Entry mov t4, arg3 wz '' Load number of data bits if_z jmp #Done '' '0' number of Bits = Done mov t1, #1 wz '' Configure DataPin shl t1, arg0 muxz outa, t1 '' PreSet DataPin LOW muxnz dira, t1 '' Set DataPin to an OUTPUT mov t2, #1 wz '' Configure ClockPin shl t2, arg1 '' Set Mask test ClockState, #1 wc '' Determine Starting State if_nc muxz outa, t2 '' PreSet ClockPin LOW if_c muxnz outa, t2 '' PreSet ClockPin HIGH muxnz dira, t2 '' Set ClockPin to an OUTPUT '*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^ sub _MSBFIRST, arg2 wz,nr '' Detect MSBFIRST mode for SHIFTOUT if_z jmp #WriteSPI '*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^ jmp #loop '' Go wait for next command '------------------------------------------------------------------------------------------------------------------------------ SHIFTIN_ ''SHIFTIN Entry mov t4, arg3 wz '' Load number of data bits if_z jmp #Done '' '0' number of Bits = Done mov t1, #1 wz '' Configure DataPin shl t1, arg0 muxz dira, t1 '' Set DataPin to an INPUT mov t2, #1 wz '' Configure ClockPin shl t2, arg1 '' Set Mask test ClockState, #1 wc '' Determine Starting State if_nc muxz outa, t2 '' PreSet ClockPin LOW if_c muxnz outa, t2 '' PreSet ClockPin HIGH muxnz dira, t2 '' Set ClockPin to an OUTPUT '*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^ sub _MSBPRE, arg2 wz,nr '' Detect MSBPRE mode for SHIFTIN if_z jmp #ReadSPI '*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^ jmp #loop '' Go wait for next command '------------------------------------------------------------------------------------------------------------------------------ 'tested OK Update_SHIFTIN mov t1, address '' Write data back to Arg4 add t1, #16 '' Arg0 = #0 ; Arg1 = #4 ; Arg2 = #8 ; Arg3 = #12 ; Arg4 = #16 wrlong t3, t1 add t1, #4 '' Point t1 to Flag ... Arg4 + #4 wrlong zero, t1 '' Clear Flag ... indicates SHIFTIN data is ready jmp #loop '' Go wait for next command '------------------------------------------------------------------------------------------------------------------------------ ' 'ReadSPI' & 'WriteSPI' taken from Kye's SD-MMC_FATEngine Object. ' Variables changed to work with Beau Schwabe's Propeller SPI Engine Object ' ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// ' Read SPI ' ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// readSPI mov t4, #8 ' Setup counter to read in 1 - 32 bits. mov t3, #0 wc ' readSPIAgain mov phsa, #0 ' Start clock low. tjnz SPITiming, #readSPISpeed ' ' //////////////////////Slow Reading/////////////////////////////////////////////////////////////////////////////////////////// movi frqa, #%0000_0001_0 ' Start the clock - read 1 .. 32 bits. readSPILoop test t1, ina wc ' rcl t3, #1 ' waitpne t2, t2 ' Get bit. waitpeq t2, t2 ' djnz t4, #readSPILoop ' Loop until done. jmp #readSPIFinish ' ' //////////////////////Fast Reading/////////////////////////////////////////////////////////////////////////////////////////// readSPISpeed movi frqa, #%0010_0000_0 ' Start the clock - read 8 bits. test t1, ina wc ' Read in data. rcl t3, #1 ' test t1, ina wc ' rcl t3, #1 ' test t1, ina wc ' rcl t3, #1 ' test t1, ina wc ' rcl t3, #1 ' test t1, ina wc ' rcl t3, #1 ' test t1, ina wc ' rcl t3, #1 ' test t1, ina wc ' rcl t3, #1 ' test t1, ina wc ' ' //////////////////////Finish Up////////////////////////////////////////////////////////////////////////////////////////////// readSPIFinish mov frqa, #0 ' Stop the clock. rcl t3, #1 ' cmpsub t4, #8 ' Read in any remaining bits. tjnz t4, #readSPIAgain ' jmp #Update_SHIFTIN 'readSPI_ret ret ' Return. Leaves the clock high. ' ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// ' Write SPI ' ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// writeSPI rdbyte phsb, t3 ' Set phsb as byte to be written mov t4, #8 ' Setup counter to write out 1 - 32 bits. ror phsb, t4 ' writeSPIAgain mov phsa, #0 ' Start clock low. tjnz SPITiming, #writeSPISpeed ' ' //////////////////////Slow Writing////////////////////////////////////////////////////////////////////////////////////////// movi frqa, #%0000_0001_0 ' Start the clock - write 1 .. 32 bits. writeSPILoop shl phsb, #1 ' waitpeq t2, t2 ' Set bit. waitpne t2, t2 ' djnz t4, #writeSPILoop ' Loop until done. jmp #writeSPIFinish ' ' //////////////////////Fast Writing////////////////////////////////////////////////////////////////////////////////////////// writeSPISpeed movi frqa, #%0100_0000_0 ' Write out data. shl phsb, #1 ' shl phsb, #1 ' shl phsb, #1 ' shl phsb, #1 ' shl phsb, #1 ' shl phsb, #1 ' shl phsb, #1 ' ' //////////////////////Finish Up////////////////////////////////////////////////////////////////////////////////////////////// writeSPIFinish mov frqa, #0 ' Stop the clock. cmpsub t4, #8 ' Write out any remaining bits. shl phsb, #1 ' tjnz t4, #writeSPIAgain ' neg phsb, #1 ' jmp #loop 'writeSPI_ret ret ' Return. Leaves the clock low.