PDA

View Full Version : Increasing Speed of 'Propeller SPI Engine v1.2'



Dantes Legacy
03-16-2012, 05:19 PM
I have recently swapped over from Beau Schwabe's Spin SPI Engine Object to the Assembly SPI Engine (http://obex.parallax.com/objects/431/) and noticed a great increase in speed on my SPI bus. I know that this is mainly due to the fact that the SPI functions take a lot less time to do in assembly. I have been tinkering with the assembly slightly to speed things up further and have had some success.

I was wondering if anyone had integrated the use of counters into the clocking of this object to speed things up even further? Similar to how the SPI is programmed for both the FSRW( http://obex.parallax.com/objects/92/) and SD-MMC_FATEngine(http://obex.parallax.com/objects/619/) which are developed for SD Card programming in particular.

Unfortunately, my experience with ASM is significantly less than that with SPIN. So I thought I might throw it out to the experts and see what we can come up with!

I was hoping to change the following:
(Taking the MSBPRE_ function as an example which is used by the SHIFTIN_ function to shift in up to 32 bits of data)


MSBPRE_ '' Receive Data MSBPRE
'MSBPRE_Sin test t1, ina wc '' Read Data Bit into 'C' flag
rcl t3, #1 '' rotate "C" flag into return value
call #PreClock '' Send clock pulse
djnz t4, #MSBPRE_Sin '' Decrement t4 ; jump if not Zero
jmp #Update_SHIFTIN '' Pass received data to SHIFTIN receive variable
''t1 is used for DataPin mask
''t2 is used for the ClockPin mask
''t3 is used to hold DataValue SHIFTIN/SHIFTOUT
''t4 is used to hold # of Bits


To a version using the counters to clock the data in like so:
(CTRA and CTRB are initialised at start of program just like the SD-MMC_FATEngine object)


movi frqa, #%0000_0001_0 ' Start the clock - read 1 .. 32 bits.
MSBPRE_Sin waitpne t2, t2 ' Get bit.
rcl t3, #1 '
waitpeq t2, t2 '
test t1, ina wc '

djnz t4, #MSBPRE_Sin ' Loop until done.
jmp #Update_SHIFTIN
mov frqa, #0 '
rcl t3, #1
''t1 is used for DataPin mask
''t2 is used for the ClockPin mask
''t3 is used to hold DataValue SHIFTIN/SHIFTOUT
''t4 is used to hold # of Bits


Am I on the right track by editing the SPI ASM object like so? Any tips would be very much appreciated as I am not that experienced in ASM.

I figure that once I understand the concepts of changing this function to shift in data, I will be able to edit the rest myself.

kuroneko
03-20-2012, 04:31 AM
Before this gets completely forgotten, the change looks OK. However, since you assemble the last bit after the loop has finished you want to move the jump to the update function after said assembly. What's your speed target here? Depending on that value you may have to look for a different approach (e.g. get rid of the waitpxx insns and know when the clock transitions).

Also, there is a bug in 1.2 re: MSBFIRST_ the shr t5, #1 should really be a ror t5, #1.

Dantes Legacy
03-20-2012, 10:03 AM
Before this gets completely forgotten, the change looks OK. However, since you assemble the last bit after the loop has finished you want to move the jump to the update function after said assembly. What's your speed target here? Depending on that value you may have to look for a different approach (e.g. get rid of the waitpxx insns and know when the clock transitions).

Also, there is a bug in 1.2 re: MSBFIRST_ the shr t5, #1 should really be a ror t5, #1.

Thanks for the info kuroneko (that's 'black cat' if my very limited knowledge of Japanese serves me right?).

I'll fix that MSBFIRST_ bug now. I see what you mean about the jump being in the wrong place too. The target speeds I am hoping to get are anywhere >= 10MHz clocking frequency if at all possible. I haven't measured it correctly with an oscilloscope, but before editing the ASM, I was achieving speeds of about 1 MHz (rough estimation).

Glad to know that my first serious attempt at ASM is going well so far.

So your saying the code should look like this instead:


MSBPRE_ '' Receive Data MSBPRE
movi frqa, #%0000_0001_0 ' Start the clock - read 1 .. 32 bits.
MSBPRE_Sin waitpne t2, t2 ' Get bit.
rcl t3, #1 '
waitpeq t2, t2 '
test t1, ina wc '

djnz t4, #MSBPRE_Sin ' Loop until done.
mov frqa, #0 '
rcl t3, #1
jmp #Update_SHIFTIN
''t1 is used for DataPin mask
''t2 is used for the ClockPin mask
''t3 is used to hold DataValue SHIFTIN/SHIFTOUT
''t4 is used to hold # of Bits

kuroneko
03-20-2012, 10:21 AM
So your saying the code should look like this instead:
Yes. Note that I didn't pay too much attention whether the clock polarity is correct. The way you have it now samples data immediately after a rising clock edge (so it looks more like a POST CLOCK example). Depending on the mode you're using you may have to rotate stuff around a bit to move the sample point to the right place. I'm sure you figure it out.

Regarding speed, 10Mbit/sA are not a problem, neither are 20MBit/sA when you're prepared to throw some more resources at it. Also, sending stuff is easy, it's the receiving bit which is tricky.

A assuming 80MHz clock (8 cycles/bit and 4 cycles/bit respectively)

Dantes Legacy
03-20-2012, 10:56 AM
I wasn't paying that much attention to the clock polarity myself either. I better get working on that and make sure it is correct. If I could get 20MBits/s, I would be extremely happy!

Thanks for your help so far. It's good to know I am on the right track.

jmg
03-20-2012, 11:01 AM
Regarding speed, 10Mbit/sA are not a problem, neither are 20MBit/sA when you're prepared to throw some more resources at it. Also, sending stuff is easy, it's the receiving bit which is tricky.


I can see getting burst rates to those numbers, with some preparatory housekeeping, however that would lower the average rates.

If the Prop had even slightly smarter Cog-Pin mapping, all of this could be done faster.

eg imagine if RL INA and RL OUTA worked to read-and-shift a single IO bit per opcode.

Smarter Cog-Pin mapping control, would also eliminate mask operations inside a loop - they could be moved
outside the loop, in a config step.

Perhaps in Prop II ?

Mark_T
03-20-2012, 02:21 PM
This thread seems relevant: http://forums.parallax.com/showthread.php?113722-EDITED-fast-SPI-out-1-bit-per-instruction

Dantes Legacy
03-20-2012, 02:49 PM
I can see getting burst rates to those numbers, with some preparatory housekeeping, however that would lower the average rates.

Yeah, I have used a very good NCO burst object before which helped me understand the functions of CTRx, PHSx and FRQx very well. (Tracy Allen's 'NCOBurst.spin' on OBEX) I did find that the signal can get a bit unstable at top speeds though. I found it hard to implement it into an SPI engine successfully so I figured ASM was my best option.


This thread seems relevant: http://forums.parallax.com/showthread.php?113722-EDITED-fast-SPI-out-1-bit-per-instruction
It does indeed! I see this must be were the latest version of the fsrw object was in development. The code in the first post is a very well explained example that should help. Thanks Mark_T!

Mark_T
03-20-2012, 03:39 PM
Yeah, I have used a very good NCO burst object before which helped me understand the functions of CTRx, PHSx and FRQx very well. (Tracy Allen's 'NCOBurst.spin' on OBEX) I did find that the signal can get a bit unstable at top speeds though. I found it hard to implement it into an SPI engine successfully so I figured ASM was my best option.


It does indeed! I see this must be were the latest version of the fsrw object was in development. The code in the first post is a very well explained example that should help. Thanks Mark_T!

Well I just googled for "fast spi propeller" or something similar...

Dantes Legacy
03-20-2012, 04:16 PM
So I have had some success with using the 'WriteSPI' and 'ReadSPI' functions from Kye's SD-MMC_FATEngine Object and have decided to implement them into the code for the Assembly SPI Object. At the moment, all I am interested in is using the MSBFIRST mode when using the SHIFTOUT function to write out data and the MSBPRE mode when using the SHIFTIN function (If I remember correctly, that makes it SPI Mode '0'). So I decided to use the 'WriteSPI' and 'ReadSPI' functions for this as they include two different speed options (and are well laid out). The problem is that I am not sure if I am shifting the bits left or right correctly in order to read and write to my SPI network.

Does anyone have any pointers on the following code? I know I am doing something wrong, but I'm not sure where yet.



DAT org
'
'' SPI Engine - main loop
'
''t1 is used for DataPin mask
''t2 is used for the ClockPin mask
''t3 is used to hold DataValue SHIFTIN/SHIFTOUT
''t4 is used to hold # of Bits
'************************************************* ************************************************** ******************************
' //////////////////////Initialization/////////////////////////////////////////////////////////////////////////////////////////
init mov ctra, clockCounterSetup ' Setup counter modules.
mov ctrb, dataInCounterSetup ' ctra & ctrb have been set accordingly at setup.
mov SPITiming, #1 '' 1 = Slow speeds, 0 = fast speeds
jmp #loop
'************************************************* ************************************************** ******************************
loop rdlong t1,par wz ''wait for command (Read a LONG. t1 becomes the PAR registers address. WZ specifies that the Z flag should be set (1) if PAR is zero.)
if_z jmp #loop ''If Z is set, jump to 'loop' address (return to start)
movd :arg,#arg0 ''get 5 arguments ; arg0 to arg4 (Move 'Dpin' destination field to ':arg'
mov t2,t1 '' │ (datapin mask(t1) is stored in clockpin mask(t2))
mov t3,#5 ''───┘ (DataValue become MSBFIRST)
:arg rdlong arg0,t2 ''(Dpin becomes destination address of clockpin mask)
add :arg,d0 ''(Add 0x200 to arg)
add t2,#4 ''(Add LSBFIRST to clockpin mask)
djnz t3,#:arg ''
mov address,t1 ''preserve address location for passing
''variables back to Spin language.
wrlong zero,par ''zero command to signify command received
ror t1,#16+2 ''lookup command address
add t1,#jumps
movs :table,t1
rol t1,#2
shl t1,#3
:table mov t2,0
shr t2,t1
and t2,#$FF
jmp t2 ''jump to command
jumps byte 0 ''0
byte SHIFTOUT_ ''1
byte SHIFTIN_ ''2
byte NotUsed_ ''3
NotUsed_ jmp #loop
'################################################# ################################################## #############
'tested OK
SHIFTOUT_ ''SHIFTOUT Entry
mov t4, arg3 wz '' Load number of data bits
if_z jmp #Done '' '0' number of Bits = Done
mov t1, #1 wz '' Configure DataPin
shl t1, arg0
muxz outa, t1 '' PreSet DataPin LOW
muxnz dira, t1 '' Set DataPin to an OUTPUT
mov t2, #1 wz '' Configure ClockPin
shl t2, arg1 '' Set Mask
test ClockState, #1 wc '' Determine Starting State
if_nc muxz outa, t2 '' PreSet ClockPin LOW
if_c muxnz outa, t2 '' PreSet ClockPin HIGH
muxnz dira, t2 '' Set ClockPin to an OUTPUT
'*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^* ^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^* ^*^*^*^*^*^*^*^*^*^
sub _MSBFIRST, arg2 wz,nr '' Detect MSBFIRST mode for SHIFTOUT
if_z jmp #WriteSPI
'*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^* ^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^* ^*^*^*^*^*^*^*^*^*^
jmp #loop '' Go wait for next command
'------------------------------------------------------------------------------------------------------------------------------
SHIFTIN_ ''SHIFTIN Entry
mov t4, arg3 wz '' Load number of data bits
if_z jmp #Done '' '0' number of Bits = Done
mov t1, #1 wz '' Configure DataPin
shl t1, arg0
muxz dira, t1 '' Set DataPin to an INPUT
mov t2, #1 wz '' Configure ClockPin
shl t2, arg1 '' Set Mask
test ClockState, #1 wc '' Determine Starting State
if_nc muxz outa, t2 '' PreSet ClockPin LOW
if_c muxnz outa, t2 '' PreSet ClockPin HIGH
muxnz dira, t2 '' Set ClockPin to an OUTPUT
'*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^* ^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^* ^*^*^*^*^*^*^*^*^*^
sub _MSBPRE, arg2 wz,nr '' Detect MSBPRE mode for SHIFTIN
if_z jmp #ReadSPI
'*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^* ^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^* ^*^*^*^*^*^*^*^*^*^
jmp #loop '' Go wait for next command

'------------------------------------------------------------------------------------------------------------------------------
'tested OK
Update_SHIFTIN
mov t1, address '' Write data back to Arg4
add t1, #16 '' Arg0 = #0 ; Arg1 = #4 ; Arg2 = #8 ; Arg3 = #12 ; Arg4 = #16
wrlong t3, t1
add t1, #4 '' Point t1 to Flag ... Arg4 + #4
wrlong zero, t1 '' Clear Flag ... indicates SHIFTIN data is ready
jmp #loop '' Go wait for next command
'------------------------------------------------------------------------------------------------------------------------------
' 'ReadSPI' & 'WriteSPI' taken from Kye's SD-MMC_FATEngine Object.
' Variables changed to work with Beau Schwabe's Propeller SPI Engine Object
' /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
' Read SPI
' /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
readSPI mov t4, #8 ' Setup counter to read in 1 - 32 bits.
mov t3, #0 wc '
readSPIAgain mov phsa, #0 ' Start clock low.
tjnz SPITiming, #readSPISpeed '
' //////////////////////Slow Reading///////////////////////////////////////////////////////////////////////////////////////////
movi frqa, #%0000_0001_0 ' Start the clock - read 1 .. 32 bits.
readSPILoop test t1, ina wc '
rcl t3, #1 '
waitpne t2, t2 ' Get bit.
waitpeq t2, t2 '
djnz t4, #readSPILoop ' Loop until done.
jmp #readSPIFinish '
' //////////////////////Fast Reading///////////////////////////////////////////////////////////////////////////////////////////
readSPISpeed movi frqa, #%0010_0000_0 ' Start the clock - read 8 bits.
test t1, ina wc ' Read in data.
rcl t3, #1 '
test t1, ina wc '
rcl t3, #1 '
test t1, ina wc '
rcl t3, #1 '
test t1, ina wc '
rcl t3, #1 '
test t1, ina wc '
rcl t3, #1 '
test t1, ina wc '
rcl t3, #1 '
test t1, ina wc '
rcl t3, #1 '
test t1, ina wc '
' //////////////////////Finish Up//////////////////////////////////////////////////////////////////////////////////////////////
readSPIFinish mov frqa, #0 ' Stop the clock.
rcl t3, #1 '
cmpsub t4, #8 ' Read in any remaining bits.
tjnz t4, #readSPIAgain '
jmp #Update_SHIFTIN
'readSPI_ret ret ' Return. Leaves the clock high.
' /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
' Write SPI
' /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
writeSPI rdbyte phsb, t3 ' Set phsb as byte to be written
mov t4, #8 ' Setup counter to write out 1 - 32 bits.
ror phsb, t4 '
writeSPIAgain mov phsa, #0 ' Start clock low.
tjnz SPITiming, #writeSPISpeed '
' //////////////////////Slow Writing//////////////////////////////////////////////////////////////////////////////////////////
movi frqa, #%0000_0001_0 ' Start the clock - write 1 .. 32 bits.
writeSPILoop shl phsb, #1 '
waitpeq t2, t2 ' Set bit.
waitpne t2, t2 '

djnz t4, #writeSPILoop ' Loop until done.
jmp #writeSPIFinish '
' //////////////////////Fast Writing//////////////////////////////////////////////////////////////////////////////////////////
writeSPISpeed movi frqa, #%0100_0000_0 ' Write out data.
shl phsb, #1 '
shl phsb, #1 '
shl phsb, #1 '
shl phsb, #1 '
shl phsb, #1 '
shl phsb, #1 '
shl phsb, #1 '
' //////////////////////Finish Up//////////////////////////////////////////////////////////////////////////////////////////////
writeSPIFinish mov frqa, #0 ' Stop the clock.
cmpsub t4, #8 ' Write out any remaining bits.
shl phsb, #1 '
tjnz t4, #writeSPIAgain '
neg phsb, #1 '
jmp #loop
'writeSPI_ret ret ' Return. Leaves the clock low.