Increasing Speed of 'Propeller SPI Engine v1.2'
I have recently swapped over from Beau Schwabe's Spin SPI Engine Object to the Assembly SPI Engine (http://obex.parallax.com/objects/431/) and noticed a great increase in speed on my SPI bus. I know that this is mainly due to the fact that the SPI functions take a lot less time to do in assembly. I have been tinkering with the assembly slightly to speed things up further and have had some success.
I was wondering if anyone had integrated the use of counters into the clocking of this object to speed things up even further? Similar to how the SPI is programmed for both the FSRW( http://obex.parallax.com/objects/92/) and SD-MMC_FATEngine(http://obex.parallax.com/objects/619/) which are developed for SD Card programming in particular.
Unfortunately, my experience with ASM is significantly less than that with SPIN. So I thought I might throw it out to the experts and see what we can come up with!
I was hoping to change the following:
(Taking the MSBPRE_ function as an example which is used by the SHIFTIN_ function to shift in up to 32 bits of data)
To a version using the counters to clock the data in like so:
(CTRA and CTRB are initialised at start of program just like the SD-MMC_FATEngine object)
Am I on the right track by editing the SPI ASM object like so? Any tips would be very much appreciated as I am not that experienced in ASM.
I figure that once I understand the concepts of changing this function to shift in data, I will be able to edit the rest myself.
I was wondering if anyone had integrated the use of counters into the clocking of this object to speed things up even further? Similar to how the SPI is programmed for both the FSRW( http://obex.parallax.com/objects/92/) and SD-MMC_FATEngine(http://obex.parallax.com/objects/619/) which are developed for SD Card programming in particular.
Unfortunately, my experience with ASM is significantly less than that with SPIN. So I thought I might throw it out to the experts and see what we can come up with!
I was hoping to change the following:
(Taking the MSBPRE_ function as an example which is used by the SHIFTIN_ function to shift in up to 32 bits of data)
MSBPRE_ '' Receive Data MSBPRE 'MSBPRE_Sin test t1, ina wc '' Read Data Bit into 'C' flag rcl t3, #1 '' rotate "C" flag into return value call #PreClock '' Send clock pulse djnz t4, #MSBPRE_Sin '' Decrement t4 ; jump if not Zero jmp #Update_SHIFTIN '' Pass received data to SHIFTIN receive variable ''t1 is used for DataPin mask ''t2 is used for the ClockPin mask ''t3 is used to hold DataValue SHIFTIN/SHIFTOUT ''t4 is used to hold # of Bits
To a version using the counters to clock the data in like so:
(CTRA and CTRB are initialised at start of program just like the SD-MMC_FATEngine object)
movi frqa, #%0000_0001_0 ' Start the clock - read 1 .. 32 bits. MSBPRE_Sin waitpne t2, t2 ' Get bit. rcl t3, #1 ' waitpeq t2, t2 ' test t1, ina wc ' djnz t4, #MSBPRE_Sin ' Loop until done. jmp #Update_SHIFTIN mov frqa, #0 ' rcl t3, #1 ''t1 is used for DataPin mask ''t2 is used for the ClockPin mask ''t3 is used to hold DataValue SHIFTIN/SHIFTOUT ''t4 is used to hold # of Bits
Am I on the right track by editing the SPI ASM object like so? Any tips would be very much appreciated as I am not that experienced in ASM.
I figure that once I understand the concepts of changing this function to shift in data, I will be able to edit the rest myself.
Comments
Also, there is a bug in 1.2 re: MSBFIRST_ the shr t5, #1 should really be a ror t5, #1.
Thanks for the info kuroneko (that's 'black cat' if my very limited knowledge of Japanese serves me right?).
I'll fix that MSBFIRST_ bug now. I see what you mean about the jump being in the wrong place too. The target speeds I am hoping to get are anywhere >= 10MHz clocking frequency if at all possible. I haven't measured it correctly with an oscilloscope, but before editing the ASM, I was achieving speeds of about 1 MHz (rough estimation).
Glad to know that my first serious attempt at ASM is going well so far.
So your saying the code should look like this instead:
Regarding speed, 10Mbit/sA are not a problem, neither are 20MBit/sA when you're prepared to throw some more resources at it. Also, sending stuff is easy, it's the receiving bit which is tricky.
A assuming 80MHz clock (8 cycles/bit and 4 cycles/bit respectively)
Thanks for your help so far. It's good to know I am on the right track.
I can see getting burst rates to those numbers, with some preparatory housekeeping, however that would lower the average rates.
If the Prop had even slightly smarter Cog-Pin mapping, all of this could be done faster.
eg imagine if RL INA and RL OUTA worked to read-and-shift a single IO bit per opcode.
Smarter Cog-Pin mapping control, would also eliminate mask operations inside a loop - they could be moved
outside the loop, in a config step.
Perhaps in Prop II ?
It does indeed! I see this must be were the latest version of the fsrw object was in development. The code in the first post is a very well explained example that should help. Thanks Mark_T!
Well I just googled for "fast spi propeller" or something similar...
Does anyone have any pointers on the following code? I know I am doing something wrong, but I'm not sure where yet.