This is great. I can now push the whole sector read code into assembly and hopefully get better performance.
BTW: This CRC code helped speed up the eMMC Spin code a little. But, I'm still seeing a series of single block reads as being much slower than a big multiblock read. Maybe that's just the way it is? I'm not seeing anything in my code that would account for this...
This is great. I can now push the whole sector read code into assembly and hopefully get better performance.
BTW: This CRC code helped speed up the eMMC Spin code a little. But, I'm still seeing a series of single block reads as being much slower than a big multiblock read. Maybe that's just the way it is? I'm not seeing anything in my code that would account for this...
@Rayman
I suggest you look at my SD code I've just released if you want to get it running fast. It works in COG or LUT.
It should drop in to your code easily
For the CRC routine, I believe if you provide the last $00 (extra) byte (which gets replaced by the crc) but instead use $80 I believe you can then run the crc routine completely without the last statement that fiddles the routine to use 7 bits, and then does another fix. Haven't had time to try it yet.
You can declare an array and block-read into that (only works in Spin source). Or just use RDFAST and friends for something like this (only works in ORG/END mode, not ORGH/END or ASM/ENDASM)
@Wuerfel_21 said:
You can declare an array and block-read into that (only works in Spin source). Or just use RDFAST and friends for something like this (only works in ORG/END mode, not ORGH/END or ASM/ENDASM)
I thought all arrays went to hubRAM/Stack. There would need to be a notably small size limit for cogRAM based.
@Wuerfel_21 said:
You can declare an array and block-read into that (only works in Spin source). Or just use RDFAST and friends for something like this (only works in ORG/END mode, not ORGH/END or ASM/ENDASM)
I thought all arrays went to hubRAM/Stack. There would need to be a notably small size limit for cogRAM based.
Yes, something like that. In C it always puts it on the stack I think. In Spin it can go to cogRAM, wherein it can be used for inline ASM block reads. Also pr0 through pr7 always exist and can also be used for such a purpose, too.
Neat. I see how that works now. The other instructions add up to an even 32 ticks, 4 whole hub rotations. And with the RDLONG slice address incrementing each pass, that makes it fit the minimum of 9 ticks to execute.
Comments
I tried all sorts of reversals and inversions and just couldn't get crcbit or crcnib to work. Didn't think of shifting the poly
Maybe the code in the earlier posts would have worked too if I had done only 5 bytes instead of 5 and 7/8 bytes...
It's so bizarre now how that 5-7/8 byte code actually works... Those last seven zeros must clean up something not done right earlier...
http://chibios.sourceforge.net/docs3/hal/hal__mmc__spi_8c_source.html
BTW: This CRC code helped speed up the eMMC Spin code a little. But, I'm still seeing a series of single block reads as being much slower than a big multiblock read. Maybe that's just the way it is? I'm not seeing anything in my code that would account for this...
The crcnib will be a huge help there I hope
If not, you could use a table driven approach from LUTRAM perhaps. Should still be quite fast.
@Rayman
I suggest you look at my SD code I've just released if you want to get it running fast. It works in COG or LUT.
It should drop in to your code easily
For the CRC routine, I believe if you provide the last $00 (extra) byte (which gets replaced by the crc) but instead use $80 I believe you can then run the crc routine completely without the last statement that fiddles the routine to use 7 bits, and then does another fix. Haven't had time to try it yet.
I've optimised Andy's code a little:
You can't call it optimized if there's an RDxxxx in the hot loop
Oi, it's doubled the speed!
Also, instead of doing
sub len,#1
and then testing the condition code, you can use the handy DJZ instruction to exit the loop earlyActually, if you have an efficient way to block read variables into Flexspin's Fcache I'm all ears.
EDIT: I guess allocating the space is where I need the help. Rather than the block copy.
You can declare an array and block-read into that (only works in Spin source). Or just use RDFAST and friends for something like this (only works in ORG/END mode, not ORGH/END or ASM/ENDASM)
I thought all arrays went to hubRAM/Stack. There would need to be a notably small size limit for cogRAM based.
Huh, that now allows use of RDLONG ...
Yes, something like that. In C it always puts it on the stack I think. In Spin it can go to cogRAM, wherein it can be used for inline ASM block reads. Also pr0 through pr7 always exist and can also be used for such a purpose, too.
Assuming code in cog or LUT RAM, no interrupts and no long crossings, all the RDLONGs after the first take 9 cycles, the minimum possible.
Neat. I see how that works now. The other instructions add up to an even 32 ticks, 4 whole hub rotations. And with the RDLONG slice address incrementing each pass, that makes it fit the minimum of 9 ticks to execute.