Stuck again porting assembly from p1 to p2 (fun with bit banging spi)
Greg LaPolla
Posts: 323
I am working on porting a touchscreen driver. I think the test line in the following code is causing an issue, but not completely sure.
spibyte mov count, #8 .loop rcl op, #1 wc waitx t add t, bitdel drvh SCK_PIN muxc outa, MOSI_PIN waitx t add t, bitdel drvl SCK_PIN test MISO_PIN, ina wc rcl resultword, #1 djnz count, #.loop ret
Comments
I find that reading and writing on the same clock are rarely if ever needed. I prefer to write when I write and read when I read.
The P2 has much better instructions for pins and a range of pins but also you need to wait for the signal to propagate through before testing.
Here's my read routine.
Firstly, there is a touchscreen driver that @MRoberts added to my LCD driver a week or so ago.
This also contains a routine to write SPI data to the LCD.
Next, be aware that you are sampling before the clock that precedes the toggle because of the internal clock buffering in the P2. Here is a timing snippet from my SD driver. It's unravelled for speed but you should get the idea. You can also look at the SD boot (hubexec) code in the P2 ROM.
There's also a diagram in Chip's datasheet that shows the clock alignment. I've done one
@Cluso - I would always qualify that statement that "it's unrolled for speed" by saying "because this is hubexec code". But not everyone is keen to run time critical code in hubexec mode so if it was in cog then a REP will loop it with zero overhead.
No! It's running in cog or lut - cannot recall which but it's certainly not hubexec. It's contained totally within the cog. excepting the mailbox of 4 longs and the sector buffer of 512 longs.
@Cluso - so there is no need to unroll it then if you use a REP. Oh, I see is that a REP on the whole sector then? Most of that time is waiting for the sector to be ready and the only real way of speeding it up overall is to use multi-block read mode so there is no real delay between contiguous sectors.
You cannot speed up the sd card, but once it's ready to go, this made a big difference. Yes, multi-sector would help if what you're reading is large blocks. It's dependant on the code, and that's doable - i've just not implemented the multi-sector command 23?
When every bit is being clocked at 4 instructions instead of 6-10 followed by extra clocks between every byte and a few clocks * 8 * 512 adds up quickly.
Once the card gets going it's quite fast. It's the massive delays in accessing the first sector which is not just the next one in line.
Technically, a file "could' be fragmented, but even if it were, and I find that that's not the case, there are still 64 sectors in one 32kB cluster and those sectors can be read sequentially in multi-block mode. In fact even though I don't have any file fragmented, I always use 64kB clusters which is the maximum that FAT32 itself permits.