@evanh said:
PPS: Regarding those block read/write routines, they partly depend on registers being loaded along with the code. If Fcache is shifted to LUTRAM you'd have to move such data into locals instead.
Yeah I know all about it, that's the tricky part. But we have 32 registers to play with and there are only up to 9 parameters I think. We can get them into C register variables using high level assignments from the source structure.
Also I've been making this multi-instance so your globals are no longer present. Even though I am only exposing the SD mode driver host interface as a single slot initially, I see no reason to fix it to just one interface.
I don't think there is any true globals in that Flex driver. Things like the pin numbers are contained to the instance by FlexC's C++ struct __using() wrapper. Only a DAT section would be global.
Also, in case you missed my previous edit, those register constants could, instead of creating locals, be placed in with the presets that are fast copied into the RES section.
I've got a working driver with a small optimisation for size, making for a saving of 148 bytes. It is just dealing with command response start bit search. Nothing else. It clocks and tests one bit at a time instead of relying on a reconfigured clock - Borrowed from a similar solution I used for CRC response collection in the block write routine. The code is tidier due to no longer dealing with variable lag effects.
It doesn't affect block read/write performance at all but when comparing command-only performance the existing v1.14 solution is 13% faster than this one.
@evanh said:
... It is just dealing with command response start bit search. Nothing else. ...
It looks like I can go further with this approach. It seems also be able to remove all rx lead-ins and their associated rate settings. Replacing with a simpler WAITX + XINIT located after the coupled DIRH + WYPIN for starting the SD clock.
I had thought I'd covered this ground thoroughly very early on but I must have overlooked something there. Best guess is having found I had no choice when it came to the tx side of things I'd assumed rx wasn't going to be any easier and just duplicated the tx approach.
PS: To be able to transmit at any divider rate, on demand, from sysclock/2 and up, it needs exact sysclock tick adjustability for timing alignment between streamer and smartpin. And there is a bad overlap with the SD clock start in the small dividers without using a spacing lead-in to precisely push out the streamer timing to match the smartpin. Here's an example of the required tx sequence:
setxfrq lnco // sysclock/1 for lead-in timing
dirl p_clk // reset clock-gen smartpin to align with streamer
...
xinit m_calign, #0 // lead-in delay at sysclock/1
setq v_nco // data transfer rate, takes effect on XZERO below (buffered command)
xzero m_cmd, pb // place first 32 bits in tx buffer, aligned to clock via lead-in delay
dirh p_clk // clock timing starts here, first clock pulse occurs in smartpin's second period
wypin #6*8+2, p_clk // SD clock pulses, 6 bytes + 2 pulses to kick off a response
...
xzero m_cmd, pb // place remaining 16 bits in buffer for when first 32 bits completes
And I have the rx routine, after start-bit is found, done basically the same but with a measured delay, for the frequency dependant rx latencies, added to that lead-in.
PPS: Now it looks like, for rx routine, I can actually get away with the following:
dirl p_clk // reset clock-gen smartpin to align with streamer
...
dirh p_clk // clock timing starts here
wypin clocks, p_clk // first clock pulse outputs during second clock period
waitx rxlag
xinit m_resp, #0 // rx response, aligned to SD clock
...
No need to setup the smartpin rate nor even the streamer rate as they are both unchanged from the command issuing.
Comments
Yeah I know all about it, that's the tricky part. But we have 32 registers to play with and there are only up to 9 parameters I think. We can get them into C register variables using high level assignments from the source structure.
Also I've been making this multi-instance so your globals are no longer present. Even though I am only exposing the SD mode driver host interface as a single slot initially, I see no reason to fix it to just one interface.
I don't think there is any true globals in that Flex driver. Things like the pin numbers are contained to the instance by FlexC's C++
struct __using()wrapper. Only a DAT section would be global.Also, in case you missed my previous edit, those register constants could, instead of creating locals, be placed in with the presets that are fast copied into the RES section.
I've updated the Object Exchange with v1.14. The zip file now includes four examples and improved documentation - https://obex.parallax.com/obex/sdsd-cc/
I've got a working driver with a small optimisation for size, making for a saving of 148 bytes. It is just dealing with command response start bit search. Nothing else. It clocks and tests one bit at a time instead of relying on a reconfigured clock - Borrowed from a similar solution I used for CRC response collection in the block write routine. The code is tidier due to no longer dealing with variable lag effects.
It doesn't affect block read/write performance at all but when comparing command-only performance the existing v1.14 solution is 13% faster than this one.
Anyone got objections to locking in this one?
I've also removed the redundant CMD6 SWITCH_FUNC switchover to High-Speed interface. Binary file size reduced by another 128 bytes.
I had left it in early on in the hopes of it helping with enabling other features, which it never did, and kind of forgotten about it.
It looks like I can go further with this approach. It seems also be able to remove all rx lead-ins and their associated rate settings. Replacing with a simpler WAITX + XINIT located after the coupled DIRH + WYPIN for starting the SD clock.
I had thought I'd covered this ground thoroughly very early on but I must have overlooked something there. Best guess is having found I had no choice when it came to the tx side of things I'd assumed rx wasn't going to be any easier and just duplicated the tx approach.
PS: To be able to transmit at any divider rate, on demand, from sysclock/2 and up, it needs exact sysclock tick adjustability for timing alignment between streamer and smartpin. And there is a bad overlap with the SD clock start in the small dividers without using a spacing lead-in to precisely push out the streamer timing to match the smartpin. Here's an example of the required tx sequence:
setxfrq lnco // sysclock/1 for lead-in timing dirl p_clk // reset clock-gen smartpin to align with streamer ... xinit m_calign, #0 // lead-in delay at sysclock/1 setq v_nco // data transfer rate, takes effect on XZERO below (buffered command) xzero m_cmd, pb // place first 32 bits in tx buffer, aligned to clock via lead-in delay dirh p_clk // clock timing starts here, first clock pulse occurs in smartpin's second period wypin #6*8+2, p_clk // SD clock pulses, 6 bytes + 2 pulses to kick off a response ... xzero m_cmd, pb // place remaining 16 bits in buffer for when first 32 bits completesAnd I have the rx routine, after start-bit is found, done basically the same but with a measured delay, for the frequency dependant rx latencies, added to that lead-in.
PPS: Now it looks like, for rx routine, I can actually get away with the following:
dirl p_clk // reset clock-gen smartpin to align with streamer ... dirh p_clk // clock timing starts here wypin clocks, p_clk // first clock pulse outputs during second clock period waitx rxlag xinit m_resp, #0 // rx response, aligned to SD clock ...No need to setup the smartpin rate nor even the streamer rate as they are both unchanged from the command issuing.