Do you think that is easy for Parallax to do the same?. They finally gave up on using custom layout and are using an affiliate company of OnSemi. Every simple thing that needs to be added to the design (I2C, SPI, DDR, ...) means money. HUGE MONEY.
Yes and no - the logic is designed in Verilog, and post #59 in this thread shows a SD block driver in hardware.
I'll admit full SD in silicon may be stretching Parallax, (mainly due to SD variances mentioned in this thread) but with Verilog & FPGAs it is relatively easy to check
a) the resource impact
b) test with real SD cases
Code above in #59 is ~ 550 lines and uses a 5 bit state engine, plus buffers.
Even smaller and easier, is proper native QuadSPI memory support.
Intelligent address handling with wait, would add a 24b counter, 24b (maybe 32b option?) compare, and a few bits of state.
I think that is what NUC505 does, to allow simpler memory mapping of SPI space.
(yes, it is slower than on-die memory)
Code above in #59 is ~ 550 lines and uses a 5 bit state engine, plus buffers.
It can be done simpler. I didn't do it because it worked for me and then I didn't need this driver anymore. It can be simplified by doing a SPI driver first, then send commands to it. Instead of the state machine there can be a simple cpu-like thing. This may reduce program lines and LEs count.
JMG, I think this has been discussed before. Please take some time to do a google search about SD SPI mode, SD 4-bit interface, SD patents licenses and royalties, and how much does a Verilog RTL/ofuscated SD Host would cost.
You can do SD SPI on only 550 lines? What is the speed? Elm-chan website has some data performance data comparing SD with SPI, SD with a Host controller, and CF/ATAPI. http://elm-chan.org/fsw/ff/img/rwtest.png
You can do SD SPI on only 550 lines? What is the speed?
Note Post #59 is not my code, it is from pik33, and see also his reply in #63.
As I have already said above, SD support is not as 'cornerstone important' as proper native QuadSPI memory support, but do not put SD in the 'too hard' basket too early.
There will be plenty of testing done with P2.FPGA.and
* 24b_qSPI eg W25Q16DV sub 50c
* 32b_qSPI.DTR eg N25Q256 ~ $2
* SD in various mixes of HW (verilog) and SW.
Is is possible those tests reveal some additions to QuadSPI peripheral that can improve SD performance.
Personally, I like the code usage simplicity of a QuadSPI cell that includes qSPI memory mapping option. (for transparent READ from Flash)
(This adds a Wait and Counter/compare/State to the cell)
Only? I think it is too much lines there. But the priority was: it had to work before the deadline. So I didn't care about code length.
This is a block driver. It can init SD, read sector and write sector and it has a watchdog, so if something go wrong, the CPU can reset all this thing and try again.
It cannot work with standard SD, to get things simpler; all my SD card are SDHC so I have no possibility to debug this.
The SD initialization is done with slow clock, then it switch to full speed, original it was 25 MHz, but in the presentation machine it worked with 800x600 VGA pixel clock (40 MHz), so the buffer reading/writing speed was somewhat slower (overhead) than 40 Mbps.
SD 4-bit mode can be done too, but there is this awful 4-bit CRC thing which is difficult to implement.
After the driver fill the buffer, the CPU has to read it which wil of course slow the process.
Maybe I will add this driver to my P1V retromachine to get SD faster than I have it now (P1 software driver can read up to about 7 Mbps @ 114 MHz I have there).
Comments
Yes and no - the logic is designed in Verilog, and post #59 in this thread shows a SD block driver in hardware.
I'll admit full SD in silicon may be stretching Parallax, (mainly due to SD variances mentioned in this thread) but with Verilog & FPGAs it is relatively easy to check
a) the resource impact
b) test with real SD cases
Code above in #59 is ~ 550 lines and uses a 5 bit state engine, plus buffers.
Even smaller and easier, is proper native QuadSPI memory support.
Intelligent address handling with wait, would add a 24b counter, 24b (maybe 32b option?) compare, and a few bits of state.
I think that is what NUC505 does, to allow simpler memory mapping of SPI space.
(yes, it is slower than on-die memory)
It can be done simpler. I didn't do it because it worked for me and then I didn't need this driver anymore. It can be simplified by doing a SPI driver first, then send commands to it. Instead of the state machine there can be a simple cpu-like thing. This may reduce program lines and LEs count.
You can do SD SPI on only 550 lines? What is the speed? Elm-chan website has some data performance data comparing SD with SPI, SD with a Host controller, and CF/ATAPI. http://elm-chan.org/fsw/ff/img/rwtest.png
As I have already said above, SD support is not as 'cornerstone important' as proper native QuadSPI memory support, but do not put SD in the 'too hard' basket too early.
There will be plenty of testing done with P2.FPGA.and
* 24b_qSPI eg W25Q16DV sub 50c
* 32b_qSPI.DTR eg N25Q256 ~ $2
* SD in various mixes of HW (verilog) and SW.
Is is possible those tests reveal some additions to QuadSPI peripheral that can improve SD performance.
Personally, I like the code usage simplicity of a QuadSPI cell that includes qSPI memory mapping option. (for transparent READ from Flash)
(This adds a Wait and Counter/compare/State to the cell)
Only? I think it is too much lines there. But the priority was: it had to work before the deadline. So I didn't care about code length.
This is a block driver. It can init SD, read sector and write sector and it has a watchdog, so if something go wrong, the CPU can reset all this thing and try again.
It cannot work with standard SD, to get things simpler; all my SD card are SDHC so I have no possibility to debug this.
The SD initialization is done with slow clock, then it switch to full speed, original it was 25 MHz, but in the presentation machine it worked with 800x600 VGA pixel clock (40 MHz), so the buffer reading/writing speed was somewhat slower (overhead) than 40 Mbps.
SD 4-bit mode can be done too, but there is this awful 4-bit CRC thing which is difficult to implement.
After the driver fill the buffer, the CPU has to read it which wil of course slow the process.
Maybe I will add this driver to my P1V retromachine to get SD faster than I have it now (P1 software driver can read up to about 7 Mbps @ 114 MHz I have there).