Simple_SpiFlash - how to use higher clock rates with P2 Edge?
I need a non volatile memory in my project. There is a flash memory on a P2 Edge. There is an object in OBEX: https://github.com/parallaxinc/propeller/tree/master/libraries/community/p2/All/Simple_SpiFlash
The object and its demo works at its default clock speed - 180 MHz. However I cannot use 180 MHz as I have a HDMI screen attached at 28.5 MHz pixel clock (800x480). The flash driver doesn't work at this speed (and it doesn't work even at 190 MHz)
I tried to slow the SPI clock rate
wxpin #4,#spi_ck 'set timebase to (4 instead of) 2 clocks per transition
and add nops in read/write procedures.
'(...) nop nop shl cmd,#1 wc outc #spi_di nop nop shl cmd,#1 wc outc #spi_di '(...)
I tried 3/1 nop and 4 /2nops and while the modified code still works - it still works only up to 180 MHz, and nothing higher. Running over 180 MHz (I tried 190) corrupts the flash so the P2 cannot start from it anymore and the flash cannot be reprogrammed until power off.
Is there any flash object capable to run at 285 MHz with a P2 Edge module? Or... is there any hint how can I modify this object to run at this frequency?
What I will do next is to try a very slow SPI clock frequency using a loop instead of the unrolled code.
Comments
Yeah, that all gets a little tricky, there is more than one factor at high speed on the Prop2. I'll have a look at the code ... Oh, writes shouldn't be the problem. It's reads that get in trouble ...
Well, one possibility for your troubles is the SPI clock rate in that code is sysclock/4. You can't throw in NOPs without also adjusting the SPI/sysclock clock ratio by changing X value in
wxpin #2,#spi_ck
Hmm, the datasheet says the Winbond W25Q128JV is good for 133 MHz SPI clock. It shouldn't have an issue with just 50 MHz. I'd be suspicious the problem is elsewhere, like read timing ...
Yeah, in
Spi_RdByte()
, thewaitx #4
will need increased as the clock rate is raised. Its value is compensation for the various latencies from the clock output generated by the Prop2 to the responding data input coming back from the EEPROM.There is latencies with a fixed sysclock ratio and also fixed propagation time and they all stack on the read data timing. The bulk are within the Prop2. None affect the write data timing from a software perpective.
So, because most of the latencies exist within the Prop2, using a smartpin in sync serial mode can eliminate much of the frustration. Not because the smartpin knows anything fancy about the timings but because how it is clocked - It uses inputB for its clock source. This means the SPI clock, generated by another smartpin, has to make its way out to a physical pin and then all the way back into the rx smartpin.
In doing so that clock input to the rx smartpin is then internally delayed by the same amounts as the data pin is. Both the internal outgoing and internal incoming latencies are negated. Which leaves only the external EEPROM latencies left to deal with.
I thought JonnyMac's SPI driver in GitHub was working with any clkfreq, as the start object is called with a parameter for the SPI bus speed. Might be worth a try.
The slower the SPI clock (larger sysclock ratio), the more leeway there is for not needing to adjust compensation for different sysclock speeds. I'd guess that'll be Jon's solution. Or he used smartpin for both.
Oh, I think this was my fault. It was one of my early work with the P2 and I was happy that it worked at all. Never tested it with higher clock speeds. The mixed use of smart pin clock and bit-banged data is probably not one of the best ideas when it comes to timing. It's difficult to make it work without an oscilloscope.
I have better routines for SPI using smart pins for both clock and data, now (from the LCD driver project). Today I'm busy as I have to repair my P&P machine but I'll try to have a look at it, tomorrow.
I did a some experimenting with the flash for a beginner's presentation on SPI. I think the code attached here is the best of that. That said, I'm not as clever as most forum members, so this may not be the best approach. It does us smart pins for clock, data in, and data out.
This thing reads the flash at up to clkfreq/6 (and it works at any clock)
But I cannot erase and write.
To be debugged.
It started to work.
I used an object form here: https://www.parallax.com/p2-edge-16-mb-flash-reader-and-explorer-utility/
It doesn't like "full optimization" in Flexprop. It started to work when I set optimization to normal.
Also, the "flash explorer" demo doesn't like Flexprop's terminal, displaying all its output in the first line of the terminal. This can be corrected by adding tx(10) and tx(13) when needed.
After looking at and comparing listings of the code with full optimization and with standard optimization the only difference I found in erase code (and it was only non-working fragment) is:
- while normal optimization, the wait function is called
- while full optimization the wait function is inlined
waitus(1) before calling wait() solved the problem.
Maybe there's something to be learned from the way the P2 handles loading the flash for a new program. I've attached Chip's flash loader.
For my demo programs there is a terminal mode called T_ANSI enables use with the FlexProp terminal.
The problem was: it seems the optimized flexprop code calls (or rather inlines) wait() too early after erasing. There is only 6 clock cycles or something like this between these commands and it seems the flash chip doesn't get enough time to process the erase command. Adding a microsecond delay between erase and wait solved the problem and all procedures (read, erase and write) seem to work now using flexprop, 285 MHz and full optimization.
Tomorrow I will try to add this to the rest of the project and check if it still works.
In an ideal world, the SFDP parameters would be read (assuming the flash supports SFDP, hopefully most do nowadays) to compute the required delays for different operations. Otherwise a hard coded delay that works for one device might not work for another...
I finally got the Simple_SpiFlash code debugged. It now works from 25 to 350MHz.
Unfortunatelly, the smart pin synchronous serial transmit mode is not easy to handle at all. The flipflops in the signal path from the clock source to the shift register and from the shift register to the data output causes skew between SCLK and MOSI. There are some tricks to compensate this delay by using de-glitch filters. But it all becomes very complex because some tricks that are good at low sysclock frequencies are not working at higher frequencies. A lot of IF and CASE statements would be required to handle all ranges.
So I gave up and limited the SCLK to sysclock/8. Faster SPI clocks make no sense here because the rest of the Spin2 code is so slow the performance increase would hardly be noticed. And it's called "Simple_SpiFlash" so I don't expect anyone trying to use it to stream videos. It's meant for simple things like storing calibration data or user settings.
Ya, I'd forgotten about the tx smartpin - https://forums.parallax.com/discussion/comment/1529968/#Comment_1529968 . The problem that plagues rx bit-bashing is the same problem that plagues tx smartpin.