I was even using it in a project based on the P2 BeMicro CV-A9 FPGA early in 2016, in the faint hope that P2 silicon would soon be available
The TAQOZ ROM code does this same high-speed multiblock read that I do in TAQOZ RELOADED where I read in excess of 3MB/s at 300MHz.
Here is where I ask TAQOZ to "print DISK" information:
How is p2gcc coming along? I would love program the P2 in C++ with all the bells and whistles (but I would only use a subset that is known to perform well on MCU's of course)
/Johannes
@ntosme2 was working on a GCC port for P2 last year and had some success but it has gone quiet and I think he needed more help/work to progress further... this thread is the one discussing it.
I was even using it in a project based on the P2 BeMicro CV-A9 FPGA early in 2016, in the faint hope that P2 silicon would soon be available
The TAQOZ ROM code does this same high-speed multiblock read that I do in TAQOZ RELOADED where I read in excess of 3MB/s at 300MHz.
Here is where I ask TAQOZ to "print DISK" information:
One thing I really liked with @lonesock's SD SPI driver is the read ahead and write behind feature.
Sadly I lost all my files for the RAISD project (4 SDs on one quickstart) thru a messed up Windows Update, I had managed to use @lonesocks SD SPI block driver inside @Kye's Fat_Engine, and both (Fsrw and Fat_Engine) could do transfers around 1,200 kbytes/second on a P1 with 80 Mhz.
Mike
Hey Mike,
Its been a while & saw an email notification on this thread. I still happened to have your old codes with me. Not sure if it helps but let me zip them up & forward to you via email.
@lonesock You're still here! That's great. Feel free to take over my part in the FSRW for P2 project.
There are a lot of improvements that can be made. Things like "release" and timeout and read-ahead, write-behind for example...
It works well enough for what I need, but there's a lot that could be done to make it better.
Well, I still exist, but unfortunately I haven't had much time for prop development. [8^(
I do pop into the forum every once in a while to try to keep tabs on the P2's progress, but I am certainly not well versed enough in the P2 internals to do the kind of code dev that would be considered elegant. I will try to get back in, but I don't know when that will be, sorry!
Got the demo fixed up enough to work in both Pnut and FlexSpin. It had seriously failed syntax in modern Pnut. Lots of fixes all round. Also streamlined the bashed code to get both reliable operation and more speed.
It consistently times-out on first of the 20 mount attempts. Haven't tried to determine why. Otherwise seems reliable at reading. The demo doesn't do any writing so that's not tested at all.
Oh, bother, while fast_bashed works with both compilers, the initial cleaned up bashed compiles but doesn't fully work with Pnut. Hmm, quite a lot of little changes between the two as well as the major streamlining ...
EDIT: Solved! One last syntax fix, which I'd also dealt with in fast_bashed. All the REP instructions had to be changed from REP #label to REP @label. Interestingly, it doesn't create any errors or warnings in Pnut, and of course # works without issue in FlexSpin.
getct() retrieves the absolute counter value but waitct() is specifying an interval to pause for. So waitct() method compiles directly to the single instruction WAITX. Or at least it does in FlexSpin on Prop2.
Lol, I see that's still got the GETCT assembly in there. The original FSRW code was ported long before Spin2 was fully developed. Methods like getct() weren't part of Spin2 early on.
A quick patch to his "sdspi_asm_mb.spin2" for my debug method and it worked with my edition of FSRW and test3 demo code. And I immediately noted it doesn't have that initial timeout I mentioned above. So maybe a good time to investigate ...
EDIT: Grr, another difference between Pnut and FlexSpin: Can't directly reference a child object's variables in Pnut. Only methods and constants apparently.
EDIT2: Testing with a second SD card I've found I've trimmed off too many of the trailing SPI clock pulses. I'm pretty certain I'd made it the required 8 pulses but one extra is needed to make a Sandisk Extreme happy. See attachments above.
...waitct() is specifying an interval to pause for...
In Chip's Spin2, waitct() is waiting for a target value to be matched or passed.
This is from the Spin2 interpreter:
' a: POLLCT(tick) (15 longs)
' b: WAITCT(tick)
'
pwct getct w 'a m a: POLLCT(tick)
cmpm w,x wc 'a m m: WAITCT(tick)
if_c jmp #pwct '| m
_ret_ popa x '| m (continued in op_rel)
Still, I thought I should check, so I wrote this little test snippet:
repeat
t := getct()
waitct(100)
t := getct() - t - 40
term.dec(x++)
term.tx(term.TAB)
term.dec(t)
term.tx(13)
waitms(1000)
It looks like those are appropriate deltas reported. I haven't actually done much reading of the spin docs myself. The spin code is pre-written here so I just fixed what was getting errors. My focus is the assembly generally.
Oh, I see where you were looking now Jon. You'd followed me in looking at the mount process. I've now replaced all the waitct() lines with waitus() or waitms(). The CT timing wasn't really suited to an arbitrary clock frequency anyway.
Well, I seem to have found the key difference between the two low level mounting methods. The parameter for ACMD41 was different. I've copied the parameter from sdspi_asm_mb.spin2 into sdspi_fast_bashed.spin2 which has stopped the timeouts.
Ha, doing a bit of reading, it looks like what was happening was the dodgy ACMD41 was requesting the card change signalling voltage. but such an operation needs a following CMD11 in sequence to execute the voltage switch. Since the CMD11 never gets sent the card presumably fails to respond to what comes next. Hence the timeout.
A small tweak to FSRW code and I've got pread() going almost as fast as FastBlocksRead(). pread() was doing a seemingly unneeded block copy for each and every read block.
Both methods are now heavily limited by a block request delay, imposed by the SD card, that I suspect always occurs when initiating a read. The seemingly obvious way to improve it is to change from using the single-block command, CMD17, to using the multi-block command, CMD18. I think that'll be next on my list of to-do's.
Nice challenge. And get to learn about SD cards at the same time.
Think I've blundered earlier with the 8 trailing clock pulses too. I had thought they were literally just a tail, there was something in the PDF about that. But in reality it looks like they're the 16-bit CRC clocks. Even when CRC is disabled, the 16 bit spaces for it still exist.
I'm struggling to see any real difference between "Default Speed" and "High Speed". It seems to be identical except for the obvious higher rate. I've found the setting process, issue CMD6 with value $80000001 for the parameter. There is optional extra bits for changing the drive strength and current limit but I doubt that's needed nor recommended.
The timings are tighter but that's a natural side effect of upping the clock rate anyway. Nothing else changes as far as I can see.
PS: My above overclocking is basically in High Speed mode already. Just not set as such in the mode register.
High Speed mode is just that - the card advertises that you may clock it faster and then you just do that. Or just don't bother checking, pretty much all cards can handle high speed (and often can be overclocked beyond that to 100 MHz - UHS-capable cards generally can because why wouldn't they? Though at that point I'd make sure you're actually computing CRCs)
Comments
Remember I had different versions of Tachyon running on the P2 FPGA om my DE2-115 board in 2015 with multiple SD cards and FAT32.
I was even using it in a project based on the P2 BeMicro CV-A9 FPGA early in 2016, in the faint hope that P2 silicon would soon be available
The TAQOZ ROM code does this same high-speed multiblock read that I do in TAQOZ RELOADED where I read in excess of 3MB/s at 300MHz.
Here is where I ask TAQOZ to "print DISK" information:
@ntosme2 was working on a GCC port for P2 last year and had some success but it has gone quiet and I think he needed more help/work to progress further... this thread is the one discussing it.
https://forums.parallax.com/discussion/comment/1478650/#Comment_1478650
Hey Mike,
Its been a while & saw an email notification on this thread. I still happened to have your old codes with me. Not sure if it helps but let me zip them up & forward to you via email.
Jonathan
There are a lot of improvements that can be made. Things like "release" and timeout and read-ahead, write-behind for example...
It works well enough for what I need, but there's a lot that could be done to make it better.
I do pop into the forum every once in a while to try to keep tabs on the P2's progress, but I am certainly not well versed enough in the P2 internals to do the kind of code dev that would be considered elegant. I will try to get back in, but I don't know when that will be, sorry!
thanks,
Jonathan
Got the demo fixed up enough to work in both Pnut and FlexSpin. It had seriously failed syntax in modern Pnut. Lots of fixes all round. Also streamlined the bashed code to get both reliable operation and more speed.
It consistently times-out on first of the 20 mount attempts. Haven't tried to determine why. Otherwise seems reliable at reading. The demo doesn't do any writing so that's not tested at all.
PS: I only just started looking at it - https://forums.parallax.com/discussion/comment/1530713/#Comment_1530713
EDIT: Oops, file missing from the zip ... fixed.
EDIT2: bashed bug fixed
EDIT3: "fast_bashed" with +1 to trailing SPI clock pulses
EDIT4: "fast_bashed" ACMD41 fixed
Oh, bother, while fast_bashed works with both compilers, the initial cleaned up bashed compiles but doesn't fully work with Pnut. Hmm, quite a lot of little changes between the two as well as the major streamlining ...
EDIT: Solved! One last syntax fix, which I'd also dealt with in fast_bashed. All the REP instructions had to be changed from REP #label to REP @label. Interestingly, it doesn't create any errors or warnings in Pnut, and of course # works without issue in FlexSpin.
Question about your use of waitct() here (from the sdspi_fast_bashed) object.
In this case it would wait for that specific value (100) in the system counter. Am I missing something?
Jon,
waitct() must be an object, not the the built-in waitcnt().. right?
Doh..spin 2
Yeah, it's called waitct() in Spin2. The supporting method is called getct() in Spin2.
getct() retrieves the absolute counter value but waitct() is specifying an interval to pause for. So waitct() method compiles directly to the single instruction WAITX. Or at least it does in FlexSpin on Prop2.
Lol, I see that's still got the GETCT assembly in there. The original FSRW code was ported long before Spin2 was fully developed. Methods like getct() weren't part of Spin2 early on.
I had a look around the forum last night and found Rayman's effort that includes a helper cog for the bit-bashing - https://forums.parallax.com/discussion/171619/fsrw-usd-card-read-write-for-p2-with-2-4-mb-s-read-speed/p1
A quick patch to his "sdspi_asm_mb.spin2" for my debug method and it worked with my edition of FSRW and test3 demo code. And I immediately noted it doesn't have that initial timeout I mentioned above. So maybe a good time to investigate ...
EDIT: Grr, another difference between Pnut and FlexSpin: Can't directly reference a child object's variables in Pnut. Only methods and constants apparently.
EDIT2: Testing with a second SD card I've found I've trimmed off too many of the trailing SPI clock pulses. I'm pretty certain I'd made it the required 8 pulses but one extra is needed to make a Sandisk Extreme happy. See attachments above.
In Chip's Spin2, waitct() is waiting for a target value to be matched or passed.
This is from the Spin2 interpreter:
Still, I thought I should check, so I wrote this little test snippet:
And got this result:
It looks like those are appropriate deltas reported. I haven't actually done much reading of the spin docs myself. The spin code is pre-written here so I just fixed what was getting errors. My focus is the assembly generally.
Regarding that interpreter snippet,
x
isn't defined there. The docs saysx
should be 100 but the behaviour suggests it is a snap of CT+100.Oh, I see where you were looking now Jon. You'd followed me in looking at the mount process. I've now replaced all the waitct() lines with waitus() or waitms(). The CT timing wasn't really suited to an arbitrary clock frequency anyway.
Well, I seem to have found the key difference between the two low level mounting methods. The parameter for ACMD41 was different. I've copied the parameter from sdspi_asm_mb.spin2 into sdspi_fast_bashed.spin2 which has stopped the timeouts.
Ha, doing a bit of reading, it looks like what was happening was the dodgy ACMD41 was requesting the card change signalling voltage. but such an operation needs a following CMD11 in sequence to execute the voltage switch. Since the CMD11 never gets sent the card presumably fails to respond to what comes next. Hence the timeout.
A small tweak to FSRW code and I've got pread() going almost as fast as FastBlocksRead(). pread() was doing a seemingly unneeded block copy for each and every read block.
Both methods are now heavily limited by a block request delay, imposed by the SD card, that I suspect always occurs when initiating a read. The seemingly obvious way to improve it is to change from using the single-block command, CMD17, to using the multi-block command, CMD18. I think that'll be next on my list of to-do's.
I appreciate you doing this, Evan. I always use FSRW in my P1 projects -- having it for the P2 will be helpful.
Nice challenge. And get to learn about SD cards at the same time.
Think I've blundered earlier with the 8 trailing clock pulses too. I had thought they were literally just a tail, there was something in the PDF about that. But in reality it looks like they're the 16-bit CRC clocks. Even when CRC is disabled, the 16 bit spaces for it still exist.
Oooo, nearly 5 MB/s overclocked SPI. It work!
Hehe, and that's not even the Sandisk.
I'm struggling to see any real difference between "Default Speed" and "High Speed". It seems to be identical except for the obvious higher rate. I've found the setting process, issue CMD6 with value $80000001 for the parameter. There is optional extra bits for changing the drive strength and current limit but I doubt that's needed nor recommended.
The timings are tighter but that's a natural side effect of upping the clock rate anyway. Nothing else changes as far as I can see.
PS: My above overclocking is basically in High Speed mode already. Just not set as such in the mode register.
High Speed mode is just that - the card advertises that you may clock it faster and then you just do that. Or just don't bother checking, pretty much all cards can handle high speed (and often can be overclocked beyond that to 100 MHz - UHS-capable cards generally can because why wouldn't they? Though at that point I'd make sure you're actually computing CRCs)
hmmm ...