well, first attempt is working after some toe stubbing. Pretty painless in the end. Just had to incorporate the new set_crset() into the rxlag calibration routine. It has resulted in a now defunct rxlag parameter in rxcmd() routine that'll need cut from the calling routines. Only the calibrator routine actually dynamically sets rxlag.
struct cmdresp_t {
uint32_t clk, // pr0
cmd, // pr1
mleadin, // pr2
mresp, // pr3
mnco; // pr4
};
static struct cmdresp_t crset;
static void set_crset( uint32_t clk_div, uint32_t rxlag )
{
crset.clk = pins.clk;
crset.cmd = pins.cmd;
crset.mleadin = X_IMM_32X1_1DAC1 | 8 + rxlag; // store start(S)-bit ahead of clock pulse
crset.mresp = X_1P_1DAC1_WFBYTE | pins.cmd<<17 | X_PINS_ON | X_ALT_ON; // + bit count
crset.mnco = 0x8000_0000UL / clk_div + (0x8000_0000UL % clk_div > 0 ? 1 : 0); // data transfer rate, round up upon a non-zero remainder
}
static uint32_t rxcmd( uint32_t len, uint32_t rxlag )
{
// Advantages - can operate up to sysclock/2 clock divider
// - suitable for rxlag calibration
// - precise and quick completion
// - no pin-input routing
// Disadvantage - slowest search for start bit, response transition delay
// - Fcache copy-to-cogRAM delay at beginning
#ifdef CLK_STEP
uint32_t i = len << 4 + 2; // clock steps for response + 2 to kick off data/busy phase
#else
uint32_t i = len << 3 + 1; // clock pulses for response + 1 to kick off data/busy phase
#endif
uint32_t latency = 64; // max of 64 clocks for response to start, Ncr (SD spec 4.12.4)
__asm {
loc pa, #crset // fast copy from structure
setq #4
rdlong pr0, pa // fast copy to registers pr0..pr4
}
pr3 |= len<<3; // add bit count to DMA mode word, crset.mresp
__asm volatile { // "volatile" enforces use of FCACHE
waitse1 // wait for end of command tx clocks
//drvh #56 // diag
// locate response start-bit
outh pr1 // when driven, momentarily force CMD pin high before start-bit search
loc pa, #resp // response buffer address in hubRAM, static location
wrfast lnco, pa // engage the FIFO
setxfrq lnco // set sysclock/1 for lead-in timing
fltl pr1 // pin drive release before SD card takes over
.startbit // Careful, this loop's instruction ordering is very touchy!
#ifdef CLK_STEP
wypin #2, pr0 // one clock pulse
#else
wypin #1, pr0 // one clock pulse
#endif
testp pr1 wc // sample before the clock pin transitions
waitse1 // wait for each clock done
if_c djnz latency, #.startbit // max of 64 clocks for response to start, Ncr (SD spec 4.12.4)
dirl pr0 // S-bit has passed, T-bit is present
// start-bit found, now read the response (NOTE: start(S)-bit is part of first byte)
if_nc xinit pr2, #0 // lead-in delay from here at sysclock/1
if_nc setq pr4 // streamer transfer rate (takes effect with buffered command below)
if_nc xzero pr3, #0 // rx buffered-op aligned with clock via lead-in
dirh pr0 // clock timing starts here
if_nc wypin i, pr0 // first clock pulse outputs during second clock period
// twiddle thumbs
if_nc waitxfi // wait for rx data completion before resuming hubexec
ret
lnco long 0x8000_0000UL
}
if( latency )
latency = 64 - latency;
#ifdef _DIAG
__asm {
wrword latency, ptrb++
}
#endif
return latency;
}
The only nag about using PRx registers is these are really allotted for user program level use. Whereas I'm now using them for a driver in a way that an end developer will be unaware of. Someone will eventually conflict with wanting to use those PRx registers for another purpose.
I think there's a way to allocate "global" variables into cog ram... is it just the register keyword? Downside is that those registers will stay allocated at all times in all cogs.
@ermsith Maybe register in a local var definition should force all such variables to exist in exactly the declared order in cogRAM?
Used enum to created a matching ordered name set - Instead of the jumble of PRx numbers which had repeatedly caused bugs.
Which in turn made it straight forward to assign any range of register numbers. So, in effect, PR8..PR15 now used. Which puts them outside of the documented Spin2 PR0..PR7 range. So I guess a good enough solution there now too.
static struct cmd_parms_t {
uint32_t p_clk, p_cmd, m_leadin, m_nco, m_se2, busydelay, m_ca;
} cmdset;
enum {
r_clk = 0x1e8, r_cmd, r_leadin, r_nco, r_se2, r_bdelay, r_ca
};
static void sdcmd( uint32_t cmd, uint32_t arg )
{
uint32_t lat;
__asm const {
loc pa, #cmdset // fast copy from structure
setq #sizeof(cmdset)/4 - 1
rdlong r_clk, pa // fast copy to registers PR0..PR6
wypin r_nco, r_clk // lots of clocks
getct lat
setse2 #0 // cancel triggering before reuse
setse2 r_se2 // trigger on low level - DAT0 busy
...
@evanh Sorry for the delay in responding, I'm away from my computer right now. Using an array or struct makes a lot of sense, since it will keep the compiler from worrying so much about what parts are used or not, and will keep things together. You can declare some variables inline in the assembly. I'm not sure using values above PR7 will work in bytecode, the interpreter may need some of that area.
I'll carry on on the current path until you get more time. I've got three routines changed over so far. One of them already needed 8 registers. I fear more will be needed for the so far untouched data block routines.
@evanh said:
I'll carry on on the current path until you get more time. I've got three routines changed over so far. One of them already needed 8 registers. I fear more will be needed for the so far untouched data block routines.
For variables needed only in the asm, you can always declare them as "long" in the inline assembly.
Yep, That's what I'm converting from. The existing test code relies heavily on compile time parameters producing lots of constants.
There is two big contributors to the need from the production code. One is the large number of runtime settable parameters, including pin assignment. The other is the large combination of mode words that those parameters are applied to. Settable clock divider being the most spread around parameter.
At this stage, I'm writing SD mode as a separate driver. I've copied the whole fatfs subdirectory and modified the pin setting parameters in particular. Got that part to work this attempt. I'd missed some coded paths that were still referencing original fatfs on my first try some months back.
There'll be a comparison in the end I think. The low level routines managing the streamer ops for 4-bit SD mode are a lot bulkier than the old 1-bit SPI based code. It's not just the streamer management, the whole dealing with start bits and wait for timeouts mechanism adds a layer. Not to mention the CRCs of course.
I don't know how much of a difference it'll be but it might be worth keeping and refining SPI mode to be as sleek as possible.
I'm teetering on adding CRC checks to the bulk read data as well. I've had a situation here where one card has a problem seating its DAT0 pin and that results in lots of timeouts but if it was, say, DAT1 then the driver would be clueless until it got CRC errors on writing data. Who knows, maybe that is happening and I'm yet to catch it.
@evanh At some point I'd like to re-write the fatfs code to sit on top of a BlockDevice (or something similar), the way the littlefs code does. That way we can use fatfs with any device for which we have a driver.
Flexprop is not seeing my P1 wx boards, when I use the find ports option I get a yellow line with text I cannot read as it is too small.
That disappears and I get a download error saying that it cannot find a P1 on the port. The gui is set for P1 as per instructions from @ersmith.
Was working two days ago without issues. Found the serial terminal that automatically opens up. All was good two days ago.
What is the yellow line with text? It stays on for approx 20 sec and disappears.
Restarted the program several times. Rebooted computer and it worked. Is there something I am missing?
Why does flexprop fade out when the curser is hovered over it when the serial terminal is in operation? Message in the upper left corner says "flexprop not responding". Have to kill the terminal before I can go back to the flexprop screen so as for an example make a modificaiton to the code.
Would like to open the "xband motion detector" obj in flexprop. How is that done? See below image in the OBJ section.
Also, I am attempting to put Flexprop on the task bar and cannot get it to do that. Is that because it is a stand alone and not installed on the drive? Windows 11 pro.
Thanks.
Martin
@pilot0315 : The only yellow text that comes up in Flexprop are some tooltips describing the full path of files when you hover over the tabs, so it's probably not related to your communication problems.
You said everything was working earlier, so the obvious thing to do is to try to figure out what changed. Have you changed cables or boards? Is your wx board directly connected via serial, or is it using wifi? If wifi, has anything changed in the network? Maybe the board has been given a new IP address by your router.
To open a file in Flexprop go to the "File" menu and select "Open File..." then use the dialog box to find the file you want to look at.
I just found out that FlexC has no saturation when converting a float number to int32_t but instead returns 0 if the number is too big for the 32 bit signed range. I don't know if it is defined somewhere what should happen in this case. I just was used to the behaviour of gnu C where very large numbers give $7FFFFFFF and large negative numbers give $80000000 as result.
It's explicitly defined as undefined by the ISO standard:
6.3.1.4 Real floating and integer
When a finite value of real floating type is converted to an integer type other than _Bool, the
fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part
cannot be represented by the integer type, the behavior is undefined.
@ManAtWork : I think integer overflow is technically "undefined behavior" but I agree saturating the result is more useful. I think the current 0 output is just what the Spin library I was using for floats did. I'll change it in the next release.
I got a problem with suppressing warnings about register assignment in inline pasm2. If I add a -0, eg: 118-0 then the first warning at the .c compiling level goes away, as expected, but it still leaves two remaining warnings at the subsequent .p2asm assembling level. Eg: sdmode_rdwrblks.p2asm:625: warning: First operand to xinit is a constant used without #; is this correct? If so, you may suppress this warning by putting -0 after the operand
The -0 is not carried over to that file.
PS: If it helps, I'm actually using enums to give labels to the cogRAM addresses. It's quite the effort to add a couple of hundred -0 covering each occurrence, so I wouldn't mind if this approach was eliminated altogether.
EDIT: I guess there is a switch to simply disable the warning. I think I'd be okay doing that. I don't seem to be making the mistake of leaving off #s.
EDIT2: Ohhh, just found the extern register int *interpreter_pc; example. My MD reader is a little broken and wasn't displaying the description ... EDIT3: Hmm, no, this isn't ideal either as each set of parameters gobbles additional registers.
I'm really wanting to tack a settable parameter set to the Fcache'd code, so it's loaded along with the code.
Currently I'm overwriting PR8..15 with the relevant parameters just before each loading of Fcache. It's fine for testing purposes but I wasn't planning on keeping those PRx locations in the finished code.
I've found a way to get extern register working. It still hogs extra registers though so still not ideal.
Replace all the enums with one concise set of #defines allows using the existing labels and at the same time eliminates all those warnings I was getting:
@evanh : Maybe put your parameter block inside the inline assembly, and write it back at the end manually? I'm imagining something like:
__asm__ volatile {
jmp #.over
.data
a long 1
b long $bbcc
c long 0
.over
' do stuff, making sure to preserve ptrb
....
' now write back the (modified) parameter block
' plus the jmp, but that's easier than fixing it up
setq #3 ' 4 words total
wrlong $0-0, ptrb
Yep, that will work. I'd thought about doing exactly that a long time back, funnily, but it wasn't on my radar recently. I think I'd written it off as too wasteful to be copying a bunch of zeros only to be immediately overwritten.
EDIT: Err, this idea has a trap. I'm regularly doing prep work in an __asm const {} just ahead of the __asm volatile {}. And the prep uses the pin number parameters in particular. I kind of have to copy the parameters before activating Fcache.
I guess I'm very much treating Fcache as an overlay mechanism.
Oh, and the SD command tx code is not using Fcache at all. It's one long __asm const {} with everything unrolled so it can use streamer immediate mode and still keep well ahead of sysclock/2 bit rate. And not using the FIFO for the streamer, the subsequent C code then has time to load up the response handler while the command bits are still being paced out by the streamer.
It still requires 7 config parameters plus a timeout over and above the command and its argument.
You could just load code into the FCACHE area manually from an asm const block (will probably be faster than the automatic FCACHE loader...). But I'm not sure if you can switch the inline assembler into cog mode. You could load from global/DAT assembly, but that doesn't know about local variables.
Hmm, yeah, nah, too much trouble around locals for sure. I'll continue with what I've got for the moment. Burning a few permanent "extern register"s shouldn't hurt.
Comments
Oh, C really does not have an equivalent to Spin2's REG keyword (AST_SPRREF).
well, first attempt is working after some toe stubbing. Pretty painless in the end. Just had to incorporate the new
set_crset()
into the rxlag calibration routine. It has resulted in a now defunctrxlag
parameter inrxcmd()
routine that'll need cut from the calling routines. Only the calibrator routine actually dynamically sets rxlag.The only nag about using PRx registers is these are really allotted for user program level use. Whereas I'm now using them for a driver in a way that an end developer will be unaware of. Someone will eventually conflict with wanting to use those PRx registers for another purpose.
I think there's a way to allocate "global" variables into cog ram... is it just the
register
keyword? Downside is that those registers will stay allocated at all times in all cogs.@ermsith Maybe
register
in a local var definition should force all such variables to exist in exactly the declared order in cogRAM?Okay, a couple of improvements with one stone:
@evanh Sorry for the delay in responding, I'm away from my computer right now. Using an array or struct makes a lot of sense, since it will keep the compiler from worrying so much about what parts are used or not, and will keep things together. You can declare some variables inline in the assembly. I'm not sure using values above PR7 will work in bytecode, the interpreter may need some of that area.
I'll carry on on the current path until you get more time. I've got three routines changed over so far. One of them already needed 8 registers. I fear more will be needed for the so far untouched data block routines.
For variables needed only in the asm, you can always declare them as "long" in the inline assembly.
Yep, That's what I'm converting from. The existing test code relies heavily on compile time parameters producing lots of constants.
There is two big contributors to the need from the production code. One is the large number of runtime settable parameters, including pin assignment. The other is the large combination of mode words that those parameters are applied to. Settable clock divider being the most spread around parameter.
At this stage, I'm writing SD mode as a separate driver. I've copied the whole
fatfs
subdirectory and modified the pin setting parameters in particular. Got that part to work this attempt. I'd missed some coded paths that were still referencing original fatfs on my first try some months back.There'll be a comparison in the end I think. The low level routines managing the streamer ops for 4-bit SD mode are a lot bulkier than the old 1-bit SPI based code. It's not just the streamer management, the whole dealing with start bits and wait for timeouts mechanism adds a layer. Not to mention the CRCs of course.
I don't know how much of a difference it'll be but it might be worth keeping and refining SPI mode to be as sleek as possible.
I'm teetering on adding CRC checks to the bulk read data as well. I've had a situation here where one card has a problem seating its DAT0 pin and that results in lots of timeouts but if it was, say, DAT1 then the driver would be clueless until it got CRC errors on writing data. Who knows, maybe that is happening and I'm yet to catch it.
Duplicating fatfs involved the following steps:
Add 1 line in include/sys/vfs.h
struct vfs *_vfs_open_sdsdcard(int pclk, int pcmd, int pdat0, int ppow, int pled) _IMPL("filesys/sdfatfs/fatfs_vfs.c");
Edit 5 lines in include/filesys/sdfatfs/fatfs_vfs.c
_vfs_open_sdsdcard(int pclk, int pcmd, int pdat0, int ppow, int pled)
struct __using("filesys/sdfatfs/fatfs.cc") *FFS;
__builtin_printf("open sdcard: using pins: %d %d %d %d %d\n", pclk, pcmd, pdat0, ppow, pled);
pmask = (1ULL << pclk) | (1ULL << pcmd) | (15ULL << pdat0) | (1ULL << ppow) | (1ULL << pled);
r = FFS->disk_setpins(drv, pclk, pcmd, pdat0, ppow, pled);
Delete the second function from include/filesys/sdfatfs/fatfs_vfs.c
#ifndef _SDISKIO_DEFINED
#define _SDISKIO_DEFINED
DRESULT disk_setpins (BYTE pdrv, int pclk, int pcmd, int pdat0, int ppow, int pled) _IMPL("sdmm.cc");
Begin the redesign of include/filesys/sdfatfs/sdmm.cc ...
Never mind
@evanh At some point I'd like to re-write the fatfs code to sit on top of a BlockDevice (or something similar), the way the littlefs code does. That way we can use fatfs with any device for which we have a driver.
Made an independent version of fatfs long ago…. Don’t remember exactly why though…. Maybe it was for the emmc chip…
Maybe this was from when @ersmith was still adding FatFS to FlexProp. Was a long time ago anyway, maybe 2020.
Looks like derived from AVR code...
Flexprop is not seeing my P1 wx boards, when I use the find ports option I get a yellow line with text I cannot read as it is too small.
That disappears and I get a download error saying that it cannot find a P1 on the port. The gui is set for P1 as per instructions from @ersmith.
Was working two days ago without issues. Found the serial terminal that automatically opens up. All was good two days ago.
What is the yellow line with text? It stays on for approx 20 sec and disappears.
Restarted the program several times. Rebooted computer and it worked. Is there something I am missing?
Why does flexprop fade out when the curser is hovered over it when the serial terminal is in operation? Message in the upper left corner says "flexprop not responding". Have to kill the terminal before I can go back to the flexprop screen so as for an example make a modificaiton to the code.
Would like to open the "xband motion detector" obj in flexprop. How is that done? See below image in the OBJ section.
Also, I am attempting to put Flexprop on the task bar and cannot get it to do that. Is that because it is a stand alone and not installed on the drive? Windows 11 pro.
Thanks.
Martin
Go to the installation folder and right-click on flexprop.exe, then select Pin to taskbar. FWIW, I am running Windows 10 Pro.
@JonnyMac
Forgot about that simple trick. Thanks.
I am going to send you a personal note. Please just read it.
Thanks
@pilot0315 : The only yellow text that comes up in Flexprop are some tooltips describing the full path of files when you hover over the tabs, so it's probably not related to your communication problems.
You said everything was working earlier, so the obvious thing to do is to try to figure out what changed. Have you changed cables or boards? Is your wx board directly connected via serial, or is it using wifi? If wifi, has anything changed in the network? Maybe the board has been given a new IP address by your router.
To open a file in Flexprop go to the "File" menu and select "Open File..." then use the dialog box to find the file you want to look at.
I just found out that FlexC has no saturation when converting a float number to int32_t but instead returns 0 if the number is too big for the 32 bit signed range. I don't know if it is defined somewhere what should happen in this case. I just was used to the behaviour of gnu C where very large numbers give $7FFFFFFF and large negative numbers give $80000000 as result.
It's explicitly defined as undefined by the ISO standard:
@ManAtWork : I think integer overflow is technically "undefined behavior" but I agree saturating the result is more useful. I think the current 0 output is just what the Spin library I was using for floats did. I'll change it in the next release.
I got a problem with suppressing warnings about register assignment in inline pasm2. If I add a -0, eg:
118-0
then the first warning at the .c compiling level goes away, as expected, but it still leaves two remaining warnings at the subsequent .p2asm assembling level. Eg:sdmode_rdwrblks.p2asm:625: warning: First operand to xinit is a constant used without #; is this correct? If so, you may suppress this warning by putting -0 after the operand
The -0 is not carried over to that file.
PS: If it helps, I'm actually using enums to give labels to the cogRAM addresses. It's quite the effort to add a couple of hundred -0 covering each occurrence, so I wouldn't mind if this approach was eliminated altogether.
EDIT: I guess there is a switch to simply disable the warning. I think I'd be okay doing that. I don't seem to be making the mistake of leaving off #s.
EDIT2: Ohhh, just found the
extern register int *interpreter_pc;
example. My MD reader is a little broken and wasn't displaying the description ... EDIT3: Hmm, no, this isn't ideal either as each set of parameters gobbles additional registers.I'm really wanting to tack a settable parameter set to the Fcache'd code, so it's loaded along with the code.
Currently I'm overwriting PR8..15 with the relevant parameters just before each loading of Fcache. It's fine for testing purposes but I wasn't planning on keeping those PRx locations in the finished code.
I've found a way to get
extern register
working. It still hogs extra registers though so still not ideal.Replace all the enums with one concise set of #defines allows using the existing labels and at the same time eliminates all those warnings I was getting:
@evanh : Maybe put your parameter block inside the inline assembly, and write it back at the end manually? I'm imagining something like:
Yep, that will work. I'd thought about doing exactly that a long time back, funnily, but it wasn't on my radar recently. I think I'd written it off as too wasteful to be copying a bunch of zeros only to be immediately overwritten.
EDIT: Err, this idea has a trap. I'm regularly doing prep work in an __asm const {} just ahead of the __asm volatile {}. And the prep uses the pin number parameters in particular. I kind of have to copy the parameters before activating Fcache.
I guess I'm very much treating Fcache as an overlay mechanism.
Oh, and the SD command tx code is not using Fcache at all. It's one long __asm const {} with everything unrolled so it can use streamer immediate mode and still keep well ahead of sysclock/2 bit rate. And not using the FIFO for the streamer, the subsequent C code then has time to load up the response handler while the command bits are still being paced out by the streamer.
It still requires 7 config parameters plus a timeout over and above the command and its argument.
You could just load code into the FCACHE area manually from an asm const block (will probably be faster than the automatic FCACHE loader...). But I'm not sure if you can switch the inline assembler into cog mode. You could load from global/DAT assembly, but that doesn't know about local variables.
Hmm, yeah, nah, too much trouble around locals for sure. I'll continue with what I've got for the moment. Burning a few permanent "extern register"s shouldn't hurt.