flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

Wuerfel_21 · 2024-07-21 13:31

Oh, C really does not have an equivalent to Spin2's REG keyword (AST_SPRREF).

evanh · 2024-07-21 14:45

well, first attempt is working after some toe stubbing. Pretty painless in the end. Just had to incorporate the new set_crset() into the rxlag calibration routine. It has resulted in a now defunct rxlag parameter in rxcmd() routine that'll need cut from the calling routines. Only the calibrator routine actually dynamically sets rxlag.

struct cmdresp_t {
    uint32_t    clk,        // pr0
                cmd,        // pr1
                mleadin,    // pr2
                mresp,      // pr3
                mnco;       // pr4
};

static struct cmdresp_t  crset;


static void  set_crset( uint32_t clk_div, uint32_t rxlag )
{
    crset.clk = pins.clk;
    crset.cmd = pins.cmd;
    crset.mleadin = X_IMM_32X1_1DAC1 | 8 + rxlag;    // store start(S)-bit ahead of clock pulse
    crset.mresp = X_1P_1DAC1_WFBYTE | pins.cmd<<17 | X_PINS_ON | X_ALT_ON;    // + bit count
    crset.mnco = 0x8000_0000UL / clk_div + (0x8000_0000UL % clk_div > 0 ? 1 : 0);  // data transfer rate, round up upon a non-zero remainder
}


static uint32_t  rxcmd( uint32_t len, uint32_t rxlag )
{
// Advantages - can operate up to sysclock/2 clock divider
//            - suitable for rxlag calibration
//            - precise and quick completion
//            - no pin-input routing
// Disadvantage - slowest search for start bit, response transition delay
//              - Fcache copy-to-cogRAM delay at beginning
#ifdef CLK_STEP
    uint32_t  i = len << 4 + 2;    // clock steps for response + 2 to kick off data/busy phase
#else
    uint32_t  i = len << 3 + 1;    // clock pulses for response + 1 to kick off data/busy phase
#endif
    uint32_t  latency = 64;    // max of 64 clocks for response to start, Ncr (SD spec 4.12.4)

    __asm {
        loc pa, #crset  // fast copy from structure
        setq    #4
        rdlong  pr0, pa    // fast copy to registers pr0..pr4
    }
    pr3 |= len<<3;   // add bit count to DMA mode word, crset.mresp

    __asm volatile {    // "volatile" enforces use of FCACHE
        waitse1    // wait for end of command tx clocks
//drvh  #56    // diag

// locate response start-bit
        outh    pr1    // when driven, momentarily force CMD pin high before start-bit search
        loc pa, #resp    // response buffer address in hubRAM, static location
        wrfast  lnco, pa    // engage the FIFO
        setxfrq lnco    // set sysclock/1 for lead-in timing
        fltl    pr1    // pin drive release before SD card takes over

.startbit  // Careful, this loop's instruction ordering is very touchy!
#ifdef CLK_STEP
        wypin   #2, pr0    // one clock pulse
#else
        wypin   #1, pr0    // one clock pulse
#endif
        testp   pr1   wc    // sample before the clock pin transitions
        waitse1      // wait for each clock done
    if_c    djnz    latency, #.startbit    // max of 64 clocks for response to start, Ncr (SD spec 4.12.4)
        dirl    pr0    // S-bit has passed, T-bit is present 

// start-bit found, now read the response (NOTE: start(S)-bit is part of first byte)
    if_nc   xinit   pr2, #0    // lead-in delay from here at sysclock/1
    if_nc   setq    pr4        // streamer transfer rate (takes effect with buffered command below)
    if_nc   xzero   pr3, #0     // rx buffered-op aligned with clock via lead-in
        dirh    pr0    // clock timing starts here
    if_nc   wypin   i, pr0    // first clock pulse outputs during second clock period

        // twiddle thumbs

    if_nc   waitxfi    // wait for rx data completion before resuming hubexec
        ret

lnco        long    0x8000_0000UL
    }

    if( latency )
        latency = 64 - latency;
#ifdef _DIAG
    __asm {
        wrword  latency, ptrb++
    }
#endif

    return latency;
}

evanh · 2024-07-21 14:53

The only nag about using PRx registers is these are really allotted for user program level use. Whereas I'm now using them for a driver in a way that an end developer will be unaware of. Someone will eventually conflict with wanting to use those PRx registers for another purpose.

Wuerfel_21 · 2024-07-21 15:11

I think there's a way to allocate "global" variables into cog ram... is it just the register keyword? Downside is that those registers will stay allocated at all times in all cogs.
@ermsith Maybe register in a local var definition should force all such variables to exist in exactly the declared order in cogRAM?

evanh · 2024-07-22 03:49

Okay, a couple of improvements with one stone:

Used enum to created a matching ordered name set - Instead of the jumble of PRx numbers which had repeatedly caused bugs.
Which in turn made it straight forward to assign any range of register numbers. So, in effect, PR8..PR15 now used. Which puts them outside of the documented Spin2 PR0..PR7 range. So I guess a good enough solution there now too.

static struct cmd_parms_t {
    uint32_t  p_clk, p_cmd, m_leadin, m_nco, m_se2, busydelay, m_ca;
} cmdset;

enum {
      r_clk = 0x1e8, r_cmd, r_leadin, r_nco, r_se2,  r_bdelay, r_ca
};


static void  sdcmd( uint32_t cmd, uint32_t arg )
{
    uint32_t  lat;

    __asm const {
        loc pa, #cmdset  // fast copy from structure
        setq    #sizeof(cmdset)/4 - 1
        rdlong  r_clk, pa    // fast copy to registers PR0..PR6

        wypin   r_nco, r_clk    // lots of clocks
        getct   lat
        setse2  #0    // cancel triggering before reuse
        setse2  r_se2    // trigger on low level - DAT0 busy
        ...

ersmith · 2024-07-22 11:34

@evanh Sorry for the delay in responding, I'm away from my computer right now. Using an array or struct makes a lot of sense, since it will keep the compiler from worrying so much about what parts are used or not, and will keep things together. You can declare some variables inline in the assembly. I'm not sure using values above PR7 will work in bytecode, the interpreter may need some of that area.

evanh · 2024-07-22 11:53

I'll carry on on the current path until you get more time. I've got three routines changed over so far. One of them already needed 8 registers. I fear more will be needed for the so far untouched data block routines.

ersmith · 2024-07-22 13:30

@evanh said:
I'll carry on on the current path until you get more time. I've got three routines changed over so far. One of them already needed 8 registers. I fear more will be needed for the so far untouched data block routines.

For variables needed only in the asm, you can always declare them as "long" in the inline assembly.

evanh · 2024-07-22 22:51

Yep, That's what I'm converting from. The existing test code relies heavily on compile time parameters producing lots of constants.

There is two big contributors to the need from the production code. One is the large number of runtime settable parameters, including pin assignment. The other is the large combination of mode words that those parameters are applied to. Settable clock divider being the most spread around parameter.

evanh · 2024-07-22 23:27

At this stage, I'm writing SD mode as a separate driver. I've copied the whole fatfs subdirectory and modified the pin setting parameters in particular. Got that part to work this attempt. I'd missed some coded paths that were still referencing original fatfs on my first try some months back.

There'll be a comparison in the end I think. The low level routines managing the streamer ops for 4-bit SD mode are a lot bulkier than the old 1-bit SPI based code. It's not just the streamer management, the whole dealing with start bits and wait for timeouts mechanism adds a layer. Not to mention the CRCs of course.

I don't know how much of a difference it'll be but it might be worth keeping and refining SPI mode to be as sleek as possible.

I'm teetering on adding CRC checks to the bulk read data as well. I've had a situation here where one card has a problem seating its DAT0 pin and that results in lots of timeouts but if it was, say, DAT1 then the driver would be clueless until it got CRC errors on writing data. Who knows, maybe that is happening and I'm yet to catch it.

evanh · 2024-07-23 09:13

Duplicating fatfs involved the following steps:

Copy include/filesys/fatfs/* to include/filesys/sdfatfs/*
Add 1 line in include/sys/vfs.h
- struct vfs *_vfs_open_sdsdcard(int pclk, int pcmd, int pdat0, int ppow, int pled) _IMPL("filesys/sdfatfs/fatfs_vfs.c");
Edit 5 lines in include/filesys/sdfatfs/fatfs_vfs.c
- _vfs_open_sdsdcard(int pclk, int pcmd, int pdat0, int ppow, int pled)
- struct __using("filesys/sdfatfs/fatfs.cc") *FFS;
- __builtin_printf("open sdcard: using pins: %d %d %d %d %d\n", pclk, pcmd, pdat0, ppow, pled);
- pmask = (1ULL << pclk) | (1ULL << pcmd) | (15ULL << pdat0) | (1ULL << ppow) | (1ULL << pled);
- r = FFS->disk_setpins(drv, pclk, pcmd, pdat0, ppow, pled);
Delete the second function from include/filesys/sdfatfs/fatfs_vfs.c

struct vfs *
_vfs_open_sdcard()
{
    return _vfs_open_sdcardx(61, 60, 59, 58);
}

Edit 3 lines in include/filesys/sdfatfs/diskio.h
- #ifndef _SDISKIO_DEFINED
- #define _SDISKIO_DEFINED
- DRESULT disk_setpins (BYTE pdrv, int pclk, int pcmd, int pdat0, int ppow, int pled) _IMPL("sdmm.cc");

Begin the redesign of include/filesys/sdfatfs/sdmm.cc ...

pilot0315 · 2024-07-25 03:51

Never mind

ersmith · 2024-07-25 14:08

@evanh At some point I'd like to re-write the fatfs code to sit on top of a BlockDevice (or something similar), the way the littlefs code does. That way we can use fatfs with any device for which we have a driver.

Rayman · 2024-07-25 17:28

Made an independent version of fatfs long ago…. Don’t remember exactly why though…. Maybe it was for the emmc chip…

Rayman · 2024-07-25 17:39

Maybe this was from when @ersmith was still adding FatFS to FlexProp. Was a long time ago anyway, maybe 2020.
Looks like derived from AVR code...

pilot0315 · 2024-07-29 05:06

Flexprop is not seeing my P1 wx boards, when I use the find ports option I get a yellow line with text I cannot read as it is too small.
That disappears and I get a download error saying that it cannot find a P1 on the port. The gui is set for P1 as per instructions from @ersmith.
Was working two days ago without issues. Found the serial terminal that automatically opens up. All was good two days ago.
What is the yellow line with text? It stays on for approx 20 sec and disappears.
Restarted the program several times. Rebooted computer and it worked. Is there something I am missing?
Why does flexprop fade out when the curser is hovered over it when the serial terminal is in operation? Message in the upper left corner says "flexprop not responding". Have to kill the terminal before I can go back to the flexprop screen so as for an example make a modificaiton to the code.
Would like to open the "xband motion detector" obj in flexprop. How is that done? See below image in the OBJ section.

Also, I am attempting to put Flexprop on the task bar and cannot get it to do that. Is that because it is a stand alone and not installed on the drive? Windows 11 pro.
Thanks.
Martin

JonnyMac · 2024-07-29 14:24

Also, I am attempting to put Flexprop on the task bar and cannot get it to do that.

Go to the installation folder and right-click on flexprop.exe, then select Pin to taskbar. FWIW, I am running Windows 10 Pro.

pilot0315 · 2024-07-29 17:51

@JonnyMac

Forgot about that simple trick. Thanks.
I am going to send you a personal note. Please just read it.
Thanks

ersmith · 2024-07-30 19:08

@pilot0315 : The only yellow text that comes up in Flexprop are some tooltips describing the full path of files when you hover over the tabs, so it's probably not related to your communication problems.

You said everything was working earlier, so the obvious thing to do is to try to figure out what changed. Have you changed cables or boards? Is your wx board directly connected via serial, or is it using wifi? If wifi, has anything changed in the network? Maybe the board has been given a new IP address by your router.

To open a file in Flexprop go to the "File" menu and select "Open File..." then use the dialog box to find the file you want to look at.

ManAtWork · 2024-08-01 15:33

I just found out that FlexC has no saturation when converting a float number to int32_t but instead returns 0 if the number is too big for the 32 bit signed range. I don't know if it is defined somewhere what should happen in this case. I just was used to the behaviour of gnu C where very large numbers give $7FFFFFFF and large negative numbers give $80000000 as result.

Wuerfel_21 · 2024-08-01 19:10

It's explicitly defined as undefined by the ISO standard:

6.3.1.4 Real floating and integer

When a finite value of real floating type is converted to an integer type other than _Bool, the
fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part
cannot be represented by the integer type, the behavior is undefined.

ersmith · 2024-08-01 19:31

@ManAtWork : I think integer overflow is technically "undefined behavior" but I agree saturating the result is more useful. I think the current 0 output is just what the Spin library I was using for floats did. I'll change it in the next release.

evanh · 2024-08-17 15:13

I got a problem with suppressing warnings about register assignment in inline pasm2. If I add a -0, eg: 118-0 then the first warning at the .c compiling level goes away, as expected, but it still leaves two remaining warnings at the subsequent .p2asm assembling level. Eg: sdmode_rdwrblks.p2asm:625: warning: First operand to xinit is a constant used without #; is this correct? If so, you may suppress this warning by putting -0 after the operand
The -0 is not carried over to that file.

PS: If it helps, I'm actually using enums to give labels to the cogRAM addresses. It's quite the effort to add a couple of hundred -0 covering each occurrence, so I wouldn't mind if this approach was eliminated altogether.

EDIT: I guess there is a switch to simply disable the warning. I think I'd be okay doing that. I don't seem to be making the mistake of leaving off #s.

EDIT2: Ohhh, just found the extern register int *interpreter_pc; example. My MD reader is a little broken and wasn't displaying the description ... EDIT3: Hmm, no, this isn't ideal either as each set of parameters gobbles additional registers.

evanh · 2024-08-18 06:37

I'm really wanting to tack a settable parameter set to the Fcache'd code, so it's loaded along with the code.

Currently I'm overwriting PR8..15 with the relevant parameters just before each loading of Fcache. It's fine for testing purposes but I wasn't planning on keeping those PRx locations in the finished code.

evanh · 2024-08-18 08:48

I've found a way to get extern register working. It still hogs extra registers though so still not ideal.

Replace all the enums with one concise set of #defines allows using the existing labels and at the same time eliminates all those warnings I was getting:

extern register uint32_t 
    *r_param1, *r_param2, *r_param3, *r_param4,
    *r_param5, *r_param6, *r_param7, *r_param8;

#  define  r_clk      r_param1
#  define  r_cmd      r_param2
#  define  r_leadin   r_param3
#  define  r_nco      r_param4
#  define  r_se2      r_param5
#  define  r_bdelay   r_param6
#  define  r_ca       r_param7

#  define  r_resp     r_param6
#  define  r_clkr     r_param7
#  define  r_clkr2    r_param7

#  define  r_pdat     r_param2
#  define  r_mdat     r_param6
#  define  r_clkdiv   r_param7

#  define  r_crc      r_param7
#  define  r_leadin2  r_param8

ersmith · 2024-08-18 19:13

@evanh : Maybe put your parameter block inside the inline assembly, and write it back at the end manually? I'm imagining something like:

__asm__ volatile {
        jmp #.over
.data
a       long 1
b       long $bbcc
c       long 0
.over
        ' do stuff, making sure to preserve ptrb
        ....
        ' now write back the (modified) parameter block
        ' plus the jmp, but that's easier than fixing it up
        setq #3 ' 4 words total
        wrlong $0-0, ptrb

evanh · 2024-08-19 00:40

Yep, that will work. I'd thought about doing exactly that a long time back, funnily, but it wasn't on my radar recently. I think I'd written it off as too wasteful to be copying a bunch of zeros only to be immediately overwritten.

EDIT: Err, this idea has a trap. I'm regularly doing prep work in an __asm const {} just ahead of the __asm volatile {}. And the prep uses the pin number parameters in particular. I kind of have to copy the parameters before activating Fcache.

I guess I'm very much treating Fcache as an overlay mechanism.

evanh · 2024-08-19 11:54

Oh, and the SD command tx code is not using Fcache at all. It's one long __asm const {} with everything unrolled so it can use streamer immediate mode and still keep well ahead of sysclock/2 bit rate. And not using the FIFO for the streamer, the subsequent C code then has time to load up the response handler while the command bits are still being paced out by the streamer.

It still requires 7 config parameters plus a timeout over and above the command and its argument.

Wuerfel_21 · 2024-08-19 14:28

You could just load code into the FCACHE area manually from an asm const block (will probably be faster than the automatic FCACHE loader...). But I'm not sure if you can switch the inline assembler into cog mode. You could load from global/DAT assembly, but that doesn't know about local variables.

evanh · 2024-08-19 23:05

Hmm, yeah, nah, too much trouble around locals for sure. I'll continue with what I've got for the moment. Burning a few permanent "extern register"s shouldn't hurt.

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

Comments

6.3.1.4 Real floating and integer