Discussing SD Drivers (depreciated fsrw)

ersmith · 2019-03-20 22:09

cheezus wrote: »
I'm working on a "template" cog engine, mostly for my own understanding. What I have so far is this :

PUB start_explicit( DO, CLK, DI, CS ) : card_type

  stop
  long[@cmd] := "i"                         'set flag to know if the started cog has read its parameters
  long[@cmd][1] := @DO
...

Ugh! Please don't do this. I know it was popular in Spin1 to save time by passing pointers to the parameters that happened to be on the stack, but it really kills performance in fastspin (it'll work, but it forces the compiler to store local variables on the stack instead of in registers, which is much slower). It also makes porting the code to other languages harder, because not all of them will pass variables on the stack -- many, like fastspin, will pass them in registers instead.

You've also got the mailbox in the DAT section, which means it's shared by all instances of the object. If you wanted to drive two different cards, with two different COGs, you'd need to have a lock on the mailbox so they don't conflict. Whereas if the mailbox is a local variable of the object this wouldn't be an issue. Maybe it doesn't matter for the SD card driver, but you mentioned wanting to use this as a template for other objects, so it might come up there.

So I would suggest that the more natural way to write this is:

VAR
  long mbox[10] ' or whatever size
...
PUB start_explicit(DO, CLK, DI, CS) : card_type
  stop
  mbox[0] := "i"
  mbox[1] := DO
  mbox[2] := CLK
  mbox[3] := DI
  mbox[4] := CS
 ...
  cog := cognew(@asm_start, @mbox)

That'll also simplify your PASM startup code, because you won't have to juggle multiple pointers (you just have the one to mbox), and I think makes everything easier to understand, both for the human reader and for the compiler.

Regards,
Eric

cheezus · 2019-03-20 22:29

Thanks for those tips Eric!

As to passing parameters, I'll need to figure out the best way to handle this. I use this trick because there are many times I would like to pass more initialization info to the cog than I have parameters. I never did get my head around the "pre-load" trick used in the P1 and always did it this way. I guess I could use several commands for initialization, need to think on this more.

Regarding the mailbox in the dat section, I also have a commented out VAR section if I wanted multiple instances running at the same time. I will have one platform with 2 sd cards so this will be something to think about. That hardware also has a shared SPI bus so was thinking about providing some SPI methods for accessing those devices as well. I think it's best to start out "generic" so maybe others can find it helpful as well.

ke4pjw · 2019-03-21 03:04

Attached is a sdrw test for the eval board. Confirmed to work consistently on my P2ES. You may want to consider adding this to the RC3 distribution. It was the file @Rayman included in his, with additions for CR + LF in the terminal. Thank you @cheezus !

cheezus · 2019-03-21 08:35

So I'm looking at the docs, trying the RTFM manual way.. I think I'm close to testing the improved read, using smartpins. I'm sure there's something still wrong though. I'm sure the experts can see where I'm going with this though.

pri read : r  | c, o        '   Read eight bits from the card.

        c := clk
        o := do

    asm
    
    dirl    c
    wrpin   sp_async_tx,c 
    wxpin   ##$9,c          'set base period 
    dirh    c

    dirl    o
    wrpin   sp_clk_cfg,o    'sync tx
    wxpin   #%1_00111,o     'stop/start mode
    dirh    o               'enable smartpin
    
'    wypin   txdata,o        'data, need to do anything?
    wypin   #8,c           'start clock, tx data '' ! should this be 8 or 16?

busy    testp   c wc
    if_nc   jmp #busy

    rdpin   pa,c ' needed?
    rdpin   r,o
     
    dirl    o
    dirl    c
    '' next part per chip's smart pin documentation 
    REV r              ' Set MSB First
    TRIML r,#7

    endasm

cheezus · 2019-07-25 19:59

I've been working on this a bit more, trying to prove concepts using inline asm before trying to put them in their own cog. Spent way too much time trying to get the smartpin read working without any luck but I was able to get free-running clock gen for init clocks working. The one "improvement" I've been focusing on is converting the ReadBlock function to inline asm. I need to do the same for the WriteBlock, which should improve performance a few percent. It's not 100% asm yet, I got lazy..


pub readblock(n, b) | c, i, o, outv, r
'
'   Read a single block.  The "n" passed in is the
'   block number (blocks are 512 bytes); the b passed
'   in is the address of 512 blocks to fill with the
'   data.
'
        c := clk
        o := do
        i := di
     
        DRVL_(cs)

        starttime := cnt

    asm
        rep     #.end_read0, #8
        drvl    c
        drvh    c   
.end_read0

            mov     outv, #$40+17 
            rol     outv,    #24        
            rep     #.end_send0, #8
            drvl    c
            rol     outv,       #1      wc
            drvc    i
            drvh    c 
.end_send0     
            drvh    i   
            
            mov     outv, n
            rep     #.end_send1, #32
            drvl    c
            rol     outv,       #1      wc
            drvc    i
            drvh    c 
.end_send1     
            drvh    i   
            
            mov     outv, #$87 
            rol     outv,    #24        
            rep     #.end_send2, #8
            drvl    c
            rol     outv,       #1      wc
            drvc    i
            drvh    c 
.end_send2     
            drvh    i      
    endasm 

        readresp
        readresp   
   
        repeat sectorsize
            asm
                mov     r,          #0
                rep     #.end_read, #8
                drvl    c
                waitx   #14   '' !! 13 safe for up to 160 mhz, 14 for 320mhz, 57@320mhz external
                testp   o               wc  
                drvh    c   
                rcl     r,          #1
            .end_read
            endasm
          byte[b++] := r
        
        asm
            rep     #.end_read2, #16
            drvl    c
            drvh    c   
        .end_read2
        endasm

   return endcmd

I've also mostly converted CMD to inline ASM. Again proving things work before burying in a cog!

pri cmd(op, parm)  |  c, i, o, outv
'
'   Send a full command sequence, and get and
'   return the response.  We make sure cs is low,
'   send the required eight clocks, then the
'   command and parameter, and then the CRC for
'   the only command that needs one (the first one).
'   Finally we spin until we get a result.

        DRVL_(cs)

        c := clk
        o := do
        i := di


    starttime := cnt

    asm
            rep     #.end_read0, #8
        drvl    c
        drvh    c   
.end_read0

            mov     outv, op
            add     outv,   #$40 
            rol     outv,   #24        
            rep     #.end_send0, #8
            drvl    c
            rol     outv,       #1      wc
            drvc    i
            drvh    c 
.end_send0     
            drvh    i   
        
            mov     outv, parm
            rep     #.end_send1, #32
            drvl    c
            rol     outv,       #1      wc
            drvc    i
            drvh    c 
.end_send1     
            drvh    i   
            
            cmp     op, #0              wz
    if_z    mov     outv, #$95  
    if_nz   mov     outv, #$87 
            
            rol     outv,    #24        
            rep     #.end_send2, #8
            drvl    c
            rol     outv,       #1      wc
            drvc    i
            drvh    c 
.end_send2     
            drvh    i   
                         
                         
    endasm 

    return readresp

I'm really hoping to get much closer to sysclock /4 or maybe /8, the waitx #14 looks really ugly, lol

I have a few ??? I'm still trying to understand, such as I'm running @320mhz as max clock but seeing that 290mhz is where things top out.. I should probably start testing 290mhz clock??? Although once things are solid at the 160mhz "standard" clock, I haven't had a problem running up @320.

Also, since I need to slow things down for LA scoping I've been putting the PLL out of spec in all kinds of strange ways and things are surprisingly stable. Is there a recommended pll setup for 8mhz or 16mhz?

One thing I'd really like to do is figure out how to "tune" that waitx automatically. I tested using the SD card on my hardware and I had to push this to almost 60. I'm assuming ringing of cable and lots of PCB trace, I'll be shortening the cable eventually, hopefully this helps.

cheezus · 2019-07-26 00:56

I finally started making progress, realized I was making a few dumb mistakes. Forgivable since I've had a bit of a break. I've been able to get the smartpins working, although it's still not quite right. Just missing a waitx somewhere I'm pretty sure, since removing the debugging code causes things to stop working. I'd investigate further but pretty sure I'm going to lose power soon. Once the weather calms down I'll get back on it. Now that it's working I can fine tune things as needed.

    
    sp_sync_rx  = %0000_0011_000_0000000000000_00_11101_0  '' +3, inv 
    sp_sync_cfg = %1_00111

pri read : r  | c, o        '   Read eight bits from the card.

        c := clk
        o := do

    asm
    
        dirl    o
        wrpin   #sp_sync_rx,o       'sync tx
        wxpin   #sp_sync_cfg,o      'stop/start mode
        dirh    o                   'enable smartpin
        akpin   o
        
        
        waitx   #12
        
        
        rep     #.end_rd, #8
            drvl    c
            waitx   #12
            drvh    c 
            waitx   #2  
            
            
.end_rd       
        
        waitx   #32
        
        rdpin   r, o

        rev     r
        and     r, #$ff

        dirl    o
        
 
    endasm
     
            
    te.tx(r)            ' debugging, remove breaks

I'm glad I've got it working and I'll figure out what I'm doing wrong shortly.

Then getting smartpin TX and clock working!

cheezus · 2019-07-26 08:39

I'm just going to drop this working version here, in case.. Smartpin RX is working and I've got the clock for RX using a smartpin as well. I just need to get the smartpin TX working and then see about writing an SPI cog. Much work to be done but take one victory at a time!

con
    sp_srx      =   %0000_0011_000_0000000000000_00_11101_0  '' +3, inv 
    sp_srx_c    =   %1_00111
    
    sp_clk      =   %1000_0000_000_0000000000000_01_00101_0

var
   long spi_bitcycles

.. 'init code does this
        spi_bitcycles :=((clkfreq / spi_clk_max) /2 )+1   ' needs to be inverse of freq / max 
        bt := ((clkfreq / 400_000 ) / 2 ) +1  '' UNDER 400k    

    asm
            wrpin   #sp_srx,    o       'sync tx
            wxpin   #sp_srx_c,  o       'stop/start mode          
            dirl    c
            wrpin   #sp_clk,    c
            wxpin   bt,         c       'set base period (*2) ''2.5 400kmax init
            dirh    c                          
            wypin   #initclks,  c       'start clock, tx data n-1?
    .busy   testp   c               wc
    if_nc   jmp     #.busy
            drvh    csn                   
            wypin   #initclks,  c       'start clock, tx data n-1?
            rdpin   pa,c
    .busy2  testp   c               wc
    if_nc   jmp     #.busy2
            wrpin   #0,         c                        
   endasm

...

pri read   | c, o, t, bc      '   Read eight bits from the card
        c   := clk
        o   := do
        bc  := spi_bitcycles        
    asm      
            dirl    c
            wrpin   #sp_clk,    c           ' set clk pin mode
            wxpin   bc,         c           ' set clk base period 
            dirh    c                       ' enable sp clk 
            dirh    o                       ' enable sp di
            wypin   #byteclks,  c           ' start clock
            
    .busy   testp   c                   wc
    if_nc   jmp     #.busy

            rdpin   t,          o           ' get data
            dirl    o                       ' disable smartpin di  
            wrpin   #0,         c           ' clear clk sp config                            
            rev     t                           
            and     t,          #$ff                     
    endasm
'    term.tx(t)
    return t

Once the TX is using smartpins, I won't need to disable the clock pin at the end of the read method. Really liking these smartpins. What a cool chip, once you get past that initial learning curve!

cheezus · 2019-07-29 14:28

I think I've got the smartpins thing good. I'm still not 100% sure I understand the timings but it everything seems pretty solid. I've still got several things to do, one I noticed in testing is default the max spi clock to sysclock /2 in cases where sysclock is too low to meet max spi clock. I also need to generalize the bpin config to allow other sd card configurations. I'll need to wrap my head around this next but I wanted to get it working. You can invert the clock polarity by commenting out + sp_inv_pin on the clock config line. (I use default high because of pullups and pin sharing)

@"Peter Jakacki" I can't remember if the p2d2 uses the same pins as the p2ES. If not, you will probably need to change the smartpin config lines for SP_STX and SP_SRX. Let me know if you need any help.

I have a bunch of work left to do before hiding the spi away in it's own cog. I think I'm getting close, finally!

{{
    SDSPI driver for P2-ES FastSpin/InlineASM RC3
    Cheezus Slice (Joe Heinz)  -   July - 2019
    Thanks to all those out there in fourm land!
    Original spin version by Radical Eye Software Copyright 2008
}}
'CON   Baudrate = 115200    'baud rate for serial communications
con
   sectorsize = 512
   sectorshift = 9
   spi_clk_max = 20_000_000     ' 20mhz safe, 25 seems to work though?
   '' NOTE, SYSCLOCK needs to be 2x spi_clk_max for now
   
    '%AAAA_BBBB_FFF_PPPPPPPPPPPPP_TT_MMMMM_0
    'ppp        =''xxxx_CIO_HHH_LLL 
    sp_inv_pin  = %0000_001_000_000 << 8

  'smartpin modes                    TT_MMMMM_0
    sp_stx      =   (%0010 << 24) + %01_11100_0     ' +2      
    sp_srx      =   (%0011 << 24) + %00_11101_0     ' +3
    sp_srx_c    =   %0_00111   
    sp_clk      =   %01_00101_0 + sp_inv_pin ' inv clock right way
                                                            
    initclks    =    (20*(2*8)) ' send 20 bytes to init card
    byteclks    =   (2*8)  
   
var
   long di, do, clk, cs, starttime, sdhc, di_mask, do_mask, clk_mask, do_pin
   long spi_bitcycles

'obj   term: "SimpleSerial.spin2"           ''-P2 TERM


pri send(outv) | c, i      '   Send eight bits, then raise di.

        i := di
        c := clk
        asm
                SHL outv,           #32-8       ' Set MSB First
                REV outv                        ' Set MSB First
                wypin   outv,       i           ' data !!
                rdpin   pa,         c
                wypin   #byteclks,  c           'start clock, tx data    
        .busy   testp               c   wc
        if_nc   jmp     #.busy    
                wypin   #$FF,       i
endasm
     
pri read   | c, o, t     '   Read eight bits from the card.

        c   := clk
        o   := do      
    asm    
            rdpin   pa,         c
            wypin   #byteclks,  c           ' start clock            
    .busy   testp   c                   wc
    if_nc   jmp     #.busy
            rdpin   t,          o           ' get data
            rev     t                           
            and     t,          #$ff                                  
    endasm
    return t    

pub start_explicit(iDO, iCLK, iDI, iCS)| t, c,csn,o,i,  bt    'RJA adding in some things to make work
    '' globals
    do := iDO
    clk := iCLK
    di := iDI
    cs := iCS
    spi_bitcycles :=((clkfreq / spi_clk_max) /2 )+1  
        
    '' locals
    c := clk
    csn := cs
    o   := do
    i   := di
    bt := spi_bitcycles         
        
    asm           
            drvh    csn 
              
            dirl    c
            wrpin   #sp_clk,    c
            wxpin   bt,         c       'set base period (*2) ''2.5 400kmax init
            dirh    c    
            
            dirl    i
            wrpin   #sp_stx,    i       'sync tx
            wxpin   #%1_00111,  i       'stop/start mode
            dirh    i
            wypin   #$FF,   i
            
            dirl    o
            wrpin   #sp_srx,    o       'sync rx
            wxpin   #sp_srx_c,  o       'stop/start mode    
            dirh    o
        
            wypin   #initclks,  c       'start clock, tx data n-1?            
    .busy   testp   c               wc
    if_nc   jmp     #.busy
            
            wypin   #byteclks,  c       'start clock, tx data n-1?
            rdpin   pa,         c                                     
    .busy2  testp   c               wc
    if_nc   jmp     #.busy2

    endasm
   

    starttime := t
     
    cmd(0, 0)
    drvh_(cs)               ' Deselect the card to terminate a command.

    cmd(8, $1aa)
    read
    read
    read
    read
    drvh_(cs)               ' Deselect the card to terminate a command.  

    repeat
            cmd(55, 0)
            t := cmd(41, $4000_0000)
            drvh_(cs)           ' Deselect the card to terminate a command.            
            if t <> 1
                quit
    if t
            return -40'abort -40 ' could not initialize card

    cmd(58, 0)
    sdhc := (read >> 6) & 1
    read
    read
    read
    drvh_(cs)               ' Deselect the card to terminate a command.
    if sdhc == 0    ' if not hc   
      return -40    ' fail mount 
           
    return sdhc +1              ' return card type
                      
pri readresp | r
'' made return timeout to caller CHZ 2-2019
'   Read eight bits, and loop until we
'   get something other than $ff.
'
   repeat
      if (r := read) <> $ff
         return r
      if checktime == -41
        return -41
 
     
pri busy | r        '   Wait until card stops returning busy
'' made return timeout to caller CHZ 2-2019

'
   repeat
      if (r := read)
         return r
      if checktime == -41
        return -41


pri checktime       '   Did we go over our time limit yet?

   if cnt - starttime > clkfreq
      return -41'abort -41 ' Timeout during read


pri cmd(op, parm) 
'
'   Send a full command sequence, and get and
'   return the response.  We make sure cs is low,
'   send the required eight clocks, then the
'   command and parameter, and then the CRC for
'   the only command that needs one (the first one).
'   Finally we spin until we get a result.

    DRVL_(cs)
    
    starttime := cnt
    
    read
    send($40+op)
    send(parm >> 24)
    send(parm >> 16)
    send(parm >> 8)
    send(parm << 0)
    if (op == 0)
        send($95)
    else
        send($87)   
    return readresp
    
pri endcmd      ' Deselect the card to terminate a command.
        drvh_(cs)           ' Deselect the card to terminate a command.

PUB stop                    ' RJA adding in some things

PUB release   
       
pub start(basepin)

  result := start_explicit(basepin, basepin+1, basepin+2, basepin+3)

pub readblock(n, b) | c, i, o, outv, r
'
'   Read a single block.  The "n" passed in is the
'   block number (blocks are 512 bytes); the b passed
'   in is the address of 512 blocks to fill with the
'   data.
'
   starttime := cnt
   cmd(17, n)
   readresp
   repeat sectorsize
      byte[b++] := read
   read
   read
   return endcmd

pub writeblock(n, b)
'
'   Write a single block.  Mirrors the read above.
'
   starttime := cnt
   cmd(24, n)
'   cmd(24, doSDHC(n))  
   send($fe)
   repeat sectorsize
      send(byte[b++])
   read
   read
   if ((readresp & $1f) <> 5)
      return -42'abort -42
   busy
   return endcmd

{{
'  Permission is hereby granted, free of charge, to any person obtaining
'  a copy of this software and associated documentation files
'  (the "Software"), to deal in the Software without restriction,
'  including without limitation the rights to use, copy, modify, merge,
'  publish, distribute, sublicense, and/or sell copies of the Software,
'  and to permit persons to whom the Software is furnished to do so,
'  subject to the following conditions:
'
'  The above copyright notice and this permission notice shall be included
'  in all copies or substantial portions of the Software.
'
'  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
'  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
'  MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
'  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
'  CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
'  TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
'  SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
}}

cheezus · 2019-07-30 19:12

Pretty sure I'm getting close now. This is completely untested and I need to double check the init sequence a few times before I even try to debug this. I just wanted to post this in case someone cares to take a quick look for obvious errors. You know how it is when you stare at code too long and start missing things.

CON 'Constants section 
   spi_clk_max = 20_000_000     ' 20mhz safe  
   initclks    =    (20*(2*8)) ' send 20 bytes to init card
   byteclks    =    (2*8)  
   
VAR 
    byte cog 
    byte do, clk, di, cs

PUB HOOK    ' test compile
start_explicit( 3, 2, 1, 0 )

PUB Stop
    if cog
        cogstop(cog-1)
    cog := 0
    
PUB GetCog
    return cog

PUB start_explicit( _DO, _CLK, _DI, _CS ) : card_type | domask, dimask, sbt
    '' save pins for later?
    do  := _DO
    clk := _CLK
    di  := _DI
    cs  := _CS
    '' prepare smartpin clock mask and bit time
    domask := getmask(clk, do)              ' bpin mask (clk)
    dimask := getmask(clk, di)   
    sbt :=((clkfreq / spi_clk_max) /2 )+1   ' SPI bit time
    if sbt < 2                              ' make sure at least sysclock /2
        sbt := 2
    stop
    long[@cmd] := "i"                         'set flag to know if the started cog has read its parameters
    long[@cmd][1] := do | (clk << 8) + (di << 16) + (cs << 24) 
    long[@cmd][2] := domask + (dimask << 8)     ' clock masks
    long[@cmd][3] := sbt                        ' spi bit time
    long[@cmd][4] := clkfreq                    '1s
    long[@cmd][5] := clkfreq / 100_000_000      '1us
    cog := cognew(@_asm_start,@cmd) + 1
    if cog
        repeat until long[@cmd] <> "i"  ' wait until cog is done reading parameter
        card_type := long[@cmd][1]       ' arg1 returns card type or returns with ptr to error from low level driver
    else                                  ' we don't have a cog?  
        return               
PUB ReadBlock(n, b)
    return DoCommand("r", n, b, 0)

PUB WriteBlock(n, b)
    return DoCommand("w", n, b, 0)
    
PRI DoCommand(_cmd, _arg1, _arg2, _arg3)
    if cmd <> 0
      return @cogbusymsg '' busy
    arg1 := _arg1
    arg2 := _arg2
    arg3 := _arg3
    cmd  := _cmd    
    repeat until cmd <> _cmd
    return err    
    
PRI getmask(clk, data) | t
    t := clk - data
    if ( || t ) <> t        
      t := ( || t ) + %0100     
        
DAT org 
_asm_start      mov     ptr_to_mbox,    ptra        ' save mbox address for later
                '' get parameters from spin
do_init         mov     ptra,       ptr_to_mbox ' return ptra back to mbox    
                rdlong  tmp,        ptra++      ' pins long               
                mov     pinDO,      tmp
                and     pinDO,      #$ff               
                mov     pinCLK,     tmp        
                shr     pinCLK,     #8
                and     pinCLK,     #$ff               
                mov     pinDI,      tmp        
                shr     pinDI,      #16
                and     pinDI,      #$ff               
                mov     pinCS,      tmp        
                shr     pinCS,      #24
                and     pinCS,      #$ff                               
                rdlong  tmp,        ptra++      'mask long
                mov     sp_doclk,   tmp
                and     sp_doclk,   #$f
                mov     sp_diclk,   tmp
                shr     sp_diclk,   #4
                and     sp_diclk,   #$f                              
                rdlong  sp_bt,      ptra++      ' timing long  
                rdlong  delay1s,    ptra++      ' 1s long
                rdlong  delay1us,   ptra        ' 1us long 
                mov     ptra,       ptr_to_mbox ' return ptra back to mbox                                                     
                '' setup pins
                drvh    pinCS               
                dirl    pinCLK      '' setup Clock pin
                mov     tmp,        sp_clk      
                add     tmp,        sp_inv_pin  ' invert clock   
                wrpin   tmp,        pinCLK      
                wxpin   sp_bt,      pinCLK      ' set base period (*2) 
                dirh    pinCLK                  ' enable clk                                                
                dirl    pinDI       '' setup DI pin                  
                mov     tmp,        sp_stx      ' save mode, bpin might change
                add     tmp,        sp_diclk    ' setup bpin
                wrpin   tmp,        pinDI       ' sync tx
                wxpin   sp_stx_c,   pinDI       ' stop/start mode
                dirh    pinDI                   ' enable smartpin
                wypin   #$FF,       pinDI       ' prime shifter                                                            
                dirl    pinDO       '' setup DO pin                
                mov     tmp,        sp_srx      ' save mode, bpin might change
                add     tmp,        sp_doclk    ' setup bpin
                wrpin   tmp,        pinDO       ' sync rx
                wxpin   sp_stx_c,   pinDO       ' 8 bit, pre   
                dirh    pinDO                   ' enable pin                                   
                '' card init                           
                drvh    pinCS                   ' set cs high
                wypin   #initclks,  pinCLK      ' set init clocks
                getct   timeout                 ' setup 1s timeout 
                addct1  timeout,    delay1s     ' in case card does not respond            
                mov     sd_cmd,     #0          'cmd 0,0
                mov     sd_parm,    #0
                call    #wait_clock             ' wait for smartpins to finish                
                call    #do_sdcmd
                drvh    pinCS                   ' set cs high
                                   
                mov     sd_cmd,     #8          ' cmd 8 $1aa
                mov     sd_parm,    #$1AA
                call    #do_sdcmd
                call    #read_byte                   ' read 32 bit
                call    #read_byte
                call    #read_byte
                call    #read_byte
                drvh    pinCS                   ' set cs high
                
.cmd55          mov     err_out,    ##-40
                mov     sd_cmd,     #55         'cmd 55,0
                mov     sd_parm,    #0
                call    #do_sdcmd
                mov     sd_cmd,     #41         'cmd 55,0
                mov     sd_parm,    ##$4000_0000'
                call    #do_sdcmd
                drvh    pinCS                   ' set cs high
                cmp     data_out,   #$01    wz  ' 01 = idle                             
        if_z    jmp     #.cmd55
                
                mov     sd_cmd,     #58          'cmd 0,0
                mov     sd_parm,    #0
                call    #do_sdcmd
                
                call    #read_byte
                and     data_in,    #40
                cmp     data_in,    #0      wz
        if_z    mov     adrShift,   #9
        if_nz   mov     adrShift,   #0
                call    #read_byte
                call    #read_byte
                call    #read_byte
                drvh    pinCS                   ' set cs high
                mov     err_out,    #0
                
                mov     sd_cmd,     #16          'cmd 0,0
                mov     sd_parm,    ##$200
                call    #do_sdcmd
                drvh    pinCS                   ' set cs high
do_init_ret     ret                               
                               
get_cmd         rdlong  cmd_in,     ptra    wz  '
                '' timeout and release card after 10s
        if_z    jmp     #get_cmd                'no cmd so handle time                               
                rdlong  arg1_in,    ++ptra      ' 
                rdlong  arg2_in,    ++ptra      ' 
                rdlong  arg3_in,    ++ptra      '
                rdlong  arg4_in,    ++ptra      '
               
                cmp     cmd_in,     #"r"    wz
        if_z    jmp     #read_a
                cmp     cmd_in,     #"w"    wz
        if_z    jmp     #write_b
'                cmp     cmd_in,     #"z"    wz
'        if_z    jmp    #release
                cmp     cmd_in,     #"i"    wz
        if_z    call    #do_init
        
               '' do commands
end_cmd         wrlong  err_out,    ++ptra                                 
                mov     cmd_in,     #0
                mov     ptra,       ptr_to_mbox ' return ptra back to mbox
                wrlong  cmd_in,     ptr_to_mbox
                jmp     #get_cmd                
     
read_a          '' arg1 is sd ptr, arg2 is hub
                mov     sd_cmd,     #17          ' cmd 8 $1aa
'                shr     adrShift,   arg1_in    ' non-hc compatibility
                mov     sd_parm,    arg1_in
                call    #do_sdcmd
                call    #read_resp
                mov     ptrb,       arg2_in
                rep     @.endread,  sectorsize 
                call    #read_byte
                wrbyte  data_in,    ptrb++        
.endread                
                call    #read_byte
                call    #read_byte
                drvh    pinCS                   ' set cs high                
read_a_ret  ret

write_b         '' arg1 is sd ptr, arg2 is hub
                mov     sd_cmd,     #24          ' cmd 8 $1aa
'                shr     adrShift,   arg1_in    ' non-hc compatibility
                mov     sd_parm,    arg1_in
                call    #do_sdcmd
                mov     data_out,   #$fe
                call    #send_byte
                mov     ptrb,       arg2_in
                rep     @.endsend,  sectorsize 
                rdbyte  data_out,   ptrb++ 
                call    #send_byte                     
.endsend                
                call    #read_byte
                call    #read_byte
                call    #read_resp
                and     data_in,    #$1f
                cmp     data_in,    #5      wz
    if_nz       mov     err_out,    ##-42
                call    #sd_busy                
                drvh    pinCS                   ' set cs high         
  
write_b_ret     ret    
                                                
do_sdcmd   ' OP is sd_cmd, parm = sd_parm   
                drvl    pinCS      
                getct   timeout                 ' setup 1s timeout 
                addct1  timeout,    delay1s                          
                call    #read_byte   
                mov     data_out,   sd_cmd
                add     data_out,   #$40
                call    #send_byte                  
                mov     data_out,   sd_parm
                shr     data_out,   #24
                and     data_out,   #$ff
                call    #send_byte                   
                mov     data_out,   sd_parm
                shr     data_out,   #16
                and     data_out,   #$ff
                call    #send_byte                 
                mov     data_out,   sd_parm
                shr     data_out,   #8
                and     data_out,   #$ff
                call    #send_byte                                 
                mov     data_out,   sd_parm
                and     data_out,   #$ff
                call    #send_byte
                cmp     sd_cmd,     #0          wz
    if_z        mov     data_out,   #$95 
    if_nz       mov     data_out,   #$87
                call    #send_byte  
                call    #read_resp                    
do_sdcmd_ret    ret    
                      
send_byte       SHL     data_out,   #32-8       ' Set MSB First
                REV     data_out                ' Set MSB First
                wypin   data_out,   pinDI       ' data !!
                rdpin   pa,         pinCLK
                wypin   #byteclks,  pinCLK      ' start clock, tx data    
                call    #wait_clock      
                wypin   #$FF,       pinDI
send_byte_ret   ret

read_byte       rdpin   pa,         pinCLK
                wypin   #byteclks,  pinCLK      ' start clock            
                call    #wait_clock
                rdpin   data_in,    pinDO       ' get data
                rev     data_in                           
                and     data_in,    #$ff                    
read_byte_ret   ret

sd_busy         call    #read_byte   
                cmp     data_in,    #0          wz
   if_z         call    #check_time        
sd_busy_ret     ret

read_resp       call    #read_byte   
                cmp     data_in,    #$ff        wz
   if_z         call    #check_time             
read_resp_ret   ret                
                
check_time      pollct1                         wc
   if_c         mov     err_out,    ##-41
                drvh    pinCS                   ' set cs high
'   if_c         jmp     #timed_out           
check_time_ret  ret                  

wait_clock      testp   pinCLK                  wc
        if_nc   jmp     #wait_clock
wait_clock_ret  ret               
                
{=== Smart pin constants ===}
sp_clk      long    %01_00101_0 
sp_stx      long    %01_11100_0          
sp_srx      long    %00_11101_0
sp_stx_c    long    %1_00111
sp_srx_c    long    %0_00111   
sp_inv_pin  long    %0000_001_000_000 << 8

adrShift    long    0    ' will be 0 for SDHC, 9 for MMC & SD
sectorsize  long    512
' loaded on init
{=== Pin Variables ===}
pinDO       long    0    ' do
pinCLK      long    0    ' clk
pinDI       long    0    ' di
pinCS       long    0    ' chip select
{===== Smartpin Clock Variables =====}'' smartpins stuffs
sp_doclk    long    0
sp_diclk    long    0
sp_bt       long    0
{===== Command Mailbox =====}
ptr_to_mbox res 1
cmd_in      res 1
arg1_in     res 1
arg2_in     res 1
arg3_in     res 1
arg4_in     res 1
err_out     res 1
{=== Timing Variables ===}
delay1s     res 1   ' clkfreq = 1 s 
delay1us    res 1   ' clkfreq / 100_000_000
timeout     res 1  
{=== Temp Variables ===} 
tmp         res 1     ' this may get used in all subroutines...don't use except in lowest 
data_in     res 1    
data_out    res 1
sd_cmd      res 1
sd_parm     res 1

'buff        res 299

FIT 496

dat         '' debug codes
cogbusymsg  byte    "SPI Busy",0
badpins     byte    "pins out of range",0

dat         '' singleton spin mailbox
alignl
cmd     long 0
arg1    long 0
arg2    long 0
arg3    long 0
arg4    long 0
err     long 0
{
var    '' multi spin mailbox
long  cmd, arg1, arg2, arg3, arg4, err 
}

AJL · 2019-07-30 23:24

I couldn't get your code and my comments to fit in one post, so only code relevant to my comments follows.

In the asm init section, I don't see where do_init is used separately from _asm_start. Moving the just saved pointer back isn't really needed is it?
Rearranging your code could allow the use of a setq+ rdlong construct to read the whole parameter block in one action. It could save some pointer maths and the code reduction will allow for the extra long needed (tmp2). Using getnib, getbyte, and getword instead of shift and mask for the slice and dice of the packed parameters is worth investigating too.

In the clock pin setup you add two constants before the WRPIN. Why not just a single constant?

read_byte destroys the contents of data_in on each call, so where you call it multiple times in sequence are you only interested in the last byte received?

I've run out of time, but that's a start.

cheezus wrote: »

Pretty sure I'm getting close now. This is completely untested and I need to double check the init sequence a few times before I even try to debug this. I just wanted to post this in case someone cares to take a quick look for obvious errors. You know how it is when you stare at code too long and start missing things.

DAT org 
_asm_start      mov     ptr_to_mbox,    ptra        ' save mbox address for later
                '' get parameters from spin
do_init         mov     ptra,       ptr_to_mbox ' return ptra back to mbox    
                rdlong  tmp,        ptra++      ' pins long               
                mov     pinDO,      tmp
                and     pinDO,      #$ff               
                mov     pinCLK,     tmp        
                shr     pinCLK,     #8
                and     pinCLK,     #$ff               
                mov     pinDI,      tmp        
                shr     pinDI,      #16
                and     pinDI,      #$ff               
                mov     pinCS,      tmp        
                shr     pinCS,      #24
                and     pinCS,      #$ff                               
                rdlong  tmp,        ptra++      'mask long
                mov     sp_doclk,   tmp
                and     sp_doclk,   #$f
                mov     sp_diclk,   tmp
                shr     sp_diclk,   #4
                and     sp_diclk,   #$f                              
                rdlong  sp_bt,      ptra++      ' timing long  
                rdlong  delay1s,    ptra++      ' 1s long
                rdlong  delay1us,   ptra        ' 1us long 
                mov     ptra,       ptr_to_mbox ' return ptra back to mbox                                                     
                '' setup pins
                drvh    pinCS               
                dirl    pinCLK      '' setup Clock pin
                mov     tmp,        sp_clk      
                add     tmp,        sp_inv_pin  ' invert clock   
                wrpin   tmp,        pinCLK      
                wxpin   sp_bt,      pinCLK      ' set base period (*2) 
                dirh    pinCLK                  ' enable clk                                                
                dirl    pinDI       '' setup DI pin                  
                mov     tmp,        sp_stx      ' save mode, bpin might change
                add     tmp,        sp_diclk    ' setup bpin
                wrpin   tmp,        pinDI       ' sync tx
                wxpin   sp_stx_c,   pinDI       ' stop/start mode
                dirh    pinDI                   ' enable smartpin
                wypin   #$FF,       pinDI       ' prime shifter                                                            
                dirl    pinDO       '' setup DO pin                
                mov     tmp,        sp_srx      ' save mode, bpin might change
                add     tmp,        sp_doclk    ' setup bpin
                wrpin   tmp,        pinDO       ' sync rx
                wxpin   sp_stx_c,   pinDO       ' 8 bit, pre   
                dirh    pinDO                   ' enable pin                                   
                '' card init                           
                drvh    pinCS                   ' set cs high
                wypin   #initclks,  pinCLK      ' set init clocks
                getct   timeout                 ' setup 1s timeout 
                addct1  timeout,    delay1s     ' in case card does not respond            
                mov     sd_cmd,     #0          'cmd 0,0
                mov     sd_parm,    #0
                call    #wait_clock             ' wait for smartpins to finish                
                call    #do_sdcmd
                drvh    pinCS                   ' set cs high
                                   
                mov     sd_cmd,     #8          ' cmd 8 $1aa
                mov     sd_parm,    #$1AA
                call    #do_sdcmd
                call    #read_byte                   ' read 32 bit
                call    #read_byte
                call    #read_byte
                call    #read_byte
                drvh    pinCS                   ' set cs high
                
.cmd55          mov     err_out,    ##-40
                mov     sd_cmd,     #55         'cmd 55,0
                mov     sd_parm,    #0
                call    #do_sdcmd
                mov     sd_cmd,     #41         'cmd 55,0
                mov     sd_parm,    ##$4000_0000'
                call    #do_sdcmd
                drvh    pinCS                   ' set cs high
                cmp     data_out,   #$01    wz  ' 01 = idle                             
        if_z    jmp     #.cmd55
                
                mov     sd_cmd,     #58          'cmd 0,0
                mov     sd_parm,    #0
                call    #do_sdcmd
                
                call    #read_byte
                and     data_in,    #40
                cmp     data_in,    #0      wz
        if_z    mov     adrShift,   #9
        if_nz   mov     adrShift,   #0
                call    #read_byte
                call    #read_byte
                call    #read_byte
                drvh    pinCS                   ' set cs high
                mov     err_out,    #0
                
                mov     sd_cmd,     #16          'cmd 0,0
                mov     sd_parm,    ##$200
                call    #do_sdcmd
                drvh    pinCS                   ' set cs high
do_init_ret     ret                               
                               
get_cmd         rdlong  cmd_in,     ptra    wz  '
                '' timeout and release card after 10s
        if_z    jmp     #get_cmd                'no cmd so handle time                               
                rdlong  arg1_in,    ++ptra      ' 
                rdlong  arg2_in,    ++ptra      ' 
                rdlong  arg3_in,    ++ptra      '
                rdlong  arg4_in,    ++ptra      '
               
                cmp     cmd_in,     #"r"    wz
        if_z    jmp     #read_a
                cmp     cmd_in,     #"w"    wz
        if_z    jmp     #write_b
'                cmp     cmd_in,     #"z"    wz
'        if_z    jmp    #release
                cmp     cmd_in,     #"i"    wz
        if_z    call    #do_init
        
               '' do commands
end_cmd         wrlong  err_out,    ++ptra                                 
                mov     cmd_in,     #0
                mov     ptra,       ptr_to_mbox ' return ptra back to mbox
                wrlong  cmd_in,     ptr_to_mbox
                jmp     #get_cmd                
     
read_a          '' arg1 is sd ptr, arg2 is hub
                mov     sd_cmd,     #17          ' cmd 8 $1aa
'                shr     adrShift,   arg1_in    ' non-hc compatibility
                mov     sd_parm,    arg1_in
                call    #do_sdcmd
                call    #read_resp
                mov     ptrb,       arg2_in
                rep     @.endread,  sectorsize 
                call    #read_byte
                wrbyte  data_in,    ptrb++        
.endread                
                call    #read_byte
                call    #read_byte
                drvh    pinCS                   ' set cs high                
read_a_ret  ret

write_b         '' arg1 is sd ptr, arg2 is hub
                mov     sd_cmd,     #24          ' cmd 8 $1aa
'                shr     adrShift,   arg1_in    ' non-hc compatibility
                mov     sd_parm,    arg1_in
                call    #do_sdcmd
                mov     data_out,   #$fe
                call    #send_byte
                mov     ptrb,       arg2_in
                rep     @.endsend,  sectorsize 
                rdbyte  data_out,   ptrb++ 
                call    #send_byte                     
.endsend                
                call    #read_byte
                call    #read_byte
                call    #read_resp
                and     data_in,    #$1f
                cmp     data_in,    #5      wz
    if_nz       mov     err_out,    ##-42
                call    #sd_busy                
                drvh    pinCS                   ' set cs high         
  
write_b_ret     ret    
                                                
do_sdcmd   ' OP is sd_cmd, parm = sd_parm   
                drvl    pinCS      
                getct   timeout                 ' setup 1s timeout 
                addct1  timeout,    delay1s                          
                call    #read_byte   
                mov     data_out,   sd_cmd
                add     data_out,   #$40
                call    #send_byte                  
                mov     data_out,   sd_parm
                shr     data_out,   #24
                and     data_out,   #$ff
                call    #send_byte                   
                mov     data_out,   sd_parm
                shr     data_out,   #16
                and     data_out,   #$ff
                call    #send_byte                 
                mov     data_out,   sd_parm
                shr     data_out,   #8
                and     data_out,   #$ff
                call    #send_byte                                 
                mov     data_out,   sd_parm
                and     data_out,   #$ff
                call    #send_byte
                cmp     sd_cmd,     #0          wz
    if_z        mov     data_out,   #$95 
    if_nz       mov     data_out,   #$87
                call    #send_byte  
                call    #read_resp                    
do_sdcmd_ret    ret    
                      
send_byte       SHL     data_out,   #32-8       ' Set MSB First
                REV     data_out                ' Set MSB First
                wypin   data_out,   pinDI       ' data !!
                rdpin   pa,         pinCLK
                wypin   #byteclks,  pinCLK      ' start clock, tx data    
                call    #wait_clock      
                wypin   #$FF,       pinDI
send_byte_ret   ret

read_byte       rdpin   pa,         pinCLK
                wypin   #byteclks,  pinCLK      ' start clock            
                call    #wait_clock
                rdpin   data_in,    pinDO       ' get data
                rev     data_in                           
                and     data_in,    #$ff                    
read_byte_ret   ret

sd_busy         call    #read_byte   
                cmp     data_in,    #0          wz
   if_z         call    #check_time        
sd_busy_ret     ret

read_resp       call    #read_byte   
                cmp     data_in,    #$ff        wz
   if_z         call    #check_time             
read_resp_ret   ret                
                
check_time      pollct1                         wc
   if_c         mov     err_out,    ##-41
                drvh    pinCS                   ' set cs high
'   if_c         jmp     #timed_out           
check_time_ret  ret                  

wait_clock      testp   pinCLK                  wc
        if_nc   jmp     #wait_clock
wait_clock_ret  ret               
                
{=== Smart pin constants ===}
sp_clk      long    %01_00101_0 
sp_stx      long    %01_11100_0          
sp_srx      long    %00_11101_0
sp_stx_c    long    %1_00111
sp_srx_c    long    %0_00111   
sp_inv_pin  long    %0000_001_000_000 << 8

adrShift    long    0    ' will be 0 for SDHC, 9 for MMC & SD
sectorsize  long    512
' loaded on init
{=== Pin Variables ===}
pinDO       long    0    ' do
pinCLK      long    0    ' clk
pinDI       long    0    ' di
pinCS       long    0    ' chip select
{===== Smartpin Clock Variables =====}'' smartpins stuffs
sp_doclk    long    0
sp_diclk    long    0
sp_bt       long    0
{===== Command Mailbox =====}
ptr_to_mbox res 1
cmd_in      res 1
arg1_in     res 1
arg2_in     res 1
arg3_in     res 1
arg4_in     res 1
err_out     res 1
{=== Timing Variables ===}
delay1s     res 1   ' clkfreq = 1 s 
delay1us    res 1   ' clkfreq / 100_000_000
timeout     res 1  
{=== Temp Variables ===} 
tmp         res 1     ' this may get used in all subroutines...don't use except in lowest 
data_in     res 1    
data_out    res 1
sd_cmd      res 1
sd_parm     res 1

'buff        res 299

FIT 496

dat         '' debug codes
cogbusymsg  byte    "SPI Busy",0
badpins     byte    "pins out of range",0

dat         '' singleton spin mailbox
alignl
cmd     long 0
arg1    long 0
arg2    long 0
arg3    long 0
arg4    long 0
err     long 0
{
var    '' multi spin mailbox
long  cmd, arg1, arg2, arg3, arg4, err 
}

cheezus · 2019-07-30 23:59

@AJL, Thanks for taking a look. I've got a bunch of problems in my code above... The first I found was I was missing a dummy rdlong at the beginning of my init. I also am packing the clock pin masks wrong. There's also something ?else? wrong because I'm outputting clock but still not getting data.

Re. init -
This is one I'm not 100% sure how I want to do. I'd like to be able to change sysclock or pins by restarting the cog (it's called from get_cmd) I could probably get away with one entry point but was even thinking about separating the timing stuff from pin stuff but going to leave it like this for now I think.

re. setq + rdlong -
This is one improvement I'm going to look at once I have things working. That goes for the getnib / setnib instructions, just not quite there yet. Trying to stay as close to what I know..

The extra constant for the clock pin inverts it and I plan on having a switch somewhere to control the polarity of the clock, as well as set default max SPI clock. Just haven't been able to get my head around that yet.

Read_byte trashes data_out and most of the time I'm not interested in any of the data.

cheezus · 2019-07-31 06:40

I'm stumbling my way through, found all kinds of errors.. I'm stuck on the conversion of readblock… It took me a while to figure out what is going on and it seems like I'm trashing the call stack? I believe it's because of my use of CALLs and JMPs. I'm missing the obvious and correct way of conditional repeat. Spin code is this.

pri readresp | r
'   Read eight bits, and loop until we
'   get something other than $ff.
'
   repeat
      if (r := read) <> $ff
         return r
      if checktime == -41
        return -41

When I was trying to convert it to ASM I had a feeling this wouldn't work.

read_resp       call    #read_byte   
                cmp     data_in,    #$ff        wz
   if_z         call    #check_time      
   if_z         jmp     #read_resp       
read_resp_ret   ret

I'm completely stuck right now and i'm pretty sure this is the problem.

cheezus · 2019-07-31 07:39

Scratch that, seems the problem is REP is not working

I noticed this while trying to use the output of fastspin to help guide me. For some reason, this is not repeating 512 times, just falling through. I know there was a bug with the REP instruction but I can't remember what it was exactly. I think I'm just doing something wrong.

…
                mov     ptrb,       arg2_in
       
                rep     @.endread,  ##512 
                call    #read_byte                
                wrbyte  data_in,    ptrb++                       
.endread                
                call    #read_byte
                call    #read_byte
...

*edit
CONFIRMED. I tried a couple different things but couldn't get rep to work??? I have it working inline ASM but not in actual asm! RTFM!!

I was able to get it working and now the SD card mounts!

.loop           call    #read_byte                
                wrbyte  data_in,    ptrb++ 
                djnz    count,       #.loop

I need to cleanup code and do some testing to make sure things are stable but I think I'm close!!

Peter Jakacki · 2019-07-31 07:41

You can't jump or call within a rep

REP works in hub memory, as well, but executes a hidden jump to get back to the top of the repeated instructions.

Any branch within the repeating instruction block will cancel REP activity. Interrupts will be ignored during REP looping.

cheezus · 2019-07-31 08:01

Peter Jakacki wrote: »

You can't jump or call within a rep

REP works in hub memory, as well, but executes a hidden jump to get back to the top of the repeated instructions.

Any branch within the repeating instruction block will cancel REP activity. Interrupts will be ignored during REP looping.

DUH!!! Thanks Peter, feeling kinda silly now

All these new instructions, all new gotchas!

evanh · 2019-07-31 08:02

That's a normal REP rule. The CALL counts as an exit branch from the REP mechanism. Another rule is any exit branching can't be the last instruction of the REP block. Another rule is interrupts are put on hold until the REP completes/exits.

The flaw (Corrected in the final silicon) is to do with a group of branching instructions that test the hardware events mechanism, eg: JCT1.

cheezus · 2019-07-31 08:40

Thanks guys! I'm getting close, just got to nail down block writes! Once I get that done I'll try to clean up code and post. A quick benchmark (since I don't have writes yet and can't run the full test) for mounting a card is 4.57ms bit-bashed inline asm, 4.04ms inline asm smartpins and 2.33ms smartpins in their own cog. These were all @160mhz sysclock and 20mhz spi clock. Note, my LA can't keep up with 20mhz SPI.

Interestingly enough, the bit-bashed command packet is 2nd to complete although it's the all out loser in final times.

Bit - bashed SPI

Smartpins inline

Smartpins cog

*edit
Got block writes working so was able to come up with a full set of test times. To mount the card, read and write 2m.

inline asm bit-bashed-19.5s
inline asm smart-pin-14s
cog smart-pin-10.9s

I'll give the code a quick cleanup, now that I've got everything working including clock polarity. There's a lot of places to improve still. I'd like to figure out a way to determine max spi clock at init, although I really can't wrap my head around any way besides brute-force init card and increase speed till it fails. Maybe I'm missing something obvious as usual. Really happy I have this working though.

*final edit
I did some code cleanup and have just about everything where I want it. I'm posting a NEW RC3 here and once things are tested further I'll update the first post.

cheezus · 2019-08-07 00:15

Still testing RC3 posted above. I've got a bunch of work left to do but it's fairly functional AFAIK. I'm wondering if anyone can give me a reason why it would be better to return a negative error code, as opposed to aborting? I'm probably going to re-write with aborts, as V1 used. Just wondering if anyone has a good argument against aborts?

Rayman · 2019-08-07 00:25

I don’t like abort
Seems like unnecessary complexity to me

cheezus · 2019-08-07 01:11

Rayman wrote: »

I don’t like abort
Seems like unnecessary complexity to me

For some cases I can see how it could be considered unnecessary but I'm working on the Touchscreen OS again and aborts seem really helpful for SD cards. The ability to point to an error message, instead of needing to do a lookup on an error code. Six of one, half dozen of the other... I guess it could be done with conditional compile and would be the best, because a lot of P1 apps expect negative return type errors from FSRW. I really like how verbose Kye's SD driver is, plus it would be nice to not need to rewrite the ENTIRE touch.spin.

As I said, I really like Kye's driver and have pretty much talked myself into trying to get the genius piece of code working. One of the functions that is very essential to the OS is the ability to launch a file from SD card and it seems that @Cluso99 's on-chip SD could do this with little additional code. I just haven't wrapped my head around that yet. In the past I had commented out large portions of Kye's driver to free up memory but now that we have enough hubram I'm hoping to be able to do BIG THINGS much easier.

Perhaps the title of this thread should be changed to cover both FSRW and KYE's SD?

Cluso99 · 2019-08-07 09:09

Kye’s SD driver is great. It’s easy to follow. I added a few extras such as returning the physical sector etc.
From a users perspective the names are too long, but makes understanding what its doing a breeze.

This is one of the reasons i wanted to get the Spin1 interpreter working with P2. But i got sidetracked near the end with work and haven’t managed to get back to it yet.

FWIW the ROM SD code, with the addition of write, and tweeking the actual spi routine, this should slot into Kye’s code nicely.

cheezus · 2019-08-07 18:27

Cluso99 wrote: »

Kye’s SD driver is great. It’s easy to follow. I added a few extras such as returning the physical sector etc.
From a users perspective the names are too long, but makes understanding what its doing a breeze.

This is one of the reasons i wanted to get the Spin1 interpreter working with P2. But i got sidetracked near the end with work and haven’t managed to get back to it yet.

FWIW the ROM SD code, with the addition of write, and tweeking the actual spi routine, this should slot into Kye’s code nicely.

Kye's driver is great and the Touch OS is based largely on your OS based on Kye's SD driver

I really wish I could wrap my head around how to interface cleanly with the ROM SD code. I know I'm probably just making it harder than I need to.
The main reason I didn't focus on using the ROM code was needing to understand SPI on the P2 a little better.

I'm stuck needing a driver for the resistive touch ADC XPT2046. Going to try modifying SDSPI for this. Once I get that done I'll be able to test a lot more, although I'm sure there will be a ton of debugging and bugfixes.

One of the things that looks really promising is LUT sharing. IE, read block from sd using CogN into shared LUT and use CogN+1 to push that block to display (or ram/flash). CogN could be reading the next block while N+1 is writing the current block. Seems like this could save a TON of hubcycles over the way things are done on the P1. I don't see myself getting around to this any time soon since I'm not sure how other people would be able to use such a specific driver.

I'm hoping to try using the rom to run a file, how many times can I get this one wrong? lol

I updated the 1st post with the current RC and some RC notes. I really wish I was better at documentation but until I get some feedback from people using this it seems pointless. I'm guessing right now there isn't much need for a P2 FSRW or Kyes, but that will change soon hopefully!

Really hoping the new silicon doesn't break things too much, hopefully worst case is change smartpin clock from sysclock /2 to /3 or /4. Time will tell though!

Cluso99 · 2019-08-07 20:22

@cheezus,
Note the ES ROM has some fixes in the respin ROM to better recover from the SD DO problem which fails to release the DO drive from the SD card. This means the SD ROM code positions change in the new ROM. The Serial Monitor/Debugger did not change position so this is fine to call.

Second thing to note is that the SD ROM code (routine) to drive the SPI code is deliberately not fast to ensure it works reliably from hubexec. It has send/receive combined ie it shifts in (reads) as it shifts out (writes).

The code in the new ROM is way better in the DO error handling, so this is what I will be using as a base for my SD (if you or others haven't done it when I need it), with just the write sector added, and the SPI send/receive routine replaced with a faster version.

BTW there is IIRC a single further fix for the DO that did not make it into the new ROM, so I will need to dig it out. I am overseas for another 2 weeks so that will have to wait.

cheezus · 2019-08-07 20:36

Cluso99 wrote: »

@cheezus,
Note the ES ROM has some fixes in the respin ROM to better recover from the SD DO problem which fails to release the DO drive from the SD card. This means the SD ROM code positions change in the new ROM. The Serial Monitor/Debugger did not change position so this is fine to call.

Second thing to note is that the SD ROM code (routine) to drive the SPI code is deliberately not fast to ensure it works reliably from hubexec. It has send/receive combined ie it shifts in (reads) as it shifts out (writes).

The code in the new ROM is way better in the DO error handling, so this is what I will be using as a base for my SD (if you or others haven't done it when I need it), with just the write sector added, and the SPI send/receive routine replaced with a faster version.

BTW there is IIRC a single further fix for the DO that did not make it into the new ROM, so I will need to dig it out. I am overseas for another 2 weeks so that will have to wait.

I've looked at the first ROM code quite a bit. It's beautiful, but I was planning on having a cog just for the SD driver so hubexec is what got me confused. I was also aware of the changes for the respin, another reason I haven't worked on this. I would have liked to build a driver around it but figured you would be doing one (using hubexec) so focusing on something that could be stuffed in a cog like P1. FSRW is a bare minimum driver and so my focus on early development. Soon I'll need a way to run a precompiled binary and if I can figure out how to copy the filename to the right place in hub and jmp into the hubexec code I'll be really happy.

Cluso99 · 2019-08-07 22:21

I don't have time to verify just now, but this is what you need to know

                   '+-------[ SD: Initialise/Locate/Load/Run a file from SD ]---------------------+ <--- SD: init/locate/load/run a file --->
                   '+ On Entry:                                                                   +
                   '+      fname[3]:       filename[11] 8+3 without "."  (will be $0 terminated)  +
                   '+ Call Format:                                                                +
                   '+              CALL    #@_Run_SDfile                   '                      + < call: init/locate/load/run a file >
                   '+ On Return:                                                                  +
                   '+      "NZ" if error, else does not return                                    +
                   '+-----------------------------------------------------------------------------+
                   
                   '+-----------------------------------------------------------------------------+
fc560     fdb00040 _Start_SDcard   call    #@_SDcard_Init                  ' initialise & read CSD/CID
fc564     f6079a00                 mov     skiprun,          #0            ' load/run MBR/VOL code
fc568     adb00174         if_e    call    #@_readMBR                      ' read MBR/VOL/FSI/FAT
fc56c     adb002a0         if_e    call    #@_readDIR                      ' read directory for filenames
                   ''              mov     skiprun,          #0            ' load/run <file> (already 0)
fc570     adb0038c         if_e    call    #@_readFILE                     ' read/load/run the file
fc574     fd800100                 JMP     #try_serial                     ' failed: so go back and try serial
                   '+-----------------------------------------------------------------------------+
fc578     fdb00028 _Run_SDfile     call    #@_SDcard_Init                  ' initialise & read CSD/CID
                   ''              mov     skiprun,          #1            ' do not load/run MBR/VOL code
fc57c     adb0015c         if_e    call    #@_readMBR1                     ' read MBR/VOL/FSI/FAT (don't run)
fc580     adb002bc         if_e    call    #@_searchDIR                    ' search dir for <fname>
fc584     f6079a00                 mov     skiprun,          #0            ' load/run <file>
fc588     adb00374         if_e    call    #@_readFILE                     ' read/load/run the file
fc58c     fd64002d                 RET                                     ' return "NZ" = failed, "Z" if loaded ok
                   '+-----------------------------------------------------------------------------+
fc590     fdb00010 _Load_SDfile    call    #@_SDcard_Init                  ' initialise & read CSD/CID
                   ''              mov     skiprun,          #1            ' do not load/run MBR/VOL code
fc594     adb00144         if_e    call    #@_readMBR1                     ' read MBR/VOL/FSI/FAT (don't run)
fc598     adb002a4         if_e    call    #@_searchDIR                    ' search dir for <fname>
                   ''              mov     skiprun,          #1            ' do not load/run <file> (already 1)
fc59c     adb00360         if_e    call    #@_readFILE                     ' read/load the file
fc5a0     fd64002d                 RET                                     ' return "NZ" = failed, "Z" if loaded ok
                   '+-----------------------------------------------------------------------------+

and

                   DAT
                   ''============[ COG VARIABLES - SD BOOT]========================================
fc560 1c0                          org     $1C0              ' place the variables in cog $1C0-$1DF
                   
fc560 1c0          cmdout          res     1                       ' The 8b CMDxx | $40
fc560 1c1          cmdpar          res     1                       ' The 32b parameters
fc560 1c2          cmdcrc          res     1                       ' The 8b CRC (must be valid for CMD0 & CMD8)
fc560 1c3          cmdpar2         res     1                       ' SDV1=$0, SDV2=$40000000
fc560 1c4          cmdtype         res     1                       ' reply is R1=1, R3=3, R7=7, else 0
fc560 1c5          reply           res     1                       ' R1 reply (moved to replyR1 when R3/R7 32b reply here)
fc560 1c6          replyR1         res     1                       ' R1 reply (8b saved when R3/R7 32b reply follows)
fc560 1c7          dataout         res     1                       ' 8/32 bit data being shifted out
fc560 1c8          bytescnt        res     1                       ' #bytes to send/recv
fc560 1c9          bitscnt         res     1                       ' #bits to be shifted in/out
fc560 1ca          ctr1            res     1
fc560 1cb          timeout         res     1                       ' = starttime + delay
fc560 1cc          spare           res     1
fc560 1cd          skiprun         res     1                       ' 1= skip load/run mbr/vol & load/no-run fname
                                                                   '\ 1=SDV1, 2=SDV2(byte address), 3=SDHC/SDV2(block address)
fc560 1ce          blocksh         res     1                       '/ block shift 0/9 bits
fc560 1cf          clustersh       res     1                       ' sectors/cluster SHL 'n' bits
                   
fc560 1d0          vol_begin       res     1 '$0000_2000           ' Ptn0: first sector of PTN
fc560 1d1          fsi_begin       res     1 '$0000_2001           ' Ptn0:       sector of file system info
fc560 1d2          fat_begin       res     1 '$0000_3122           ' Ptn0: first sector of FAT table
fc560 1d3          dir_begin       res     1' $0000_4000           ' Ptn0: first sector of DATA is DIR table
fc560 1d4          dat_begin       res     1 '$0000_4580 $0000_54C0' Ptn0: first sector of file's DATA
fc560 1d5          ptn_size        res     1 '$0008_0000           '       file-size 32KB = 64<<9 sectors
                   
fc560 1d6          _bufad          res     1
fc560 1d7          _blocknr        res     1
fc560 1d8          _sectors        res     1
fc560 1d9          _entries        res     1
fc560 1da          bufad           res     1                       ' ptr sector buffer
fc560 1db          blocknr         res     1                       ' sector#
fc560 1dc          fname           res     3                       ' 8+3+1
fc560 1df          _hubdata        res     1

In your code...

cog $1C0-$1DF is required for use by the SD ROM driver

Place the filename (8+3+$0) into 3 longs at fname (cog $1DC) (remember little endian so reverse each group of 4 characters!)
So "filename.xtn" would become "elif" + "eman" + ($0,"ntx")

Call _Load_SDfile
or
Call _Run_SDfile

cheezus · 2019-08-11 00:21

I'm working on this again and I think I'm getting ready for another release. I need to verify all my changes are correct first but it looks much nicer now that I'm using the getbyte instructions. There's a couple uglies left maybe someone can point to the CLEAN way to do this.

... ' ex1

send_byte       SHL     data_out,   #32-8       ' Set MSB First
                REV     data_out                ' Set MSB First
…

… 'ex2
                rev     data_in                           
                and     data_in,    #$ff                    
read_byte_ret   ret

I also need to use all the new test instructions but I think I can get those

SPI smartpin setup looks like this now

PUB start_explicit( _DO, _CLK, _DI, _CS ) : card_type | domask, dimask
    stop
    long[@cmd] := "i"   'set flag to know if the started cog has read its parameters
    long[@cmd][1] :=  _DO | (_CLK << 8) + (_DI << 16) + ( _CS << 24)' pins 
    long[@cmd][2] := getmask(_CLK, _DO) + (getmask(_CLK, _DI) << 4) ' clock masks
    long[@cmd][2] += spi_clk_inv << 8                                 ' clock invert
    long[@cmd][3] := GetBitTime(spi_clk_max)                        ' spi bit time
    long[@cmd][4] := GetBitTime(400_000)                            '400k
    cog := cognew(@_asm_start,@cmd) + 1
    if cog
        repeat until long[@cmd] <> "i"  ' wait until cog is done reading parameter
        card_type := long[@cmd][4]       ' arg5 returns card type or returns with ptr to error from low level driver


DAT org 
_asm_start      mov     ptr_to_mbox,    ptra        ' save mbox address for later
                '' get parameters from spin
do_init         mov     ptra,       ptr_to_mbox     ' return ptra back to mbox  
                rdlong  tmp,        ptra++          ' command long 0                             
                rdlong  tmp,        ptra++          ' pins long    1                
                getbyte pinDO,      tmp,    #0
                getbyte pinCLK,     tmp,    #1        
                getbyte pinDI,      tmp,    #2        
                getbyte pinCS,      tmp,    #3                                      
                rdlong  tmp,        ptra++          'mask long      2                  
                getnib  sp_doclk,   tmp,    #0
                getnib  sp_diclk,   tmp,    #1 
                getnib  sp_inv_pin, tmp,    #2                               
                shl     sp_doclk,   #24                
                shl     sp_diclk,   #24       
                shl     sp_inv_pin, #14         
                rdlong  sp_bt,      ptra++          ' timing long   3                
                rdlong  sp_init_bt, ptra            ' 400k          4                
                rdlong  delay1s,    #$14            ' clkfreq from hub                  
                '' setup pins
                drvh    pinCS                               
                dirl    pinCLK      '' setup Clock pin
                mov     tmp,        sp_clk                      
                add     tmp,        sp_inv_pin  ' invert clock   
                wrpin   tmp,        pinCLK      
                wxpin   sp_init_bt, pinCLK      ' set base period (*2) 
                dirh    pinCLK                  ' enable clk                                                                
                dirl    pinDI       '' setup DI pin                  
                mov     tmp,        sp_stx      ' save mode, bpin might change
                add     tmp,        sp_diclk    ' setup bpin
                wrpin   tmp,        pinDI       ' sync tx
                wxpin   sp_stx_c,   pinDI       ' stop/start mode
                dirh    pinDI                   ' enable smartpin
                wypin   #$FF,       pinDI       ' prime shifter                                                                            
                dirl    pinDO       '' setup DO pin                
                mov     tmp,        sp_srx      ' save mode, bpin might change
                add     tmp,        sp_doclk    ' setup bpin
                wrpin   tmp,        pinDO       ' sync rx
                wxpin   sp_stx_c,   pinDO       ' 8 bit, pre   
                dirh    pinDO                   ' enable pin                                                   
                '' card init                           
                drvh    pinCS                   ' set cs high
                wypin   sectorsize, pinCLK      ' set init clocks
                getct   timeout                 ' setup 1s timeout 
                addct1  timeout,    delay1s     ' in case card does not respond            
                call    #wait_clock             ' wait for smartpins to finish
                
                dirl    pinCLK      '' setup Clock pin
                wxpin   sp_bt, pinCLK      ' set base period (*2) 
                dirh    pinCLK                  ' enable clk

I think the fifo could help in the readbyte and writebyte but haven't wrapped my head around that yet.

I'm ready to try getting this runfile thing sorted and I THINK understand things pretty well. I'm about to start renaming some of my variables to match the convention laid out by @Cluso99. Not that there's a shortage of cogram yet with 237 longs left...

The other thing that I've been trying to wrap my head around is the LUT sharing.. Seems that YModem would really benefit from being able to load the LUT with 1k from the SD and then the ASync serial cog could just use a shared LUT bypassing the hub. I would also like to do this with my RAM/Display driver i'm working on and it seems probable, it would need some convention to be set with the objects that share the adjacent cog.

AJL · 2019-08-11 03:26

Packing the mask long into nibbles instead of bytes doesn't really gain anything.
How about something like this using bytes:

DAT org 
_asm_start      mov     ptr_to_mbox,    ptra        ' save mbox address for later
                '' get parameters from spin
do_init         mov     ptra,       ptr_to_mbox     ' return ptra back to mbox  
                rdlong  tmp,        ptra++          ' command long 0; not used
                rdbyte pinDO,      ptra++         ' pins long 1
                rdbyte pinCLK,     ptra++
                rdbyte pinDI,       ptra++
                rdbyte pinCS,      ptra++
                rdlong sp_doclk,        ptra              ' mask long 2
                getbyte  sp_diclk,   sp_doclk,    #1 
                getbyte  sp_inv_pin, sp_doclk,    #2                               
                shl     sp_doclk,   #24                
                shl     sp_diclk,   #24       
                shl     sp_inv_pin, #14         
                rdlong  delay1s,    #$14            ' clkfreq from hub                  
                '' setup pins

Slightly smaller memory footprint, and saves double handling the data when it isn't necessary.

cheezus · 2019-08-11 05:40

AJL wrote: »

Packing the mask long into nibbles instead of bytes doesn't really gain anything.
How about something like this using bytes:

DAT org 
_asm_start      mov     ptr_to_mbox,    ptra        ' save mbox address for later
                '' get parameters from spin
do_init         mov     ptra,       ptr_to_mbox     ' return ptra back to mbox  
                rdlong  tmp,        ptra++          ' command long 0; not used
                rdbyte pinDO,      ptra++         ' pins long 1
                rdbyte pinCLK,     ptra++
                rdbyte pinDI,       ptra++
                rdbyte pinCS,      ptra++
                rdlong sp_doclk,        ptra              ' mask long 2
                getbyte  sp_diclk,   sp_doclk,    #1 
                getbyte  sp_inv_pin, sp_doclk,    #2                               
                shl     sp_doclk,   #24                
                shl     sp_diclk,   #24       
                shl     sp_inv_pin, #14         
                rdlong  delay1s,    #$14            ' clkfreq from hub                  
                '' setup pins

Slightly smaller memory footprint, and saves double handling the data when it isn't necessary.

I'm packing into nibbles so I have extra bits later if I need them. Not real sure HOW to use them yet so it might be a waste. I was thinking about different ways I could do this and I really wanted it to all fit into the mailbox. I finally got the init sequence the way I wanted, with all of the parameters being loaded by the normal command loop.

DAT org 
[code]
DAT     
                org                                                     
_asm_start      mov     ptr_to_mbox,    ptra                ' save mbox address for later
                '' get parameters from spin 
get_cmd         rdlong  cmd_in,         ptra    wz          '
                '' timeout and release card after 10s       
        if_z    jmp     #get_cmd                            ' no cmd so handle time                               
                rdlong  arg1_in,        ++ptra               
                rdlong  arg2_in,        ++ptra               
                rdlong  arg3_in,        ++ptra                
                rdlong  err_out,        ++ptra              ' dummy read to align err pointer
                '' decode command                           
                cmp     cmd_in,         #"i"    wz          
        if_z    jmp     #do_init    
                cmp     cmd_in,         #"c"    wz          
        if_z    call    #card_init                       
                cmp     cmd_in,         #"r"    wz          
        if_z    jmp     #read_a                             
                cmp     cmd_in,         #"w"    wz          
        if_z    jmp     #write_b           
'                cmp     cmd_in,         #"z"    wz          
'        if_z    jmp    #release                                                                      
                mov     err_out,        #5                  ' err = no command                 '               
               '' do commands                               
end_cmd         drvh    pinCS                               ' set cs high
                wrlong  err_out,        ptra                                                     
                mov     cmd_in,         #0                  ' signal done to spin driver                 
                mov     ptra,           ptr_to_mbox         ' return ptra back to mbox
                wrlong  cmd_in,         ptr_to_mbox         
                jmp     #get_cmd                            
                                                                                                                                        

do_init         getbyte pinDO,          arg1_in,    #0      
                getbyte pinCLK,         arg1_in,    #1        
                getbyte pinDI,          arg1_in,    #2        
                getbyte pinCS,          arg1_in,    #3                                                   
                getnib  sp_doclk,       arg2_in,    #0      
                getnib  sp_diclk,       arg2_in,    #1      
                getnib  sp_inv_pin,     arg2_in,    #2                
                shl     sp_doclk,       #24                 
                shl     sp_diclk,       #24                                
                shl     sp_inv_pin,     #14                 
                mov     sp_bt,          arg3_in             ' get max clock
                mov     sp_init_bt,     err_out             ' get 400k clock
                mov     err_out,        #0                  '  err = no error               
                rdlong  delay1s,        #$14                ' clkfreq from hub                                                                 
                dirl    pinCLK      '' setup Clock pin                 
                mov     spare,          ##sp_clk            ' transition           
                add     spare,          sp_inv_pin          ' invert clock 
                wrpin   spare,          pinCLK              ' set clock mode
                wxpin   sp_init_bt,     pinCLK              ' set base period (*2) 
                dirh    pinCLK                              ' enable clk     
                dirl    pinDI       '' setup DI pin                  
                mov     spare,          ##sp_stx            ' sync tx
                add     spare,          sp_diclk            ' setup bpin
                wrpin   spare,          pinDI               ' set di mode
                wxpin   #sp_stx_c,      pinDI               ' set di config
                dirh    pinDI                               ' enable smartpin
                wypin   #$FF,           pinDI               ' prime shifter                                                                            
                dirl    pinDO       '' setup DO pin                
                mov     spare,          ##sp_srx            ' sync rx
                add     spare,          sp_doclk            ' setup bpin
                wrpin   spare,          pinDO               ' set do mode
                wxpin   #sp_srx_c,      pinDO               ' set do config   
                dirh    pinDO                               ' enable pin                                
                call    #card_init                          
                jmp     #end_cmd

I'm pretty happy with this although the memory footprint is a little wasteful, with cluso's variables in cog. I could probably squeeze out a few cog longs but it's getting there.

I keep thinking about ways to deal using the LUT and I've got a proposal. Instead of passing a positive number for the hub address for block read/write, if you pass a negative number it will be loaded to LUT. This would give us 4 pages, I just need to figure out how to handle loading cog2 and doing something with the data. Since the LUT can't be accessed in bytes like the cogram can, i'm not sure how much more efficient packing bytes into longs and copying to LUT will be. I think that's the next step though.

I did separate the SD card init from pins init, to allow mounting of cards without resetting pins. I need to setup commands for this, as well as handle release properly. I also want to implement an auto-release timeout too, just need to try to implement it.

msrobots · 2019-08-11 09:53

shorten this up.

    long[@cmd] := "i"   'set flag to know if the started cog has read its parameters
    long[@cmd][1] :=  _DO | (_CLK << 8) + (_DI << 16) + ( _CS << 24)' pins 
    long[@cmd][2] := getmask(_CLK, _DO) + (getmask(_CLK, _DI) << 4) ' clock masks
    long[@cmd][2] += spi_clk_inv << 8                                 ' clock invert
    long[@cmd][3] := GetBitTime(spi_clk_max)                        ' spi bit time
    long[@cmd][4] := GetBitTime(400_000)                            '400k

=

    cmd := "i"   'set flag to know if the started cog has read its parameters
    cmd[1] :=  _DO | (_CLK << 8) + (_DI << 16) + ( _CS << 24)' pins 
    cmd[2] := getmask(_CLK, _DO) + (getmask(_CLK, _DI) << 4) ' clock masks
    cmd[2] += spi_clk_inv << 8                                 ' clock invert
    cmd[3] := GetBitTime(spi_clk_max)                        ' spi bit time
    cmd[4] := GetBitTime(400_000)                            '400k

Mike

cheezus · 2019-08-11 10:09

I'm going to release RC4 here, it's a major code cleanup. I tried to use all the instructions where applicable. Things clean up nicely this way. I know that with 3 operand instructions vs 2 regular instructions it's only readability but still, I LOVE P2ASM!!!!
I'm also including Ymodem with this release, although it appears receive is broken. I think it's baud related, or maybe it's a buffer overflow issue since smartpin serial isn't buffered? I'd think that would show up as a problem sending a file TO the sd card?? I'll debug that as soon as I'm done with this SD driver

I updated the above code blurbs to reflect the new versions. Now that I've got cog variables cleaned up a bit and I'm getting ready to try to run a file I have a couple final questions. First to verify the file that is renamed and placed on card is the .EEPROM file created? (?duh?) Most important, if I create a constants section with the rom hook addresses... what's the proper JMP convention? I'm SOOOO lost on that part!

DAT
…
…
        jmp     #$FC578     'jump into rom code?
…
…

CON P2_REVa_ROM_ADDRESS_TABLE
  _Start_SDcard    = $FC560    
  _Run_SDfile      = $FC578    
  _SDcard_Init     = $FC5A4    
  readFILE         = $FC900    
  search_dir       = $FC840    

  _reset_booter    = $FCA78    
  _Start_Monitor   = $FCA88    
  _Enter_Monitor   = $FCA8C    
  _Redo_Monitor    = $FCA9C    
  _SerialAddr      = $FCAA8    
  _SerialBaud      = $FCAB0    
  _SerialInit      = $FCAB8    

  _HubTxCR         = $FCAE4    
  _HubTxRev        = $FCAEC    
  _HubTx           = $FCAF0    
  _HubRx           = $FCB10    
  _HubHexRev       = $FCB24    
  _HubHex8         = $FCB28    
  _HubHex          = $FCB2C    
  _HubTxStrVer     = $FCB9C    
  _HubTxString     = $FCBA4    
  _HubListA2H      = $FCBC4    
  _HubList         = $FCBC8    
  _RdLongCogHub    = $FCF34    
  _HubRxStrMon     = $FCCC4    
  _HubRxString     = $FCCCC    
  _HubMonitor      = $FCD78    

  _Download        = $FCEA4    
  _ParseHex        = $FCF68

I'm also thinking about the whole pushing data to LUT. I guess as far as raw performance goes it would help to use the FIFO, so I should probably start reading up on that... LUTS seems like the way to go, although if you have to pack / unpack bytes it might not be as efficient? It would be wasteful to only get one block read/write into luts and I think it would also kinda defeat the purpose?!?

*edit

Thanks for the suggestion @msrobots, here's the updated version. I made the change to the release real quick.

PUB start_explicit( _DO, _CLK, _DI, _CS ) : card_type | domask, dimask
    stop
    cmd := "i"   'set flag to know if the started cog has read its parameters
    arg1 :=  _DO | (_CLK << 8) + (_DI << 16) + ( _CS << 24)' pins     
    arg2 := getmask(_CLK, _DO) + (getmask(_CLK, _DI) << 4) ' clock masks
    arg2 += spi_clk_inv << 8                               ' clock invert
    arg3 := GetBitTime(spi_clk_max)                        ' spi bit time
    err := GetBitTime(400_000 )                            '400k
    cog := cognew(@_asm_start,@cmd) + 1
    if cog
        repeat until cmd <> "i"     ' wait until cog is done reading parameter

Still looking for "fixes" for these -

... ' ex1

send_byte       SHL     data_out,   #32-8       ' Set MSB First
                REV     data_out                ' Set MSB First
…

… 'ex2
                rev     data_in                           
                and     data_in,    #$ff                    
read_byte_ret   ret

Discussing SD Drivers (depreciated fsrw)

Comments