Shop OBEX P1 Docs P2 Docs Learn Events
Stuck again porting assembly from p1 to p2 (fun with bit banging spi) — Parallax Forums

Stuck again porting assembly from p1 to p2 (fun with bit banging spi)

Greg LaPollaGreg LaPolla Posts: 323
edited 2021-02-11 05:21 in Propeller 2

I am working on porting a touchscreen driver. I think the test line in the following code is causing an issue, but not completely sure.

spibyte         mov     count, #8
.loop
                rcl     op, #1  wc
                waitx   t
                add     t, bitdel
                drvh    SCK_PIN
                muxc    outa, MOSI_PIN
                waitx   t
                add     t, bitdel
                drvl    SCK_PIN
                test    MISO_PIN, ina  wc
                rcl     resultword, #1
                djnz    count, #.loop
                ret

Comments

  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2021-02-11 05:43

    I find that reading and writing on the same clock are rarely if ever needed. I prefer to write when I write and read when I read.
    The P2 has much better instructions for pins and a range of pins but also you need to wait for the signal to propagate through before testing.

    Here's my read routine.

    ' SPIRD ( dat -- dat+rd )
    SPIRD           rep     @sre1,#8        ' 8 bits
                    outnot  sck             ' clock (low high)
                    waitx  sdly
                    outnot  sck
    RWAIT           waitx   sdly            ' sdly = 0 default  
                    nop
                    testp   miso wc         ' read data from card
                    rcl     a,#1            ' shift in msb first
    sre1            ret
    
  • Cluso99Cluso99 Posts: 18,069

    Firstly, there is a touchscreen driver that @MRoberts added to my LCD driver a week or so ago.
    This also contains a routine to write SPI data to the LCD.

    Next, be aware that you are sampling before the clock that precedes the toggle because of the internal clock buffering in the P2. Here is a timing snippet from my SD driver. It's unravelled for speed but you should get the idea. You can also look at the SD boot (hubexec) code in the P2 ROM.

    There's also a diagram in Chip's datasheet that shows the clock alignment. I've done one

    'read 512 bytes: wkg but we are sampling 3 clocks after CLK=H and 2 clocks before CLK=L   48714 clocks (was 194742)
    ' Note: From testing at 200MHz there are 7 clocks between OUTx and TESTP and 8 clocks between OUTx and TESTB
    ' Note: Both TESTP or TESTB alternatives shown :)
                    WRFAST  #0,               PTRB           ' use streamer /fifo
                    rep     @.reprd,          sdx_bytecnt    ' read 512 bytes (or 16 bytes if CSD/CID)
    '               outl    #sdx_ck                          ' CLK=0  (already 0 first time)
    .s7             outh    #sdx_ck                          ' CLK=1
                    mov     sdx_reply,        #0             ' clear reply
                    outl    #sdx_ck                          ' CLK=0
                    nop
    .s6             outh    #sdx_ck                          ' CLK=1
                    testp   #sdx_do                      wc  '\ read input bit7:  sample on/after prior CLK rising edge
    '               testb   inb,             #sdx_do-32  wc  '/ read input bit7:  sample on/after prior CLK rising edge
                    outl    #sdx_ck                          ' CLK=0
                    rcl     sdx_reply,        #1             ' accum DO input bit 7
    .s5             outh    #sdx_ck                          ' CLK=1
                    testp   #sdx_do                      wc  '\ read input bit6:  sample on/after prior CLK rising edge
    '               testb   inb,             #sdx_do-32  wc  '/ read input bit6:  sample on/after prior CLK rising edge
    
    1027 x 667 - 204K
  • @Cluso - I would always qualify that statement that "it's unrolled for speed" by saying "because this is hubexec code". But not everyone is keen to run time critical code in hubexec mode so if it was in cog then a REP will loop it with zero overhead.

  • Cluso99Cluso99 Posts: 18,069
    edited 2021-02-11 10:42

    @"Peter Jakacki" said:
    @Cluso - I would always qualify that statement that "it's unrolled for speed" by saying "because this is hubexec code". But not everyone is keen to run time critical code in hubexec mode so if it was in cog then a REP will loop it with zero overhead.

    No! It's running in cog or lut - cannot recall which but it's certainly not hubexec. It's contained totally within the cog. excepting the mailbox of 4 longs and the sector buffer of 512 longs.

  • @Cluso - so there is no need to unroll it then if you use a REP. Oh, I see is that a REP on the whole sector then? Most of that time is waiting for the sector to be ready and the only real way of speeding it up overall is to use multi-block read mode so there is no real delay between contiguous sectors.

  • Cluso99Cluso99 Posts: 18,069
    edited 2021-02-11 12:08

    You cannot speed up the sd card, but once it's ready to go, this made a big difference. Yes, multi-sector would help if what you're reading is large blocks. It's dependant on the code, and that's doable - i've just not implemented the multi-sector command 23?

    When every bit is being clocked at 4 instructions instead of 6-10 followed by extra clocks between every byte and a few clocks * 8 * 512 adds up quickly.

    Once the card gets going it's quite fast. It's the massive delays in accessing the first sector which is not just the next one in line.

  • @Cluso99 said:
    You cannot speed up the sd card, but once it's ready to go, this made a big difference. Yes, multi-sector would help if what you're reading is large blocks. It's dependant on the code, and that's doable - i've just not implemented the multi-sector command 23?

    When every bit is being clocked at 4 instructions instead of 6-10 followed by extra clocks between every byte and a few clocks * 8 * 512 adds up quickly.

    Once the card gets going it's quite fast. It's the massive delays in accessing the first sector which is not just the next one in line.

    Technically, a file "could' be fragmented, but even if it were, and I find that that's not the case, there are still 64 sectors in one 32kB cluster and those sectors can be read sequentially in multi-block mode. In fact even though I don't have any file fragmented, I always use 64kB clusters which is the maximum that FAT32 itself permits.

Sign In or Register to comment.