Shop OBEX P1 Docs P2 Docs Learn Events
PNut appears faster than FastSpin in certain cases — Parallax Forums

PNut appears faster than FastSpin in certain cases

RaymanRayman Posts: 13,892
edited 2020-05-23 16:53 in Propeller 2
I'm not sure how this can be, but PNut appears to be slightly faster than FastSpin in certain cases.
Seen this twice now.

One was an FSRW test of reading in a BMP (fsrw_test2.spin2). PNut appeared to be just a hair faster.
And now, I'm loading up hyperram from uSD in a loop like this:
    repeat i from 1 to 480
        sd.pread(pBuffer,640)         
        WriteHyperRamRow(pBuffer,640,i)  '(pData,nBytes,Row)  'copy nBytes from pData to Row "Row" on hyperRam.  nBytes cannot exceed 1024 (?)
        'waitus(50)
        WaitReady() 'need to either wait for writing to complete or at least about 50 us so sd doesn't overwrite before complete

Was working fine in FastSpin.
But for PNut, I had to insert a delay or use the WaitReady() function with FastSpin or else the drivers stomp on each other.

I have no idea how this can be possible though...
Everything I know says FastSpin should always be faster...

Comments

  • We have no insight into what's behind those function calls. Can you grab some timer ticks (getct) and measure?
  • RaymanRayman Posts: 13,892
    That is what fsrw_test2.spin2 does...
  • whickerwhicker Posts: 749
    edited 2020-05-23 17:00
    I mean between statements in your posted spin2 code. To determine why that waitus might be needed. And then compare the results between the two compilers.
  • RaymanRayman Posts: 13,892
    The only think I can think of that could cause this is that business about the FIFO having to reload on a jump when in HUBEXEC mode...
  • RaymanRayman Posts: 13,892
    edited 2020-05-23 17:32
    Actually, maybe I should try FastSpin with the -O2 compiler option for optimization...

    Hmm... My fsrw_test2 code doesn't work with -O2. I think it's fine anyway.
  • Fundamentally what I'm hearing is that the WriteHyperRAMRow( ) function is potentially returning before the write completes.

    So it's acting more like set parameters for write command?
  • RaymanRayman Posts: 13,892
    That's what was happening... Took me a while to figure out. Thought something was wrong with FSRW for a minute...
    I added a WaitReady() function to make it blocking:
            WaitReady() 'need to either wait for writing to complete or at least about 50 us so sd doesn't overwrite before complete
    
  • The inline assembly runs in hubexec in fastspin and COG in PNut. Since you seem to make heavy use of inline assembly that probably explains the difference. I am looking at ways to force code into COG or LUT.
  • RaymanRayman Posts: 13,892
    Right, that's probably what is going on...
  • @Rayman, you could have the write command "buffer" as in wait for the previous command to complete. Otherwise trigger command and continue on immediately. Doing that can hide the call overhead.

    Write1 - write info entered into command buffer and return. Write starts immediately.
    Write2 - previous write still not done but buffer is available, overwrite command buffer with this write info, return.
    Write3 - command buffer still full. Wait until command buffer available and then write to command buffer and return.
    Write4 - etc.
  • If you want to avoid the wait entirely, pingpong between two buffers:
    repeat i from 1 to 480 step 2
            sd.pread(pBuffer1,640)         
            WriteHyperRamRow(pBuffer1,640,i)
            sd.pread(pBuffer2,640)         
            WriteHyperRamRow(pBuffer2,640,i+1)
    
  • The next fastspin will copy ORG/END blocks into LUT before executing them.
  • RaymanRayman Posts: 13,892
    edited 2020-05-23 23:08
    That’s neat. When will it do that? At startup ?
  • AribaAriba Posts: 2,682
    ersmith wrote: »
    The next fastspin will copy ORG/END blocks into LUT before executing them.

    Would it also be possible to disable any PASM optimization for such ORG - END blocks?
    There may be a reason, that I write the PASM code not in the fastest way, and the compiler/optimizer can not know, what I had in mind.

    Andy
  • Ariba wrote: »
    ersmith wrote: »
    The next fastspin will copy ORG/END blocks into LUT before executing them.

    Would it also be possible to disable any PASM optimization for such ORG - END blocks?

    Yes, that's already done in github.
Sign In or Register to comment.