PNut appears faster than FastSpin in certain cases

Rayman · 2020-05-23 16:38

I'm not sure how this can be, but PNut appears to be slightly faster than FastSpin in certain cases.
Seen this twice now.

One was an FSRW test of reading in a BMP (fsrw_test2.spin2). PNut appeared to be just a hair faster.
And now, I'm loading up hyperram from uSD in a loop like this:

    repeat i from 1 to 480
        sd.pread(pBuffer,640)         
        WriteHyperRamRow(pBuffer,640,i)  '(pData,nBytes,Row)  'copy nBytes from pData to Row "Row" on hyperRam.  nBytes cannot exceed 1024 (?)
        'waitus(50)
        WaitReady() 'need to either wait for writing to complete or at least about 50 us so sd doesn't overwrite before complete

Was working fine in FastSpin.
But for PNut, I had to insert a delay or use the WaitReady() function with FastSpin or else the drivers stomp on each other.

I have no idea how this can be possible though...
Everything I know says FastSpin should always be faster...

whicker · 2020-05-23 16:49

We have no insight into what's behind those function calls. Can you grab some timer ticks (getct) and measure?

Rayman · 2020-05-23 16:53

That is what fsrw_test2.spin2 does...

whicker · 2020-05-23 16:59

I mean between statements in your posted spin2 code. To determine why that waitus might be needed. And then compare the results between the two compilers.

Rayman · 2020-05-23 17:02

The only think I can think of that could cause this is that business about the FIFO having to reload on a jump when in HUBEXEC mode...

Rayman · 2020-05-23 17:23

Actually, maybe I should try FastSpin with the -O2 compiler option for optimization...

Hmm... My fsrw_test2 code doesn't work with -O2. I think it's fine anyway.

whicker · 2020-05-23 18:04

Fundamentally what I'm hearing is that the WriteHyperRAMRow( ) function is potentially returning before the write completes.

So it's acting more like set parameters for write command?

Rayman · 2020-05-23 18:43

That's what was happening... Took me a while to figure out. Thought something was wrong with FSRW for a minute...
I added a WaitReady() function to make it blocking:

        WaitReady() 'need to either wait for writing to complete or at least about 50 us so sd doesn't overwrite before complete

ersmith · 2020-05-23 18:53

The inline assembly runs in hubexec in fastspin and COG in PNut. Since you seem to make heavy use of inline assembly that probably explains the difference. I am looking at ways to force code into COG or LUT.

Rayman · 2020-05-23 18:55

Right, that's probably what is going on...

whicker · 2020-05-23 21:51

@Rayman, you could have the write command "buffer" as in wait for the previous command to complete. Otherwise trigger command and continue on immediately. Doing that can hide the call overhead.

Write1 - write info entered into command buffer and return. Write starts immediately.
Write2 - previous write still not done but buffer is available, overwrite command buffer with this write info, return.
Write3 - command buffer still full. Wait until command buffer available and then write to command buffer and return.
Write4 - etc.

Wuerfel_21 · 2020-05-23 22:07

If you want to avoid the wait entirely, pingpong between two buffers:

repeat i from 1 to 480 step 2
        sd.pread(pBuffer1,640)         
        WriteHyperRamRow(pBuffer1,640,i)
        sd.pread(pBuffer2,640)         
        WriteHyperRamRow(pBuffer2,640,i+1)

ersmith · 2020-05-23 23:06

The next fastspin will copy ORG/END blocks into LUT before executing them.

Rayman · 2020-05-23 23:08

That’s neat. When will it do that? At startup ?

Ariba · 2020-05-24 08:35

ersmith wrote: »

The next fastspin will copy ORG/END blocks into LUT before executing them.

Would it also be possible to disable any PASM optimization for such ORG - END blocks?
There may be a reason, that I write the PASM code not in the fastest way, and the compiler/optimizer can not know, what I had in mind.

Andy

ersmith · 2020-05-24 15:07

Ariba wrote: »

ersmith wrote: »

The next fastspin will copy ORG/END blocks into LUT before executing them.

Would it also be possible to disable any PASM optimization for such ORG - END blocks?

Yes, that's already done in github.

PNut appears faster than FastSpin in certain cases

Comments