P2 cogexec vs hubexec vs overlay timings

Curious to see just how cogexec vs hubexec vs overlay timings would stack up, I ran some tests...
Overlay loads the hub block into cog and executes from cog.

Here is the routine that I tested, with a small loop repeated 'loops' times where 1=no loop.
              getct     ctr1
              call      #olay             ' *** cogexec ***
              getct     ctr2
        .............
              rdlong    y,olayptr                       ' set known hub rotation
              getct     ctr1
              call      #hub_test                       ' *** hubexec test ***
              getct     ctr2
        .............
              rdlong    y,olayptr2                      ' set known hub rotation
              getct     ctr1
              setq      #hub_test2_end - hub_test2 -1   ' *** overlay length to load
              rdlong    olay, olayptr        
              call      #olay
              getct     ctr2a
        .............
hub_test
              mov       z, loopctr              ' 1
.loop         add       x, #1                   ' 2
              add       x, #1                   ' 3
              add       x, #1                   ' 4
              add       x, #1                   ' 5
              add       x, #1                   ' 6
              add       x, #1                   ' 7
              add       x, #1                   ' 8
              add       x, #1                   ' 9
              add       x, #1                   '10

              add       x, #1                   ' 1
              add       x, #1                   ' 2
              add       x, #1                   ' 3
              add       x, #1                   ' 4
              add       x, #1                   ' 5
              add       x, #1                   ' 6
              add       x, #1                   ' 7
              add       x, #1                   ' 8
              add       x, #1                   ' 9
              add       x, #1                   '10

              add       x, #1                   ' 1
              add       x, #1                   ' 2
              add       x, #1                   ' 3
              add       x, #1                   ' 4
              add       x, #1                   ' 5
.loop2        add       x, #1                   ' 6
              add       x, #1                   ' 7
              add       x, #1                   ' 8
              add       x, #1                   ' 9
              djnz      z, #.loop2               '10

              ret
hub_test_end
and here are the results (clocks in hex)...
loops:  cogexec  hubexec  overlay  
0A:     000000B2 00000128 000000DE (178 296 222)
09:     000000A6 00000110 000000D2 
08:     0000009A 000000F8 000000C6 
07:     0000008E 000000E0 000000BA 
06:     00000082 000000C8 000000AE 
05:     00000076 000000B0 000000A2 (118 176 162)
04:     0000006A 00000098 00000096 
03:     0000005E 00000080 0000008A ( 94 128 138)
02:     00000052 00000068 0000007E 
01:     00000046 00000051 00000072 (70 81 114)
So roughly there is an 11 clock overhead for hubexec over cogexec for each load/loop.
And overlays don't seem worth the trouble unless there are a lot of loops.
My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
Website: www.clusos.com
Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
Sign In or Register to comment.