Shop OBEX P1 Docs P2 Docs Learn Events
P2 Taqoz V2.8: Placing Assembler Routines into LUT RAM — Parallax Forums

P2 Taqoz V2.8: Placing Assembler Routines into LUT RAM

Christof Eb.Christof Eb. Posts: 1,106
edited 2023-06-09 08:53 in Forth

While long assembler routines without jumps are fast, if they are executed from HUB, small routines are not so fast, because the microcache does not help and they have to wait for HUB access window.
In Taqoz there is cogmod, which can execute code from COG RAM, but the space is very limited and this space is better used for register variables, which are very much faster than RDLONG.

In the upper part of LUT there seems to be plenty of space. Also this part cannot be accessed directly in assembler, so let's use that.

{   
   LutModulesA.fth

   09.06.23 Christof Eberspaecher 

}

!polls \ clear multitasking

IFDEF *Tests*   
    oldorgT org
    FORGET *Tests*   }
pub *Tests*     PRINT" P2 Test" ;

0 bytes oldorgT

: LutMod_Library ; \ ====================================================

long lutPC
168 lutPC ! \ depends on usage of L-stack, which starts at $80 128dez

long startPC   long endPC

FORTH ASSEMBLER
: getStart \ directive: get the start of a routine
   _pc @ startPC !
;
: getEnd \ directive: get the end of a routine
   _pc @ endPC !
;
FORTH

: mov2Lut \ move the last routine to LUT
   startPC @
   begin
      dup @ lutPC @ lut!
      lutPC ++
      4 +
   dup endPC @ => until
   drop
;

\ Example: ===========================================================

create$ incTos lutPC @ 512 + @words CPA W! \ place before the code
code (incTos)
   getStart ' assembler directive
   _ret_ add a,#1
   getEnd   ' assembler directive
end
mov2Lut \ place after the code

\ TAQOZ# 1 incTos . --- 2  ok


create$ fiboLutAsm lutPC @ 512 + @words CPA W!
code fiboAsm ( n1 -- f ) '\ n2=n1'th fibonacci number
   getStart
       mov xx,#0
       mov yy,#1
       mov zz,#0
       FOR:
          mov xx,yy
          mov yy,zz
          add zz,xx
       NEXT: a
   _ret_    mov a,zz
   getEnd
end
mov2Lut

46 lap fiboAsm lap .lap
46 lap fiboLutAsm lap .lap

As the code is first compiled for the HUB RAM location, it is necessary to be aware of absolute or relative jumps!

Recommended is the use of the version _BOOT_P2.BIX in Taqoz.zip in https://sourceforge.net/projects/tachyon-forth/files/TAQOZ/binaries/ .

In a little fibonacci benchmark, the speed executing from LUT is better than twice as fast in comparison to HUB execution:
TAQOZ# 46 lap fiboAsm lap .lap --- 1,184 cycles= 5,920ns @200MHz ok
TAQOZ# 46 lap fiboLutAsm lap .lap --- 488 cycles= 2,440ns @200MHz ok

Have fun! Christof

Sign In or Register to comment.