P2 Taqoz V2.8: Placing Assembler Routines into LUT RAM
While long assembler routines without jumps are fast, if they are executed from HUB, small routines are not so fast, because the microcache does not help and they have to wait for HUB access window.
In Taqoz there is cogmod, which can execute code from COG RAM, but the space is very limited and this space is better used for register variables, which are very much faster than RDLONG.
In the upper part of LUT there seems to be plenty of space. Also this part cannot be accessed directly in assembler, so let's use that.
{
LutModulesA.fth
09.06.23 Christof Eberspaecher
}
!polls \ clear multitasking
IFDEF *Tests*
oldorgT org
FORGET *Tests* }
pub *Tests* PRINT" P2 Test" ;
0 bytes oldorgT
: LutMod_Library ; \ ====================================================
long lutPC
168 lutPC ! \ depends on usage of L-stack, which starts at $80 128dez
long startPC long endPC
FORTH ASSEMBLER
: getStart \ directive: get the start of a routine
_pc @ startPC !
;
: getEnd \ directive: get the end of a routine
_pc @ endPC !
;
FORTH
: mov2Lut \ move the last routine to LUT
startPC @
begin
dup @ lutPC @ lut!
lutPC ++
4 +
dup endPC @ => until
drop
;
\ Example: ===========================================================
create$ incTos lutPC @ 512 + @words CPA W! \ place before the code
code (incTos)
getStart ' assembler directive
_ret_ add a,#1
getEnd ' assembler directive
end
mov2Lut \ place after the code
\ TAQOZ# 1 incTos . --- 2 ok
create$ fiboLutAsm lutPC @ 512 + @words CPA W!
code fiboAsm ( n1 -- f ) '\ n2=n1'th fibonacci number
getStart
mov xx,#0
mov yy,#1
mov zz,#0
FOR:
mov xx,yy
mov yy,zz
add zz,xx
NEXT: a
_ret_ mov a,zz
getEnd
end
mov2Lut
46 lap fiboAsm lap .lap
46 lap fiboLutAsm lap .lap
As the code is first compiled for the HUB RAM location, it is necessary to be aware of absolute or relative jumps!
Recommended is the use of the version _BOOT_P2.BIX in Taqoz.zip in https://sourceforge.net/projects/tachyon-forth/files/TAQOZ/binaries/ .
In a little fibonacci benchmark, the speed executing from LUT is better than twice as fast in comparison to HUB execution:
TAQOZ# 46 lap fiboAsm lap .lap --- 1,184 cycles= 5,920ns @200MHz ok
TAQOZ# 46 lap fiboLutAsm lap .lap --- 488 cycles= 2,440ns @200MHz ok
Have fun! Christof
