P2 Taqoz V2.8: Placing Assembler Routines into LUT RAM
While long assembler routines without jumps are fast, if they are executed from HUB, small routines are not so fast, because the microcache does not help and they have to wait for HUB access window.
In Taqoz there is cogmod, which can execute code from COG RAM, but the space is very limited and this space is better used for register variables, which are very much faster than RDLONG.
In the upper part of LUT there seems to be plenty of space. Also this part cannot be accessed directly in assembler, so let's use that.
{ LutModulesA.fth 09.06.23 Christof Eberspaecher } !polls \ clear multitasking IFDEF *Tests* oldorgT org FORGET *Tests* } pub *Tests* PRINT" P2 Test" ; 0 bytes oldorgT : LutMod_Library ; \ ==================================================== long lutPC 168 lutPC ! \ depends on usage of L-stack, which starts at $80 128dez long startPC long endPC FORTH ASSEMBLER : getStart \ directive: get the start of a routine _pc @ startPC ! ; : getEnd \ directive: get the end of a routine _pc @ endPC ! ; FORTH : mov2Lut \ move the last routine to LUT startPC @ begin dup @ lutPC @ lut! lutPC ++ 4 + dup endPC @ => until drop ; \ Example: =========================================================== create$ incTos lutPC @ 512 + @words CPA W! \ place before the code code (incTos) getStart ' assembler directive _ret_ add a,#1 getEnd ' assembler directive end mov2Lut \ place after the code \ TAQOZ# 1 incTos . --- 2 ok create$ fiboLutAsm lutPC @ 512 + @words CPA W! code fiboAsm ( n1 -- f ) '\ n2=n1'th fibonacci number getStart mov xx,#0 mov yy,#1 mov zz,#0 FOR: mov xx,yy mov yy,zz add zz,xx NEXT: a _ret_ mov a,zz getEnd end mov2Lut 46 lap fiboAsm lap .lap 46 lap fiboLutAsm lap .lap
As the code is first compiled for the HUB RAM location, it is necessary to be aware of absolute or relative jumps!
Recommended is the use of the version _BOOT_P2.BIX in Taqoz.zip in https://sourceforge.net/projects/tachyon-forth/files/TAQOZ/binaries/ .
In a little fibonacci benchmark, the speed executing from LUT is better than twice as fast in comparison to HUB execution:
TAQOZ# 46 lap fiboAsm lap .lap --- 1,184 cycles= 5,920ns @200MHz ok
TAQOZ# 46 lap fiboLutAsm lap .lap --- 488 cycles= 2,440ns @200MHz ok
Have fun! Christof