PropBASIC LMM JumpTable

For my latest PropBASIC project (VGA,Sound,Controller interface for the ZX81) I needed a jump table for some LMM code.
With a little inline assembly, I got it working.
This is for LMM code ONLY!!!
!!! NOTICE Do not put spaces between the @ the forum doesn't display correctly. !!!
Bean
With a little inline assembly, I got it working.
This is for LMM code ONLY!!!
regIndex = regIndex MAX 34 ' Only 35 entries in jump table
' Jump Table
ASM
MOV __param1,regIndex
ADD __param1,#1 ' Skip RDLONG instruction
SHL __param1,#2 ' 4 bytes per long
ADD __param1,__PC
RDLONG __PC,__param1
LONG @ @ @LSB0
LONG @ @ @LSB1
LONG @ @ @LSB2
LONG @ @ @LSB3
LONG @ @ @LSB4
LONG @ @ @LSB5
LONG @ @ @LSB6
LONG @ @ @LSB7
LONG @ @ @LSB8
LONG @ @ @LSB9
LONG @ @ @LSB10
LONG @ @ @LSB11
LONG @ @ @LSB12
LONG @ @ @LSB13
LONG @ @ @LSB14
LONG @ @ @LSB15
LONG @ @ @LSB16
LONG @ @ @LSB17
LONG @ @ @LSB18
LONG @ @ @LSB19
LONG @ @ @LSB20
LONG @ @ @LSB21
LONG @ @ @LSB22
LONG @ @ @LSB23
LONG @ @ @LSB24
LONG @ @ @LSB25
LONG @ @ @LSB26
LONG @ @ @LSB27
LONG @ @ @LSB28
LONG @ @ @LSB29
LONG @ @ @LSB30
LONG @ @ @LSB31
LONG @ @ @LSB32
LONG @ @ @LSB33
LONG @ @ @LSBHandled
ENDASM
!!! NOTICE Do not put spaces between the @ the forum doesn't display correctly. !!!
Bean
Comments
Bean
Basically the way it works is that the man LMM loop is the same, but the LMM_JUMP routine for handling branches checks for short backwards branches. If it finds one, it loads the LMM instructions from the destination up to the current jump into a COG cache space and jumps into it there, where it can run at full COG speed.
I haven't tried it with PropBASIC specifically, but it's a general LMM feature so it should work with any compiler. The current source code (as tested in fastspin) is attached.
Performance of Heater's fftbench benchmark (Spin version, compiled with fastspin):
plain LMM: 233538 us unrolled LMM: 154896 us auto-cache LMM: 58332 us (128 instruction COG cache) compiler FCACHE: 63019 us (128 instruction COG cache)
In this particular case the auto-cache LMM beats the compiler. With smaller COG caches the compiler FCACHE does better, if I remember my testing correctly. I've still left the compiler FCACHE as the default in fastspin, since I've already done the compiler work for it, but for future projects I may switch to auto cached LMM instead.