Taking a fresh look at Tachyon since I first implemented it in 2012 I got to thinking what would happen if I used 16-bits instead of 8-bits for the opcode? I know it takes up twice as much memory for the simpler functions and as soon as I encode a literal it takes up 4 bytes. Originally even high-level calls were only 2 bytes long since they used a vector table but V3 cut back on the preset table so many calls are now 3-bytes anyway.
Nonetheless I started playing with some code so that a 16-bit opcode could still efficiently call PASM code while only wasting one extra cycle. Addresses above the usable cog range of 0..$1C0(ish) would automatically call a high-level function which means it would push and replace the IP (instruction pointer) onto the return stack. This is good as I don't need to waste a 16-bit opcode plus address to tell it to call a function and it seems a bit like a conventional 16-bit address Forth.
But it's an opcode interpreter and not a dumb address interpreter so any address above hub ram is interpreted as a 15-bit literal and pushed onto the stack. That's great and that's even more compact and efficient than bytecode and because now there is no need to use up cog code for fast constants either (in addition to the elimination of absolute and vectored CALL code).
0018(0000) 0C 16 BC 04 | doNEXT rdword instr,IP 'read word code instruction
001C(0001) 02 18 FC 81 | add IP,#2 wc 'advance IP to next wordcode (clears the carry too!)
0020(0002) 09 16 7C 2A | shr instr,#9 nr,wz ' cog or hub?
0024(0003) 0B 00 28 5C | if_z jmp instr 'execute the code by directly indexing the first 512 longs in cog
0028(0004) 0F 16 7C 2A | shr instr,#15 nr,wz ' embedded 15-bit literal?
002C(0005) 09 00 54 5C | if_nz jmp #PUSH15 ' push this literal without having to do a call/return
0030(0006) 0A 14 FC 5C | call #SAVEIP ' otherwise this is an address of high-level word code
0034(0007) 0B 18 BC A0 | mov IP,instr ' so after saving the IP, load it with new address
0038(0008) 00 00 7C 5C | jmp #doNEXT
There is also bit 0 which is not required for high-level hub calls as all hub code is word aligned but I may use it to tell the code interpreter not to bother pushing the IP onto the return stack, in effect causing it to jump.
Branch runtime optimization is already implemented in Tachyon as a branch stack for DO/LOOP & FOR/NEXT anyway while BEGIN was tried that way but left out due to cog memory constraints. So I could reimplement this at least for BEGIN so that it is actually a compiled codeword that pushes the IP onto the branch stack which is then used by WHILE/REPEAT/UNTIL/AGAIN.
One thing that bothers me is an IF/ELSE branch at present requires one bytecode plus one displacement byte but that could double the memory usage with DAWN.
!Idea - since the top of hub ram is needed for buffers we wouldn't be calling code there so how about I encode $7Fxx as an IF+forward displacement and do similar for ELSE as $7Exx. Maybe I could do the same for BEGIN loops which have a negative displacement so that UNTIL is $7Dxx, REPEAT/AGAIN as $7Cxx. If I make the displacement signed and word aligned I just need two "opcodes" Wow! way better now.
Also any literal > $7FFF needs to be encoded as 32-bits plus the opcode word = 6 bytes total (vs 3-5 in bytecode). I may just have to have a -1 as a unique opcode. The compiler can still make use of double-opcode macros too (i.e. NIP=SWAP,DROP)
$0000..$01FF Directly call PASM cog code at this address
$0200..$7BFE Call wordcode subroutine (push IP)
$7E00..$7EFF Conditional relative +/- word jump
$7F00..$7FFF Relative +/- word jump
$8000..$FFFF Embedded 15-bit unsigned literal
Variables and constants will no longer sit in code space requiring an aligned opcode and to be called as they did but will simply be compiled as an embedded or long literal using only the information found in the header which is a record in the dictionary. Variables are mostly read and before used an appropriate fetch (C@,W@,@) although with the lexing of source code we could take a variable such as myvar and simply say myvar@ or something similar for correct code to be compiled. Same again for myvar! too.
Here's just a very quick and incomplete rundown of changes and features -
Faster than bytecode - most operations do not require to read any additional hub code
Direct access to all cog code rather than having to page into the upper 256 longs.
More cog space to include critical functions (elimination of fast constants and call types etc)
Embedded 15-bit literals - handles all hub addresses in one operation
Elimination of call vector table
Fast internal data stack - still with top 4 registers directly addressable.
Separate dataspace (code/data/dictionary)
Symbols such as variables and constants are "code-less" and simply compile a literal inline without the need to call code.
All calls (or jumps) only ever consumes a 16-bit opcode (vs bytecode+16-bit address)
Text input is "executed" via a 128 word lookup table character by character rather than by text word in earlier Tachyon (compared to by line in Forths)
Numbers are built digit by digit rather than processed as a string. (expect floating point to be encoded too)
Syntactical preprocessing and optimization (embedded symbols/parenthesis/braces)
Length of variable encoded into header for correct width access and indexing.
I've also been playing with directly handling C-style syntax, it's quite interesting what can be accomplished while being able to freely mix both styles (and interactively).
This is a work in progress as I am not really starting from scratch as I have the whole Tachyon kernel to overhaul. However I will update this thread with my incremental results for those who are interested.