I've started working on the new Spin.
At first, it's going to be interpreted byte code. In-line assembly will be allowed, though. Also, every assembly instruction, except branches and context-dependent stuff like ALTI and AUGS, has a procedure form in Spin, making all hardware functions readily accessible. It actually takes a huge load off Spin, by circumventing the need to re-dress lots of functions. If you look at the spreadsheet linked to in the "Prop2 FPGA Files!!!" thread, you can see how they work. They will get people rapidly acquainted with how the actual instructions work. There are local bit variables, CF and ZF, which get used and updated by the pasm-instruction procedures.
For the run-time data and call stacks, the LUT will be used. The user can set how much is otherwise available for his own use. This means that there is no need to declare a stack in hub space. It's now implied. The limitation is up to 512 longs, but that's going to be fine for programs that don't call themselves recursively.
To fetch code bytes, 'RDBYTE b,PTRA++' could be used, but I came up with a faster way of doing it, while keeping the FIFO free for the user:
' Get new block, then bytes via 'calld b_ret,b_call'
new_block setq #15
new_long alts block_long,#block_base 'get byte %xxxx00
alts block_long,#block_base 'get byte %xxxx01
alts block_long,#block_base 'get byte %xxxx10
alts block_long,#block_base 'get byte %xxxx11
incmod block_long,#15 wc 'another long?
if_nc jmp #new_long
add block_addr,#64 'read next block
block_base res 16
block_long res 1
block_addr res 1
b_call res 1
b_ret res 1
That only needs one instruction (a CALLD) per byte fetch and it takes 12 clocks, plus another ~50 clocks every 64th byte. The simple RDBYTE takes 9..24 clocks.