@cgracey said:
I am glad you are using the interpreter. So much thought and effort has gone into it. I think, initially, it was around 4KB. Now it's just a little over 6KB and does WAY more. I am always thinking about how it could be improved and how future silicon could make it faster.
Careful, you'll end up with something that looks a lot like an x86 processor
(in the sense that they optimize for being able to run a set of complex[citation needed] bytecodes really quickly)
I'd guess the biggest cycle contribution on P2 is just grabbing data from the stack in hub RAM at 9..16 cycles cost, which is sort of a general pain point for anything that can't easily be reformulated into using FIFO or block transfer. The actual 6 cycle cost of entering the bytecode handler is probably negligable.
Maybe the trick would be to keep the "live" expression stack as a separate entity in cogRAM. Would have to have the compiler work around it if an expression tree ever gets deeper than 16 or so values...
Stephen Moraco had Claude Opus analyze it to make suggestions for how to make it more efficient. It discovered, by looking at the compiler, too, that NEXT/QUIT are always followed by a pop bytecode, so the NEXT/QUIT bytecodes could also do the pop, automatically. The thing is, those are seldom used and it's probably not worth blowing up the interpreter for, unless there is a very sympathetic way to do it.
Since when would there be a dedicated NEXT/QUIT bytecode? Aren't NEXT/QUIT just jumps (with appropriate pops to clear loop state in front of them)? Don't let yourself get brainrotted by the AI slop, you're too good for that.
Ha, yes, I think they are just jumps. I wasn't at my computer, so I was kind of making stuff up like AI does. The whole thing has gotten so complex that it's hard for me to remember all the details.
I have thought the same things about putting the stack in cog RAM, but the checks for overflows and underflows would be about as long as reading or writing the hub. So, I've kind of come to the conclusion that there needs to be some kind of stack virtualizer that is kind of like the FIFO. Your idea of having the compiler track stack depth is good, but, I think we would need extra bytecodes to handle instances where the stack must be flushed or loaded.
I just remembered that the RDLUT/WRLUT instructions can take PTRA/PTRB expressions, just like RDLONG/WRLONG can. RDLUT tales 3 clocks. whereas RDLONG takes 9-16 clocks. WRLUT takes 2 clocks, whereas WRLONG takes 3-10 clocks. So, the LUT could be used as a faster stack, but then there'd be less room for code. Stack-based machines are easy to design. Working out variable lifetimes for good compilation is a lot more complicated.
Comments
Ha, yes, I think they are just jumps. I wasn't at my computer, so I was kind of making stuff up like AI does. The whole thing has gotten so complex that it's hard for me to remember all the details.
I have thought the same things about putting the stack in cog RAM, but the checks for overflows and underflows would be about as long as reading or writing the hub. So, I've kind of come to the conclusion that there needs to be some kind of stack virtualizer that is kind of like the FIFO. Your idea of having the compiler track stack depth is good, but, I think we would need extra bytecodes to handle instances where the stack must be flushed or loaded.
Flexspin puts local variables in cogRAM. That is except when referenced. Then they go on the stack.
As such, when writing code, I explicitly create local copies of globals when I know there is repeated use in a function.
PS: I wouldn't know if the interpreter uses cogRAM automatically for locals. Correct me if so.
I just remembered that the RDLUT/WRLUT instructions can take PTRA/PTRB expressions, just like RDLONG/WRLONG can. RDLUT tales 3 clocks. whereas RDLONG takes 9-16 clocks. WRLUT takes 2 clocks, whereas WRLONG takes 3-10 clocks. So, the LUT could be used as a faster stack, but then there'd be less room for code. Stack-based machines are easy to design. Working out variable lifetimes for good compilation is a lot more complicated.