@cgracey said:
I am glad you are using the interpreter. So much thought and effort has gone into it. I think, initially, it was around 4KB. Now it's just a little over 6KB and does WAY more. I am always thinking about how it could be improved and how future silicon could make it faster.
Careful, you'll end up with something that looks a lot like an x86 processor
(in the sense that they optimize for being able to run a set of complex[citation needed] bytecodes really quickly)
I'd guess the biggest cycle contribution on P2 is just grabbing data from the stack in hub RAM at 9..16 cycles cost, which is sort of a general pain point for anything that can't easily be reformulated into using FIFO or block transfer. The actual 6 cycle cost of entering the bytecode handler is probably negligable.
Maybe the trick would be to keep the "live" expression stack as a separate entity in cogRAM. Would have to have the compiler work around it if an expression tree ever gets deeper than 16 or so values...
Stephen Moraco had Claude Opus analyze it to make suggestions for how to make it more efficient. It discovered, by looking at the compiler, too, that NEXT/QUIT are always followed by a pop bytecode, so the NEXT/QUIT bytecodes could also do the pop, automatically. The thing is, those are seldom used and it's probably not worth blowing up the interpreter for, unless there is a very sympathetic way to do it.
Since when would there be a dedicated NEXT/QUIT bytecode? Aren't NEXT/QUIT just jumps (with appropriate pops to clear loop state in front of them)? Don't let yourself get brainrotted by the AI slop, you're too good for that.
Ha, yes, I think they are just jumps. I wasn't at my computer, so I was kind of making stuff up like AI does. The whole thing has gotten so complex that it's hard for me to remember all the details.
I have thought the same things about putting the stack in cog RAM, but the checks for overflows and underflows would be about as long as reading or writing the hub. So, I've kind of come to the conclusion that there needs to be some kind of stack virtualizer that is kind of like the FIFO. Your idea of having the compiler track stack depth is good, but, I think we would need extra bytecodes to handle instances where the stack must be flushed or loaded.
I just remembered that the RDLUT/WRLUT instructions can take PTRA/PTRB expressions, just like RDLONG/WRLONG can. RDLUT tales 3 clocks. whereas RDLONG takes 9-16 clocks. WRLUT takes 2 clocks, whereas WRLONG takes 3-10 clocks. So, the LUT could be used as a faster stack, but then there'd be less room for code. Stack-based machines are easy to design. Working out variable lifetimes for good compilation is a lot more complicated.
Bugfix release to restore the resizable terminal window, as a "bonus" the window shows the width and height in characters while resizing.
Also, updated the offscreen window check, I'm not sure if this fixes the issues with the window position (can't reproduce here) but implements a slightly better check.
The update for PNut v55 is at a good point, it needs some more testing.
Thank you, Marco. Just to show you what I'm seeing, this is the upper-left offset from my screen to the Spin Tools. I think I used the Maximize button yesterday -- could that be affecting what Spin Tools is seeing for size and position?
@JonnyMac said:
Thank you, Marco. Just to show you what I'm seeing, this is the upper-left offset from my screen to the Spin Tools. I think I used the Maximize button yesterday -- could that be affecting what Spin Tools is seeing for size and position?
Ahhh.... yes, the maximized window... there is a subtle difference between Windows and Linux, for some reasons Windows sets negative x and y for a maximized window, this makes the check believe the title bar is offscreen and resets the position.
Sorry, I need to do that better, maybe I shuld also keep track of the maximized state since it isn't restored exactly as it should.
The PNut v55 update is nearly done, I need to do some more tests, I think I can do a release in the weekend or monday, if that's not a big problem I'll do the fix for the complete release.
@cgracey said:
Marco, how did it go witth the new bytecodes?
Good so far, I have implemented the compressed bytecodes in all places, the most difficult were the post effects associated to bitfieds but I think I got them.
I have recompiled with PNut v55 all the tests I have and are all passing.
I need to do a bit more tests to be sure I haven't missed something critical and it is done.
@cgracey said:
Marco, how did it go witth the new bytecodes?
Good so far, I have implemented the compressed bytecodes in all places, the most difficult were the post effects associated to bitfieds but I think I got them.
I have recompiled with PNut v55 all the tests I have and are all passing.
I need to do a bit more tests to be sure I haven't missed something critical and it is done.
Comments
Ha, yes, I think they are just jumps. I wasn't at my computer, so I was kind of making stuff up like AI does. The whole thing has gotten so complex that it's hard for me to remember all the details.
I have thought the same things about putting the stack in cog RAM, but the checks for overflows and underflows would be about as long as reading or writing the hub. So, I've kind of come to the conclusion that there needs to be some kind of stack virtualizer that is kind of like the FIFO. Your idea of having the compiler track stack depth is good, but, I think we would need extra bytecodes to handle instances where the stack must be flushed or loaded.
Flexspin puts local variables in cogRAM. That is except when referenced. Then they go on the stack.
As such, when writing code, I explicitly create local copies of globals when I know there is repeated use in a function.
PS: I wouldn't know if the interpreter uses cogRAM automatically for locals. Correct me if so.
I just remembered that the RDLUT/WRLUT instructions can take PTRA/PTRB expressions, just like RDLONG/WRLONG can. RDLUT tales 3 clocks. whereas RDLONG takes 9-16 clocks. WRLUT takes 2 clocks, whereas WRLONG takes 3-10 clocks. So, the LUT could be used as a faster stack, but then there'd be less room for code. Stack-based machines are easy to design. Working out variable lifetimes for good compilation is a lot more complicated.
Released version 0.55.1
Bugfix release to restore the resizable terminal window, as a "bonus" the window shows the width and height in characters while resizing.
Also, updated the offscreen window check, I'm not sure if this fixes the issues with the window position (can't reproduce here) but implements a slightly better check.
The update for PNut v55 is at a good point, it needs some more testing.
Thank you, Marco. Just to show you what I'm seeing, this is the upper-left offset from my screen to the Spin Tools. I think I used the Maximize button yesterday -- could that be affecting what Spin Tools is seeing for size and position?
Ahhh.... yes, the maximized window... there is a subtle difference between Windows and Linux, for some reasons Windows sets negative x and y for a maximized window, this makes the check believe the title bar is offscreen and resets the position.
Sorry, I need to do that better, maybe I shuld also keep track of the maximized state since it isn't restored exactly as it should.
The PNut v55 update is nearly done, I need to do some more tests, I think I can do a release in the weekend or monday, if that's not a big problem I'll do the fix for the complete release.
As always, Marco, thank you for your efforts and making Spin Tools better with every release
Marco, how did it go witth the new bytecodes?
Good so far, I have implemented the compressed bytecodes in all places, the most difficult were the post effects associated to bitfieds but I think I got them.
I have recompiled with PNut v55 all the tests I have and are all passing.
I need to do a bit more tests to be sure I haven't missed something critical and it is done.
Okay. Sounds good! Thank you for doing this.