David Betz,
I think that ship already sailed with the self modifying and all the ALT* instructions. This is a neat code density trick, and with a little smarts an editor could show the executed snippets in a tooltip or some form of expanded view mode. Heck, we might even be able to let you edit the "expanded" form and have it auto-generate the condensed form.
Maybe I am biased, since I have to deal with templates, lambdas, and what not in C++ all the time, and those are equally hard (if not far worse in some cases) to parse and debug. I agree that it will make PASM2 a bit harder to read and debug when it has them, but I think the benefits far outweigh that cost.
In my opinion it is way worse than lambdas and templates. Having a bit mask determine which instructions are executed means you can't look at code and figure out what it does without finding every use of it and examine the bit mask. And if you modify the code you have to make sure all of the bit masks still work as intended or modify them. It's one of the ugliest features I've ever seen in an instruction set. Just my opinion of course.
Well, we can now get ultimate compactness AND highest speed:
skip ##%001111110111100 'execute jmp/rfbyte/add/rdlong/pusha
jmp #rw_mem
...then this executes, pieced together from the rw_mem code...
rfbyte m
add m,dbase
rdlong x,m
_ret_ pusha x 'only the four instructions we wanted, zero time overhead!
SKIP uses bits in D, LSB first, to selectively cancel instructions in the order they occur in memory, starting with the next instruction. It works through branches, too. And it doesn't just cancel instructions. In cog-exec mode, it actually increments the PC by 1..8, not just by 1, in order to get to the next instruction, saving time. It works in hub-exec mode by just cancelling instructions, always stepping by 4, to not reset the FIFO.
so skip takes the 15 bits in the above example as an immediate into D&S ? or generates an AUGD for skip patterns > 9 up to 32 bits/instructions?
so when the pattern is in a register we can really assemble it dynamically ... so we can write self modifying code with it ;-) ??
wow
The fact that SKIP continues to apply after a branch makes me a little nervous. I know that the above code example made use of it, but is that really necessary? From what I'm seeing it could have also been done with the SKIP being executed immediately before the instructions it applies to. True, this would require a little extra setup and a 2-clock penalty if the first instruction is skipped. But it also localizes what SKIP can affect. Similarly, imagine trying to troubleshoot code that doesn't seem to be executing because a previously called block happened to set a SKIP before returning.
What happens if there's a second skip when one is already in effect?
My guess is it simply reloads the skip queue, but it would be rare that one skip would be 'alive' when hitting the next ?
I think also no-more-ones clears everything too.
SKIP uses bits in D, LSB first, to selectively cancel instructions in the order they occur in memory, starting with the next instruction. It works through branches, too. And it doesn't just cancel instructions. In cog-exec mode, it actually increments the PC by 1..8, not just by 1, in order to get to the next instruction, saving time. It works in hub-exec mode by just cancelling instructions, always stepping by 4, to not reset the FIFO.
This is pretty cool ( but I would say that, as I've suggested a SKIP before )
Skip is also useful for streaming memory, as in XIP designs.
Condition codes can be used for some simplest skip effects, but this is more powerful.
Questions:
- Is there a 9-bit immediate version, or does it always cost 64b opcode ?
- Example shows Const, text says D, so I guess both #Immed, and Reg versions exist ?
- If there is a 9-bit version, does it make sense to nudge the 8-quanta, to 9 ?
- How does this behave crossing code boundaries ?
- Is there a SKIP #0 that cancels any queued Skip ?
I can also see a use in Libraries, where you have skip-based preambles, and skip-based returns for handling differing memory regions, and different types. The SKIP field is passed as a type param
I still think it might be better to just rearchitect the byte code VM instruction set to fit into COG/LUT RAM. Are you attempting to keep the byte codes the same as the ones used by the P1 VM?
I think the byte-codes are already changed for P2, as the capabilities are very different.
Usually the most-used byte code would pack into COG and LUT, but Hub exec allows the rare ones to reside in HUB.
The main issue with overflow into HUB, I see as management of HUB memory maps, so there is strong pressure to avoid this, but it is no longer a brick wall.
With a software based Spin, there was talk about a Spin compiler that extracted and packed the used calls only.
'Protecting users from themselves' is one chestnut to solve when taking that path
At the very least I would've done it by moving the skip value into a known memory location (or pushing to the stack) then calling the function, and have that function pull the skip register from the stack and apply it. As Seairth says, it would limit the scope. Alternately, if any RET instruction killed it, that would also suffice, as that would avoid the push/pop for the setup, but still limit the scope.
It looks clever but also a source of virtually unreadable code. Is it really faster than the one-routine-per-instruction implementation if you put those routines in hub memory? How about paring down the instruction set so you don't need so many cases?
You don't want those routines in the hub because the FIFO will always be slowing down the RDxxxx/WRxxxx instructions. This allows for smallest code, actually just the bare set of instructions needed to build any variant routine, and then a mask for the SKIP instruction.
I still think it might be better to just rearchitect the byte code VM instruction set to fit into COG/LUT RAM. Are you attempting to keep the byte codes the same as the ones used by the P1 VM?
The interpreter will fit in the cog and lut, no problem. The byte codes are new, compared to Spin1.
This is going to allow for some really obfuscated assembly code...
What happens if there's a second skip when one is already in effect?
There are techniques for handling that. I will show the interpreter, as it progresses. It's almost a new way of programming, where you throw all the instructions you'll need into a loose sequence, and then at runtime, skip those you don't want. You can come up with all kinds of permutations.
SKIP uses bits in D, LSB first, to selectively cancel instructions in the order they occur in memory, starting with the next instruction. It works through branches, too. And it doesn't just cancel instructions. In cog-exec mode, it actually increments the PC by 1..8, not just by 1, in order to get to the next instruction, saving time. It works in hub-exec mode by just cancelling instructions, always stepping by 4, to not reset the FIFO.
This is pretty cool ( but I would say that, as I've suggested a SKIP before )
Skip is also useful for streaming memory, as in XIP designs.
Condition codes can be used for some simplest skip effects, but this is more powerful.
Questions:
- Is there a 9-bit immediate version, or does it always cost 64b opcode ?
- Example shows Const, text says D, so I guess both #Immed, and Reg versions exist ?
- If there is a 9-bit version, does it make sense to nudge the 8-quanta, to 9 ?
- How does this behave crossing code boundaries ?
- Is there a SKIP #0 that cancels any queued Skip ?
I can also see a use in Libraries, where you have skip-based preambles, and skip-based returns for handling differing memory regions, and different types. The SKIP field is passed as a type param
SKIP {#}D
It's LSB-first, so leading 0's end the skipping in cases of 9-bit #D values.
SKIP #0 would end any skipping pattern.
This is maybe a lot simpler than I conveyed. And the really cool thing is that when executing from cog/lut, the PC is stepped by 1...8 to get to the next instruction to execute, making things really fly. In cases where steps greater than 8 are needed, a step of 8 will occur, but that instruction will be cancelled, taking two clocks. I'll see about increasing the max step size to 16. It would take more logic, but mainly more settling time.
What happens if there's a second skip when one is already in effect?
My guess is it simply reloads the skip queue, but it would be rare that one skip would be 'alive' when hitting the next ?
I think also no-more-ones clears everything too.
Something else: a branch is seen as just another skippable instruction. If your SKIP list allows a branch, the next bits above will apply to the instruction stream at the new address. This code is all protected from interrupts, until the SKIP queue has no more 1's in it. This is a whole new way to program.
It is a new way to program, and I think it needs a new way of presenting things to make it less confusing (especially when branches are tossed into the mix).
This is going to allow for some really obfuscated assembly code...
What happens if there's a second skip when one is already in effect?
There are techniques for handling that. I will show the interpreter, as it progresses. It's almost a new way of programming, where you throw all the instructions you'll need into a loose sequence, and then at runtime, skip those you don't want. You can come up with all kinds of permutations.
Any new SKIP overrides the current SKIP.
You may have created something even more to be avoided than GOTO! :-)
This is maybe a lot simpler than I conveyed. And the really cool thing is that when executing from cog/lut, the PC is stepped by 1...8 to get to the next instruction to execute, making things really fly. In cases where steps greater than 8 are needed, a step of 8 will occur, but that instruction will be cancelled, taking two clocks. I'll see about increasing the max step size to 16. It would take more logic, but mainly more settling time.
16 is worth checking, but if there is an opcode-change at 9, it would make sense to check a 9-base ?
- only slightly more logic than 8, and I think enables the more compact form, without an unexpected step ?
Something else: a branch is seen as just another skippable instruction. If your SKIP list allows a branch, the next bits above will apply to the instruction stream at the new address. This code is all protected from interrupts, until the SKIP queue has no more 1's in it. This is a whole new way to program.
How exactly is 'protected' applied - are interrupts disabled ? (sounds risky)
Does RET clear Skip ? - If not, a SKIP #0 could always be used, but that pushes up code size.
If SKIP is going to survive branches, then in the case of RET, you could reuse the _ret_ flag to modify SKIP behavior. I'd recommend RET cancels SKIP by default (the safe option), but can keep the skip active with %0000. Maybe alias as RETSKIP or something.
In my opinion it is way worse than lambdas and templates. Having a bit mask determine which instructions are executed means you can't look at code and figure out what it does without finding every use of it and examine the bit mask. And if you modify the code you have to make sure all of the bit masks still work as intended or modify them. It's one of the ugliest features I've ever seen in an instruction set. Just my opinion of course.
Random-bit patterns are unlikely to be used by a compiler, and the skip-list management would make it tricky to use in Libraries, tho I could see benefits for compact libraries.
I could see if-else block management, with some helpers in the tool flows. Compilers could possibly manage that, but assemblers would ideally need a means to generate the skip-mask based on labels.
' psuedo listing code of no-jumps if-else
if_z skip #%011111000 ' built from if-labels
if_nz skip #%000001111 ' built from else-labels
' 5 lines of If
' 4 lines of Else
You do NOT want programmers squinting at bit masks, and trying to count lines of code.
Is a double opcode, one or two skip bits ?
It is a new way to program, and I think it needs a new way of presenting things to make it less confusing (especially when branches are tossed into the mix).
This looks more like an ASM-only use, and the P2 Assembler support is already at the poor end of the scale.
There is a risk the P2 becomes an i860 ?
Funny you should say that. The i860 also came to my mind.
Has anyone tried to program that thing in assembler? I have not, but I did sit through an hour long presentation by an Intel guy about i860 assembly language programming when it was launched.
What with it's parallel streams of integer and float operations, written on the same source line, and it's pipelining that caused the results of one instruction to come out in the destination register of another instruction 4 lines down the code, it was obviously impossible to work with. The guy giving the presentation did not show how to reach max flops performance on the i860 and admitted that Intel itself did not know how to do it for the FFT case he was discussing. That's OK "The compilers will take care of it".
I could see if-else block management, with some helpers in the tool flows. Compilers could possibly manage that, but assemblers would ideally need a means to generate the skip-mask based on labels.
' psuedo listing code of no-jumps if-else
if_z skip #%011111000 ' built from if-labels
if_nz skip #%000001111 ' built from else-labels
' 5 lines of If
' 4 lines of Else
See? You already got it wrong! It should be:
' psuedo listing code of no-jumps if-else
if_z skip ##%1111000001 ' built from if-labels
if_nz skip #%11111 ' built from else-labels
' 5 lines of If
' 4 lines of Else
The LSB in the first skip could actually be either zero or one in this case, since the second skip is predicated by the if_nz. This variant might be a bit more clear, though:
if_z skip ##%1111000001 ' built from if-labels
skip #%11111 ' built from else-labels
' 5 lines of If
' 4 lines of Else
Note the use of AUGD. You would have to keep the total number of instructions in the if/else blocks to 8 to avoid that.
I could see if-else block management, with some helpers in the tool flows. Compilers could possibly manage that, but assemblers would ideally need a means to generate the skip-mask based on labels.
' psuedo listing code of no-jumps if-else
if_z skip #%011111000 ' built from if-labels
if_nz skip #%000001111 ' built from else-labels
' 5 lines of If
' 4 lines of Else
You do NOT want programmers squinting at bit masks, and trying to count lines of code.
Is a double opcode, one or two skip bits ?
Thinking some more on this, the above example 'skips a skip', so is more compact, but is harder to read.
Another more general and easier to read/general form could be like this
' psuedo listing code of no-jumps if-else, simpler serial design, auto-label mask generate.
if_nz skip skip_if_end ' uses Name_else, Name_end to auto-build skip masks
' 1-32 lines of If (< 9 lines is faster)
skip Skip_end ' ** PASM generated, if Skip_else present
Skip_else: ' optional Skip, not needed with no else body
' optional 1-32 lines of Else (< 9 lines is faster)
Skip_end: 'mandatory end label
Well, yes - I did not drill to LSB-first detail, but see already the other form I posted, using Assembler to fill-out common skip masks. That would be a more common method.
Another hard-to-find bug that will likely be caused with SKIP: accounting for implied AUGx instructions.
Yes, see my question above about 'Is a double opcode, one or two skip bits ?'
I think skip can only count 32b chunks, and so the labels are the best way to manage any double opcodes.
I see this as a similar problem to REP, where first versions had a complicated offset-line-count, but it was fixed to be a simpler
REP label
structure.
Random Skip patterns would always be possible, but would not be applied in most general code.
Only cases like Chip's example of complex-code-packing would use complex skips, and even those could have software helpers.
Comments
I second Roy's comments. This is a seriously great optimization.
We can present it in ways that help people.
No way this is a bad idea.
so when the pattern is in a register we can really assemble it dynamically ... so we can write self modifying code with it ;-) ??
wow
Good point, certainly, now the opcode modifies D, it needs a rename to reflect that.
My guess is it simply reloads the skip queue, but it would be rare that one skip would be 'alive' when hitting the next ?
I think also no-more-ones clears everything too.
This is pretty cool ( but I would say that, as I've suggested a SKIP before )
Skip is also useful for streaming memory, as in XIP designs.
Condition codes can be used for some simplest skip effects, but this is more powerful.
Questions:
- Is there a 9-bit immediate version, or does it always cost 64b opcode ?
- Example shows Const, text says D, so I guess both #Immed, and Reg versions exist ?
- If there is a 9-bit version, does it make sense to nudge the 8-quanta, to 9 ?
- How does this behave crossing code boundaries ?
- Is there a SKIP #0 that cancels any queued Skip ?
I can also see a use in Libraries, where you have skip-based preambles, and skip-based returns for handling differing memory regions, and different types. The SKIP field is passed as a type param
Usually the most-used byte code would pack into COG and LUT, but Hub exec allows the rare ones to reside in HUB.
The main issue with overflow into HUB, I see as management of HUB memory maps, so there is strong pressure to avoid this, but it is no longer a brick wall.
With a software based Spin, there was talk about a Spin compiler that extracted and packed the used calls only.
'Protecting users from themselves' is one chestnut to solve when taking that path
The interpreter will fit in the cog and lut, no problem. The byte codes are new, compared to Spin1.
There are techniques for handling that. I will show the interpreter, as it progresses. It's almost a new way of programming, where you throw all the instructions you'll need into a loose sequence, and then at runtime, skip those you don't want. You can come up with all kinds of permutations.
Any new SKIP overrides the current SKIP.
SKIP {#}D
It's LSB-first, so leading 0's end the skipping in cases of 9-bit #D values.
SKIP #0 would end any skipping pattern.
This is maybe a lot simpler than I conveyed. And the really cool thing is that when executing from cog/lut, the PC is stepped by 1...8 to get to the next instruction to execute, making things really fly. In cases where steps greater than 8 are needed, a step of 8 will occur, but that instruction will be cancelled, taking two clocks. I'll see about increasing the max step size to 16. It would take more logic, but mainly more settling time.
Something else: a branch is seen as just another skippable instruction. If your SKIP list allows a branch, the next bits above will apply to the instruction stream at the new address. This code is all protected from interrupts, until the SKIP queue has no more 1's in it. This is a whole new way to program.
- only slightly more logic than 8, and I think enables the more compact form, without an unexpected step ?
Does RET clear Skip ? - If not, a SKIP #0 could always be used, but that pushes up code size.
I could see if-else block management, with some helpers in the tool flows. Compilers could possibly manage that, but assemblers would ideally need a means to generate the skip-mask based on labels.
You do NOT want programmers squinting at bit masks, and trying to count lines of code.
Is a double opcode, one or two skip bits ?
This looks more like an ASM-only use, and the P2 Assembler support is already at the poor end of the scale.
There is a risk the P2 becomes an i860 ?
That's a bit of a stretch... i860... LMAO.
Has anyone tried to program that thing in assembler? I have not, but I did sit through an hour long presentation by an Intel guy about i860 assembly language programming when it was launched.
What with it's parallel streams of integer and float operations, written on the same source line, and it's pipelining that caused the results of one instruction to come out in the destination register of another instruction 4 lines down the code, it was obviously impossible to work with. The guy giving the presentation did not show how to reach max flops performance on the i860 and admitted that Intel itself did not know how to do it for the FFT case he was discussing. That's OK "The compilers will take care of it".
Yeah, sure. The chip flopped.
See? You already got it wrong! It should be:
The LSB in the first skip could actually be either zero or one in this case, since the second skip is predicated by the if_nz. This variant might be a bit more clear, though:
Note the use of AUGD. You would have to keep the total number of instructions in the if/else blocks to 8 to avoid that.
Thinking some more on this, the above example 'skips a skip', so is more compact, but is harder to read.
Another more general and easier to read/general form could be like this
Presumably, the AUGS would be executed, the OR would be skipped and the AND would be executed. Would the outstanding AUG apply to the AND?
Well, yes - I did not drill to LSB-first detail, but see already the other form I posted, using Assembler to fill-out common skip masks. That would be a more common method.
Yes, see my question above about 'Is a double opcode, one or two skip bits ?'
I think skip can only count 32b chunks, and so the labels are the best way to manage any double opcodes.
I see this as a similar problem to REP, where first versions had a complicated offset-line-count, but it was fixed to be a simpler
REP label
structure.
Random Skip patterns would always be possible, but would not be applied in most general code.
Only cases like Chip's example of complex-code-packing would use complex skips, and even those could have software helpers.
Good point, worth testing this type of case. It would not usually be deliberate code, but it is possible.
You are free to not use it.