_ret_ only works if the internal call-stackpointer is not at bottom, otherwise it's ignored and therfore a NOP.
So as long as you not have done a CALL in your code, the %0000 condition works as a NOP and only inside subroutines it is a _ret_.
Andy
I could add a 3-bit counter for this purpose, but for the benefit, it might only cause more confusion. There would need to be a way to read it, in case something like an 'abort' was done and normal stack usage resumed, but at an offset. This stack is pure flops and muxes. Not much to it:
I think the interpreter can be made even faster by looking up a CALL address, JMPing to it, and then an implied RETurn brings execution back to the top of the loop. This will require pre-stuffing the stack with the top-of-loop address, so that infinite RETurns keep it returning to the top of the loop. When the stack pops, the top value gets copied down. Once pre-stuffed, it will forever deliver the same return address. Any intervening CALLs only temporarily affect the bottom levels.
There's just something wrong with 0 not being NOP...
Hope you're not thinking about changing that...
NOP can be anything, but there is benefit in having Zeroed data (0x00) and blank Flash (0xff) as at least predictable operations.
In many CPUs the 0xff fetches a simple (almost NOP) opcode, until the PC wraps to the reset vector - that is tolerable, provided that "PC wraps to the reset vector" is predictable.
What does a P2 do, when it runs over the top ?
( I guess in P2, there are 3 tops... TopOfCog, TopOfLUT, TopOfHub?)
Wouldn't it actually be better if zero was something like "software reset"? Do you really want code that wanders off into uninitialized memory to just keep running? Wouldn't it be better for an embedded system if it reset to a known state?
That's an excellent point. Possibly better would be to trigger a debug interrupt. Instead of a debugger, startup code could set the debug interrupt to call cogstop or whatever other behavior is desirable (cogatn a watchdog cog, for instance).
At which point, yeah, just make NOP an alias for an existing instruction.
There is a software/system standard, called IEC60730, that should be considered in P2 design steps.
It covers many of the above points.
note items like "detecting access to unused ROM areas", and "RAM parity error" (unclear if that one is SW or hardware?) and "RAM guard" and "SFR guard"
Other common items are things like being able to confirm a Clock is present, before you switch to it, and many parts have a Missing Clock Detector Reset watchdog, essentially a simple monostable that fires if no edges are seen after some 10's of microseconds. (usually enough us to allow 32kHz and LF Osc clocking )
With already having two Clocks, a P2 could add this fairly easily ?
This might be why:
"From a hardware design point of view, unmapped areas of a bus are often designed to return zeroes; since the NOP slide behavior is often desirable, it gives a bias to coding it with the all-zeroes opcode."
I can't remember how fuse programming works, but that's one thing to worry about. You don't want to accidentally trigger fuse programming due to running garbage code or due a brownout. In a brownout some instructions may work and others may not work, so you wouldn't want to get to the fuse programming code due to a missed branch. You can search the literature for examples.
_ret_ only works if the internal call-stackpointer is not at bottom, otherwise it's ignored and therfore a NOP.
So as long as you not have done a CALL in your code, the %0000 condition works as a NOP and only inside subroutines it is a _ret_.
Okay, I made $00000000 into NOP. I agree that NOP should be a simple value. Nothing is simpler than 0.
This means that if anyone ever wrote this:
_RET_ ROR 0,0
...then nothing would happen.
It makes no sense to rotate a register by itself, anyway, though it would be some kind of state machine. That they would want to do that on register 0 and do a _RET_ is even more remote of a possibility. So, this should be a good NOP. Note that it needs to be $0000_0000, exactly. High bits in the LSBs will not be a NOP.
Okay, I made $00000000 into NOP. I agree that NOP should be a simple value. Nothing is simpler than 0.
0 is also common NOP value, and is what RAM usually clears to on init. (relevant on parts that can run code from RAM)
That leaves blank code space, of 0xff - what exactly does P2 do now, on landing into 0xff, and then wrapping PC ?
Okay, I made $00000000 into NOP. I agree that NOP should be a simple value. Nothing is simpler than 0.
This means that if anyone ever wrote this:
_RET_ ROR 0,0
...then nothing would happen.
It makes no sense to rotate a register by itself, anyway, though it would be some kind of state machine. That they would want to do that on register 0 and do a _RET_ is even more remote of a possibility. So, this should be a good NOP. Note that it needs to be $0000_0000, exactly. High bits in the LSBs will not be a NOP.
Are we going to loose the the code size gains or performance increase that you initially talked about?
Okay, I made $00000000 into NOP. I agree that NOP should be a simple value. Nothing is simpler than 0.
This means that if anyone ever wrote this:
_RET_ ROR 0,0
...then nothing would happen.
It makes no sense to rotate a register by itself, anyway, though it would be some kind of state machine. That they would want to do that on register 0 and do a _RET_ is even more remote of a possibility. So, this should be a good NOP. Note that it needs to be $0000_0000, exactly. High bits in the LSBs will not be a NOP.
Are we going to loose the the code size gains or performance increase that you initially talked about?
Nope. 0 is still NOP, like it was before. That's all. Hopefully nobody ever has a bug because '_ret_ rot 0,0' isn't working. Chances are about zero.
Okay, I made $00000000 into NOP. I agree that NOP should be a simple value. Nothing is simpler than 0.
0 is also common NOP value, and is what RAM usually clears to on init. (relevant on parts that can run code from RAM)
That leaves blank code space, of 0xff - what exactly does P2 do now, on landing into 0xff, and then wrapping PC ?
$FFFFFFFF is AUGD #$7FFFFF. It will sail through a bunch of those without doing anything.
This interpreter takes at least 14 clocks per bytecode, but is faster than the one posted above in cases where more than one instruction must be executed, as no more branches are needed:
'
'
' Interpreter loop
'
rep #1,#8 'pre-stuff stack with loop address
push #loop 'all _ret_'s will jump to loop
loop rfbyte p 'get bytecode
altgb p,#ibase 'ready to lookup byte
getbyte x 'lookup byte
jmp x 'jump to snippet, _ret_ returns to loop
ibase byte con_n1, con_0, con_1, con_2, con_3, con_4, con_7, con_8
byte con_15, con_16, con_31, con_32, con_bp, con_bn, con_wp, con_wn,
byte con_lg, con_bx
con_n1 _ret_ pusha _FFFFFFFF
con_0 _ret_ pusha #0
con_1 _ret_ pusha #1
con_2 _ret_ pusha #2
con_3 _ret_ pusha #3
con_4 _ret_ pusha #4
con_7 _ret_ pusha #7
con_8 _ret_ pusha #8
con_15 _ret_ pusha #15
con_16 _ret_ pusha #16
con_31 _ret_ pusha #31
con_32 _ret_ pusha #32
con_bp rfbyte x
_ret_ pusha x
con_bn rfbyte x
or x,_FFFFFF00
_ret_ pusha x
con_wp rfword x
_ret_ pusha x
con_wn rfword x
or x,_FFFF0000
_ret_ pusha x
con_lg rflong x
_ret_ pusha x
con_bx rfbyte y
decod x,y
testb y,#5 wc
if_c sub x,#1
testb y,#6 wc
if_c not x
_ret_ pusha x
It also only takes one byte per instruction for the address lookup, though the address is limited to $FF.
The LUT could be used as a either an address table or initial jump location. Then, you'd get all the address bits or faster single-instruction snippets.
..
That leaves blank code space, of 0xff - what exactly does P2 do now, on landing into 0xff, and then wrapping PC ?
$FFFFFFFF is AUGD #$7FFFFF. It will sail through a bunch of those without doing anything.
Ok so far, then what happens next ? does the PC wrap to the reset opcode, & how does it change with a primed AUGD pending ?
It depends what's in memory, I guess. A primed AUGD with all the bits set would append 23 one's above the next 9-bit #D.
So the PC wraps from 512k to 00, or does the PC inc to 512k+1, and the memory wraps to read 00 ?
What about if the P2 memory becomes a value other than 512k, (since that will be based on final, routed die room), what will the not-physically-present memory read as ?
Can the opcode at 00 be anything, depending on the users code ?
..
That leaves blank code space, of 0xff - what exactly does P2 do now, on landing into 0xff, and then wrapping PC ?
$FFFFFFFF is AUGD #$7FFFFF. It will sail through a bunch of those without doing anything.
Ok so far, then what happens next ? does the PC wrap to the reset opcode, & how does it change with a primed AUGD pending ?
It depends what's in memory, I guess. A primed AUGD with all the bits set would append 23 one's above the next 9-bit #D.
So the PC wraps from 512k to 00, or does the PC inc to 512k+1, and the memory wraps to read 00 ?
What about if the P2 memory becomes a value other than 512k, (since that will be based on final, routed die room), what will the not-physically-present memory read as ?
Can the opcode at 00 be anything, depending on the users code ?
That's just too much to think about. The programmer should never allow things to get out of control, anyway, and if they do get out of control, the degree to which things crash is mostly uncontrollable.
So what is the final outcome of the instruction set changes? Since $00000000 was one of the values for NOP before, is the only real change that the $0000 execution condition now means _RET_?
So what is the final outcome of the instruction set changes? Since $00000000 was one of the values for NOP before, is the only real change that the $0000 execution condition now means _RET_?
That's right.
Instead of $0xxxxxxx all being NOP's, now $00000000 is a dedicated NOP.
Condition-field value %0000 now means _RET_, which only RETurns if a branch is not also occurring because of the actual instruction.
That's just too much to think about. The programmer should never allow things to get out of control, anyway, and if they do get out of control, the degree to which things crash is mostly uncontrollable.
Of course you cannot cover all possible crashes, but you can design in some protections.
Most serious MCU vendors already address IEC60730 coverage that I linked above (as just one example).
That is all about designing in simple protections, and they all do add up.
For many customers, if they do not see IEC60730 mentioned, they will simply choose another controller.
'too much to think about' can quickly become "I wish I had thought about that"
So what is the final outcome of the instruction set changes? Since $00000000 was one of the values for NOP before, is the only real change that the $0000 execution condition now means _RET_?
That's right.
Instead of $0xxxxxxx all being NOP's, now $00000000 is a dedicated NOP.
Condition-field value %0000 now means _RET_, which only RETurns if a branch is not also occurring because of the actual instruction.
Thanks! I notice you've already updated the instruction spreadsheet.
Out of curiosity, is the trailing underline necessary for "_ret_"? Could it be just "_ret"? Actually, I'd prefer a different syntax altogether, but I realize it needs to be mutually exclusive of the "if_" predicates. Maybe "iret" (for immediate return or implied return), though that might be confused with return-from-interrupt. Or maybe just make it "ret" and change the actual instruction to "retc" (return from call, making it more consistent with the other RET instructions).
Out of curiosity, is the trailing underline necessary for "_ret_"? Could it be just "_ret"? Actually, I'd prefer a different syntax altogether, but I realize it needs to be mutually exclusive of the "if_" predicates. Maybe "iret" (for immediate return or implied return), though that might be confused with return-from-interrupt. Or maybe just make it "ret" and change the actual instruction to "retc" (return from call, making it more consistent with the other RET instructions).
but it is not an immediate return, when attached to a conditional instruction, making iret confusing.
maybe cret for conditional return, if you really need a modifier.
given the context tho, can ret alone not do, as ret + opcode separate it from ret alone on a line.
It is placed in the conditional column, and where it tags a following opcode, intent is clear
I'd also be happy with clearer code and very slightly higher IQ in the tools so this is supported
djnz x,#blank
ret
hsync
xcont m_bs,#0 'horizontal sync
xzero m_sn,#1
xcont m_bv,#0
ret
macros already expand in assembler list files, so this is similar to that.
Result is code is very easy to read, and safe to edit, and it will be more tolerant of HLL generated ASM.
I had thought about that approach too. It's certainly more natural. But it would also complicate the assembler, which I was trying to avoid. I guess the thing is that the current "_ret_" looks like a cludge, and I was hoping for something a bit more natural looking.
I do hope there is someway of getting rid of the horribly ugly _underscores_.
Although I'm not keen on putting too many smarts into an assembler language either. Not past aliasing some instructions. If I write an instruction I should get an instruction.
I had thought about that approach too. It's certainly more natural. But it would also complicate the assembler....
Only very slightly..
When any RET is found, the assembler knows if it has a tag label.
If yes, it keeps the ret form, whilst if there are 2 linear opcodes, the ret flips to be a condition code patch on the prior opcode.
The LST file clearly shows what was done. It can even add a comment as such.
Although I'm not keen on putting too many smarts into an assembler language either. Not past aliasing some instructions. If I write an instruction I should get an instruction.
So you never use macros then ?
Or include files ?
Or generic calls ?
Or conditional code ?
There are already many cases where any exact ASM source-line-to-bytes rule is broken.
I do hope there is someway of getting rid of the horribly ugly _underscores_.
Since the approach I suggest (which does get rid of the horribly ugly _underscores_) is not mutually exclusive, you are of course free to use the _ret_ form. A user taste choice.
Comments
I could add a 3-bit counter for this purpose, but for the benefit, it might only cause more confusion. There would need to be a way to read it, in case something like an 'abort' was done and normal stack usage resumed, but at an offset. This stack is pure flops and muxes. Not much to it:
I think the interpreter can be made even faster by looking up a CALL address, JMPing to it, and then an implied RETurn brings execution back to the top of the loop. This will require pre-stuffing the stack with the top-of-loop address, so that infinite RETurns keep it returning to the top of the loop. When the stack pops, the top value gets copied down. Once pre-stuffed, it will forever deliver the same return address. Any intervening CALLs only temporarily affect the bottom levels.
If it were something else, it would likely be >512 and then need an extra register to hold it, right?
In many CPUs the 0xff fetches a simple (almost NOP) opcode, until the PC wraps to the reset vector - that is tolerable, provided that "PC wraps to the reset vector" is predictable.
What does a P2 do, when it runs over the top ?
( I guess in P2, there are 3 tops... TopOfCog, TopOfLUT, TopOfHub?)
See below.
There is a software/system standard, called IEC60730, that should be considered in P2 design steps.
It covers many of the above points.
Here is an example of the IEC60730 coverage a recent MCU release claims
http://www.lapis-semi.com/en/company/news/news2017/r201702_1.html
note items like "detecting access to unused ROM areas", and "RAM parity error" (unclear if that one is SW or hardware?) and "RAM guard" and "SFR guard"
Other common items are things like being able to confirm a Clock is present, before you switch to it, and many parts have a Missing Clock Detector Reset watchdog, essentially a simple monostable that fires if no edges are seen after some 10's of microseconds. (usually enough us to allow 32kHz and LF Osc clocking )
With already having two Clocks, a P2 could add this fairly easily ?
See above, - A software reset, or watchdog, could occur anytime, how does the present stack clear manage that ?
Looks to me that 0 is pretty popular.
This might be why:
"From a hardware design point of view, unmapped areas of a bus are often designed to return zeroes; since the NOP slide behavior is often desirable, it gives a bias to coding it with the all-zeroes opcode."
whatever that means...
I'm in favor of this.
It had the same opcode as "if_never ROR 0,0". The important part was the "if_never". It just happened to be that ROR was instruction %0000000.
This means that if anyone ever wrote this:
...then nothing would happen.
It makes no sense to rotate a register by itself, anyway, though it would be some kind of state machine. That they would want to do that on register 0 and do a _RET_ is even more remote of a possibility. So, this should be a good NOP. Note that it needs to be $0000_0000, exactly. High bits in the LSBs will not be a NOP.
That leaves blank code space, of 0xff - what exactly does P2 do now, on landing into 0xff, and then wrapping PC ?
Are we going to loose the the code size gains or performance increase that you initially talked about?
Nope. 0 is still NOP, like it was before. That's all. Hopefully nobody ever has a bug because '_ret_ rot 0,0' isn't working. Chances are about zero.
$FFFFFFFF is AUGD #$7FFFFF. It will sail through a bunch of those without doing anything.
It also only takes one byte per instruction for the address lookup, though the address is limited to $FF.
The LUT could be used as a either an address table or initial jump location. Then, you'd get all the address bits or faster single-instruction snippets.
It depends what's in memory, I guess. A primed AUGD with all the bits set would append 23 one's above the next 9-bit #D.
So the PC wraps from 512k to 00, or does the PC inc to 512k+1, and the memory wraps to read 00 ?
What about if the P2 memory becomes a value other than 512k, (since that will be based on final, routed die room), what will the not-physically-present memory read as ?
Can the opcode at 00 be anything, depending on the users code ?
That's just too much to think about. The programmer should never allow things to get out of control, anyway, and if they do get out of control, the degree to which things crash is mostly uncontrollable.
That's right.
Instead of $0xxxxxxx all being NOP's, now $00000000 is a dedicated NOP.
Condition-field value %0000 now means _RET_, which only RETurns if a branch is not also occurring because of the actual instruction.
Of course you cannot cover all possible crashes, but you can design in some protections.
Most serious MCU vendors already address IEC60730 coverage that I linked above (as just one example).
That is all about designing in simple protections, and they all do add up.
For many customers, if they do not see IEC60730 mentioned, they will simply choose another controller.
'too much to think about' can quickly become "I wish I had thought about that"
maybe cret for conditional return, if you really need a modifier.
given the context tho, can ret alone not do, as ret + opcode separate it from ret alone on a line.
It is placed in the conditional column, and where it tags a following opcode, intent is clear
I'd also be happy with clearer code and very slightly higher IQ in the tools so this is supported
compiles to a list file like this
macros already expand in assembler list files, so this is similar to that.
Result is code is very easy to read, and safe to edit, and it will be more tolerant of HLL generated ASM.
Although I'm not keen on putting too many smarts into an assembler language either. Not past aliasing some instructions. If I write an instruction I should get an instruction.
Only very slightly..
When any RET is found, the assembler knows if it has a tag label.
If yes, it keeps the ret form, whilst if there are 2 linear opcodes, the ret flips to be a condition code patch on the prior opcode.
The LST file clearly shows what was done. It can even add a comment as such.
So you never use macros then ?
Or include files ?
Or generic calls ?
Or conditional code ?
There are already many cases where any exact ASM source-line-to-bytes rule is broken.
Since the approach I suggest (which does get rid of the horribly ugly _underscores_) is not mutually exclusive, you are of course free to use the _ret_ form. A user taste choice.