I'm good with how things are now, but would prefer it if REP worked in hubexec also, Having it silently fail is bad in my opinion.
You know that REP in hub exec might provide ZERO increase in performance over DJNZ, as you'd often be waiting for the instruction FIFO to start reloading. It would be good just for cog exec and hub exec compatibility, though.
For 9-bit relative branches:
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
This means the assembler needs to detect when the branch instruction (that has the 9-bit relative) and the target instruction are not aligned and throw an error.
For 9-bit relative branches:
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
This means the assembler needs to detect when the branch instruction (that has the 9-bit relative) and the target instruction are not aligned and throw an error.
Yes, user and library code that was developed intending to "run in either" would need an align directive in the Assembler that would also enable checking.
There could be packer-features (db.dw etc) enabled at the same time too, if those help portability.
Maybe this directive is best on-by-default and advanced users can disable, if they wish ?
I'm good with how things are now, but would prefer it if REP worked in hubexec also, Having it silently fail is bad in my opinion.
You know that REP in hub exec might provide ZERO increase in performance over DJNZ, as you'd often be waiting for the instruction FIFO to start reloading. It would be good just for cog exec and hub exec compatibility, though.
I don't think performance is the concern when running anything in hub exec mode.
In the case of REP, I think it's more about the fact that it's concise and self-documenting. If I want to do exactly 8 loops of a short set of instructions, rep is simpler and cleaner than setting up a temporary counter register with an 8 and using DJNZ.
However, I do think getting REP working in hub exec is low on the priority list. I'd much rather see smart pins before I see REP made universal.
For 9-bit relative branches:
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
This means the assembler needs to detect when the branch instruction (that has the 9-bit relative) and the target instruction are not aligned and throw an error.
Yes, user and library code that was developed intending to "run in either" would need an align directive in the Assembler that would also enable checking.
There could be packer-features (db.dw etc) enabled at the same time too, if those help portability.
Maybe this directive is best on-by-default and advanced users can disable, if they wish ?
No, this has nothing to do with making the code relocatable to cog/lut memory. Because of the left shift, the target instruction *must* be aligned to the branch instruction. But, because instruction alignment could change at any time, simply by doing the sort of thing Chip was demonstrating above with send_string, mutual instruction alignment must always be validated.
Chip,
Shouldn't
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
be
In the case of hub exec, the 9 bits are sign-extended to 18 bits and then shifted left two bits for the offset.
Chip,
Shouldn't
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
be
In the case of hub exec, the 9 bits are sign-extended to 18 bits and then shifted left two bits for the offset.
For 9-bit relative branches:
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
This means the assembler needs to detect when the branch instruction (that has the 9-bit relative) and the target instruction are not aligned and throw an error.
Yes, user and library code that was developed intending to "run in either" would need an align directive in the Assembler that would also enable checking.
There could be packer-features (db.dw etc) enabled at the same time too, if those help portability.
Maybe this directive is best on-by-default and advanced users can disable, if they wish ?
No, this has nothing to do with making the code relocatable to cog/lut memory. Because of the left shift, the target instruction *must* be aligned to the branch instruction. But, because instruction alignment could change at any time, simply by doing the sort of thing Chip was demonstrating above with send_string, mutual instruction alignment must always be validated.
Ah, I see your point, that this 9-bit Reloc is a super-set problem, but the align directive I mentioned would work for this for HUB-only, as well as for "run in either" code.
I would expect ASM would always check the target is valid & reachable, which covers inside range, and also would check 2 lsb's match on 9-bit Reloc ?
I think the Assembler can just emulate the REP in hubexec mode.
All you need is a reserved cog register for the loop counter. At the REP instruction the Assembler sets the loopcounter to the number of repeats, at the position where the REP ends, the Assembler inserts a DJNZ.
This may be much easier than finding a Verilog solution.
I think the Assembler can just emulate the REP in hubexec mode.
All you need is a reserved cog register for the loop counter. At the REP instruction the Assembler sets the loopcounter to the number of repeats, at the position where the REP ends, the Assembler inserts a DJNZ.
This may be much easier than finding a Verilog solution.
A good idea. The code size would change, but the source is the same.
If there is no spare register available, the process would need to generate an error.
I think the Assembler can just emulate the REP in hubexec mode.
All you need is a reserved cog register for the loop counter. At the REP instruction the Assembler sets the loopcounter to the number of repeats, at the position where the REP ends, the Assembler inserts a DJNZ.
This may be much easier than finding a Verilog solution.
Andy
That would make it possible to assemble the same PASM for hub or cog, but it wouldn't fix the problem of binary compatibility.
That would make it possible to assemble the same PASM for hub or cog, but it wouldn't fix the problem of binary compatibility.
Correct, but it is better than a 'fail silently' alternative & it does allow single-source code.
Chip is looking into the Verilog, but it may prove too complex, making this a back-up idea.
Chip,
Shouldn't
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
be
In the case of hub exec, the 9 bits are sign-extended to 18 bits and then shifted left two bits for the offset.
That's the same result, n'est pas?
True.
But there are an extra 2 MSBs here which can be ignored. But it could cause a bug when Verilog gets converted to gates in the final chip - we just don't know. May as well get it correct now.
For 9-bit relative branches:
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
This means the assembler needs to detect when the branch instruction (that has the 9-bit relative) and the target instruction are not aligned and throw an error.
Yep. It does that.
I'm going to add a new rule to the assembler that no relative jumps are allowed to go between cog/lut space and hub space. It's not that they wouldn't work - it just seems like really bad practice that could set someone up for a rude awakening when they someday start to relocate code at run time.
I'm going to add a new rule to the assembler that no relative jumps are allowed to go between cog/lut space and hub space. It's not that they wouldn't work - it just seems like really bad practice that could set someone up for a rude awakening when they someday start to relocate code at run time.
Disallowing seems blunt, if it does actually work, so maybe a warning is ok.
There may be places where this is useful ?
Certainly a generic JMP or CALL between memory segments could default to absolute, and use relative within a segment sub section.
If there are also over-rule opcodes like (eg) RCALL and ACALL, that gives the user control.
Chip,
Shouldn't
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
be
In the case of hub exec, the 9 bits are sign-extended to 18 bits and then shifted left two bits for the offset.
Well, yes, that's how the Verilog was written. I just didn't think to explain it that way.
I don't like the idea of emulating REP in hubexec mode. In cog mode it's an instruction. In hubexec mode it's multiple instructions using an implicit memory location. I wouldn't mind a separate mnemonic that does what was suggested, emulating in hubexec mode or using the REP instruction in cog mode. Assembly macros do that sort of thing all the time. I subscribe to the principle of "least surprise" and this violates that.
I don't like the idea of emulating REP in hubexec mode. In cog mode it's an instruction. In hubexec mode it's multiple instructions using an implicit memory location. I wouldn't mind a separate mnemonic that does what was suggested, emulating in hubexec mode or using the REP instruction in cog mode. Assembly macros do that sort of thing all the time. I subscribe to the principle of "least surprise" and this violates that.
? I'm not following ?
How do you propose "a separate mnemonic that does what was suggested, emulating in hubexec mode"
I don't like the idea of emulating REP in hubexec mode. In cog mode it's an instruction. In hubexec mode it's multiple instructions using an implicit memory location. I wouldn't mind a separate mnemonic that does what was suggested, emulating in hubexec mode or using the REP instruction in cog mode. Assembly macros do that sort of thing all the time. I subscribe to the principle of "least surprise" and this violates that.
I'll get REP working in hub exec. I think I know what to do now and it's not that complex, at all. I just have to get the whole system running again after all these other changes, which are mainly in the assembler, and then I'll try getting REP going in hub exec.
On the other hand, how bad would it be if we got rid of REP altogether? (yes, another radical thought. there are no sacred cows here!)
Using REP for tight polling loops is no longer as important now that we have interrupts. Using REP for tight output loops may also be less important with smart pins. Remember that REP was added long before either of these concepts were included. It may be that REP is no longer as important as it once was. I'm not saying that it's not useful, but just less useful than it originally was.
(Edit: okay, maybe there are sacred cows here. I just don't know that REP is one of them.)
We haven't got the performance that P2-Hot had.
Every cycle we can squeeze out of a tight loop all helps.
Please don't take this the wrong way but maybe the current design is getting near the end in terms of increased performance based on choices that were made earlier.
REP isn't anything that can't be replaced with a couple of lines of code. Sure, it's a nice thought but if it doesn't fit with everything else... ditch it.
Who here wouldn't love to have a P2 in its current form without REP?
REP isn't anything that can't be replaced with a couple of lines of code. Sure, it's a nice thought but if it doesn't fit with everything else... ditch it.
? - "replaced with a couple of lines of code" is already an admission the alternative is larger and slower.
It works now, why remove something that is smaller and faster ?
I've been thinking about this notion of having code be binary portable between hub and cog space. The more I dwell on it, the more I think it's a bad habit to encourage. I'm not quite sure how to express my thoughts on this, though. So, instead I'll just throw out stuff in semi-random order and hope that you all get the gist.
* Timing is not the same in cog and hub exec modes, except for very small snippets.
* For code that's good enough to run in hub exec mode, it seems highly unlikely that there'd be a reason to run it in a cog instead.
* For code that's tuned to run in a cog, it seems highly unlikely that you'd run the code in the hub.
* The notion of binary portability is one where the code is assembled in one context (hub or cog), but loaded into the other context at run time. Since the code could have been assembled in the intended context to start with, the only reason you wouldn't do so is because of some sort of runtime conditional. But in light of the fact that hub and cog code have different qualities, I don't know what such a conditional would be.
* Except for extremely contrived (and trivial) examples, I'm hard-pressed to find a counter-example.
Maybe I'm just not being imaginative enough, but I suspect I'm really just seeing something that will be a fringe feature. And when fringe features start to dictate language/processor design (e.g. relative addressing behavior), I get concerned that we are putting the cart before the horse (or whatever your favorite idiom is). I'm not saying we should change anything at this point. Just food for thought...
REP isn't anything that can't be replaced with a couple of lines of code. Sure, it's a nice thought but if it doesn't fit with everything else... ditch it.
? - "replaced with a couple of lines of code" is already an admission the alternative is larger and slower.
It works now, why remove something that is smaller and faster ?
Because that still requires an increase in size and complexity in the circuitry. As will adding support for it in hub exec mode.
Comments
Here's the latest instruction set. Only thing new is JMPREL.
You know that REP in hub exec might provide ZERO increase in performance over DJNZ, as you'd often be waiting for the instruction FIFO to start reloading. It would be good just for cog exec and hub exec compatibility, though.
I had to make it so it silently fails in hub exec by not engaging. Before that, it blew things sky-high.
This means the assembler needs to detect when the branch instruction (that has the 9-bit relative) and the target instruction are not aligned and throw an error.
Yes, user and library code that was developed intending to "run in either" would need an align directive in the Assembler that would also enable checking.
There could be packer-features (db.dw etc) enabled at the same time too, if those help portability.
Maybe this directive is best on-by-default and advanced users can disable, if they wish ?
I don't think performance is the concern when running anything in hub exec mode.
In the case of REP, I think it's more about the fact that it's concise and self-documenting. If I want to do exactly 8 loops of a short set of instructions, rep is simpler and cleaner than setting up a temporary counter register with an 8 and using DJNZ.
However, I do think getting REP working in hub exec is low on the priority list. I'd much rather see smart pins before I see REP made universal.
No, this has nothing to do with making the code relocatable to cog/lut memory. Because of the left shift, the target instruction *must* be aligned to the branch instruction. But, because instruction alignment could change at any time, simply by doing the sort of thing Chip was demonstrating above with send_string, mutual instruction alignment must always be validated.
Shouldn't
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
be
In the case of hub exec, the 9 bits are sign-extended to 18 bits and then shifted left two bits for the offset.
That's the same result, n'est pas?
Ah, I see your point, that this 9-bit Reloc is a super-set problem, but the align directive I mentioned would work for this for HUB-only, as well as for "run in either" code.
I would expect ASM would always check the target is valid & reachable, which covers inside range, and also would check 2 lsb's match on 9-bit Reloc ?
All you need is a reserved cog register for the loop counter. At the REP instruction the Assembler sets the loopcounter to the number of repeats, at the position where the REP ends, the Assembler inserts a DJNZ.
This may be much easier than finding a Verilog solution.
Andy
If there is no spare register available, the process would need to generate an error.
That would make it possible to assemble the same PASM for hub or cog, but it wouldn't fix the problem of binary compatibility.
Chip is looking into the Verilog, but it may prove too complex, making this a back-up idea.
But there are an extra 2 MSBs here which can be ignored. But it could cause a bug when Verilog gets converted to gates in the final chip - we just don't know. May as well get it correct now.
Yep. It does that.
I'm going to add a new rule to the assembler that no relative jumps are allowed to go between cog/lut space and hub space. It's not that they wouldn't work - it just seems like really bad practice that could set someone up for a rude awakening when they someday start to relocate code at run time.
Disallowing seems blunt, if it does actually work, so maybe a warning is ok.
There may be places where this is useful ?
Certainly a generic JMP or CALL between memory segments could default to absolute, and use relative within a segment sub section.
If there are also over-rule opcodes like (eg) RCALL and ACALL, that gives the user control.
Well, yes, that's how the Verilog was written. I just didn't think to explain it that way.
? I'm not following ?
How do you propose "a separate mnemonic that does what was suggested, emulating in hubexec mode"
I'll get REP working in hub exec. I think I know what to do now and it's not that complex, at all. I just have to get the whole system running again after all these other changes, which are mainly in the assembler, and then I'll try getting REP going in hub exec.
If so, might be worth the effort of making it work both ways...
Hope this all works out, would be really nice...
Using REP for tight polling loops is no longer as important now that we have interrupts. Using REP for tight output loops may also be less important with smart pins. Remember that REP was added long before either of these concepts were included. It may be that REP is no longer as important as it once was. I'm not saying that it's not useful, but just less useful than it originally was.
(Edit: okay, maybe there are sacred cows here. I just don't know that REP is one of them.)
Every cycle we can squeeze out of a tight loop all helps.
Please don't take this the wrong way but maybe the current design is getting near the end in terms of increased performance based on choices that were made earlier.
REP isn't anything that can't be replaced with a couple of lines of code. Sure, it's a nice thought but if it doesn't fit with everything else... ditch it.
Who here wouldn't love to have a P2 in its current form without REP?
Sandy
And Chip may well add it too.
It works now, why remove something that is smaller and faster ?
* Timing is not the same in cog and hub exec modes, except for very small snippets.
* For code that's good enough to run in hub exec mode, it seems highly unlikely that there'd be a reason to run it in a cog instead.
* For code that's tuned to run in a cog, it seems highly unlikely that you'd run the code in the hub.
* The notion of binary portability is one where the code is assembled in one context (hub or cog), but loaded into the other context at run time. Since the code could have been assembled in the intended context to start with, the only reason you wouldn't do so is because of some sort of runtime conditional. But in light of the fact that hub and cog code have different qualities, I don't know what such a conditional would be.
* Except for extremely contrived (and trivial) examples, I'm hard-pressed to find a counter-example.
Maybe I'm just not being imaginative enough, but I suspect I'm really just seeing something that will be a fringe feature. And when fringe features start to dictate language/processor design (e.g. relative addressing behavior), I get concerned that we are putting the cart before the horse (or whatever your favorite idiom is). I'm not saying we should change anything at this point. Just food for thought...
Because that still requires an increase in size and complexity in the circuitry. As will adding support for it in hub exec mode.