The addressing conundrum

cgracey · 2015-10-02 02:34

I've got the new relative branch behavior and JMPREL implemented, but I still need to make some assembler changes.

Here's the latest instruction set. Only thing new is JMPREL.

instructions.txt

cgracey · 2015-10-02 02:37

Roy Eltham wrote: »

Chip,
+1 for JMPREL D

I'm good with how things are now, but would prefer it if REP worked in hubexec also, Having it silently fail is bad in my opinion.

You know that REP in hub exec might provide ZERO increase in performance over DJNZ, as you'd often be waiting for the instruction FIFO to start reloading. It would be good just for cog exec and hub exec compatibility, though.

cgracey · 2015-10-02 02:45

Roy Eltham wrote: »

Chip,
+1 for JMPREL D

I'm good with how things are now, but would prefer it if REP worked in hubexec also, Having it silently fail is bad in my opinion.

I had to make it so it silently fails in hub exec by not engaging. Before that, it blew things sky-high.

Seairth · 2015-10-02 02:48

cgracey wrote: »

For 9-bit relative branches:
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.

This means the assembler needs to detect when the branch instruction (that has the 9-bit relative) and the target instruction are not aligned and throw an error.

jmg · 2015-10-02 02:55

Seairth wrote: »

cgracey wrote: »

For 9-bit relative branches:
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.

This means the assembler needs to detect when the branch instruction (that has the 9-bit relative) and the target instruction are not aligned and throw an error.

Yes, user and library code that was developed intending to "run in either" would need an align directive in the Assembler that would also enable checking.
There could be packer-features (db.dw etc) enabled at the same time too, if those help portability.

Maybe this directive is best on-by-default and advanced users can disable, if they wish ?

Seairth · 2015-10-02 02:56

cgracey wrote: »

Roy Eltham wrote: »

Chip,
+1 for JMPREL D

I'm good with how things are now, but would prefer it if REP worked in hubexec also, Having it silently fail is bad in my opinion.

You know that REP in hub exec might provide ZERO increase in performance over DJNZ, as you'd often be waiting for the instruction FIFO to start reloading. It would be good just for cog exec and hub exec compatibility, though.

I don't think performance is the concern when running anything in hub exec mode.

In the case of REP, I think it's more about the fact that it's concise and self-documenting. If I want to do exactly 8 loops of a short set of instructions, rep is simpler and cleaner than setting up a temporary counter register with an 8 and using DJNZ.

However, I do think getting REP working in hub exec is low on the priority list. I'd much rather see smart pins before I see REP made universal.

Seairth · 2015-10-02 03:01

jmg wrote: »

Seairth wrote: »

cgracey wrote: »

For 9-bit relative branches:
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.

This means the assembler needs to detect when the branch instruction (that has the 9-bit relative) and the target instruction are not aligned and throw an error.

Yes, user and library code that was developed intending to "run in either" would need an align directive in the Assembler that would also enable checking.
There could be packer-features (db.dw etc) enabled at the same time too, if those help portability.

Maybe this directive is best on-by-default and advanced users can disable, if they wish ?

No, this has nothing to do with making the code relocatable to cog/lut memory. Because of the left shift, the target instruction *must* be aligned to the branch instruction. But, because instruction alignment could change at any time, simply by doing the sort of thing Chip was demonstrating above with send_string, mutual instruction alignment must always be validated.

Cluso99 · 2015-10-02 03:03

Chip,
Shouldn't
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
be
In the case of hub exec, the 9 bits are sign-extended to 18 bits and then shifted left two bits for the offset.

Seairth · 2015-10-02 03:12

Cluso99 wrote: »

Chip,
Shouldn't
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
be
In the case of hub exec, the 9 bits are sign-extended to 18 bits and then shifted left two bits for the offset.

That's the same result, n'est pas?

jmg · 2015-10-02 03:14

Seairth wrote: »

jmg wrote: »

Seairth wrote: »

cgracey wrote: »

For 9-bit relative branches:
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.

This means the assembler needs to detect when the branch instruction (that has the 9-bit relative) and the target instruction are not aligned and throw an error.

Yes, user and library code that was developed intending to "run in either" would need an align directive in the Assembler that would also enable checking.
There could be packer-features (db.dw etc) enabled at the same time too, if those help portability.

Maybe this directive is best on-by-default and advanced users can disable, if they wish ?

No, this has nothing to do with making the code relocatable to cog/lut memory. Because of the left shift, the target instruction *must* be aligned to the branch instruction. But, because instruction alignment could change at any time, simply by doing the sort of thing Chip was demonstrating above with send_string, mutual instruction alignment must always be validated.

Ah, I see your point, that this 9-bit Reloc is a super-set problem, but the align directive I mentioned would work for this for HUB-only, as well as for "run in either" code.

I would expect ASM would always check the target is valid & reachable, which covers inside range, and also would check 2 lsb's match on 9-bit Reloc ?

Ariba · 2015-10-02 03:18

I think the Assembler can just emulate the REP in hubexec mode.
All you need is a reserved cog register for the loop counter. At the REP instruction the Assembler sets the loopcounter to the number of repeats, at the position where the REP ends, the Assembler inserts a DJNZ.

This may be much easier than finding a Verilog solution.

Andy

jmg · 2015-10-02 03:30

Ariba wrote: »

I think the Assembler can just emulate the REP in hubexec mode.
All you need is a reserved cog register for the loop counter. At the REP instruction the Assembler sets the loopcounter to the number of repeats, at the position where the REP ends, the Assembler inserts a DJNZ.

This may be much easier than finding a Verilog solution.

A good idea. The code size would change, but the source is the same.
If there is no spare register available, the process would need to generate an error.

Electrodude · 2015-10-02 03:37

Ariba wrote: »

I think the Assembler can just emulate the REP in hubexec mode.
All you need is a reserved cog register for the loop counter. At the REP instruction the Assembler sets the loopcounter to the number of repeats, at the position where the REP ends, the Assembler inserts a DJNZ.

This may be much easier than finding a Verilog solution.

Andy

That would make it possible to assemble the same PASM for hub or cog, but it wouldn't fix the problem of binary compatibility.

jmg · 2015-10-02 03:44

Electrodude wrote: »

That would make it possible to assemble the same PASM for hub or cog, but it wouldn't fix the problem of binary compatibility.

Correct, but it is better than a 'fail silently' alternative & it does allow single-source code.
Chip is looking into the Verilog, but it may prove too complex, making this a back-up idea.

Cluso99 · 2015-10-02 03:49

Seairth wrote: »

Cluso99 wrote: »

Chip,
Shouldn't
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
be
In the case of hub exec, the 9 bits are sign-extended to 18 bits and then shifted left two bits for the offset.

That's the same result, n'est pas?

True.
But there are an extra 2 MSBs here which can be ignored. But it could cause a bug when Verilog gets converted to gates in the final chip - we just don't know. May as well get it correct now.

cgracey · 2015-10-02 04:26

Seairth wrote: »

cgracey wrote: »

For 9-bit relative branches:
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.

This means the assembler needs to detect when the branch instruction (that has the 9-bit relative) and the target instruction are not aligned and throw an error.

Yep. It does that.

I'm going to add a new rule to the assembler that no relative jumps are allowed to go between cog/lut space and hub space. It's not that they wouldn't work - it just seems like really bad practice that could set someone up for a rude awakening when they someday start to relocate code at run time.

jmg · 2015-10-02 04:37

cgracey wrote: »

Yep. It does that.

I'm going to add a new rule to the assembler that no relative jumps are allowed to go between cog/lut space and hub space. It's not that they wouldn't work - it just seems like really bad practice that could set someone up for a rude awakening when they someday start to relocate code at run time.

Disallowing seems blunt, if it does actually work, so maybe a warning is ok.

There may be places where this is useful ?

Certainly a generic JMP or CALL between memory segments could default to absolute, and use relative within a segment sub section.

If there are also over-rule opcodes like (eg) RCALL and ACALL, that gives the user control.

cgracey · 2015-10-02 04:43

Cluso99 wrote: »

Chip,
Shouldn't
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
be
In the case of hub exec, the 9 bits are sign-extended to 18 bits and then shifted left two bits for the offset.

Well, yes, that's how the Verilog was written. I just didn't think to explain it that way.

Mike Green · 2015-10-02 04:45

I don't like the idea of emulating REP in hubexec mode. In cog mode it's an instruction. In hubexec mode it's multiple instructions using an implicit memory location. I wouldn't mind a separate mnemonic that does what was suggested, emulating in hubexec mode or using the REP instruction in cog mode. Assembly macros do that sort of thing all the time. I subscribe to the principle of "least surprise" and this violates that.

jmg · 2015-10-02 04:56

Mike Green wrote: »

I don't like the idea of emulating REP in hubexec mode. In cog mode it's an instruction. In hubexec mode it's multiple instructions using an implicit memory location. I wouldn't mind a separate mnemonic that does what was suggested, emulating in hubexec mode or using the REP instruction in cog mode. Assembly macros do that sort of thing all the time. I subscribe to the principle of "least surprise" and this violates that.

? I'm not following ?
How do you propose "a separate mnemonic that does what was suggested, emulating in hubexec mode"

cgracey · 2015-10-02 05:07

Mike Green wrote: »

I don't like the idea of emulating REP in hubexec mode. In cog mode it's an instruction. In hubexec mode it's multiple instructions using an implicit memory location. I wouldn't mind a separate mnemonic that does what was suggested, emulating in hubexec mode or using the REP instruction in cog mode. Assembly macros do that sort of thing all the time. I subscribe to the principle of "least surprise" and this violates that.

I'll get REP working in hub exec. I think I know what to do now and it's not that complex, at all. I just have to get the whole system running again after all these other changes, which are mainly in the assembler, and then I'll try getting REP going in hub exec.

Rayman · 2015-10-02 12:35

Is REP the only thing that is now different between Hub and cog exec for binary compatibility?

If so, might be worth the effort of making it work both ways...

Hope this all works out, would be really nice...

Seairth · 2015-10-02 13:02

On the other hand, how bad would it be if we got rid of REP altogether? (yes, another radical thought. there are no sacred cows here!)

Using REP for tight polling loops is no longer as important now that we have interrupts. Using REP for tight output loops may also be less important with smart pins. Remember that REP was added long before either of these concepts were included. It may be that REP is no longer as important as it once was. I'm not saying that it's not useful, but just less useful than it originally was.

(Edit: okay, maybe there are sacred cows here. I just don't know that REP is one of them.)

potatohead · 2015-10-02 13:38

Well, not everyone will use an interrupt. They may also choose fast loops for walking data of various kinds.

ozpropdev · 2015-10-02 13:48

We haven't got the performance that P2-Hot had.
Every cycle we can squeeze out of a tight loop all helps.

Alexander (Sandy) Hapgood · 2015-10-02 18:13

ozpropdev wrote: »

We haven't got the performance that P2-Hot had.
Every cycle we can squeeze out of a tight loop all helps.

Please don't take this the wrong way but maybe the current design is getting near the end in terms of increased performance based on choices that were made earlier.

REP isn't anything that can't be replaced with a couple of lines of code. Sure, it's a nice thought but if it doesn't fit with everything else... ditch it.

Who here wouldn't love to have a P2 in its current form without REP?

Sandy

potatohead · 2015-10-02 19:58

Why ditch it? Works fine, and is not needed for HUB.

And Chip may well add it too.

jmg · 2015-10-02 20:15

Alexander (Sandy) Hapgood wrote: »

REP isn't anything that can't be replaced with a couple of lines of code. Sure, it's a nice thought but if it doesn't fit with everything else... ditch it.

? - "replaced with a couple of lines of code" is already an admission the alternative is larger and slower.
It works now, why remove something that is smaller and faster ?

Seairth · 2015-10-02 21:36

I've been thinking about this notion of having code be binary portable between hub and cog space. The more I dwell on it, the more I think it's a bad habit to encourage. I'm not quite sure how to express my thoughts on this, though. So, instead I'll just throw out stuff in semi-random order and hope that you all get the gist.

* Timing is not the same in cog and hub exec modes, except for very small snippets.
* For code that's good enough to run in hub exec mode, it seems highly unlikely that there'd be a reason to run it in a cog instead.
* For code that's tuned to run in a cog, it seems highly unlikely that you'd run the code in the hub.
* The notion of binary portability is one where the code is assembled in one context (hub or cog), but loaded into the other context at run time. Since the code could have been assembled in the intended context to start with, the only reason you wouldn't do so is because of some sort of runtime conditional. But in light of the fact that hub and cog code have different qualities, I don't know what such a conditional would be.
* Except for extremely contrived (and trivial) examples, I'm hard-pressed to find a counter-example.

Maybe I'm just not being imaginative enough, but I suspect I'm really just seeing something that will be a fringe feature. And when fringe features start to dictate language/processor design (e.g. relative addressing behavior), I get concerned that we are putting the cart before the horse (or whatever your favorite idiom is). I'm not saying we should change anything at this point. Just food for thought...

Seairth · 2015-10-02 21:40

jmg wrote: »

Alexander (Sandy) Hapgood wrote: »

REP isn't anything that can't be replaced with a couple of lines of code. Sure, it's a nice thought but if it doesn't fit with everything else... ditch it.

? - "replaced with a couple of lines of code" is already an admission the alternative is larger and slower.
It works now, why remove something that is smaller and faster ?

Because that still requires an increase in size and complexity in the circuitry. As will adding support for it in hub exec mode.

The addressing conundrum

Comments