...Again, I don't see any need to support unaligned hub execution...
Here is an example of why unaligned hub code is important:
call @send_string
db 13,13,"The time is ",0
mov val,hours
call @send_decimal2
call @send_string
db ':',0
mov val,minutes
call @send_decimal2
call @send_string
db " and the date is ",0
...
You can do things like that, which is way better than having to get pointers to data located elsewhere.
But you can also do it with immediates:
call #send_string
db 13,13,"The time is ",0
mov val,hours
call #send_decimal2
call #send_string
db ':',0
mov val,minutes
call #send_decimal2
call #send_string
db " and the date is ",0
...
Edit: I can't imagine ever coding that up that way. I would gather all the strings up into a data area and have labels to them that would get passed to send_string. Makes for simpler code that is easier to change and edit in the future.
That is the more common approach, and how HLL would do it, but Chip's code is much smaller and elegantly simple.
Yes that would be kind of cool but you could still do this even without byte addressing if you pad the text to the next long boundary.
That would need a variant of db*, to append packers, and then send_string code needs to be packer-aware, or use round-up for the return address patch.
Maybe even a RET variant that uses bytepointer as a param, and knows to do the round-up ?
* a db variant that would be useful here, could be
nstr 13,13,"The time is " // c String insert, pack append null
lstr 13,13,"The time is " // Pascal/Basic string insert, length prefix, pack append
jmg,
Actually, Chip's code is not smaller at all. It's just arranged differently. The actual code of the send_string function would be smaller in my variant, since it wouldn't need to manipulate the return address.
I guess my variant having the labels added makes the source code a tiny amount larger, but I wouldn't call that much larger.
Honestly, for this kind of formatted string stuff I would not do it piecemeal like that at all, I'd implement something like printf.
...Again, I don't see any need to support unaligned hub execution...
Here is an example of why unaligned hub code is important:
call @send_string
db 13,13,"The time is ",0
mov val,hours
call @send_decimal2
call @send_string
db ':',0
mov val,minutes
call @send_decimal2
call @send_string
db " and the date is ",0
...
You can do things like that, which is way better than having to get pointers to data located elsewhere.
But you can also do it with immediates:
call #send_string
db 13,13,"The time is ",0
mov val,hours
call #send_decimal2
call #send_string
db ':',0
mov val,minutes
call #send_decimal2
call #send_string
db " and the date is ",0
...
And that would work the same, would it not?
Also, what is "db"?
Woops! That 'db' should be 'byte'. It looked so much like x86 code that I shifted into it.
Of course, you could use '#' instead of '@'. The point was to show that you could have subroutines that pop the return address, read some variable number of data, and then return right after the data.
I guess my variant having the labels added makes the source code a tiny amount larger, but I wouldn't call that much larger.
You need a label for every string, and that label in HUB likely needs a 32b load which is 8 bytes every string.
"a tiny amount" ? - depends on how you define tiny!
db is Data Byte a common Assembler directive for entering strings and arrays of bytes.
Many assemblers also have
db,dw,dd,dq, for increasingly larger 'pieces'
eg
DD 0,0C800F800H,19990101,4711,'ABCD',3.1415927
call #send_string
db 13,13,"The time is ",0
mov val,hours
call #send_decimal2
call #send_string
db ':',0
mov val,minutes
call #send_decimal2
call #send_string
db " and the date is ",0
...
And that would work the same, would it not?
even better, is to do what (all?) other assemblers do
call send_string
db 13,13,"The time is ",0
mov val,hours
call send_decimal2
call send_string
db ':',0
mov val,minutes
call send_decimal2
call send_string
db " and the date is ",0
...
That code is readable to someone coming from almost any other MCU and the labels are auto-managed by the tools.
Sorry, but I don't see the necessity to mix strings and code. But hey, if you want to do that then why not just make the compiler pad to the next long.
I am certain the simplest model is...
Instruction Model
- All addresses in LONG format
- All addresses long-aligned
COG/LUT Memory Model
- All data and register addresses in LONG format
- All data and register addresses long-aligned
HUB Memory Model
- All data addresses in BYTE format
- All data addresses byte-aligned
- Accessible using SETQ and RD/WR-LONG/WORD/BYTE
The models are simple.
The instructions remain the same in both hub-exec and cog-exec modes.
The only break from long addresses and long aligned is for HUB Memory access which is in bytes.
Special instructions are used to access the hub.
I guess my variant having the labels added makes the source code a tiny amount larger, but I wouldn't call that much larger.
You need a label for every string, and that label in HUB likely needs a 32b load which is 8 bytes every string.
"a tiny amount" ? - depends on how you define tiny!
Sigh...
So, yes, it's one extra instruction to load the address of the string before calling the function (2 if it needs 32bit). Chip's code has to use extra instructions in the function to get the address of the string, and then change it and put it back for the return to work. So for a few strings, it's a wash. Mine does get larger with more strings, but only a small amount.
That code example doesn't work in COG/LUT space at all. For COG/LUT space you'd have to do it the way I said, where the strings are in hub, the code in cog/lut, and use labels to address the strings. If you code it that way to start with then the code can work in either hub or cog space.
Again, I wouldn't do any of this in this way at all in real code.
In any case, allowing code to be unaligned in hub space is what's causing us to have binary incompatibility between cug/hub spaces.
I've been thinking very hard about what Cluso99 is saying, since he's being so adamant. And I see the ramifications for compiler makers.
I think I have a solution that will please everyone.
Consider that ANY code that is going to run in both cog and hub must use relative jumps within itself, as absolute execution addresses are different between modes.
And here is the whole problem with cog exec vs. hub exec: In cog exec, the PC steps by 1, whereas in hub exec it must step by 4. This creates different relative address encodings which make binaries incompatible between cog exec and hub exec modes.
Well, what if we assembled those 20-bit relative addresses in cog code as shifted left by two bits? This will give them the same expanse as hub code. Then, whenever we are in cog exec, we always shift relative addresses down by two bits before adding them to the PC. Now, the same binary will run in both modes.
Does anyone see a problem with this?
No problem, but that's because I'm familiar with the P1 way; new users will probably find it hard to wrap their head around (as I probably did oh so many years ago that I can't remember). Why not just extend it a little further and have everything in bytes (COG, LUT, HUB, & PC in all exec modes) like everyone else (and GCC) does ? This would have binary compatibility in all modes (except for REP) and allow non-long-aligned hub execution.
I know the >>2 was confusing before, but this should be taken care of by the compiler, not by us. I.e., an absolute immediate instruction address (in bytes) will be parsed by the compiler, and if it meets the criteria for COGexec, it will take bits 2-10 and put it in the instruction encoding; if not, it will take bits 0-8 for the instruction and bits 9-19 for the ALTDS. If we want to specify ## so the compiler doesn't do it behind our back (or instruct it to warn us if we put in a number larger than $7ff with a single #), we can, but we don't need to.
In any case, I don't think it's more confusing than having data tables in hub and cog/lut that need to be corrected by >>2 to be consistent (i.e., Cog_ptr++; Hub_ptr+=4, which I have messed up many times), and will disappear if everything is in bytes, along with .cog_ram/.pasm.
call #send_string
db 13,13,"The time is ",0
mov val,hours
call #send_decimal2
call #send_string
db ':',0
mov val,minutes
call #send_decimal2
call #send_string
db " and the date is ",0
...
And that would work the same, would it not?
even better, is to do what (all?) other assemblers do
call send_string
db 13,13,"The time is ",0
mov val,hours
call send_decimal2
call send_string
db ':',0
mov val,minutes
call send_decimal2
call send_string
db " and the date is ",0
...
That code is readable to someone coming from almost any other MCU and the labels are auto-managed by the tools.
It seems relative addressing should be the standard inside self-contained blocks.
Absolute addressing should be for hub <--> cog.
We need to figure out what the rules should be, because it would be awfully nice to make #/@ optional.
call @send_string
byte "The time is ",0
mov val,hours
call @send_decimal2
call @send_string
byte ':',0
mov val,minutes
call @send_decimal2
call @sendstring
byte " and the data is ",0
...
send_string pop ptra
:loop rdbyte chr,ptra++ wz
if_z jmp ptra
call @send_chr
jmp @:loop
Note the fluidity with which registers (PTRA in this case) can interact with your code address. If we got rid of the PC's full 20 bits in their current arrangement, in order to do a long-index PC, you couldn't things like this. We'd need special instructions to masssage over the two-bit differences.
db is Data Byte a common Assembler directive for entering strings and arrays of bytes.
Many assemblers also have
db,dw,dd,dq, for increasingly larger 'pieces'
eg
DD 0,0C800F800H,19990101,4711,'ABCD',3.1415927
Yes, I knew what it was in other assemblers. But not in PASM, which is why I asked.
I totally get Chip's slippage though. I've done that more than a few times myself when working in multiple languages at the same time.
even better, is to do what (all?) other assemblers do
call send_string
db 13,13,"The time is ",0
mov val,hours
call send_decimal2
call send_string
db ':',0
mov val,minutes
call send_decimal2
call send_string
db " and the date is ",0
...
That code is readable to someone coming from almost any other MCU and the labels are auto-managed by the tools.
It seems relative addressing should be the standard inside self-contained blocks.
We need to figure out what the rules should be, because it would be awfully nice to make #/@ optional.
Most assemblers use the smallest opcode, usually, that is relative by default.
eg CALL and JMP are the generic 'use the smallest'
Some have a different call to force far or absolute - eg FCALL or LCALL
and can force small(relative) with SJMP or RJMP
That can help with forward labels on assemblers that limit passes.
In all cases, if the ASM does find a label is 'too far' an error is generated.
Chip,
Agreed that relative addressing should be the default inside self-contained blocks of code, and absolute addressing between separate blocks (doesn't matter if they are hub <--> cog or within the same space. For example, we may be compiling code that is relocatable or replaceable. So we need to be able to override the default.
...In any case, allowing code to be unaligned in hub space is what's causing us to have binary incompatibility between cug/hub spaces.
I've already implemented the shifts for relative branches.
For 9-bit relative branches:
In the case of cog exec, the 9 bits are sign-extended to 20 bits for the offset.
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
For 20-bit relative branches:
In the case of cog exec, they are arithmetically shifted right by two bits for the offset.
In the case of hub exec, they are used as-is.
So, do we not have binary portability (with the current exception of REP) for relocatable code?
I've already implemented the shifts for relative branches.
For 9-bit relative branches:
In the case of cog exec, the 9 bits are sign-extended to 20 bits for the offset.
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
For 20-bit relative branches:
In the case of cog exec, they are arithmetically shifted right by two bits for the offset.
In the case of hub exec, they are used as-is.
So, do we not have binary portability (with the current exception of REP) for relocatable code?
Try it and see ?
Binary portability (with the current exception of REP) for relocatable code, would be nice.
You mentioned before some issues with relative branches not being portable ?
I think a change in alignment is the killer, so tools could have an align-mode that avoided/warned on any change inside a code block.
Things like db could have a align variant, to allow users to code mostly 'long-legal'
even better, is to do what (all?) other assemblers do
call send_string
db 13,13,"The time is ",0
mov val,hours
call send_decimal2
call send_string
db ':',0
mov val,minutes
call send_decimal2
call send_string
db " and the date is ",0
...
That code is readable to someone coming from almost any other MCU and the labels are auto-managed by the tools.
It seems relative addressing should be the standard inside self-contained blocks.
We need to figure out what the rules should be, because it would be awfully nice to make #/@ optional.
Most assemblers use the smallest opcode, usually, that is relative by default.
eg CALL and JMP are the generic 'use the smallest'
Some have a different call to force far or absolute - eg FCALL or LCALL
and can force small(relative) with SJMP or RJMP
That can help with forward labels on assemblers that limit passes.
In all cases, if the ASM does find a label is 'too far' an error is generated.
With 32-bit opcodes, we pack full 20-bit addresses for JMP/CALL/CALLA/CALLB/CALLD/LOC into single instructions. So, there is no way to get smaller branches. The only quirk is the D,@ branches which are 9-bit relative. Those only reach -256..+255 instructions, but they also perform operations on D.
Relative addressing works from anywhere to anywhere. It's just that in the case of relocatable code, you wouldn't want to have relative branches coming into or going out of your code, since you're moving around. Those external addresses would either have to be absolute or in registers at run time.
What about non-relative branches? Do they not work differently in hub vs cog space?
If I have a function that calls other functions at absolute addresses, can that function be run in hub space and also copied to cog space and run there too? The function it calls don't move, just the function itself.
I've already implemented the shifts for relative branches.
For 9-bit relative branches:
In the case of cog exec, the 9 bits are sign-extended to 20 bits for the offset.
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
For 20-bit relative branches:
In the case of cog exec, they are arithmetically shifted right by two bits for the offset.
In the case of hub exec, they are used as-is.
So, do we not have binary portability (with the current exception of REP) for relocatable code?
Try it and see ?
Binary portability (with the current exception of REP) for relocatable code, would be nice.
You mentioned before some issues with relative branches not being portable ?
I think a change in alignment is the killer, so tools could have an align-mode that avoided/warned on any change inside a code block.
Things like db could have a align variant, to allow users to code mostly 'long-legal'
What I'm implementing now fixes the old problem of relative addresses winding up differently between cog and hub code. They'll be the same now. The only caveat for hub code to run in the cog will be that its instructions are relatively long-aligned, so that they go into the cog's long registers properly. Oh, and don't use REP in hub code - but I'm going to see about changing that, just for compatibility's sake.
What about non-relative branches? Do they not work differently in hub vs cog space?
If I have a function that calls other functions at absolute addresses, can that function be run in hub space and also copied to cog space and run there too? The function it calls don't move, just the function itself.
No, absolute branches are absolute. They will work from anywhere.
Only relative branches work differently, but nobody would probably notice, because things would just work as expected in either domain.
The only caveat for hub code to run in the cog will be that its instructions are relatively long-aligned, so that they go into the cog's long registers properly.
It should be easy to have the tools check that, if users can tag a code area as aligned.
Of course, you could use '#' instead of '@'. The point was to show that you could have subroutines that pop the return address, read some variable number of data, and then return right after the data.
Sorry. I mixed up two different conversations. I think your example code was addressing the question of unaligned instructions. I was focusing on it with respect to the conversation about relative addressing.
It seems relative addressing should be the standard inside self-contained blocks.
Absolute addressing should be for hub <--> cog.
We need to figure out what the rules should be, because it would be awfully nice to make #/@ optional.
I agree. The issue (that I focused on) with your example above is that I do not consider the send_string and send_decimal2 subroutines to be part of the same self-contained block. Hence the use of immediate addressing.
I've already implemented the shifts for relative branches.
For 9-bit relative branches:
In the case of cog exec, the 9 bits are sign-extended to 20 bits for the offset.
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
For 20-bit relative branches:
In the case of cog exec, they are arithmetically shifted right by two bits for the offset.
In the case of hub exec, they are used as-is.
So, do we not have binary portability (with the current exception of REP) for relocatable code?
Ah... well. Ignore my last comments, then. We'll have to try it out and see how it feels.
When might we expect the next image to be released?
Comments
But you can also do it with immediates:
And that would work the same, would it not?
Also, what is "db"?
That is the more common approach, and how HLL would do it, but Chip's code is much smaller and elegantly simple.
That would need a variant of db*, to append packers, and then send_string code needs to be packer-aware, or use round-up for the return address patch.
Maybe even a RET variant that uses bytepointer as a param, and knows to do the round-up ?
* a db variant that would be useful here, could be
Actually, Chip's code is not smaller at all. It's just arranged differently. The actual code of the send_string function would be smaller in my variant, since it wouldn't need to manipulate the return address.
I guess my variant having the labels added makes the source code a tiny amount larger, but I wouldn't call that much larger.
Honestly, for this kind of formatted string stuff I would not do it piecemeal like that at all, I'd implement something like printf.
Woops! That 'db' should be 'byte'. It looked so much like x86 code that I shifted into it.
Of course, you could use '#' instead of '@'. The point was to show that you could have subroutines that pop the return address, read some variable number of data, and then return right after the data.
"a tiny amount" ? - depends on how you define tiny!
db is Data Byte a common Assembler directive for entering strings and arrays of bytes.
Many assemblers also have
db,dw,dd,dq, for increasingly larger 'pieces'
eg
DD 0,0C800F800H,19990101,4711,'ABCD',3.1415927
even better, is to do what (all?) other assemblers do
That code is readable to someone coming from almost any other MCU and the labels are auto-managed by the tools.
I am certain the simplest model is...
Instruction Model
- All addresses in LONG format
- All addresses long-aligned
COG/LUT Memory Model
- All data and register addresses in LONG format
- All data and register addresses long-aligned
HUB Memory Model
- All data addresses in BYTE format
- All data addresses byte-aligned
- Accessible using SETQ and RD/WR-LONG/WORD/BYTE
The models are simple.
The instructions remain the same in both hub-exec and cog-exec modes.
The only break from long addresses and long aligned is for HUB Memory access which is in bytes.
Special instructions are used to access the hub.
So, yes, it's one extra instruction to load the address of the string before calling the function (2 if it needs 32bit). Chip's code has to use extra instructions in the function to get the address of the string, and then change it and put it back for the return to work. So for a few strings, it's a wash. Mine does get larger with more strings, but only a small amount.
That code example doesn't work in COG/LUT space at all. For COG/LUT space you'd have to do it the way I said, where the strings are in hub, the code in cog/lut, and use labels to address the strings. If you code it that way to start with then the code can work in either hub or cog space.
Again, I wouldn't do any of this in this way at all in real code.
In any case, allowing code to be unaligned in hub space is what's causing us to have binary incompatibility between cug/hub spaces.
No problem, but that's because I'm familiar with the P1 way; new users will probably find it hard to wrap their head around (as I probably did oh so many years ago that I can't remember). Why not just extend it a little further and have everything in bytes (COG, LUT, HUB, & PC in all exec modes) like everyone else (and GCC) does ? This would have binary compatibility in all modes (except for REP) and allow non-long-aligned hub execution.
I know the >>2 was confusing before, but this should be taken care of by the compiler, not by us. I.e., an absolute immediate instruction address (in bytes) will be parsed by the compiler, and if it meets the criteria for COGexec, it will take bits 2-10 and put it in the instruction encoding; if not, it will take bits 0-8 for the instruction and bits 9-19 for the ALTDS. If we want to specify ## so the compiler doesn't do it behind our back (or instruct it to warn us if we put in a number larger than $7ff with a single #), we can, but we don't need to.
In any case, I don't think it's more confusing than having data tables in hub and cog/lut that need to be corrected by >>2 to be consistent (i.e., Cog_ptr++; Hub_ptr+=4, which I have messed up many times), and will disappear if everything is in bytes, along with .cog_ram/.pasm.
It seems relative addressing should be the standard inside self-contained blocks.
Absolute addressing should be for hub <--> cog.
We need to figure out what the rules should be, because it would be awfully nice to make #/@ optional.
Note the fluidity with which registers (PTRA in this case) can interact with your code address. If we got rid of the PC's full 20 bits in their current arrangement, in order to do a long-index PC, you couldn't things like this. We'd need special instructions to masssage over the two-bit differences.
Yes, I knew what it was in other assemblers. But not in PASM, which is why I asked.
I totally get Chip's slippage though. I've done that more than a few times myself when working in multiple languages at the same time.
Does relative addressing not work across the boundary, or it is more a limited-reach issue ?
Most assemblers use the smallest opcode, usually, that is relative by default.
eg CALL and JMP are the generic 'use the smallest'
Some have a different call to force far or absolute - eg FCALL or LCALL
and can force small(relative) with SJMP or RJMP
That can help with forward labels on assemblers that limit passes.
In all cases, if the ASM does find a label is 'too far' an error is generated.
Agreed that relative addressing should be the default inside self-contained blocks of code, and absolute addressing between separate blocks (doesn't matter if they are hub <--> cog or within the same space. For example, we may be compiling code that is relocatable or replaceable. So we need to be able to override the default.
I've already implemented the shifts for relative branches.
For 9-bit relative branches:
In the case of cog exec, the 9 bits are sign-extended to 20 bits for the offset.
In the case of hub exec, the 9 bits are sign-exended to 20 bits and then shifted left two bits for the offset.
For 20-bit relative branches:
In the case of cog exec, they are arithmetically shifted right by two bits for the offset.
In the case of hub exec, they are used as-is.
So, do we not have binary portability (with the current exception of REP) for relocatable code?
Try it and see ?
Binary portability (with the current exception of REP) for relocatable code, would be nice.
You mentioned before some issues with relative branches not being portable ?
I think a change in alignment is the killer, so tools could have an align-mode that avoided/warned on any change inside a code block.
Things like db could have a align variant, to allow users to code mostly 'long-legal'
With 32-bit opcodes, we pack full 20-bit addresses for JMP/CALL/CALLA/CALLB/CALLD/LOC into single instructions. So, there is no way to get smaller branches. The only quirk is the D,@ branches which are 9-bit relative. Those only reach -256..+255 instructions, but they also perform operations on D.
Relative addressing works from anywhere to anywhere. It's just that in the case of relocatable code, you wouldn't want to have relative branches coming into or going out of your code, since you're moving around. Those external addresses would either have to be absolute or in registers at run time.
What about non-relative branches? Do they not work differently in hub vs cog space?
If I have a function that calls other functions at absolute addresses, can that function be run in hub space and also copied to cog space and run there too? The function it calls don't move, just the function itself.
What I'm implementing now fixes the old problem of relative addresses winding up differently between cog and hub code. They'll be the same now. The only caveat for hub code to run in the cog will be that its instructions are relatively long-aligned, so that they go into the cog's long registers properly. Oh, and don't use REP in hub code - but I'm going to see about changing that, just for compatibility's sake.
No, absolute branches are absolute. They will work from anywhere.
Only relative branches work differently, but nobody would probably notice, because things would just work as expected in either domain.
JMPREL D
It will add D to the PC for zero-based jump tables.
It will use D<<2 when in hub exec mode.
This will provide a 'case' mechanism which will work in portable binaries.
It should be easy to have the tools check that, if users can tag a code area as aligned.
That's nice to cover, but the tools can catch a REP in the wrong place, if this is too difficult.
Sorry. I mixed up two different conversations. I think your example code was addressing the question of unaligned instructions. I was focusing on it with respect to the conversation about relative addressing.
But, since I am on that topic...
I agree. The issue (that I focused on) with your example above is that I do not consider the send_string and send_decimal2 subroutines to be part of the same self-contained block. Hence the use of immediate addressing.
IMHO you are at least headed in the right direction.
But I still believe this is more complicated than it needs to be, all for the sake of being able to run non-long-aligned in hubexec (only).
BTW forget REP if it is too difficult. Its easy to workaround if we need portable code between hubexec and cogexec.
Ah... well. Ignore my last comments, then. We'll have to try it out and see how it feels.
When might we expect the next image to be released?
That sounds cool
Case tables are very useful things.
Hopefully, within 24 hours.
+1 for JMPREL D
I'm good with how things are now, but would prefer it if REP worked in hubexec also, Having it silently fail is bad in my opinion.