This is all pretty simple. We just need to be sure we are presenting it all optimally, through assembler syntax and documentation.
In my opinion, it's really not. It seems to me that several people who have been heavily involved in these conversations thought that "@" worked like I did (which, based on your comments, was wrong). This may somewhat have to do with the last week worth of syntax tweaks, but I also think it has to do with the added complexity that hub exec has brought.
Or maybe it's just me who wasn't getting it. Either way, I think the original post helps ensure we are all moving forward with the same understanding.
Thanks, but Chip said "@label is superfluous when used on labels that were declared under ORGH"
& IIRC also COG/HUB boundary crossing was always in absolute mode ?
Can you expand the above for more than one COG, using real code a user might run.
& IIRC also COG/HUB boundary crossing was always in absolute mode ?
Is this in response to JMP #4? If so, notice that we are forcing the hub addressing of an ORG label. As a result, it looks to the assembler like we are doing a hub-to-hub JMP, not a hub-to-cog.
Can you expand the above for more than one COG, using real code a user might run.
Even if you inserted a COGINIT, all of the interpretations above would still be the same. Some of the numbers would be different, but that's it. Do you have an example in mind?
Is this in response to JMP #4? If so, notice that we are forcing the hub addressing of an ORG label. As a result, it looks to the assembler like we are doing a hub-to-hub JMP, not a hub-to-cog.
Add another COG block into that example, and there is real potential for a new user to get totally lost.
In most cases, a COG segment would expect to copy and run in the COG, but it is still possible to call the original, in hub, in hubexec mode.
The use of # on labels removes any segment checking, and even a COG-COG call would assemble ok on the examples above.
To my instincts, that is just dangerous: the tools should encapsulate code areas better than that.
Consider that PNut, as it is, is just an assembler right now. It's not performing any organizing of its own, yet.
Maybe gas is an alternative ? - but reading up on gas, shows it also has some shortcomings, (eg one pass, not great at call/jmp resizing) and yasm claims to improve on that.
It would be great to somehow get cog exec addresses to the top of hub space, but there seems to be no good way.
I assume instruction offsets in COG, LUT and HUB space are absolute addresses allocated when the instruction is assembled. To force COG-EXEC code to top of HUB RAM, these and any references to them will need to be relative.
'example code
orgh ' in HUB-space
coginit 1,cog_code_in_hub_space ' cog_code_in_hub_space is a relative address
' in the HUB-space model
' do something here, blah, blah, blah
org
cog_code_in_hub_space ' this an relative in HUB-space model,
' absolute for the COG-space model as we deal
' in instructions (longs)
nop ' first instruction in COG-space
nop ' second instruction in COG-space
' end of code for PNUT
After assembling the code, because the addresses are realtive in HUB-space, the COG-EXEC code can be relocated to the top of HUB RAM and all relative references can be fixed up.
Edited to add:
These relative/absolute addresses are only within PNUT when assembling, to relocate COG_EXEC image
If we keep pnut simple and clean, we get this chip bootstrapped.
Maybe pnut level of functionality ends up as the on chip tool. Simple, dirty, etc...
That is a fine goal for that use case.
From there, we get an open spin type port and that can be a considerably better tool in terms of checks, etc... additionally, many things happen!
Spin2 gets done, gcc and friends do too.
That level of capability is what the P1 tools look like and there was a lot done on those and it was not hard to do either.
And on that note... harvesting pasm...
If it is fun and easy to do, there will be pasm just like on p1. The division of labor is simple. The people who make that code make the most of it when they have a good time doing it.
Other people come along and want that code for their tools and they can do the work to use it.
Seems fair enough given a lot of that will happen because it is fun to do. If it is not fun, closer to work, lots of people won't bother, making for a small harvest.
Just think about that.
Anyway, pnut is the low level, on chip tool, and we need it to get the booter, cryo, monitir, assembler in chip done.
From there, expectations can and should be raised for the masses and those with pro type needs.
And the cases are very different. When Baggers and I do fun video and sprite drivers, for example, that needs to be fun, or count on it not happening. The less we have to know about the tools, and the leaner and more flexible they are, the better.
That is fun.
Eric Ball and I did a bunch of stuff too. Same dynamics. We had fun thinking about vide I signals and writing PASM for them. Too many requirements and that just does not happen and we all lose.
Making it portable, work with gcc, etc... isn't fun, and all of that work is valid, but also a drain. I myself don't have the time, but I'm also happy to put code out there for others who do have the time.
Look at the PASM in gcc discussion here right now. I want no part of that. I can get a while cool driver done on the time it takes to resolve all of that.
This keep it simple and clean discussion matters in terms of what there actually will be to work with, and I know I speak for many who put goid, high value stuff in the obex.
Having code to work with is a nice problem to have. The pasm authors do their part, others can add value and do theirs.
Anyway, we are very early, boot strapping. More and better tools to come.
Thete isn't anything about this addressing that can't be parsed and better managed when we get to the next level.
At this one, the small lean nature is what makes this happen. There are the minimum number of dependencies, and a rapid cycle.
I would not be testing if I had to go and get gas, deal with a binary loader, and all the complexity that goes along. There is no need at present for anything like that.
As it is right now, one can know near nothing, type a few lines hit the button and it works. If we lose that, I'm gone. Thete are better things to do.
Cog at 0 makes sweet use of the source and destination 9 bit addresses. That keeps cogs simple and fast.
Moving them trades that for some low hub ram for code space we can easily put data in. For newbies, and everyone in general, having the cog simple and fast really matters.
I really don't want em moved for those reasons. I don't see it as a net gain, and it makes moving code from P1 and p1 skills difficult. Not worth it, IMHO.
Consider that PNut, as it is, is just an assembler right now. It's not performing any organizing of its own, yet.
Maybe gas is an alternative ? - but reading up on gas, shows it also has some shortcomings, (eg one pass, not great at call/jmp resizing) and yasm claims to improve on that.
What would the benefit be of putting cogram in upper hubram? It would make addressing even more complicated than it is now, with the same location of cogram having different addresses depending on if you're executing it or using it as data.
The 1K of executable hub space it might gain doesn't matter. Nobody will ever write a program that uses (512K - 1K) for hubexec without running out of RAM first for lack of data space (which can go in that first 1K).
After assembling the code, because the addresses are realtive in HUB-space, the COG-EXEC code can be relocated to the top of HUB RAM and all relative references can be fixed up.
Edited to add:
These relative/absolute addresses are only within PNUT when assembling, to relocate COG_EXEC image
This approach has some appeal, because there are two addresses at play during build.
Seairth's example above shows this, and when one adds 16 COGs into the mix, it comes down to confusion over which address are users thinking about.
There is a transport address, which is where every byte packs into loader memory, before the device launches COGS and then there is run-time address, for COG and HUB.
Internal to a loaded COG, run-time address are 9 bits, calls into HUB are absolute 22b values.
If the tools managed and reported the addresses with a COG prefix, (example like c1_, c3_) then COG values read more easily as COG-relative, and hub addresses, that may scatter across many COGs as 'owners', also clearly relate to the COG.
The report files are easier to read, and there is a final pass of a 'loader packer' that takes each COGs [COG+HUB] memory and packs those images into the loader-binary.
The sparse image the assembler creates in the first pass, is simply packed in the loader-export. (and a final loader map reported, but that is not what the user focuses on)
Overall, the tools use more memory, but it is still under 2M Bytes.
Comments
I thought COG code ran in a COG, and has no from-HUB access opcodes ?
Or do you mean code that might run in two places ?
Okay. I've updated the first post. Hopefully, I've captured this well enough.
In my opinion, it's really not. It seems to me that several people who have been heavily involved in these conversations thought that "@" worked like I did (which, based on your comments, was wrong). This may somewhat have to do with the last week worth of syntax tweaks, but I also think it has to do with the added complexity that hub exec has brought.
Or maybe it's just me who wasn't getting it. Either way, I think the original post helps ensure we are all moving forward with the same understanding.
Examples-of-use are always good....
& IIRC also COG/HUB boundary crossing was always in absolute mode ?
Can you expand the above for more than one COG, using real code a user might run.
That is exactly what both JMP #2 and #6 are showing (as compared to JMP #1 and #5, respectively).
Is this in response to JMP #4? If so, notice that we are forcing the hub addressing of an ORG label. As a result, it looks to the assembler like we are doing a hub-to-hub JMP, not a hub-to-cog.
Even if you inserted a COGINIT, all of the interpretations above would still be the same. Some of the numbers would be different, but that's it. Do you have an example in mind?
Add another COG block into that example, and there is real potential for a new user to get totally lost.
In most cases, a COG segment would expect to copy and run in the COG, but it is still possible to call the original, in hub, in hubexec mode.
The use of # on labels removes any segment checking, and even a COG-COG call would assemble ok on the examples above.
To my instincts, that is just dangerous: the tools should encapsulate code areas better than that.
https://github.com/yasm/yasm/wiki/Faq
http://yasm.tortall.net/
The move immediate relative opcode suggestion resolved some of the issues of doing this.
What issues remain ?
After assembling the code, because the addresses are realtive in HUB-space, the COG-EXEC code can be relocated to the top of HUB RAM and all relative references can be fixed up.
Edited to add:
These relative/absolute addresses are only within PNUT when assembling, to relocate COG_EXEC image
Why?
Maybe pnut level of functionality ends up as the on chip tool. Simple, dirty, etc...
That is a fine goal for that use case.
From there, we get an open spin type port and that can be a considerably better tool in terms of checks, etc... additionally, many things happen!
Spin2 gets done, gcc and friends do too.
That level of capability is what the P1 tools look like and there was a lot done on those and it was not hard to do either.
And on that note... harvesting pasm...
If it is fun and easy to do, there will be pasm just like on p1. The division of labor is simple. The people who make that code make the most of it when they have a good time doing it.
Other people come along and want that code for their tools and they can do the work to use it.
Seems fair enough given a lot of that will happen because it is fun to do. If it is not fun, closer to work, lots of people won't bother, making for a small harvest.
Just think about that.
Anyway, pnut is the low level, on chip tool, and we need it to get the booter, cryo, monitir, assembler in chip done.
From there, expectations can and should be raised for the masses and those with pro type needs.
And the cases are very different. When Baggers and I do fun video and sprite drivers, for example, that needs to be fun, or count on it not happening. The less we have to know about the tools, and the leaner and more flexible they are, the better.
That is fun.
Eric Ball and I did a bunch of stuff too. Same dynamics. We had fun thinking about vide I signals and writing PASM for them. Too many requirements and that just does not happen and we all lose.
Making it portable, work with gcc, etc... isn't fun, and all of that work is valid, but also a drain. I myself don't have the time, but I'm also happy to put code out there for others who do have the time.
Look at the PASM in gcc discussion here right now. I want no part of that. I can get a while cool driver done on the time it takes to resolve all of that.
This keep it simple and clean discussion matters in terms of what there actually will be to work with, and I know I speak for many who put goid, high value stuff in the obex.
Having code to work with is a nice problem to have. The pasm authors do their part, others can add value and do theirs.
Anyway, we are very early, boot strapping. More and better tools to come.
Thete isn't anything about this addressing that can't be parsed and better managed when we get to the next level.
At this one, the small lean nature is what makes this happen. There are the minimum number of dependencies, and a rapid cycle.
I would not be testing if I had to go and get gas, deal with a binary loader, and all the complexity that goes along. There is no need at present for anything like that.
As it is right now, one can know near nothing, type a few lines hit the button and it works. If we lose that, I'm gone. Thete are better things to do.
Let's keep the lowest level that way.
Moving them trades that for some low hub ram for code space we can easily put data in. For newbies, and everyone in general, having the cog simple and fast really matters.
I really don't want em moved for those reasons. I don't see it as a net gain, and it makes moving code from P1 and p1 skills difficult. Not worth it, IMHO.
The 1K of executable hub space it might gain doesn't matter. Nobody will ever write a program that uses (512K - 1K) for hubexec without running out of RAM first for lack of data space (which can go in that first 1K).
This approach has some appeal, because there are two addresses at play during build.
Seairth's example above shows this, and when one adds 16 COGs into the mix, it comes down to confusion over which address are users thinking about.
There is a transport address, which is where every byte packs into loader memory, before the device launches COGS and then there is run-time address, for COG and HUB.
Internal to a loaded COG, run-time address are 9 bits, calls into HUB are absolute 22b values.
If the tools managed and reported the addresses with a COG prefix, (example like c1_, c3_) then COG values read more easily as COG-relative, and hub addresses, that may scatter across many COGs as 'owners', also clearly relate to the COG.
The report files are easier to read, and there is a final pass of a 'loader packer' that takes each COGs [COG+HUB] memory and packs those images into the loader-binary.
The sparse image the assembler creates in the first pass, is simply packed in the loader-export. (and a final loader map reported, but that is not what the user focuses on)
Overall, the tools use more memory, but it is still under 2M Bytes.