And how can this work if you have more than one hubcode subroutine?
You will need to modify the ptrb for every hub-call, which is ugly, slow and needs a lot of unnecessary cog code.
It would all be so much easier if we get the real hub address with @ in DAT blocks.
Andy
It's not that bad, at all.
Remember that a block or ORGH code is self-referencing via relative addressing. LOC can be used to get the absolute address of a data table within hub code and branches are all relative. You just need to jump to the hub code, then it can take of itself.
The case of needing TWO disparate addresses, like in the code above, is exceptional and requires either an extra parameter or a simple PASM computation.
And how can this work if you have more than one hubcode subroutine?
You will need to modify the ptrb for every hub-call, which is ugly, slow and needs a lot of unnecessary cog code.
It would all be so much easier if we get the real hub address with @ in DAT blocks.
Andy
Agreed on getting the real hub address (by whatever means) in DAT.
As far as multiple hub code subroutines, I think one possible solution (which I haven't actually tried) could be:
CON
_clkfreq = 80_000_000
PUB Start()
coginit(16, @cogcode, 0)
DAT
org 'org automatically at $000
hub1addr long 0
hub2addr long 0
initaddr long 0
cogcode mov initaddr, ptrb
add initaddr,#@initcode-@cogcode
call initaddr
.loop drvnot #60
...
call hub1addr
...
call hub2addr
...
jmp #.loop
orgh 'orgh automatically at $400
initcode
mov hub1addr, ptrb
add hub1addr, @hub1-@cogcode
mov hub2addr, ptrb
add hub2addr, @hub2-@cogcode
ret
hub1
drvnot #56
waitx ##20_000_000
ret
hub2
drvnot #57
waitx ##10_000_000
ret
This would need one long in cog ram for each subroutine + 4 longs for the initialization, and would avoid potential issues with constantly modifying ptrb. And this allows for having a label on the call statement rather than a reference to ptrb, and is no slower once initialized. This isn't perfect, as you have to remember to call the subroutine address (without the #) instead of referencing the actual label via call #label.
I consider this a workaround.
There should be an automatic way of populating these addresses. The IBM 1130 I'm emulating could handle relocatable code in 4k words of core memory, can't we do the same now?
I'd just like to see cog/lut/hub addressing worked out so that it's consistent regardless of which tool you use.
If my object only works with PNut and your object only works with FastSpin, due to differences in addressing memory, then how do I use both at the same time?
For now I think I will use the padding suggestion.
Are you guys making this a lot more complicated than it needs to be?
Once you are in a block of ORGH code, you can jump to any label within, because it's all relatively addressed. You just need to jump to the entry point from Spin2.
If you need to access a data table, you can get its address using LOC.
If you need to do a jump table, use JMPREL followed by a list of JMP instructions. Regular jumps are coded plainly (JMP #address) since ORGH-to-ORGH jumps are all relative.
To convince you of this, can you describe a use case of concern that I can demonstrate for you?
Chip,
There are many times where you do need to know the absolute address of hub code.
It works now in PASM when you call hubexec routines from cog. The same really should be possible for spin2 too.
Just saying it's not necessary is totally missing the point. It's a micro, and we want total control of it. Not knowing the actual hub address is a design flaw IMHO, and I'm not alone. I don't care if it requires another pass if it cannot be resolved in the current passes.
We went thru many hoops in P1 to get this. That, and conditional compilation were the two things that gave Brad and Mark the inspiration to write bst and homespun because PropTool was never going to give us these features.
Chip,
I'm already **** off that P1 PASM and Spin is so vastly different that conversion to P2 is a total nightmare. It's just another nail in the coffin as far as I'm concerned.
To make absolute addressing work as you'd like in Spin2 is a complicated matter that goes against the grain of how everything is working. I will probably not be making such a change. Personally, I like the way it works, as is. To me, it's quite a minor issue. And I think the long-term benefit of having code not tied to absolute addresses is going to be useful for loading code dynamically in an operating-system context. Relative addressing exists for reasons which are not currently appreciated, given the state of tools and frameworks.
As far as differences between P1 and P2 silicon and Spin languages go, I think there are valid reasons for these, too. Making everything compatible would have introduced a lot of constraints that wouldn't have been very fun to live with.
Weird result in the Spin2 doc v34Q when saved as Word docx...
I see this (missing underscores in the long section)
Billions LONG 1 000 000 000 'Billions is a long symbol that points to 1_000_000_000
but when I copied and pasted, the underscores actually do exist, just not visible in word
Billions LONG 1_000_000_000 'Billions is a long symbol that points to 1_000_000_000
Does anyone else see this?? It will certainly confuse some people
Just found a second example
Double Binary %%21
%%01_23
%%3333_2222_1111_0000 ● Double binary values start with '%%' and use digits '0'..'3'
● Underscores '_' are allowed after the first digit for placeholding
in the %%01_23 the underscore is missing yet in the following line %%3333_2222_1111_0000 the underscores are shown.
I checked the binary from a cut and paste and the underscore is shown correctly as $5F.
Postedit:
The plot thickens...
In the Operators table the rendering of the text is wrong. I only see the lower half of each character. Seems Word is failing to render this document correctly
To make absolute addressing work as you'd like in Spin2 is a complicated matter that goes against the grain of how everything is working. I will probably not be making such a change. Personally, I like the way it works, as is. To me, it's quite a minor issue. And I think the long-term benefit of having code not tied to absolute addresses is going to be useful for loading code dynamically in an operating-system context. Relative addressing exists for reasons which are not currently appreciated, given the state of tools and frameworks.
As far as differences between P1 and P2 silicon and Spin languages go, I think there are valid reasons for these, too. Making everything compatible would have introduced a lot of constraints that wouldn't have been very fun to live with.
Chip, we'll just agree to disagree.
I already know how difficult it is to convert P1 PASM. I am about to try converting some P1 Spin to P2. For starters, I'll be converting my P1 OS which also contains a very large piece of spin written by Kye (the FAT driver - I have already re-written the PASM code that it requires. Currently I compile (for P1) using homespun as PropTool cannot handle various pieces.
Should I find there is no fun anymore, then I have another (non prop) project I've been wanting to do for years.
Here's what I'm running into (sorry if I ramble about this):
I've got a P1 cog that uses all but about 3-4 of the available longs. It uses tricks like using INB for zero (which obviously won't work for P2).
I'm trying to move the code over to the P2, with the addition of some debugging routines. So I don't really have any space to add these routines w/o using either lut or hub memory.
At this time I'm getting the debugger initialization code in the hub but would also like to have some other callable routines in the hub for various debugging purposes. So I want to be able to jump back and forth between cog and hub execution (basically cog calling hub routines, not the other way around).
Using a cog variable for the pointer for each subroutine certainly works, in principle. But in practice, what I have are cog calls that look like this:
call #cog_routine
And hub calls that look like this:
call hub_routine_addr
That distinction of # vs. no # drives me crazy. 10 years of P1 PASM has drilled into me that the # is required. And to have to use cog memory for hub addresses isn't any fun either.
I guess I just need to use the lut instead of the hub for these routines. Maybe that's where I should have started.
But it seems like a shame to not be able to take advantage of all of that hub ram and not worry about running out of code space...
Maybe try moving the whole thing into hubexec. Assuming timing isn't that critical, this will need some additional work to split off the working variables because they still generally will want to be cog registers.
PS: lutram is excellent for indexed 32-bit tables and buffers.
I already know how difficult it is to convert P1 PASM. I am about to try converting some P1 Spin to P2. For starters, I'll be converting my P1 OS which also contains a very large piece of spin written by Kye (the FAT driver - I have already re-written the PASM code that it requires. Currently I compile (for P1) using homespun as PropTool cannot handle various pieces.
Ray, fastspin can compile P1 Spin for the P2, so you could use that. It would save you some conversion.
Chip,
Some suggestions for the Spin2 Documentation v34Q (shows as v34M)...
In the spin2 docs the following could do with an explanation about little endian that in fact the long $12345678 is actually stored in memory lowest byte first so $78 $56 $34 $12
symbol4 BYTE 'symbol4 is a byte symbol that points to $78
symbol5 WORD 'symbol5 is a word symbol that points to $5678
symbol6 LONG 'symbol6 is a long symbol that points to $12345678
LONG $12345678 'long value $12345678
In this section, worth mentioning that the file data gets inserted here
FileDat FILE "Filename" 'include binary file, FileDat is a byte symbol that points to file
Add additional comment
Str0 BYTE "Monkeys",0 'strings with symbols
to:
Str0 BYTE "Monkeys",0 'strings with symbols (strings are 0=null terminated)
Can an optional byte/word/long value be included?
I do this sometimes to look thru code in a dump, especially to see if it's being overwritten.
Anyway LONG $xxxxxxxx[$-$nn] can achieve this now, and it works for cog/lut/hub.
ORGF $040 'fill to cog address $040 with zeros (no symbol allowed before ORGF)
Chip,
A question...
DAT ORGH 'begin a hub-exec program (no symbol allowed before ORGH)
'COGINIT(32+16, @IncPins, 0) will launch this program in a free cog
IncPins MOV DIRA,#$FF 'In Spin2, IncPins is the 'MOV' instruction (long)
Loop ADD OUTA,#1 'In Spin2, @IncPins is the hub address of the 'MOV' instruction
JMP #Loop 'In PASM, Loop is the hub address ($00404) of the 'ADD' instruction
If the JMP #Loop uses a relative address, is the relative value the same inserted into the instruction the same offset value for cog-exec vs hub-exec?
ie could the same hub-exec code run correctly if loaded into cog?
For consistency, might FIT become FITH for hub?
FIT $2000 'test to make sure hub address has not exceeded $2000
I believe the OTGH $FC000 will also fill hub with zeros up to $FC000.
Do we need an ORGHF for fill, and use ORGH to not fill ???
ORGHF $400 'set hub/hub-exec mode and fill hub memory with zeros to $400
This would permit us to compile a P2 PASM block of code for a specific hub location, and just output that block of code to a file. It (the pasm binary) could then be included in another program purely as a block of code using the FILE command. The included "file" could be positioned directly at the hub address where it would execute from (and compiled for) or it could be moved in software to it's hub address where it was designed to reside.
In the spin2 docs the following could do with an explanation about little endian that in fact the long $12345678 is actually stored in memory lowest byte first so $78 $56 $34 $12
symbol4 BYTE 'symbol4 is a byte symbol that points to $78
symbol5 WORD 'symbol5 is a word symbol that points to $5678
symbol6 LONG 'symbol6 is a long symbol that points to $12345678
LONG $12345678 'long value $12345678
That's more of a problem with how we document the numbers. Little endian should be written as they really are - least significant first. In this case LONG $87654321.
In the spin2 docs the following could do with an explanation about little endian that in fact the long $12345678 is actually stored in memory lowest byte first so $78 $56 $34 $12
symbol4 BYTE 'symbol4 is a byte symbol that points to $78
symbol5 WORD 'symbol5 is a word symbol that points to $5678
symbol6 LONG 'symbol6 is a long symbol that points to $12345678
LONG $12345678 'long value $12345678
That's more of a problem with how we document the numbers. Little endian should be written as they really are - least significant first. In this case LONG $87654321.
First, your typo...
LONG $87654321
reverses nibbles, so you meant
LONG $78563412
But no because when you say LONG $12345678 that is what you mean and want, and that is what will be in the register when it gets loaded. But if you look into hub as bytes, you will see - lets use a hub address of $400..
ORGH $400
LONG $12345678
will actually put the following into hub...
$400: $78
$401: $56
$402: $34
$403: $12
Now with Chips example...
symbol4 = symbol5 = symbol6 = hub address $400
so the byte at $400 is $78
and the word at $400 is $56 78 (because little endian reverses the bytes in rdword)
and the long at $400 is $12345678 (again because little endian reverses the bytes in rdlong)
Maybe try moving the whole thing into hubexec. Assuming timing isn't that critical, this will need some additional work to split off the working variables because they still generally will want to be cog registers.
PS: lutram is excellent for indexed 32-bit tables and buffers.
Wmosscrop, I think this is good advice. If it is going to be a problem to go between cog and hub, keep your app in a single ORGH block and use hubexec. Otherwise, you would probably need to make that table you were mulling over.
What you were doing could be simplified, somewhat, like this:
PUB Start()
coginit(16, @cogcode, @@0) 'ptra = object base (@@0)
DAT org
cogcode rep #2,#8 'update jump addresses
altd ptr,ptrmod
add 0-0,ptra
.loop call sub0
call sub1
call sub2
call sub3
call sub4
call sub5
call sub6
call sub7
jmp #.loop
sub0 long @sub0_
sub1 long @sub1_
sub2 long @sub2_
sub3 long @sub3_
sub4 long @sub4_
sub5 long @sub5_
sub6 long @sub6_
sub7 long @sub7_
delay long 2_000_000
ptr long 0
ptrmod long 1<<9 + sub0
orgh
sub0_ drvnot #56
_ret_ waitx delay
sub1_ drvnot #57
_ret_ waitx delay
sub2_ drvnot #58
_ret_ waitx delay
sub3_ drvnot #59
_ret_ waitx delay
sub4_ drvnot #60
_ret_ waitx delay
sub5_ drvnot #61
_ret_ waitx delay
sub6_ drvnot #62
_ret_ waitx delay
sub7_ drvnot #63
_ret_ waitx delay
If you want to call with #address, the following should work, but it's not tested.
This code modifies the hubaddresses in Spin before it starts the cog.
I fully understand. Humans have been taught to read most-significant-first but that's no the way little-endian works. So we concoct a strange mix to make it more readable ... and the result is confusion on the address boundaries because the display order is all jumbled.
evanh,
The bits in the bytes do not change order between big and little endian, nor do the nibbles. Just the order of the bytes in words, longs, etc. change order. I assume you know this, and are just arguing for some unrealistic world where we would display bytes nibble swapped on little endian systems.
Also, endian-ness only really matters when converting to a stream of bytes, either when you stick it in memory or when you send it in a packet. When you operate on a long values in registers, it's always the same on big or little endian. Shift up moves bits towards the MSB, and shift down moves them towards LSB. Arguing for your display mechanism goes completely counter to that.
Maybe try moving the whole thing into hubexec. Assuming timing isn't that critical, this will need some additional work to split off the working variables because they still generally will want to be cog registers.
PS: lutram is excellent for indexed 32-bit tables and buffers.
Yes, but that then precludes the use of XBYTE (which needs the FIFO used by hubexec), which I definitely want to use. It has potential for saving quite a bit of code in this particular cog.
What I don't understand is why the process used to build and load the binary into the P2 can't keep track of addresses. Isn't that the point of an assembler/loader pair? They already "know" that the referenced location is in the hub and what locations reference that location. At some point the absolute location of the referenced instruction has to be determined so that the code can be placed in the hub.
evanh,
The bits in the bytes do not change order between big and little endian, nor do the nibbles. Just the order of the bytes in words, longs, etc. change order. ...
That's just a manifestation of data formats in use. Namely, the base addressable unit of bytes.
Fundamentally, the root is still how we read numbers - Which is same as big-endian formats.
Yes, but that then precludes the use of XBYTE (which needs the FIFO used by hubexec), which I definitely want to use. It has potential for saving quite a bit of code in this particular cog.
Maybe try moving the whole thing into hubexec. Assuming timing isn't that critical, this will need some additional work to split off the working variables because they still generally will want to be cog registers.
PS: lutram is excellent for indexed 32-bit tables and buffers.
Yes, but that then precludes the use of XBYTE (which needs the FIFO used by hubexec), which I definitely want to use. It has potential for saving quite a bit of code in this particular cog.
What I don't understand is why the process used to build and load the binary into the P2 can't keep track of addresses. Isn't that the point of an assembler/loader pair? They already "know" that the referenced location is in the hub and what locations reference that location. At some point the absolute location of the referenced instruction has to be determined so that the code can be placed in the hub.
As objects are assembled into the top-level hierarchy, they are compiled bottom-up. At the highest level of the hierarchy, all sub-objects are brought in as binary blobs. All that is known about them is their PUB names and index values. The PUB names and index values of the sub-objects' objects have already been dealt with in prior compilations and now nothing more is known about them. They are just part of the sub-object blob. At each level of compilation, all levels of sub-objects are scanned for redundancies and the redundant instances are removed and things are reconnected without them. It would take a different approach to resolve all run-time addresses at compile time.
Chip,
If you made a new operator (@@@ for example), then when you compile it you do it similar to how @ would be compiled except you save the offset to the compiled bytecode in a fixup table and you force the bytecode compiled size to be always the same for this operator. This fixup table would need to be held in the blobs similar to the pub name and index values. Then when including child blobs during compiling you use their fixup tables to adjust their @@@ offsets by their offset with respect to the current compiling object, then you adjust the fixup table to be relative to the current object and add it to its fixup table. At the end you have one big fixup table that you can then use to do the final fixup of all @@@ offsets to be their absolute hub address. I think it might work out. If you even understand what I am trying to explain. It might be tricky to keep the fixup tables correct when distilling, but maybe not?
I tried a bunch to get @@@ working in OpenSpin way back whenever, and I failed to think of this idea above back then. If you don't poke it full of holes with things I missed, then I might get the gumption to implement it in OpenSpin.
... Shift up moves bits towards the MSB, and shift down moves them towards LSB. Arguing for your display mechanism goes completely counter to that.
It is a little tongue in checkcheek. It's not like I expect people to rewrite all documentation and software to fix the blunder. It'd be much better to fix it architecturally instead.
...
As objects are assembled into the top-level hierarchy, they are compiled bottom-up. At the highest level of the hierarchy, all sub-objects are brought in as binary blobs. All that is known about them is their PUB names and index values. The PUB names and index values of the sub-objects' objects have already been dealt with in prior compilations and now nothing more is known about them. They are just part of the sub-object blob. At each level of compilation, all levels of sub-objects are scanned for redundancies and the redundant instances are removed and things are reconnected without them. It would take a different approach to resolve all run-time addresses at compile time.
But when you have compiled the whole code, you know the position of each object. Just remember these positions in a table (only 1 long per object) and then start the whole compilation again. This time you add the remembered offset to every hub-label when you store it into the symbol table.
If there is a chance that code size per object change, because the symbol value is now bigger, then store the hub-labels with an offset of $400 or so in the first compilation.
This solution would need very few changes in the existing compiler. Yes, the compile time will double, but that's a minor issue.
Comments
It's not that bad, at all.
Remember that a block or ORGH code is self-referencing via relative addressing. LOC can be used to get the absolute address of a data table within hub code and branches are all relative. You just need to jump to the hub code, then it can take of itself.
The case of needing TWO disparate addresses, like in the code above, is exceptional and requires either an extra parameter or a simple PASM computation.
Agreed on getting the real hub address (by whatever means) in DAT.
As far as multiple hub code subroutines, I think one possible solution (which I haven't actually tried) could be:
This would need one long in cog ram for each subroutine + 4 longs for the initialization, and would avoid potential issues with constantly modifying ptrb. And this allows for having a label on the call statement rather than a reference to ptrb, and is no slower once initialized. This isn't perfect, as you have to remember to call the subroutine address (without the #) instead of referencing the actual label via call #label.
I consider this a workaround.
There should be an automatic way of populating these addresses. The IBM 1130 I'm emulating could handle relocatable code in 4k words of core memory, can't we do the same now?
if it's really an issue, you might try FastSpin. It's all absolute addressing there.
If my object only works with PNut and your object only works with FastSpin, due to differences in addressing memory, then how do I use both at the same time?
For now I think I will use the padding suggestion.
Once you are in a block of ORGH code, you can jump to any label within, because it's all relatively addressed. You just need to jump to the entry point from Spin2.
If you need to access a data table, you can get its address using LOC.
If you need to do a jump table, use JMPREL followed by a list of JMP instructions. Regular jumps are coded plainly (JMP #address) since ORGH-to-ORGH jumps are all relative.
To convince you of this, can you describe a use case of concern that I can demonstrate for you?
There are many times where you do need to know the absolute address of hub code.
It works now in PASM when you call hubexec routines from cog. The same really should be possible for spin2 too.
Just saying it's not necessary is totally missing the point. It's a micro, and we want total control of it. Not knowing the actual hub address is a design flaw IMHO, and I'm not alone. I don't care if it requires another pass if it cannot be resolved in the current passes.
We went thru many hoops in P1 to get this. That, and conditional compilation were the two things that gave Brad and Mark the inspiration to write bst and homespun because PropTool was never going to give us these features.
Chip,
I'm already **** off that P1 PASM and Spin is so vastly different that conversion to P2 is a total nightmare. It's just another nail in the coffin as far as I'm concerned.
To make absolute addressing work as you'd like in Spin2 is a complicated matter that goes against the grain of how everything is working. I will probably not be making such a change. Personally, I like the way it works, as is. To me, it's quite a minor issue. And I think the long-term benefit of having code not tied to absolute addresses is going to be useful for loading code dynamically in an operating-system context. Relative addressing exists for reasons which are not currently appreciated, given the state of tools and frameworks.
As far as differences between P1 and P2 silicon and Spin languages go, I think there are valid reasons for these, too. Making everything compatible would have introduced a lot of constraints that wouldn't have been very fun to live with.
Does anyone else see this?? It will certainly confuse some people
Just found a second example in the %%01_23 the underscore is missing yet in the following line %%3333_2222_1111_0000 the underscores are shown.
I checked the binary from a cut and paste and the underscore is shown correctly as $5F.
Postedit:
The plot thickens...
In the Operators table the rendering of the text is wrong. I only see the lower half of each character. Seems Word is failing to render this document correctly
Chip, we'll just agree to disagree.
I already know how difficult it is to convert P1 PASM. I am about to try converting some P1 Spin to P2. For starters, I'll be converting my P1 OS which also contains a very large piece of spin written by Kye (the FAT driver - I have already re-written the PASM code that it requires. Currently I compile (for P1) using homespun as PropTool cannot handle various pieces.
Should I find there is no fun anymore, then I have another (non prop) project I've been wanting to do for years.
I've got a P1 cog that uses all but about 3-4 of the available longs. It uses tricks like using INB for zero (which obviously won't work for P2).
I'm trying to move the code over to the P2, with the addition of some debugging routines. So I don't really have any space to add these routines w/o using either lut or hub memory.
At this time I'm getting the debugger initialization code in the hub but would also like to have some other callable routines in the hub for various debugging purposes. So I want to be able to jump back and forth between cog and hub execution (basically cog calling hub routines, not the other way around).
Using a cog variable for the pointer for each subroutine certainly works, in principle. But in practice, what I have are cog calls that look like this:
And hub calls that look like this:
That distinction of # vs. no # drives me crazy. 10 years of P1 PASM has drilled into me that the # is required. And to have to use cog memory for hub addresses isn't any fun either.
I guess I just need to use the lut instead of the hub for these routines. Maybe that's where I should have started.
But it seems like a shame to not be able to take advantage of all of that hub ram and not worry about running out of code space...
PS: lutram is excellent for indexed 32-bit tables and buffers.
Ray, fastspin can compile P1 Spin for the P2, so you could use that. It would save you some conversion.
Some suggestions for the Spin2 Documentation v34Q (shows as v34M)...
In the spin2 docs the following could do with an explanation about little endian that in fact the long $12345678 is actually stored in memory lowest byte first so $78 $56 $34 $12
In this section, worth mentioning that the file data gets inserted here
Add additional comment
Add note line:
Can an optional byte/word/long value be included?
I do this sometimes to look thru code in a dump, especially to see if it's being overwritten.
Anyway LONG $xxxxxxxx[$-$nn] can achieve this now, and it works for cog/lut/hub.
Chip,
A question... If the JMP #Loop uses a relative address, is the relative value the same inserted into the instruction the same offset value for cog-exec vs hub-exec?
ie could the same hub-exec code run correctly if loaded into cog?
For consistency, might FIT become FITH for hub?
I believe the OTGH $FC000 will also fill hub with zeros up to $FC000.
Do we need an ORGHF for fill, and use ORGH to not fill ???
This would permit us to compile a P2 PASM block of code for a specific hub location, and just output that block of code to a file. It (the pasm binary) could then be included in another program purely as a block of code using the FILE command. The included "file" could be positioned directly at the hub address where it would execute from (and compiled for) or it could be moved in software to it's hub address where it was designed to reside.
Does this make sense, and is it possible???
LONG $87654321
reverses nibbles, so you meant
LONG $78563412
But no because when you say LONG $12345678 that is what you mean and want, and that is what will be in the register when it gets loaded. But if you look into hub as bytes, you will see - lets use a hub address of $400..
Now with Chips example...
symbol4 = symbol5 = symbol6 = hub address $400
so the byte at $400 is $78
and the word at $400 is $56 78 (because little endian reverses the bytes in rdword)
and the long at $400 is $12345678 (again because little endian reverses the bytes in rdlong)
Wmosscrop, I think this is good advice. If it is going to be a problem to go between cog and hub, keep your app in a single ORGH block and use hubexec. Otherwise, you would probably need to make that table you were mulling over.
What you were doing could be simplified, somewhat, like this:
EDIT: To write the dump as true little-endian everywhere would be:
This code modifies the hubaddresses in Spin before it starts the cog.
Do you understand little endian?
You seem to be confusing memory addresses with data, and getting nibbles mixed as well.
I am referring to an example Chip has given in his docs. What he said is correct. I just think it deserves a little but brief further explanation.
The bits in the bytes do not change order between big and little endian, nor do the nibbles. Just the order of the bytes in words, longs, etc. change order. I assume you know this, and are just arguing for some unrealistic world where we would display bytes nibble swapped on little endian systems.
Also, endian-ness only really matters when converting to a stream of bytes, either when you stick it in memory or when you send it in a packet. When you operate on a long values in registers, it's always the same on big or little endian. Shift up moves bits towards the MSB, and shift down moves them towards LSB. Arguing for your display mechanism goes completely counter to that.
Yes, but that then precludes the use of XBYTE (which needs the FIFO used by hubexec), which I definitely want to use. It has potential for saving quite a bit of code in this particular cog.
What I don't understand is why the process used to build and load the binary into the P2 can't keep track of addresses. Isn't that the point of an assembler/loader pair? They already "know" that the referenced location is in the hub and what locations reference that location. At some point the absolute location of the referenced instruction has to be determined so that the code can be placed in the hub.
Fundamentally, the root is still how we read numbers - Which is same as big-endian formats.
As objects are assembled into the top-level hierarchy, they are compiled bottom-up. At the highest level of the hierarchy, all sub-objects are brought in as binary blobs. All that is known about them is their PUB names and index values. The PUB names and index values of the sub-objects' objects have already been dealt with in prior compilations and now nothing more is known about them. They are just part of the sub-object blob. At each level of compilation, all levels of sub-objects are scanned for redundancies and the redundant instances are removed and things are reconnected without them. It would take a different approach to resolve all run-time addresses at compile time.
If you made a new operator (@@@ for example), then when you compile it you do it similar to how @ would be compiled except you save the offset to the compiled bytecode in a fixup table and you force the bytecode compiled size to be always the same for this operator. This fixup table would need to be held in the blobs similar to the pub name and index values. Then when including child blobs during compiling you use their fixup tables to adjust their @@@ offsets by their offset with respect to the current compiling object, then you adjust the fixup table to be relative to the current object and add it to its fixup table. At the end you have one big fixup table that you can then use to do the final fixup of all @@@ offsets to be their absolute hub address. I think it might work out. If you even understand what I am trying to explain. It might be tricky to keep the fixup tables correct when distilling, but maybe not?
I tried a bunch to get @@@ working in OpenSpin way back whenever, and I failed to think of this idea above back then. If you don't poke it full of holes with things I missed, then I might get the gumption to implement it in OpenSpin.
But when you have compiled the whole code, you know the position of each object. Just remember these positions in a table (only 1 long per object) and then start the whole compilation again. This time you add the remembered offset to every hub-label when you store it into the symbol table.
If there is a chance that code size per object change, because the symbol value is now bigger, then store the hub-labels with an offset of $400 or so in the first compilation.
This solution would need very few changes in the existing compiler. Yes, the compile time will double, but that's a minor issue.
Andy