Thanks, but the bug that Praxis found is very worrisome. I knew it was a theoretical possibility, but I thought the probability of it actually occurring was infinitesimally small, like the probability that the Large Hadron Collider will create a black hole when it's turned on tomorrow and suck us all into oblivion. So of course Praxis had to find it. Guess that means we're doomed tomorrow.
The problem boils down to this: the silly compiler wants to generate code to push a number that is an offset into the object (case statements and lookup/down do this). The offset happens to be 1022, which requires a three-byte sequence. Unfortunately, three bytes here will move the target from 1022 to 1023. OK, let's push 1023. Well, that happens to be a two-byte sequence, so now the target moves back to 1022... and we're stuck in a loop.
There's a similar potential problem with jumps that take variable-length operands. Anyone have any experience in this area? I'm going to look at the code proptool generates here and try to figure out how it does it.
I'm pretty sure that if I make it so that variable-length sequences never shrink, the process will eventually converge, but I wonder if it will match proptool...
Easy.. generate all jumps as 2 byte addresses, then do a recursive reduce on all the addresses.
I can generally generate the optimum code in 2 reduction passes, but if you are going for 100% bytecode compatibility then watch out as the Parallax compiler does not tighten in one particular instance.
mpark said...
I knew it was a theoretical possibility, but I thought the probability of it actually occurring was infinitesimally small, like the probability that the Large Hadron Collider will create a black hole when it's turned on tomorrow and suck us all into oblivion.
I'd wait until tomorrow to try to fix it then. Who knows? You may not have to!
BradC, tell me more. Start big and try to shrink? How are you recursively reducing? I have a linear data structure I just iterate over, so I don't have much opportunity for recursion. I'm probably doing it wrong. Well, obviously I am.
And what's the "one particular instance"? Don't be coy, man, you're killing me!
Edit: Wait, do you mean "jump 63" being compiled with the two-byte offset $80 $3f instead of just $3f?
I downloaded the Mono binary for the MacOS and it runs the compiler. I haven't tried to compile anything yet, maybe later in the week. Now on to make some scripts to run this from my text editor (TextWrangler) along with running the Python downloader.
mpark said...
Oh, hippy, I've always wondered: how do you draw all those nice ASCII box diagrams? They're very helpful.
MS-DOS text editor and practice. I can knock them out quicker than using a graphics package. Helps to use an editor which converts tabs to spaces.
mpark said...
The problem boils down to this: the silly compiler wants to generate code to push a number that is an offset into the object (case statements and lookup/down do this). The offset happens to be 1022, which requires a three-byte sequence. Unfortunately, three bytes here will move the target from 1022 to 1023. OK, let's push 1023. Well, that happens to be a two-byte sequence, so now the target moves back to 1022... and we're stuck in a loop.
There's a similar potential problem with jumps that take variable-length operands. Anyone have any experience in this area? I'm going to look at the code proptool generates here and try to figure out how it does it.
I'm pretty sure that if I make it so that variable-length sequences never shrink, the process will eventually converge, but I wonder if it will match proptool...
Have the same problem with the LCC bytecode converter assembly process, and in fact it's a problem for all forward references in any assembler with variable length operands. Older assemblers/chips got round that by ( like 6800 ) using BRA and JMP so fixed but different size for NEAR and FAR. Here there's a problem.
My solution ( theory, not implemented yet ) is to first pass compile/assemble with worst case sized destination address ( ie long ) then a further pass through to determine where a smaller destination size could be used and shrink the code, re-assemble from that point on. Repeat until it's the smallest size.
The advantage is that after initial code generation the code will run even if bloated so it's not dependent on this phase of optimisation/compression working. Trying to start small and grow big is harder. This way you shouldn't get stuck in a something shrinks then has to grow again loop.
That change a one-byte 1024 to a two-byte 1022 is a problem still. The best solution is to optimise from the inwards code outwards on the fly rather than as a separate poet-processing pass ( although inner code can be found in that pass ). I was going to mark code in blocks once I'd reached a minimal solution and then never optimise within those blocks again.
You can probably do a two phase processing - generate with worse case address size, shrink to minimum is first phase, reduce to packed numbers in second phase.
To match how the Prop Tool does it is going to require studying it. Not creating exact same code isn't a problem IMO although desirable. There will be a few cases where program code works on PropTool won't work using homespun but that should be a rarity.
A lot depends perhaps on how you've implemented your compiler and how you do the assembly, on the fly or as a later pass, whether linear or as chunks of pre-assembled image returned from called functions. To get the same you may need a re-design.
First thing is to decide how important exact code generation matching is. Obviously people will compare your .eeprom with PropTool .eeprom so maybe a warning message that code may be different will be useful so people are expecting differences if seen. A listing will also help as you can flag in that where code may diverge. Anyone interested will probably fire-up a disassembler to check the two codes, most people probably won't care.
BradC, tell me more. Start big and try to shrink? How are you recursively reducing? I have a linear data structure I just iterate over, so I don't have much opportunity for recursion. I'm probably doing it wrong. Well, obviously I am.
And what's the "one particular instance"? Don't be coy, man, you're killing me!
Edit: Wait, do you mean "jump 63" being compiled with the two-byte offset $80 $3f instead of just $3f?
No, when referencing a STRING() constant, the address is compiled absolute zero referenced 2 bytes even if one is required. There is no recursive reduction on it.
I don't use a linear array, I have a stack of tokens and simply repeatedly adjust any of them that have variable length addresses.
I start by allocating 4 byte constants and 2 byte addresses then just shrink them pass-by pass until it matches the output of the parallax compiler.
I thought it was strange that STRING uses a zero based address and does not reduce.. all the others do.
I've not encountered the problem you see with the 1023-1024 address boundary, but I'm trying to reproduce it just in case..
I'm sure you've hit the weird lsb in some conditions of float parsing when the mantissa is past the accuracy. I still can't figure out how to reliably adjust it to accurately reproduce the results from the Parallax compiler. Next step is going to be IDA I guess.. no better way to figure out the weird bits..
I've never written a compiler before. It's great fun [noparse]:)[/noparse]
Thanks, but the bug that Praxis found is very worrisome. I knew it was a theoretical possibility, but I thought the probability of it actually occurring was infinitesimally small, like the probability that the Large Hadron Collider will create a black hole when it's turned on tomorrow and suck us all into oblivion. So of course Praxis had to find it. Guess that means we're doomed tomorrow.
They say tomorrow never comes[noparse]:)[/noparse]
In terms of recursion, that means finding the two closest jmp and label pairs ( that is with no other jmp and label between them ) ...
jmp label
:
label
Compact the code there to a minimum and then move outwards ignoring the inner which is already compacted, compact any further inner jmp-label pairs before compacting the whole. It doesn't have to be recursive but that's perhaps the easiest way to do it. By 'jmp', any bytecode/opcode with a reference to a later destination address.
If I remember correctly, the algorithm used successfully in the past for optimizing variable length operand addresses is to generate the worst case initially, then make multiple passes through the instructions shortening any addresses that can be shortened until two passes result in no changes. Since you always shorten the program, the process will terminate.
Thanks for the pointers. I've updated the first post with version 0.13. This corrects two problems found in heater's con_test.spin and the unhandled exception that hippy found.
It also compiles Praxis's (well, really Cluso99's) clusodebugger_251. I go from small addresses to big, never shortening, the reverse of what everyone suggested, not to be contrary, but because I'm lazy and just set all addresses to 0 to begin with. I estimate the probability that this approach will fail to be about the same as the probability that all the air molecules in this room will spontaneously jump to one side... what the?... can't... breathe... Whoa, that was weird.
Brad, you should definitely look at clusodebugger_251. You'll see Proptool push 1023 with 39 03 ff instead of the shorter 37 29. Also, yes, I've encountered the unreduced string addresses. And the floating-point LSB discrepancy (in fact it's what causes the difference in heater's second test). I've decided not to worry about floating-point stuff. But what is IDA?
Hippy, I've pulled all _SPACE stuff for now. Unfortunately the design (or lack thereof) of homespun makes doing what you suggest rather more difficult than it should be. I'll try again.
@Praxis: I am using the RamInterpreter to exercise the clusodebugger. I can run 50,000 traces after which I stop it. However, I have introduced a bug (not believed to be in the debug section that I released, but in the section I have added for spin debuging, and also for exercising the code.
I have been writing code comparisons, but keeping them in sync, together with other likely bugs is moking it overly complex. I am going to sleep on it tonight.
Michaels program is great - just waiting for a listing of my code to print out (not listing the interpreter).
mpark said...
Hippy, I've pulled all _SPACE stuff for now. Unfortunately the design (or lack thereof) of homespun makes doing what you suggest rather more difficult than it should be. I'll try again.
To do the same manually ( to reserve a single byte ) in PropTool the code would be this -
PRI ThisNeverGetsUsed
PUB Main
rest of code
Could you not fake the presence of a PRI just before the first ever real PRI/PUB of the main program ? It doesn't really matter if it takes up / wastes some space in the object link table as a pointer to a method which will never get called. Of course, rather than generate a one byte 'return' you'll want to generate N zeroes depending on what _SPACE is.
In the PropTool to reserve more than one byte, just keep adding 'return', one per byte.
Just ideas so no need of a reply and take your time and decide what your priorities are. It's amazing enough as it is.
Brad, you should definitely look at clusodebugger_251. You'll see Proptool push 1023 with 39 03 ff instead of the shorter 37 29.
Oh, oh oh.. I remember now..
Proptool _never_ seems to use mask constants for address constants. I've actually had to add a flag to my constant generator to disable mask constant optimisation when used to generate address constants...
There are a few other peculiar behaviors I seem to recall, but I need to dig them out of the code as I've not documented them anywhere..
Not surprising really.. it makes the problem you describe about the shrinking variables harder to hit as it never gets smaller if the variable increases.
Maybe I should never say _never_.. but I've seen plenty of times when my code would generate a maskable constant and proptool generates a standard constant.
mpark said...
Hippy -- I don't think that will really work in Proptool. PRIs end up after PUBs in memory. Or maybe I'm still not getting it.
Sorry you're right. PRI placed after PUB in image not just in the object link table. You could use a faked PUB PreMain then just makes sure the PINIT at $000C points to the second PUB.
Why do you even need to do that? DAT comes before PUB and you can just declare arbitrary amounts of blank space in the DAT portion. Am I missing something basic here?
A Faked PUB method will work also. You don't _have_ to start from the first pub method, just make sure pcurr points to the method you do want to start from and the stack base is correctly configured.
This request may be a little unusual but if its implemented then I can add inheritance to DOL (I know, it needs other work first...). In propTool all your spin variables get rearranged to long,word,byte order. This is done to save space but if it wasn't done than I can call another object with a var offset of 0. That basically means that if all the variables are copied from the parent object at compile time and placed in the start of the subobject file then you will be able to do inheritance fairly simply.
So, all that needs to be added to the compiler is a switch to disable the variable sorting. (and a lot of work by me to fix DOL :-( ). Everything else should be able to be done by a preprocessor.
Mike Green said...
The DAT sections and the PUB/PRI sections don't necessarily appear in the binary in the order in which they appeared in the source program
No indeed.. DAT always comes first long aligned after the object method table. then PUB then PRI..
so I would have thought that for what Hippy wanted, assigning some blank DAT space in the top object would have been precisely what he was after.
stevenmess2004 said...
This request may be a little unusual but if its implemented then I can add inheritance to DOL (I know, it needs other work first...). In propTool all your spin variables get rearranged to long,word,byte order. This is done to save space but if it wasn't done than I can call another object with a var offset of 0. That basically means that if all the variables are copied from the parent object at compile time and placed in the start of the subobject file then you will be able to do inheritance fairly simply.
So, all that needs to be added to the compiler is a switch to disable the variable sorting. (and a lot of work by me to fix DOL :-( ). Everything else should be able to be done by a preprocessor.
Don't forget the variable sorting is also done to guarantee alignment.
long aaa
byte bbb, ccc, ddd
word eee
long fff
would waste a byte after ddd and 2 bytes after eee
Additionally by having all the longs up front, variable access to the first 8 of them uses less bytecode than a direct address..
It makes sense if you look at it from another angle.
BradC, tell me more. Start big and try to shrink? How are you recursively reducing? I have a linear data structure I just iterate over, so I don't have much opportunity for recursion. I'm probably doing it wrong. Well, obviously I am.
I have not read the whole thread and I may misunderstand what's being discussed here, but if you are talking about branch length optimization, the optimal code is to generate short branches and then iterate to lengthen as needed. Something like this:
change = true;
while (change) {
change = false;
for all short jump instructions
if (the target is out-of-reach)
change instr to long jmp
change = false;
}
There was a paper back in the 60s that demonstrates this generates optimal code in more cases than shortening long jmps. The implementation is left as an exercise You'll need to track the "PC" of the instr, and also a link to the target.
BradC said...
Don't forget the variable sorting is also done to guarantee alignment.
long aaa
byte bbb, ccc, ddd
word eee
long fff
would waste a byte after ddd and 2 bytes after eee
Additionally by having all the longs up front, variable access to the first 8 of them uses less bytecode than a direct address..
It makes sense if you look at it from another angle.
I'm aware of that, however, my scheme depends on the variables being in the same order. You could implement it something like this
'In parent object
word aaa
long bbb
'compiles to
long bbb
word aaa
'in sub-object
PVAR 'new block type to tell the compiler that these variables are special and should be put at the start of the var section
word aaa
long bbb
VAR
word ccc
long ddd
'compiles to
long bbb
word aaa
long ddd
word ccc
If you did that then I would still get the desired effect and waste a maximum of three bytes which shouldn't be too much. I would have to write a pre-processor to copy the required variables into the sub-object but that should be easy.
Mike Green said...
The DAT sections and the PUB/PRI sections don't necessarily appear in the binary in the order in which they appeared in the source program
No indeed.. DAT always comes first long aligned after the object method table. then PUB then PRI..
so I would have thought that for what Hippy wanted, assigning some blank DAT space in the top object would have been precisely what he was after.
DAT
long 0[noparse][[/noparse]_space]
ought to do it nicely.
That only works when DAT is in the top-most object ( main program ).
DAT comes after the object link table of the object it's in so if a sub-object wants to reserve
space at the very top of image there's no way for it to be done without editing the main
program to add that. The _SPACE in a sub-object must reserve space at the top of the image,
_SPACE should be accumulative. How multiple sub-objects all wanting to use that space sort
out what they use would need to be considered but that's no different to how multiple objects
share other memory they wish to reserve for themselves.
That only works when DAT is in the top-most object ( main program ).
DAT comes after the object link table of the object it's in so if a sub-object wants to reserve
space at the very top of image there's no way for it to be done without editing the main
program to add that. The _SPACE in a sub-object must reserve space at the top of the image,
_SPACE should be accumulative. How multiple sub-objects all wanting to use that space sort
out what they use would need to be considered but that's no different to how multiple objects
share other memory they wish to reserve for themselves.
Oh.. I see now.. ugh that is positively bletcherous.. Doable though.. as you compile all the child objects first you can just tot up all the _space requests and offset the top DAT block by that amount. It's not actually difficult. I guess it becomes awkward when you start having multiple declarations of child objects though.
does
OBJ
fred[noparse][[/noparse]6] : "freddo"
.. reserve 6 times the _SPACE requirement that fred has or just the one?
Comments
The problem boils down to this: the silly compiler wants to generate code to push a number that is an offset into the object (case statements and lookup/down do this). The offset happens to be 1022, which requires a three-byte sequence. Unfortunately, three bytes here will move the target from 1022 to 1023. OK, let's push 1023. Well, that happens to be a two-byte sequence, so now the target moves back to 1022... and we're stuck in a loop.
There's a similar potential problem with jumps that take variable-length operands. Anyone have any experience in this area? I'm going to look at the code proptool generates here and try to figure out how it does it.
I'm pretty sure that if I make it so that variable-length sequences never shrink, the process will eventually converge, but I wonder if it will match proptool...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Michael Park
PS, BTW, and FYI:
To search the forum, use search.parallax.com (do not use the Search button).
Check out the Propeller Wiki: propeller.wikispaces.com/
Post Edited (mpark) : 9/10/2008 4:47:46 AM GMT
I can generally generate the optimum code in 2 reduction passes, but if you are going for 100% bytecode compatibility then watch out as the Parallax compiler does not tighten in one particular instance.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
BradC, tell me more. Start big and try to shrink? How are you recursively reducing? I have a linear data structure I just iterate over, so I don't have much opportunity for recursion. I'm probably doing it wrong. Well, obviously I am.
And what's the "one particular instance"? Don't be coy, man, you're killing me!
Edit: Wait, do you mean "jump 63" being compiled with the two-byte offset $80 $3f instead of just $3f?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Michael Park
PS, BTW, and FYI:
To search the forum, use search.parallax.com (do not use the Search button).
Check out the Propeller Wiki: propeller.wikispaces.com/
Post Edited (mpark) : 9/10/2008 6:41:38 AM GMT
Have the same problem with the LCC bytecode converter assembly process, and in fact it's a problem for all forward references in any assembler with variable length operands. Older assemblers/chips got round that by ( like 6800 ) using BRA and JMP so fixed but different size for NEAR and FAR. Here there's a problem.
My solution ( theory, not implemented yet ) is to first pass compile/assemble with worst case sized destination address ( ie long ) then a further pass through to determine where a smaller destination size could be used and shrink the code, re-assemble from that point on. Repeat until it's the smallest size.
The advantage is that after initial code generation the code will run even if bloated so it's not dependent on this phase of optimisation/compression working. Trying to start small and grow big is harder. This way you shouldn't get stuck in a something shrinks then has to grow again loop.
That change a one-byte 1024 to a two-byte 1022 is a problem still. The best solution is to optimise from the inwards code outwards on the fly rather than as a separate poet-processing pass ( although inner code can be found in that pass ). I was going to mark code in blocks once I'd reached a minimal solution and then never optimise within those blocks again.
You can probably do a two phase processing - generate with worse case address size, shrink to minimum is first phase, reduce to packed numbers in second phase.
To match how the Prop Tool does it is going to require studying it. Not creating exact same code isn't a problem IMO although desirable. There will be a few cases where program code works on PropTool won't work using homespun but that should be a rarity.
A lot depends perhaps on how you've implemented your compiler and how you do the assembly, on the fly or as a later pass, whether linear or as chunks of pre-assembled image returned from called functions. To get the same you may need a re-design.
First thing is to decide how important exact code generation matching is. Obviously people will compare your .eeprom with PropTool .eeprom so maybe a warning message that code may be different will be useful so people are expecting differences if seen. A listing will also help as you can flag in that where code may diverge. Anyone interested will probably fire-up a disassembler to check the two codes, most people probably won't care.
No, when referencing a STRING() constant, the address is compiled absolute zero referenced 2 bytes even if one is required. There is no recursive reduction on it.
I don't use a linear array, I have a stack of tokens and simply repeatedly adjust any of them that have variable length addresses.
I start by allocating 4 byte constants and 2 byte addresses then just shrink them pass-by pass until it matches the output of the parallax compiler.
I thought it was strange that STRING uses a zero based address and does not reduce.. all the others do.
I've not encountered the problem you see with the 1023-1024 address boundary, but I'm trying to reproduce it just in case..
I'm sure you've hit the weird lsb in some conditions of float parsing when the mantissa is past the accuracy. I still can't figure out how to reliably adjust it to accurately reproduce the results from the Parallax compiler. Next step is going to be IDA I guess.. no better way to figure out the weird bits..
I've never written a compiler before. It's great fun [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
They say tomorrow never comes[noparse]:)[/noparse]
Compact the code there to a minimum and then move outwards ignoring the inner which is already compacted, compact any further inner jmp-label pairs before compacting the whole. It doesn't have to be recursive but that's perhaps the easiest way to do it. By 'jmp', any bytecode/opcode with a reference to a later destination address.
It also compiles Praxis's (well, really Cluso99's) clusodebugger_251. I go from small addresses to big, never shortening, the reverse of what everyone suggested, not to be contrary, but because I'm lazy and just set all addresses to 0 to begin with. I estimate the probability that this approach will fail to be about the same as the probability that all the air molecules in this room will spontaneously jump to one side... what the?... can't... breathe... Whoa, that was weird.
Brad, you should definitely look at clusodebugger_251. You'll see Proptool push 1023 with 39 03 ff instead of the shorter 37 29. Also, yes, I've encountered the unreduced string addresses. And the floating-point LSB discrepancy (in fact it's what causes the difference in heater's second test). I've decided not to worry about floating-point stuff. But what is IDA?
Hippy, I've pulled all _SPACE stuff for now. Unfortunately the design (or lack thereof) of homespun makes doing what you suggest rather more difficult than it should be. I'll try again.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Michael Park
PS, BTW, and FYI:
To search the forum, use search.parallax.com (do not use the Search button).
Check out the Propeller Wiki: propeller.wikispaces.com/
Nice work, my thought for testing the clusodebugger was to give your compiler something meaty to chew on.
Cheers
I have been writing code comparisons, but keeping them in sync, together with other likely bugs is moking it overly complex. I am going to sleep on it tonight.
Michaels program is great - just waiting for a listing of my code to print out (not listing the interpreter).
To do the same manually ( to reserve a single byte ) in PropTool the code would be this -
Could you not fake the presence of a PRI just before the first ever real PRI/PUB of the main program ? It doesn't really matter if it takes up / wastes some space in the object link table as a pointer to a method which will never get called. Of course, rather than generate a one byte 'return' you'll want to generate N zeroes depending on what _SPACE is.
In the PropTool to reserve more than one byte, just keep adding 'return', one per byte.
Just ideas so no need of a reply and take your time and decide what your priorities are. It's amazing enough as it is.
Oh, oh oh.. I remember now..
Proptool _never_ seems to use mask constants for address constants. I've actually had to add a flag to my constant generator to disable mask constant optimisation when used to generate address constants...
There are a few other peculiar behaviors I seem to recall, but I need to dig them out of the code as I've not documented them anywhere..
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
Brad -- Never? That is surprising.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Michael Park
PS, BTW, and FYI:
To search the forum, use search.parallax.com (do not use the Search button).
Check out the Propeller Wiki: propeller.wikispaces.com/
Not surprising really.. it makes the problem you describe about the shrinking variables harder to hit as it never gets smaller if the variable increases.
Maybe I should never say _never_.. but I've seen plenty of times when my code would generate a maskable constant and proptool generates a standard constant.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
Sorry you're right. PRI placed after PUB in image not just in the object link table. You could use a faked PUB PreMain then just makes sure the PINIT at $000C points to the second PUB.
A Faked PUB method will work also. You don't _have_ to start from the first pub method, just make sure pcurr points to the method you do want to start from and the stack base is correctly configured.
Why can't you use DAT?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
So, all that needs to be added to the compiler is a switch to disable the variable sorting. (and a lot of work by me to fix DOL :-( ). Everything else should be able to be done by a preprocessor.
No indeed.. DAT always comes first long aligned after the object method table. then PUB then PRI..
so I would have thought that for what Hippy wanted, assigning some blank DAT space in the top object would have been precisely what he was after.
DAT
long 0[noparse][[/noparse]_space]
ought to do it nicely.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
Don't forget the variable sorting is also done to guarantee alignment.
long aaa
byte bbb, ccc, ddd
word eee
long fff
would waste a byte after ddd and 2 bytes after eee
Additionally by having all the longs up front, variable access to the first 8 of them uses less bytecode than a direct address..
It makes sense if you look at it from another angle.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
I have not read the whole thread and I may misunderstand what's being discussed here, but if you are talking about branch length optimization, the optimal code is to generate short branches and then iterate to lengthen as needed. Something like this:
change = true;
while (change) {
change = false;
for all short jump instructions
if (the target is out-of-reach)
change instr to long jmp
change = false;
}
There was a paper back in the 60s that demonstrates this generates optimal code in more cases than shortening long jmps. The implementation is left as an exercise You'll need to track the "PC" of the instr, and also a link to the target.
Hope this helps.
// richard
If you did that then I would still get the desired effect and waste a maximum of three bytes which shouldn't be too much. I would have to write a pre-processor to copy the required variables into the sub-object but that should be easy.
Let's aim for a completely binary compatible compiler first [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
That only works when DAT is in the top-most object ( main program ).
DAT comes after the object link table of the object it's in so if a sub-object wants to reserve
space at the very top of image there's no way for it to be done without editing the main
program to add that. The _SPACE in a sub-object must reserve space at the top of the image,
_SPACE should be accumulative. How multiple sub-objects all wanting to use that space sort
out what they use would need to be considered but that's no different to how multiple objects
share other memory they wish to reserve for themselves.
Oh.. I see now.. ugh that is positively bletcherous.. Doable though.. as you compile all the child objects first you can just tot up all the _space requests and offset the top DAT block by that amount. It's not actually difficult. I guess it becomes awkward when you start having multiple declarations of child objects though.
does
OBJ
fred[noparse][[/noparse]6] : "freddo"
.. reserve 6 times the _SPACE requirement that fred has or just the one?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!