Z80 -> PASM recompiler
Ale
Posts: 2,363
Hei all interested (and not so interested parties)
I was wondering about a re-compiler that would take a compiled program, analyze it and then emit propeller asm. The idea was that such a program would try to optimize the code and not just do a substitution. The point here being flags, they would be calculated only when needed... if an add follows another add then the flags for the first instruction does not need to be calculated and so on. A termination rule should of course exist how much in advance we look for their use.
I would start with some ROM code (the TRS-80 model 100 to be precise), I know it is for a 8085 but...
Any thoughts ?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Visit some of my articles at Propeller Wiki:
MATH on the propeller propeller.wikispaces.com/MATH
pPropQL: propeller.wikispaces.com/pPropQL
pPropQL020: propeller.wikispaces.com/pPropQL020
OMU for the pPropQL/020 propeller.wikispaces.com/OMU
pPropellerSim - A propeller simulator for ASM development sourceforge.net/projects/ppropellersim
I was wondering about a re-compiler that would take a compiled program, analyze it and then emit propeller asm. The idea was that such a program would try to optimize the code and not just do a substitution. The point here being flags, they would be calculated only when needed... if an add follows another add then the flags for the first instruction does not need to be calculated and so on. A termination rule should of course exist how much in advance we look for their use.
I would start with some ROM code (the TRS-80 model 100 to be precise), I know it is for a 8085 but...
Any thoughts ?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Visit some of my articles at Propeller Wiki:
MATH on the propeller propeller.wikispaces.com/MATH
pPropQL: propeller.wikispaces.com/pPropQL
pPropQL020: propeller.wikispaces.com/pPropQL020
OMU for the pPropQL/020 propeller.wikispaces.com/OMU
pPropellerSim - A propeller simulator for ASM development sourceforge.net/projects/ppropellersim
Comments
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
The main use of this technique was when companies just had to upgrade their computers because they were getting old and no longer supported and they had all these punch card decks of compiled programs without any matching source code. The newer computers weren't fast enough to do interpretation and still get reasonable throughput, but they could translate the program from one instruction set to another. The newer computers had several times the amount of storage available and the programs easily doubled or quadrupled in size, but they did work (mostly). It was not unusual for translated programs to require some hand "tuning" where the translator couldn't handle some code sequences and someone would have to look at the original machine code alongside of the translated code to figure out what was needed.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Visit some of my articles at Propeller Wiki:
MATH on the propeller propeller.wikispaces.com/MATH
pPropQL: propeller.wikispaces.com/pPropQL
pPropQL020: propeller.wikispaces.com/pPropQL020
OMU for the pPropQL/020 propeller.wikispaces.com/OMU
pPropellerSim - A propeller simulator for ASM development sourceforge.net/projects/ppropellersim
The first problem that comes to mind is this. Given a binary blob of code how do you know which bytes of it are instructions to be translated and which bytes are just data?
You can't just translate everything, your data, constants, strings etc are going to be mangled.
This is a problem already with just simple disassemblers.
One way around this is to write a simulator for the 8085 which runs the code. Obviously it does not run the data parts in normal non-self modifying code. As it runs it emits the assembler opcodes of the executed instructions along with their addresses into file containing the resulting disassembled listing.
Now assuming you have exercised all code paths through the program when you disassemble it with that simulator then you have the complete program in ASM. Now you write an assembler that generates PASM instructions (LMM) from that 8085 assembler language.
What about all the data, constants, strings etc. Somehow your 8085 to PASM assembler has to know where to put all these when generating the PASM.
So far all we know is that the bytes of the original binary blob which were not executed during the simulation are either:
a) Some kind of data area
b) Code that did not get exercised that time (bad)
Some how we have to analyse the disassembled code to see how it references that stuff to figure out where the data was and recreate it in the new PASM version.
This of course all breaks down when we find code that makes use of jump tables in fixed memory locations which will no longer be where they should be.
Then we just assemble the finished thing with BST. Easy !
Now, what happens if someone has self-modifying code in the original program? Don't forget a boot loader or CP/M loading a program is an example of self-modifying code. Or just moving code from ROM to RAM to gain speed when executed. The CP/M BIOS we use actually applies binary patches to the main CP/M files before it starts them !
When I came to that thought a year or two ago when someone else here asked about Z80 to PASM translation I realized my mind was not big enough to accommodate the problem [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
All what you say applies but:
The way I thought about it includes a code analyzer because that way we can optimize and we avoid the "what is code" question. (I did part of this some 10 years ago, just the code analyzer). The constants pool is something I haven't found a suitable answer for yet. Self-modifying code is another point that I for the time being left out because I wanted to test with code that I know runs from ROM.
I have to still think a bit more about the problems before I start coding something (where the question of language comes to play.... Ideally I'd like it to run in the Hive or similar (triblade, ...) but means coding loads of LMM code (I already modified pPropellerSim to support LMM code: long jumps, calls and so on).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Visit some of my articles at Propeller Wiki:
MATH on the propeller propeller.wikispaces.com/MATH
pPropQL: propeller.wikispaces.com/pPropQL
pPropQL020: propeller.wikispaces.com/pPropQL020
OMU for the pPropQL/020 propeller.wikispaces.com/OMU
pPropellerSim - A propeller simulator for ASM development sourceforge.net/projects/ppropellersim
BUT that assumes you have already created the code analyser which must answer that question as part of it's analysis.
That's the part that hurts my head anyway[noparse]:)[/noparse]
Now what about the problem of jump tables:
Each entry is 3 bytes, a call plus an address. They are used by getting an address of the table adding the required function offset times 3 and calling that resultant address.
Rewriting that table in PASM is bound to mess up the structure. Then your translator has to know how to fix the dispatch mechanism accordingly in every place it occurs.
CP/M likes tables like that. They, or similar, probably come up in all kinds of code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.