PASM Disassembler released
Shazz
Posts: 52
As advertised, here is my weekend-coded PASM disassembler. Java 6 JRE required.
tmpstore.free.fr/propeller/PASMDisassembler_v0.6.zip
tmpstore.free.fr/propeller/PASM.syn.
Example of usage :
java -jar "PASMDisassembler.jar" turbulence.eeprom -o turbulence.asm
java -jar .\dist\PASMDisassembler.jar -h -s 0x001C -e 0x0D0 -o turbulence.eeprom.pasm turbulence.eeprom
WARNING : don't use it to decompile SPIN code, only PASM !
If you find some issues, please tell me, I did not have time to write all JUnits to test all opcodes & conditions (JUnit with propAsm included)
I'll publish the sources for the curious soon. Source included
Not fully finished (no label generation, stuff like that) but really usable. Full EEPROM header will be processed in a later version. done.
And free gift, TextPad PASM syntax highlighter
Post Edited (Shazz) : 5/1/2009 3:47:18 PM GMT
tmpstore.free.fr/propeller/PASMDisassembler_v0.6.zip
tmpstore.free.fr/propeller/PASM.syn.
Example of usage :
java -jar "PASMDisassembler.jar" turbulence.eeprom -o turbulence.asm
java -jar .\dist\PASMDisassembler.jar -h -s 0x001C -e 0x0D0 -o turbulence.eeprom.pasm turbulence.eeprom
WARNING : don't use it to decompile SPIN code, only PASM !
If you find some issues, please tell me, I did not have time to write all JUnits to test all opcodes & conditions (JUnit with propAsm included)
I'll publish the sources for the curious soon. Source included
Not fully finished (no label generation, stuff like that) but really usable. Full EEPROM header will be processed in a later version. done.
And free gift, TextPad PASM syntax highlighter
Post Edited (Shazz) : 5/1/2009 3:47:18 PM GMT
Comments
Nevertheless here is an idea to give some more information:
By scanning the bytecode at least for the cognew/coginit instruction you would get a list of possible entry-points. But as data might also have the same bit-pattern this is only a hint and you still have to decide if the given adresses really could be an entry point.
To do it perfect the combination disassembler&disspiner is needed. And there you still have a lot of problems! Because of self modifying code it's not easy to decide wheather the section you currently look at is VAR, DAT or PUB.
Why do you output the source with 8 digits sometime? The source adress has 9 bits and will always fit into 3 digits.
Ok, the goal of this tool was to be a PASM disassembler only (so not to handle EEPROM with Spin code) as I wrote it not for fun but to disassemble Linus' turbulence demo (which is PASM only if I'm not wrong, everything is done "manually"). So yes it won't help for SPIN, that could be limited I agree.... but writing a spin decompiler is really another job I don't plan to do.
good point I can try to point the coginit target addresses and create dumb labels.(if the code is not self modified afterwards)
Yes... that confirms why I don't want to do it [noparse]:)[/noparse]
Because there are multiple things I still don't understand with the propeller :
- for example, with immediate value > 512 (so the size of the source register) where is the value stored ? I guess the assembler resets the i flag and replace the value by an address, right ? As I was not sure, I kept the 9 bits for the destination which is useless...
- is the instruction correct : 011111 0100 1010 000000000 011010011 ($7d2800d3) == IF_Z MUXNZ WC NR $0, $D3 ???
- if in the specs an instruction has for effects 001i and I try to assemble this instruction with flag NR, is it valid ? how will it be assembled ?
Ok you're right I can test it by myself... but I'd like to have the theoretical answer before [noparse]:)[/noparse]
Post Edited (Shazz) : 4/20/2009 12:25:04 PM GMT
bignum LONG $12345678
and don't us the hash:
mov somewhere, bignum
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
You have covered the TEST instruction using the ZCR field. Most PASM readers would expect a symbolic notation WZ,WC,NR instead of ZCR bits.
The PASM actually starts at cell 0 with a jump over the other header fields. Being able to parse the header and the coginit sequence minimally will go a long way to helping people understand the entry point. For example, header yields $18 as spin start.
Init·0018
Spin·0018·PUSH·····$00000000·/·+0
Spin·0019·PUSH·····$00000000·/·+0
Spin·001A·PUSH·····$00000000·/·+0
Spin·001B·COGISUB··$00000000·/·+0
Asmb·0000·Starts
NR just means the instruction will not modify the destination. This explains why AND and TEST have the same instruction code. It also helps to understand the first SHR [noparse]:)[/noparse]
I thought having the "i bit" set means that the instruction has an immediate source value. Destination can not be immediate no? It is not possible to have an immediate value or register cell > 511.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230
Post Edited (jazzed) : 4/20/2009 2:56:41 PM GMT
Yep, I realize the header is really helpful, at least to understand the relative addresses used in any instruction ! So I will add the header parsing (but I need first to understand all fields, not so clear : PBASE, VBASE, SBASE, PINIT, SINIT,.... cogisub ?)
then I guess it should help me to disassemble $04c4b400 as :
- the clock frequency (header0)
- IF_NC_AND_NZ RDWORD $05a, #$0 (in the program space)
- myVar long 80_000_000 (in the var space)
I can check that using the VBASE, right ?
moreover I just found the propasm assembler, it will help me to do some testing !
And thanks for the immediate tip, I did not thought that adding an immediate > 511 would make an assembler error [noparse]:)[/noparse] somewhat strange.... I would have imagiend yes that in this case the assembler would have translated (as assembler did on 68000 for example with short and long branches)
And ok I'll change the ZCR to full words [noparse]:)[/noparse]
And (and and and..) I did not notice for the TEST and AND ! right !
And I'll modify the stuff to show special registers (CNT, DIRA, OUTA...)
I want Linus depacker source code ! [noparse]:D[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230
Something tells us you'll be the one to figure it out first..
k.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New ICON coming, gotta wait for the INK to heal, now we have colour!
@Jazzed : thanks for the header, by the way can you briefly explain me how the values (taken from the header) will impact the code (like code sections and so on)
Moreover I have added a few stuffs (special registers,...) I'll release it as soon as it passes the 151 tests (taken from Hippy's proplist)
So I use Bill's propasm to on demand assemble listing then directly disassemble it and check the result is similar to the listing. And I got a first surprise :
in my mind, CMPSUB doesn't affect R (as in the specs) by default, so you have to specify WR. But Bill's assembler generates CMPSUB $123, $45 WR ($e0be4645) by default. Is it the normal behavior ? Is it a fix done by Bill to give sense to this instruction ?
CMPSUB is an odd one. Normally compare instructions don't affect the destination register by default. This one does (check prop manual or quickref). So I guess the WR has been added to make it absolutely clear that the destination is written. As it is the default behaviour for this instruction you don't have to.
I wonder if this should have been named SUBGE (subtract if greater or equal)?
HTH
Post Edited (kuroneko) : 4/22/2009 8:20:02 AM GMT
No, not yet, I'm focusing of my disassembler first [noparse]:)[/noparse] (and few hours to do it ) but if you can, I'll be more than pleased [noparse]:)[/noparse] )
about CMPSUB, I read in the manual :
ZCR=000 and Result: Not Written, so CMPSUB does not affect the result, no ?
and if the manual : "If the WR effect is specified, the result, if any, is written to Value1."
So I'm lost or propasm doesn't assemble as described.
SUBGE[noparse][[/noparse] /b]would have been more... understandable, yes [noparse]:)[/noparse] I can disassemble it like that [noparse]:D[/noparse]
See discussion here, the manual is in fact still wrong (sorry about that, I usually look stuff up in the QR or in the prop datasheet). The quick reference guide does say that WR is default behaviour (prop tool 1.2.5).
Damn! I thought the Errata was for 1.0 (and manual is 1.01) !!! So I did not read it.... Thanks Kuroneko... I'll update my stuff.
(Should I understand that you have already retro-engineered Linus' demo ???)
CMPSX $123, $45
propasm :
%110111 0000 1111 100100011 001000101 (SUBSX $123, $45 NR)
expected from specs:
%110001 0000 1111 100100011 001000101
any good reason ? CMPSX is similar to SUBSX if NR... but is it an optimization ?
NEGNC $123, $45 -> -s if C==0
propasm :
%101011 0010 1111 100100011 001000101 (ABSNEG -> -s )
expected from specs:
%101101 0010 1111 100100011 001000101
any good reason ? NEGNC is similar to ABSNEG if WC... but is it an optimization ?
So, it's more an optimization of understandability for the programmer which already knows CMPSX and SUBSX from other CPUs. If you would not have both assembler instructions it would be irritating and they would have to learn that the one is doing the same as the other with the only difference in the NR and WC flags.
But my disassembler is validated, first step done.
So, same question remains : can someone briefly explain me how the header values (s/v/p/base,s/p/init) will impact the code (like code / const / data sections and so on) ?
Post Edited (Shazz) : 4/22/2009 3:40:58 PM GMT
Hopefully a little more description of the headers will help ... not much else is necessary to
understand about spin to get a grip of the pasm. You're not writing a spin disassembler right?
' clkfreq is obviously not important to set in eeprom image
00 b400 ' clkfreq low
02 04c4 ' clkfreq hi
' key item: checksum is critical otherwise eeprom won't be loaded
04 ca6f ' sum byte, clkmode byte
06 0010 ' (obj) object start addr
08 005c ' (vars) variables start addr
0A 0088 ' (stk) stack start addr
' key item: place where spin bytecode starts
0C 002c ' (PUB) obchain first PUB method start
0E 008c ' (isp) initial stack pointer value
The only values that impact loading and PASM are checksum, clockmode, and first PUB start.
First PUB start is important because Spin starts PASM in a COG ....
In the turbulence case, only 4 bytecodes are required PUSH#0, PUSH#0, PUSH#0, COGINIT.
In spin, it would look like this:
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230
in decoding the Spin to find the CogInits to find the start of PASM code and in decoding PASM
or DAT 'long @label' constants, though I'm not sure there'd be enough semantic information to
even identify a constant as such.
For details I have on the header ... http://forums.parallax.com/attachment.php?attachmentid=48698
I disagree on the NEGNC/ABSNEG similarity. ABSNEG always gives you -|s| (no flag dependency), while NEGNC gives you either s or -s depending on the carry flag. Most importantly, -|s| is not always equal to -s.
If you're simply saying that propasm assembles NEGNC as ABSNEG then that's wrong.
I'm up to the point where it does something useful (i.e. I/O setup etc.) but haven't had time for a closer look yet.
1. the header. Ok I thought some data could be interesting to disassemble the PASM. So if I understood well, only for SPIN, so I don't mind (except for informations)
So in this case, let's assume this dumb code :
so currently it is badly disassembled because I don't recognize data (the 4 longs) :
so mask2 long $00FF_0000 and cps2 long 80_000_000 are disassembled as RDBYTE and RDWORD...
because I don't know that the code section as ended and I should consider them as data and not instructions.
moreover, if I want for code clarity replace MOV DIRA, $002 by MOV DIRA, data1
and set data1 to $00000008 (or $00000018 depending of the code section) how can I do that ????
I mean the ORG directives are gone of course so in this example how can I know there are 2 code sections and 2 data sections without the original source code ?
2. the "strange" propasm behavior
Kuroneko, with WR it is assembled like that by propasm :
CMPSX $123, $45 WC
%110111 0100 1111 100100011 001000101 => (SUBSX $123, $45 WC NR)
so that's wrong isn't it if I understand well your subsx :dst, :src wc,nr ' does not set carry, carry is only set for a signed underflow
Then I tried to assemble :
NEGNC $123, $45 WC -> s as C==1
%101011 0110 1111 100100011 001000101 (ABSNEG -> -s)
So it seems that this is REALLY wrong too....
3. Linus' demo
Eh eh eh [noparse]:)[/noparse] of course I'm continue my little work (first a good disassembly listing is needed !) but I hope you'll share with us your discoveries ! [noparse]:D[/noparse]
I don't quite get this part: WC -> s as C==1. Specifying WC doesn't make C==1, C has to be set as a pre-requisite (input) for this instruction, i.e. just by looking at this instruction (out of context) you can't tell what C is.
Attached is a copy of the hub RAM after the initial loader has finished (relevant changes are from 0x00D4 .. 0x0473, that's cog 0x035 .. 0x11C).
first: "This demo has to be stored on EEPROM. It uses the entire RAM for effects and loads code overI²C as needed".... eheh !!! Nice so you have the effect loader depacked ! I wonder how you dumped the Hub RAM at this point (did you write a kind of laoder similar to Chip's original bootloader ?)
Btw, I thought that Linus would have erase the hub RAM after the initial EEPROM loading... so seems he simply overwrites data first.
Good work ! I'll try to see what happens in this section..
EDIT : no easy to follow but I see a 2 COGINIT & COGSTOP, some code related to the display (VSCL, VCFG...),; some exchange with the I/Os ...
second : about WC <> C==1, because I really wondered... As I'm to used to instructions which has effects (WC, WZ..) I was not sure if instructions which have behaviors related to the flags were reading the prior flag state or use the effect... So thanks for the correction.
Post Edited (Shazz) : 4/23/2009 11:14:27 AM GMT
Can you post an update of your disassembler? Ta
Post Edited (kuroneko) : 4/23/2009 11:17:42 AM GMT
tmpstore.free.fr/propeller/PASMDisassembler_v0.3.zip
so to use it go to dist folder and type for example in command line mode :
java -jar PASMDisassembler.jar -s 0x040 -e 0x0480 turbulence.pass_1.eeprom > listing.asm
So... you understood the depacking algo (the loader, right ?) then write your own depacker in C ???? Right ??? Gosh.... Could you just tell me in the original EEPROM where does start/end the depacker code ?
So maybe you can fix the disassembly errors [noparse]:)[/noparse]
If there isn't too many errors, I'll comment this piece of code asap.
Btw, when you say "cog 0x007 .. 0x034' what does it mean ? where the code is coginit'ed to be executed ? (I have to understand more what happens during the coginit, address translation and stuff like that... basically how the COG code differ from the original hub RAM data)
Post Edited (Shazz) : 4/23/2009 11:47:17 AM GMT
BTW, why don't you disassemble this one? It's perfectly normal code ...
Post Edited (kuroneko) : 4/23/2009 12:31:33 PM GMT
$00000090 -> data1 LONG $c24
$00000098 (cell $26)-> code target for cell $2e JMP #$026 .. ok but when you jump to $026, what are you doing ? really RDBYTE $002, $024 ?
I'll fix the SHR, dunno why I print WR...
about
$0000024c $00000093 $a37c0000 LONG $a37c0000
010001 0101 1100 100011000 001000000
it should be :
IF_C MAXS $118,#$40 NC WC, right ?
But for me it is not a normal instruction, I thought that if an instruction has WR by default you cannot have NC.... But while writing this I realize that's dumb... else NC is useless...
So ok I'll fix that asap [noparse]:)[/noparse]
0xA37C0000 = 101000.1101.1111.000000000.000000000 = mov $0, #0 wc, wz, nr
BTW, what's NC?
Post Edited (kuroneko) : 4/23/2009 12:53:57 PM GMT
1 the laoder, ok I mixed my brains.... 0x026 is ok but what means the LONG $7d2800d3 between the 2 HUB RAM r/w instructions ? Can you declare some data in the middle of code ? (I'm not used to that.. but why not...)
2. Whatever if I badly mentally disassembled $a37c0000, the issue was the same, I thought that NR was illegal for an instruction which has by default WR... fixed in the next version.
for the moment so, if any effect is added to the instruction, I will print all the effects to avoid confusion, even if they are implied (ok I'm lazy).
and NC.... could have been No Carry Set but as no instruction writes C by default.. ok doesn't exist [noparse]:)[/noparse]
3. question, would it have sense in the disassembler to replace all source/dest $001, $002... by var1, var2,.. and add var1, var2 labels at cells ORG+$001, ORG+$002... ?
Same for MOV, JUMP, CALL.. replace #$xxx by labels ?
Post Edited (Shazz) : 4/23/2009 1:59:39 PM GMT
tmpstore.free.fr/propeller/PASMDisassembler_v0.4.zip
issue fixed with effects.
Question : For the loader, where is the coginit ? and so the hub address ? how do you know it's 0x07 ? and finished at 0x034 ? Managed by Chip's booloader ? by default coginit(0,0,?,?) ? so all the EEPROM content (at least the beginning) including the header was copied in the cog ? Then cells 0 to 6 are used for storing stuff ?
ex :
Turbulence loader :
Post Edited (Shazz) : 4/23/2009 4:18:39 PM GMT