Clarification on P2 OpCode Identification.

__red__ · 2018-10-13 20:22

I'm trying to better understand the P2ASM OpCodes and am confused by the block on lines #35-42 and #43-50

Picking one example:
EEEE 0100000 CZI DDDDDDDDD SSSSSSSSS TESTB D,{#}S WC/WZ Test bit S[4:0] of D, write to C/Z. C/Z = D[S[4:0]].
vs
EEEE 0100000 CZI DDDDDDDDD SSSSSSSSS BITL D,{#}S {WCZ} Bit S[4:0] of D = 0, C,Z = D[S[4:0]].

How does one differentiate between the two before performing the action in Column F of the spreadsheet?

Thanks,

Red

ozpropdev · 2018-10-13 21:04

Take note of the combinations of WC WZ and WCZ.

This table might help.

EEEE 0100000 00I DDDDDDDDD SSSSSSSSS        BITL    D,S/#
EEEE 0100000 01I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       WZ
EEEE 0100000 10I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       WC
EEEE 0100000 11I DDDDDDDDD SSSSSSSSS        BITL    D,S/#       WCZ

EEEE 0100001 00I DDDDDDDDD SSSSSSSSS        BITH    D,S/#
EEEE 0100001 01I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       WZ
EEEE 0100001 10I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       WC
EEEE 0100001 11I DDDDDDDDD SSSSSSSSS        BITH    D,S/#       WCZ

EEEE 0100010 00I DDDDDDDDD SSSSSSSSS        BITC    D,S/#
EEEE 0100010 01I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       ANDZ
EEEE 0100010 10I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       ANDC
EEEE 0100010 11I DDDDDDDDD SSSSSSSSS        BITC    D,S/#       WCZ

EEEE 0100011 00I DDDDDDDDD SSSSSSSSS        BITNC   D,S/#
EEEE 0100011 01I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       ANDZ
EEEE 0100011 10I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       ANDC
EEEE 0100011 11I DDDDDDDDD SSSSSSSSS        BITNC   D,S/#       WCZ

EEEE 0100100 00I DDDDDDDDD SSSSSSSSS        BITZ    D,S/#
EEEE 0100100 01I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       ORZ
EEEE 0100100 10I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       ORC
EEEE 0100100 11I DDDDDDDDD SSSSSSSSS        BITZ    D,S/#       WCZ

EEEE 0100101 00I DDDDDDDDD SSSSSSSSS        BITNZ   D,S/#
EEEE 0100101 01I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       ORZ
EEEE 0100101 10I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       ORC
EEEE 0100101 11I DDDDDDDDD SSSSSSSSS        BITNZ   D,S/#       WCZ

EEEE 0100110 00I DDDDDDDDD SSSSSSSSS        BITRND  D,S/#
EEEE 0100110 01I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       XORZ
EEEE 0100110 10I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       XORC
EEEE 0100110 11I DDDDDDDDD SSSSSSSSS        BITRND  D,S/#       WCZ

EEEE 0100111 00I DDDDDDDDD SSSSSSSSS        BITNOT  D,S/#
EEEE 0100111 01I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       XORZ
EEEE 0100111 10I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       XORC
EEEE 0100111 11I DDDDDDDDD SSSSSSSSS        BITNOT  D,S/#       WCZ

EEEE 1101011 00L DDDDDDDDD 001000000        DIRL    D/#
EEEE 1101011 01L DDDDDDDDD 001000000        TESTP   D/#         WZ
EEEE 1101011 10L DDDDDDDDD 001000000        TESTP   D/#         WC
EEEE 1101011 11L DDDDDDDDD 001000000        DIRL    D/#         WCZ

EEEE 1101011 00L DDDDDDDDD 001000001        DIRH    D/#
EEEE 1101011 01L DDDDDDDDD 001000001        TESTPN  D/#         WZ
EEEE 1101011 10L DDDDDDDDD 001000001        TESTPN  D/#         WC
EEEE 1101011 11L DDDDDDDDD 001000001        DIRH    D/#         WCZ

EEEE 1101011 00L DDDDDDDDD 001000010        DIRC    D/#
EEEE 1101011 01L DDDDDDDDD 001000010        TESTP   D/#         ANDZ
EEEE 1101011 10L DDDDDDDDD 001000010        TESTP   D/#         ANDC
EEEE 1101011 11L DDDDDDDDD 001000010        DIRC    D/#         WCZ

EEEE 1101011 00L DDDDDDDDD 001000011        DIRNC   D/#         {WCZ}
EEEE 1101011 01L DDDDDDDDD 001000011        TESTPN  D/#         ANDZ
EEEE 1101011 10L DDDDDDDDD 001000011        TESTPN  D/#         ANDC
EEEE 1101011 11L DDDDDDDDD 001000011        DIRNC   D/#         {WCZ}

EEEE 1101011 00L DDDDDDDDD 001000100        DIRZ    D/#
EEEE 1101011 01L DDDDDDDDD 001000100        TESTP   D/#         ORZ
EEEE 1101011 10L DDDDDDDDD 001000100        TESTP   D/#         ORC
EEEE 1101011 11L DDDDDDDDD 001000100        DIRZ    D/#         WCZ

EEEE 1101011 00L DDDDDDDDD 001000101        DIRNZ   D/#
EEEE 1101011 01L DDDDDDDDD 001000101        TESTPN  D/#         ORZ
EEEE 1101011 10L DDDDDDDDD 001000101        TESTPN  D/#         ORC
EEEE 1101011 11L DDDDDDDDD 001000101        DIRNZ   D/#         WCZ

EEEE 1101011 00L DDDDDDDDD 001000110        DIRRND  D/#
EEEE 1101011 01L DDDDDDDDD 001000110        TESTP   D/#         XORZ
EEEE 1101011 10L DDDDDDDDD 001000110        TESTP   D/#         XORC
EEEE 1101011 11L DDDDDDDDD 001000110        DIRRND  D/#         WCZ

EEEE 1101011 00L DDDDDDDDD 001000111        DIRNOT  D/#
EEEE 1101011 01L DDDDDDDDD 001000111        TESTPN  D/#         XORZ
EEEE 1101011 10L DDDDDDDDD 001000111        TESTPN  D/#         XORC
EEEE 1101011 11L DDDDDDDDD 001000111        DIRNOT  D/#         WCZ

__red__ · 2018-10-13 22:18

This looks like exactly the information I was looking for. Where did you find it? The official docs I was referring to don't have the CZI defined like that.

(Oh, I guess it could be logic puzzle implied by saying you won't have a test without storing the result or a bitl that stores to only one of them.)

Thanks ozpropdev!

Helps a ton.

I should have something to share soonish I hope which people may be amused by.

ozpropdev · 2018-10-13 22:23

I had to build this table when I was implementing a full disassembler in my debugger.
Glad it was helpful.

Cluso99 · 2018-10-14 00:56

I posted a shortened instruction summary in the tips and traps reference that also shows how the instructions fit in a map

__red__ · 2018-10-14 04:39

I went looking for that and was unable to find it. Could I trouble you for a direct link please?

ozpropdev · 2018-10-14 04:46

Try this
http://forums.parallax.com/discussion/comment/1448159/#Comment_1448159

__red__ · 2018-10-14 04:51

Thanks, you guys are awesome.

__red__ · 2018-10-15 07:40

So along similar lines:

EEEE 0110001   CZ 0   DDDDDDDDD  DDDDDDDDD   NOT     D        {WC/WZ/WCZ}
EEEE 0110001   CZ I   DDDDDDDDD  SSSSSSSSS   NOT     D,{#}S   {WC/WZ/WCZ}

Typically the I flag would determine if the # was present or not.

How can I decode the difference between D, S and D, #S?

Cluso99 · 2018-10-15 07:51

__red__ wrote: »
So along similar lines:
EEEE 0110001   CZ 0   DDDDDDDDD  DDDDDDDDD   NOT     D        {WC/WZ/WCZ}
EEEE 0110001   CZ I   DDDDDDDDD  SSSSSSSSS   NOT     D,{#}S   {WC/WZ/WCZ}
Typically the I flag would determine if the # was present or not.

How can I decode the difference between D, S and D, #S?

The difference is when an immediate source is used, I = 1.

For the first case here of NOT D {wc,wz,wcz} the compiler will actually use S=D and I=0
ie NOT D,D {wc,wz ,wcz}

evanh · 2018-10-15 07:52

The first one is an alias in the docs:

.	Math and Logic	EEEE 0110001 CZI DDDDDDDDD SSSSSSSSS	NOT     D,{#}S   {WC/WZ/WCZ}	Get !S into D. D = !S. C = !S[31]. *
alias	Math and Logic	EEEE 0110001 CZ0 DDDDDDDDD DDDDDDDDD	NOT     D        {WC/WZ/WCZ}	Get !D into D. D = !D. C = !D[31]. *

My best guess is it's when S==D, hence the weird register naming.

__red__ · 2018-10-15 08:15

Thanks guys, I guess I wasn't clear.

So if I = 0, NOT D. That's clear.

It's the difference between
NOT D, S
and
NOT D, #S

Maybe it's my understanding of the DDDDDDDDD DDDDDDDDD?

I thought that meant an 18 bit number. Is there an implication that that should be two 9 bit numbers that are always equal?
If so, what happens if I=1 and they are not?

Cluso99 · 2018-10-15 08:44

#S represents an immediate 9bit number, zero extended to 32 bits, and the I bit = 1. It is stored as 9 bits in the instructions SSSSSSSSS bits.
S represents the 9 bit address of a cog register which stores the 32 bit number, and the I bit = 0.

evanh · 2018-10-15 09:35

Red,
It's all one instruction. The alias mnemonic is just tidy way of showing register D being both source and destination. The extra 9 D's is a representative illusion, it's still 9 S's but with duplicate register number. The reason for the I=0 is because that specific case can only be true for register direct addressing.

evanh · 2018-10-15 09:47

Actually, the instruction is a bit unusual in that it is effectively one of the move instruction group. Unlike many RISC architectures, the Prop is only a two operand architecture, so generally has to perform duplication with specific move instructions. The gain for this trade-off is the larger general register set.

evanh · 2018-10-15 10:34

[deleted]

__red__ · 2018-10-15 14:38

Interestingly, I noticed just now that p2dump actually dumps it in the form D, D in cases.

My last check was 79% coverage this morning so making good progress.

My current peculiarity is decoding jmp/call/calld arguments and trying to end up with the same result as p2dump.

I read the docs so I understand that cog ram addresses as longs and hub ram as bytes and that apparently I have to increment PC by 1 (or 4) when doing relative calls.

I have about 5% of them ending up with results in the wilderness so I just need to apply more time to the problem.

My validation is literal text comparison between the two tools so every whitespace is special. (The disadvantage of this approach is that if there are any bugs in p2dump I will be explicitly replicating them)

I de-lurked my project on Twitter yesterday for those following me there. Right now all my code is in a testing namespace.

I'll post a project thread when I'm near 100% coverage and moved into a real namespace.

Dave Hein · 2018-10-15 14:58

p2dump doesn't make any attempt at using aliases. It will print out the disassembled instruction with both D and S even if they match. I haven't found any errors in the output from p2dump, but there's always a chance that there are some. I have validated the output from p2dump by running it through p2asm, and verifying that the resulting binary matches the original binary. I've even run mega-bytes of random data through p2dump, and get back the same data after running it through p2asm.

If you look in the util/verify directory you will see a few files that I use to do the verification. The script runall tests 33 different spin2 files, and runrand tests 10 Mbytes of random data. BTW, when the disassembler encounters data that produces invalid or ambiguous instructions it will just generate a LONG value with the original data.

__red__ · 2018-10-15 15:29

Thanks for all the help.

I'm impatiently waiting for the end of the work day so I can try and get to 100%

I'm doing my development on a Chromebook which adds a second level of fun to the builds.

__red__ · 2018-10-15 18:09

p2dump                                            | disasm
------------------------------------------------------------------------------------------------
0130 ad900010 if_z         jmp     #$+5           | 0130 ad900010 if_z         jmp     #$+$14
0150 5d900008 if_nz        jmp     #$+3           | 0150 5d900008 if_nz        jmp     #$+$c
0430 fd900000              jmp     #$+4             0430 fd900000              jmp     #$+4
0454 fd63ec2c              jmp     $1f6             0454 fd63ec2c              jmp     $1f6
0534 fd63ec2c              jmp     $1f6             0534 fd63ec2c              jmp     $1f6
073c fd90013c              jmp     #$+$140          073c fd90013c              jmp     #$+$140

Yup - confirmed. Apparently the algorithm for converting from numbers to addresses differs under 0x400 which makes sense as that's where the cog RAM / hub RAM boundary is.

__red__ · 2018-10-15 18:42

.

evanh · 2018-10-15 18:51

Oh, yes, yep. Addressing inside cogs is in longword sized steps. Whereas in hubram it's byte sized steps.

So, cogram ($200 longs) + lutram (another $200 longs) = 4 kB, but $400 bytes in hubram addressing is only 1 kB.

Therefore, hubexec operates from 1 kB up. cogexec below that.

__red__ · 2018-10-15 18:51

Sorry - I wanted to verify against the latest master before I asked about this.

I see this in the output:

07b4 fb041d28              long    $fb041d28        | 07b4 fb041d28              rdlong  $e, #$128
07e0 fb041d2c              long    $fb041d2c        | 07e0 fb041d2c              rdlong  $e, #$12c
0824 fb041d30              long    $fb041d30        | 0824 fb041d30              rdlong  $e, #$130
0890 fb041b34              long    $fb041b34        | 0890 fb041b34              rdlong  $d, #$134

I'm not saying mine is correct (it's likely not), but I'm not sure what to make of the results from p2dump either.

__red__ · 2018-10-15 18:53

evanh wrote: »

Therefore, hubexec operates from 1 kB up. cogexec below that.

In the end, the most complicated part of working out that equation was whether or not the opcode needed you to increase the PC or not when you did the divide. Some did, some didn't.

__red__ · 2018-10-15 18:56

Found the answer - sorry.

when the disassembler encounters data that produces invalid or ambiguous instructions it will just generate a LONG value with the original data.

Dave Hein · 2018-10-15 19:59

p2dump generates a LONG because it determines that $fb041d28 cannot be disassembled into an instruction that can be assembled back to the same value. Using an immediate S value greater than $FF is invalid with rdlong. The ninth bit is use to indicate a pointer reference. This instruction would normally be disassembled to "rdlong $e, ptra[8]", but the post-update bit is set without the update bit being set. So p2dump will default to generating a LONG instruction.

I use the same disassembler code in spinsim, and it will generate a "rdlong $e, ptra[8]" because it uses more relaxed rules. However, I should flag the instruction as questionable in spinsim.

By default, p2dump assumes that the first $400 bytes run in cog memory, and anything after that runs in hub RAM. This boundary can be adjusted using the -hub option. If you run p2dump without the -data option it will generate DAT, ORG and ORGH instructions, which allow the output file to be assembled by p2asm.

__red__ · 2018-10-21 22:35

Odd question.
I see that PTRA and PTRB are defined as pointers in HUB RAM.

I also see that cog RAM maps 0x1EF is COG Register, 0x200-0x3F0 COG LUT, 0x400-0xFFFFF is HUB RAM.

What happens if I try to jmp to ptra and ptra < 0x400?

ozpropdev · 2018-10-21 22:46

A jump to any address below $400 will be a cog/lut address.

Cluso99 · 2018-10-22 11:17

0x000-0x1FF is COG where 0x1F0-0x1FF are special registers
0x200-0x3FF is LUT
0x00000-0x00FFF is HUB but code cannot execute from here (HUB is addressed as bytes)
0x01000-0xFFFFF is HUB where code (hubexec) can be executed from
Note in the current P2 silicon HUB is only 512KB and the top 16KB is also mapped to the top of the 1MB address range

evanh · 2018-10-22 11:40

Cluso,
hubexec starts at $400

__red__ · 2018-10-27 04:01

I think this is a mistake in the instructions_v??.txt in the fpga package.

EEEE 1101011 00L DDDDDDDDD 000000111        LOCKREL D/#         {WC}

We can't check for C if we mandate that C is zero? right?

Clarification on P2 OpCode Identification.

Comments