LMM2 - Propeller 2 LMM experiments (50-80 LMM2 MIPS @ 160MHz)

cgracey · 2012-12-13 20:25

When a D register is given in REPD, the number of repeats will be 0 if D=0, which still means the code executes ONCE. If D=1, it will repeat once, making TWO executions of the looped code.

cgracey · 2012-12-13 20:27

David Betz wrote: »

Yes, but how do I know if it is actually repeating? If the REPS instruction were treated as a NOP the code would still work. Under what circumstances will that last JMP instruction be executed normally?

It will execute after 511 passes through the repeating code.

David Betz · 2012-12-13 20:29

In case you're interested, here is the disassembly of the fibo function which is where most of the time is spent.

000015d4 <_fibo>:
    15d4:	4cdafca0 			mov	1b4 <__TMP0>, #76 
    15d8:	52b4fc1c 			jmpret	168 <__LMM_PUSHM_ret>, #148 <__LMM_PUSHM> 
    15dc:	001afca0 			mov	34 <r13>, #0 
    15e0:	0018bca0 			mov	30 <r12>, 0 <__bootrom> 
    15e4:	001cbca0 			mov	38 <r14>, 0 <__bootrom> 
    15e8:	011c7c87 			cmp	38 <r14>, #1 wz, wc
    15ec:	1c22f880 	if_be		brs	1608 <_fibo+0x34> 
    15f0:	0e00bca0 			mov	0 <__bootrom>, 38 <r14> 
    15f4:	0100fc84 			sub	0 <__bootrom>, #1 
    15f8:	021cfc84 			sub	38 <r14>, #2 
    15fc:	4b007c1c 			jmp	#12c <__LMM_CALL> 
    1600:	d4150000 			nop
    1604:	001abc80 			add	34 <r13>, 0 <__bootrom> 
    1608:	2422fc84 			brs 	15e4 <_fibo+0x10> 
    160c:	0c00bca0 			mov	0 <__bootrom>, 30 <r12> 
    1610:	0100fc60 			and	0 <__bootrom>, #1 
    1614:	0d00bc80 			add	0 <__bootrom>, 34 <r13> 
    1618:	4fdafca0 			mov	1b4 <__TMP0>, #79 
    161c:	5cbefc1c 			jmpret	17c <__LMM_POPRET_ret>, #170 <__LMM_POPRET>

Ariba · 2012-12-13 20:32

David Betz wrote: »

Yes, but how do I know if it is actually repeating? If the REPS instruction were treated as a NOP the code would still work. Under what circumstances will that last JMP instruction be executed normally?

You should get the same benchmark result as with the jmpd version (same instruction count), or only a bit slower because of the additional jump after 511 repeats.

David Betz · 2012-12-13 20:37

Ariba wrote: »

You should get the same benchmark result as with the jmpd version (same instruction count), or only a bit slower because of the additional jump after 511 repeats.

Well, if that's the case, then something must be wrong. I'll keep looking. Thanks!

Ariba · 2012-12-13 20:39

REPS #511,#5 encodes to: $0DC3FC44 with the PNUT assembler.

So this should do the same:

__LMM_loop
        long $0DC3FC44
        nop
          rdlongc L_ins1,pc
          add pc,#4
          nop
L_ins1    nop
          nop
        jmp #__LMM_loop

David Betz · 2012-12-13 20:46

Ariba wrote: »

REPS #511,#5 encodes to: $0DC3FC44 with the PNUT assembler.

That's odd. The 4 in the last digit seems like it should mean 4 instructions not 5.
I have $0dc3fe48 for REPS #511,#8.

cgracey · 2012-12-13 20:46

In all cases of REPS/REPD, all values (constants and register contents) are such that 0 means 1, on upwards. If you put %000000 in the instructions-to-repeat field, it means 1. %111111 means 64. Same deal with the loop counts: 0 means 1 (because of the initial fall-through), while $FFFF_FFFF would mean $1_0000_0000 block executions.

Sapieha · 2012-12-13 20:47

Hi Chip.

Can You verify this.

Table 12: 'Extended Miscellaneous Flow Control Instructions

Machine Code                            |    Mnemonic    |    Operand    |    Operation
----------------------------------------------------------------------------------------------
000011 zcn 1 cccc nnnnnnnnn [B]0100iiiii [/B]      |    REPD        |    D/#n ,i    | Delayed repeat of the following i (0-31) instructions the
                                                                             | value in register D(0-511) or n(0-511) times.
                                                                             | The pipeline causes a delay of three instructions before
                           [B] ????     [/B]                                        | the repeated set of instructions begins to execute
000011 zcn 1 cccc nnnnnnnnn [COLOR=#ff0000][B]0100iiiii [/B][/COLOR]      |    REPS        |    D/#n ,i    | Repeat of the following i (0-31) instructions the
                                                                             | value in register D(0-511) or n(0-511) times.
000011 zcn 1 cccc nnnnnnnnn 010100000       |    NOPX        |    D/#n       | Repeat the NOP instruction the value in register D(0-511)
                                                                             | or n(0-511) times.
000011 zcn 1 cccc dddnnnnnn 011101011       |    SETSKIP     |    D/#n       | Executes up to the next 32 instructions as NOPs described by the
                                                                             | set bit pattern of a register D(0-511) or literal N(0-63).

There are two repeat instructions:

REPS #loops,#ins - executes early in the pipeline, uses a 14-bit constant, needs only one spacer instruction
REPD D,#ins -      executes late in the pipeline, uses D or a 9-bit constant, needs three spacer instructions,
                   if D is $1FF then infinite repeat

If REPS is used by a task that has at least one other task(s) between its own time slots, no spacers are needed.
If REPD is used by a task that has at least three other task(s) between its own time slots, no spacers are needed.

For infinite repeat, you must do REPD $1FF,#x. When the hardware sees register address $1FF, that means infinite.
When a D register is given in REPD, the number of repeats will be 0 if D=0, which still means the code executes ONCE.
 If D=1, it will repeat once, making TWO executions of the looped code. 



 Code:

Fast loading from HUB to COG ram can be done with just a few instructions:

 Load 64 longs from HUB memory (@PTRA) into COG-$100

    REPS    #64,#1
    SETINDA    #$100
    RDLONGC    INDA++,PTRA++

This way, you can load as much or as little as you please, to wherever in the COG you´d like.
 Then, you can jump to it.

David Betz · 2012-12-13 20:48

cgracey wrote: »

In all cases of REPS/REPD, all values (constants and register contents) are such that 0 means 1, on upwards. If you put %000000 in the instructions-to-repeat field, it means 1. %111111 means 64. Same deal with the loop counts: 0 means 1 (because of the initial fall-through), while $FFFF_FFFF would mean $1_0000_0000 block executions.

Ah, then that's my problem. Off by one in both fields. Thanks!!!!

David Betz · 2012-12-13 20:57

David Betz wrote: »

Ah, then that's my problem. Off by one in both fields. Thanks!!!!

Ugh. With that fix the LMM loop doesn't work at all. I think I must be too tired to think anymore today. I'll look at this again tomorrow.

David Betz · 2012-12-13 21:04

David Betz wrote: »

Ugh. With that fix the LMM loop doesn't work at all. I think I must be too tired to think anymore today. I'll look at this again tomorrow.

However, this loop works with better results:

__LMM_loop
        reps #511,#8
        nop
        rdlongc L_ins1,pc
        add pc,#4
        nop
L_ins1  nop
        rdlongc L_ins2,pc
        add pc,#4
        nop
L_ins2  nop
        jmp #__LMM_loop

Results:

fibo(26) = 121393 (890ms) (53425690 ticks)

Ariba · 2012-12-13 21:04

Yeah, for me its also time to go to bed - its 6 AM here...

Andy

Sapieha · 2012-12-13 21:06

Hi Andy.

I have same time.

But will not sleep in some more time

Ariba wrote: »

Yeah, for me its also time to go to bed - its 6 AM here...

Andy

David Betz · 2012-12-13 21:07

Ariba wrote: »

Yeah, for me its also time to go to bed - its 6 AM here...

Andy

I guess I'm a wimp. It's only midnight here.

David Betz · 2012-12-13 21:28

cgracey wrote: »

In all cases of REPS/REPD, all values (constants and register contents) are such that 0 means 1, on upwards. If you put %000000 in the instructions-to-repeat field, it means 1. %111111 means 64. Same deal with the loop counts: 0 means 1 (because of the initial fall-through), while $FFFF_FFFF would mean $1_0000_0000 block executions.

I've been thinking about this and it seems that PNut.exe must subtract 1 from the value supplied for the immediate before encoding it into the instruction because Andy gave the following example:

REPS #511,#5 encodes to: $0DC3FC44 with the PNUT assembler.

This seems like it might be awkward if the same rule is applied to the REPD instruction since it can take the repeat count in a register. Will the register's contents also be decremented by 1 before being used as the repeat count? If not then the value you need to load into the register for the D form will have to be different than the value you supply as a constant in the immediate form for the same loop count.

cgracey · 2012-12-13 22:09

Sapieha wrote: »

Hi Chip.

Can You verify this...

Here are the current encodings:

000011 Z00 1 CCCC 111111111 001iiiiii		REPD	#i			(infinite repeat)
000011 Z00 1 CCCC nnnnnnnnn 001iiiiii		REPD	D,#i
000011 Z01 1 CCCC nnnnnnnnn 001iiiiii		REPD	#n,#i
000011 n11 1 nnnn nnnnnnnnn 001iiiiii		REPS	#n,#i

I'm going to try to write the documentation for this right now and post it in an hour, or so. As of now, PNUT.EXE doesn't support the 'REPD #i' syntax for infinite repeat, so you must type 'REPD $1FF,#i'.

cgracey · 2012-12-13 22:13

David Betz wrote: »

I've been thinking about this and it seems that PNut.exe must subtract 1 from the value supplied for the immediate before encoding it into the instruction because Andy gave the following example:

This seems like it might be awkward if the same rule is applied to the REPD instruction since it can take the repeat count in a register. Will the register's contents also be decremented by 1 before being used as the repeat count? If not then the value you need to load into the register for the D form will have to be different than the value you supply as a constant in the immediate form for the same loop count.

That's right, the way the constant is interpreted and the way the register's contents are interpreted for REPD loops is the same: 0 -> 1 loop (actually, no loops - just the initial fall through).

Sapieha · 2012-12-13 22:15

Hi Chip.

Thanks

cgracey wrote: »
Here are the current encodings:
000011 Z00 1 CCCC 111111111 001iiiiii        REPD    #i            (infinite repeat)
000011 Z00 1 CCCC nnnnnnnnn 001iiiiii        REPD    D,#i
000011 Z01 1 CCCC nnnnnnnnn 001iiiiii        REPD    #n,#i
000011 n11 1 nnnn nnnnnnnnn 001iiiiii        REPS    #n,#i
I'm going to try to write the documentation for this right now and post it in an hour, or so. As of now, PNUT.EXE doesn't support the 'REPD #i' syntax for infinite repeat, so you must type 'REPD $1FF,#i'.

Cluso99 · 2012-12-14 01:13

cgracey wrote: »
Here are the current encodings:
000011 Z00 1 CCCC 111111111 001iiiiii        REPD    #i            (infinite repeat)
000011 Z00 1 CCCC nnnnnnnnn 001iiiiii        REPD    D,#i
000011 Z01 1 CCCC nnnnnnnnn 001iiiiii        REPD    #n,#i
000011 n11 1 nnnn nnnnnnnnn 001iiiiii        REPS    #n,#i
I'm going to try to write the documentation for this right now and post it in an hour, or so. As of now, PNUT.EXE doesn't support the 'REPD #i' syntax for infinite repeat, so you must type 'REPD $1FF,#i'.

Chip: How can you use 6 "i" bits because the other instructions cannot decode this???

[TABLE="width: 461"]
[TR]
[TD]000011
[/TD]
[TD]z0n
[/TD]
[TD]1
[/TD]
[TD]cccc
[/TD]
[TD]nnnnnnnnn
[/TD]
[TD]010[COLOR=#ff0000]0[/COLOR][FONT=courier new]iiiii[/FONT]
[/TD]
[TD]REPD
[/TD]
[TD]D/#n,#i
[/TD]
[/TR]
[TR]
[TD]000011
[/TD]
[TD]n11
[/TD]
[TD]1
[/TD]
[TD]nnnn
[/TD]
[TD]nnnnnnnnn
[/TD]
[TD]010[COLOR=#ff0000]0[/COLOR][FONT=courier new]iiiii[/FONT]
[/TD]
[TD]REPS
[/TD]
[TD]#n,#i
[/TD]
[/TR]
[TR]
[TD]000011
[/TD]
[TD]zcn
[/TD]
[TD]1
[/TD]
[TD]cccc
[/TD]
[TD]nnnnnnnnn
[/TD]
[TD]010[COLOR=#ff0000]1[/COLOR]00000
[/TD]
[TD]NOPX
[/TD]
[TD]D/#n
[/TD]
[/TR]
[TR]
[TD]000011
[/TD]
[TD]zcn
[/TD]
[TD]1
[/TD]
[TD]cccc
[/TD]
[TD]nnnnnnnnn
[/TD]
[TD]010[COLOR=#ff0000]1[/COLOR]00001
[/TD]
[TD]SETZC
[/TD]
[TD]D/#n,#i
[/TD]
[/TR]
[TR]
[TD]000011
[/TD]
[TD]zcn
[/TD]
[TD]1
[/TD]
[TD]cccc
[/TD]
[TD]dnnnnnnnn
[/TD]
[TD]010[COLOR=#ff0000]1[/COLOR]00010
[/TD]
[TD]SETSPA
[/TD]
[TD]D/#n
[/TD]
[/TR]
[TR]
[TD]000011
[/TD]
[TD]zcn
[/TD]
[TD]1
[/TD]
[TD]cccc
[/TD]
[TD]dnnnnnnnn
[/TD]
[TD]010[COLOR=#ff0000]1[/COLOR]00011
[/TD]
[TD]SETSPB
[/TD]
[TD]D/#n
[/TD]
[/TR]
[/TABLE]


etc...

This is my writeup (lost a bit of formatting in the past)...

[TABLE="width: 725"]
[TR]
[TD]REPD
[/TD]
[TD]D/#n,#i
[/TD]
[TD]Repeat D/#n loops of following #i[4..0] instruct's    Delay(3)
Jump terminates. REPD $1FF,#i = infinite.             (r=1 for #n) 
[I]Note: Compiler subtracts 1 from #n & #i since 0->1 execution/instr 
Warning: The instructions will be executed D+1 times! [/I]
[/TD]
[/TR]
[TR]
[TD]REPS
[/TD]
[TD]#n,#i
[/TD]
[TD]Repeat #n[13..0] loops of following #i[4..0] instr's  Delay(3)      
Jump terminates.                                       
[I]Note: Compiler subtracts 1 from #n & #i since 0->1 execution/instr[/I]
[/TD]
[/TR]
[/TABLE]

cgracey · 2012-12-14 01:36

Cluso99 wrote: »

Chip: How can you use 6 "i" bits because the other instructions cannot decode this???

Opcodes got changed around since that document was written. Here are the new encodings:

instruction					mnem	operand
-------------------------------------------------------------------------------------------------
000000 ZC0 I CCCC DDDDDDDDD SUPIIIIII		WRBYTE	D,S/PTR		(waits for hub)
000000 Z01 I CCCC DDDDDDDDD SUPIIIIII		RDBYTE	D,S/PTR		(waits for hub)
000000 Z11 I CCCC DDDDDDDDD SUPIIIIII		RDBYTEC	D,S/PTR		(waits for hub if cache miss)

000001 ZC0 I CCCC DDDDDDDDD SUPIIIIII		WRWORD	D,S/PTR		(waits for hub)
000001 Z01 I CCCC DDDDDDDDD SUPIIIIII		RDWORD	D,S/PTR		(waits for hub)
000001 Z11 I CCCC DDDDDDDDD SUPIIIIII		RDWORDC	D,S/PTR		(waits for hub if cache miss)

000010 ZC0 I CCCC DDDDDDDDD SUPIIIIII		WRLONG	D,S/PTR		(waits for hub)
000010 Z01 I CCCC DDDDDDDDD SUPIIIIII		RDLONG	D,S/PTR		(waits for hub)
000010 Z11 I CCCC DDDDDDDDD SUPIIIIII		RDLONGC	D,S/PTR		(waits for hub if cache miss)

000011 ZCR 0 CCCC DDDDDDDDD SSSSSSSSS		COGINIT	D,S		(waits for hub)


000011 ZCR 1 CCCC DDDDDDDDD 000000000		CLKSET	D		(waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000001		COGID	D		(waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000010	      ( COGINIT	D )		(waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000011		COGSTOP	D		(waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000100		LOCKNEW	D		(waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000101		LOCKRET	D		(waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000110		LOCKSET	D		(waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000111		LOCKCLR	D		(waits for hub)

000011 ZCR 1 CCCC 000000000 000001000		CACHEX
000011 ZCR 1 CCCC 000000001 000001000		CLRACCA
000011 ZCR 1 CCCC 000000010 000001000		CLRACCB
000011 ZCR 1 CCCC 000000011 000001000		CLRACCS
000011 ZCR 1 CCCC 000000101 000001000		FITACCA			(waits for mac)
000011 ZCR 1 CCCC 000000110 000001000		FITACCB			(waits for mac)
000011 ZCR 1 CCCC 000000111 000001000		FITACCS			(waits for mac)

000011 ZC0 1 CCCC DDDDDDDDD 000001001		SNDSER	D		(waits for tx if !wc)
000011 ZC1 1 CCCC DDDDDDDDD 000001001		RCVSER	D		(waits for rx if !wc)

000011 ZCR 1 CCCC DDDDDDDDD 000001010		PUSHZC	D
000011 ZCR 1 CCCC DDDDDDDDD 000001011		POPZC	D

000011 ZCR 1 CCCC DDDDDDDDD 000001100		SUBCNT	D		(subtracts D from cnt[31:0], then cntl if same thread)
000011 ZC0 1 CCCC DDDDDDDDD 000001101		PASSCNT	D		(loops if (cnt[31:0] - D) msb set)
000011 ZC1 1 CCCC DDDDDDDDD 000001101		GETCNT	D		(gets cnt[31:0], then cntl if same thread)
000011 ZCR 1 CCCC DDDDDDDDD 000001110		GETACCA	D		(gets acca[31:0], then acca[63:32], waits for mac)
000011 ZCR 1 CCCC DDDDDDDDD 000001111		GETACCB	D		(gets accb[31:0], then accb[63:32], waits for mac)

000011 ZCR 1 CCCC DDDDDDDDD 000010000		GETLFSR	D
000011 ZCR 1 CCCC DDDDDDDDD 000010001		GETTOPS	D		(GETTOPS wc,nr = POLVID wc)
000011 ZCR 1 CCCC DDDDDDDDD 000010010		GETPTRA	D
000011 ZCR 1 CCCC DDDDDDDDD 000010011		GETPTRB	D

000011 ZCR 1 CCCC DDDDDDDDD 000010100		GETPIX	D		(waits two clocks)
000011 ZCR 1 CCCC DDDDDDDDD 000010101		GETSPD	D
000011 ZCR 1 CCCC DDDDDDDDD 000010110		GETSPA	D
000011 ZCR 1 CCCC DDDDDDDDD 000010111		GETSPB	D

000011 ZCR 1 CCCC DDDDDDDDD 000011000		POPAR	D
000011 ZCR 1 CCCC DDDDDDDDD 000011001		POPBR	D
000011 ZCR 1 CCCC DDDDDDDDD 000011010		POPA	D
000011 ZCR 1 CCCC DDDDDDDDD 000011011		POPB	D
000011 ZCR 1 CCCC 000000000 000011100		RETA
000011 ZCR 1 CCCC 000000000 000011101		RETB
000011 ZCR 1 CCCC 000000000 000011110		RETAD
000011 ZCR 1 CCCC 000000000 000011111		RETBD

000011 ZCR 1 CCCC DDDDDDDDD 000100000		DECOD2	D
000011 ZCR 1 CCCC DDDDDDDDD 000100001		DECOD3	D
000011 ZCR 1 CCCC DDDDDDDDD 000100010		DECOD4	D
000011 ZCR 1 CCCC DDDDDDDDD 000100011		DECOD5	D
000011 ZCR 1 CCCC DDDDDDDDD 000100100		BLMASK	D
000011 ZCR 1 CCCC DDDDDDDDD 000100101		NOT	D
000011 ZCR 1 CCCC DDDDDDDDD 000100110		ONECNT	D		(waits one clock)
000011 ZCR 1 CCCC DDDDDDDDD 000100111		ZERCNT	D		(waits one clock)
000011 ZCR 1 CCCC DDDDDDDDD 000101000		INCPAT	D		(waits three clocks)
000011 ZCR 1 CCCC DDDDDDDDD 000101001		DECPAT	D		(waits three clocks)
000011 ZCR 1 CCCC DDDDDDDDD 000101010		BINGRY	D
000011 ZCR 1 CCCC DDDDDDDDD 000101011		GRYBIN	D		(waits one clock)
000011 ZCR 1 CCCC DDDDDDDDD 000101100		MERGEW	D
000011 ZCR 1 CCCC DDDDDDDDD 000101101		SPLITW	D
000011 ZCR 1 CCCC DDDDDDDDD 000101110		SEUSSF	D
000011 ZCR 1 CCCC DDDDDDDDD 000101111		SEUSSR	D

000011 ZCR 1 CCCC DDDDDDDDD 000110000		GETMULL	D		(waits for mul if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110001		GETMULH	D		(waits for mul if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110010		GETDIVQ	D		(waits for div if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110011		GETDIVR	D		(waits for div if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110100		GETSQRT	D		(waits for sqrt if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110101		GETQX	D		(waits for cordic if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110110		GETQY	D		(waits for cordic if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110111		GETQZ	D		(waits for cordic if !wc)

000011 ZCR 1 CCCC DDDDDDDDD 000111000		GETPHSA	D		(GETPHSA wc,nr = POLCTRA wc)
000011 ZCR 1 CCCC DDDDDDDDD 000111001		GETPHZA	D		(clears phsa)
000011 ZCR 1 CCCC DDDDDDDDD 000111010		GETCOSA	D
000011 ZCR 1 CCCC DDDDDDDDD 000111011		GETSINA	D

000011 ZCR 1 CCCC DDDDDDDDD 000111100		GETPHSB	D		(GETPHSB wc,nr = POLCTRB wc)
000011 ZCR 1 CCCC DDDDDDDDD 000111101		GETPHZB	D		(clears phsb)
000011 ZCR 1 CCCC DDDDDDDDD 000111110		GETCOSB	D
000011 ZCR 1 CCCC DDDDDDDDD 000111111		GETSINB	D

000011 Z00 1 CCCC 111111111 001iiiiii		REPD	#i		(infinite repeat)
000011 Z0N 1 CCCC nnnnnnnnn 001iiiiii		REPD	D/#n,#i
000011 n11 1 nnnn nnnnnnnnn 001iiiiii		REPS	#n,#i

000011 ZCN 1 CCCC nnnnnnnnn 01000----		<empty>

000011 ZCN 1 CCCC nnnnnnnnn 01001tttt		JMPTASK	D/#n,#t

000011 ZCN 1 CCCC nnnnnnnnn 010100000		NOPX	D/#n		(waits)
000011 ZCN 1 CCCC nnnnnnnnn 010100001		SETZC	D/#n		(d[1:0] into z/c via wz/wc)
000011 ZCN 1 CCCC Dnnnnnnnn 010100010		SETSPA	D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100011		SETSPB	D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100100		ADDSPA	D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100101		ADDSPB	D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100110		SUBSPA	D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100111		SUBSPB	D/#n

000011 ZCN 1 CCCC nnnnnnnnn 010101000		PUSHAR	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101001		PUSHBR	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101010		PUSHA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101011		PUSHB	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101100		CALLA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101101		CALLB	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101110		CALLAD	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101111		CALLBD	D/#n

000011 ZCN 1 CCCC SUPIIIIII 010110000		WRQUAD	D/PTR		(waits for hub)
000011 Z0N 1 CCCC SUPIIIIII 010110001		RDQUAD	D/PTR		(waits for hub)
000011 Z1N 1 CCCC SUPIIIIII 010110001		RDQUADC	D/PTR		(waits for hub if cache miss)
000011 ZCN 1 CCCC nnnnnnnnn 010110010		SETPTRA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110011		SETPTRB	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110100		ADDPTRA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110101		ADDPTRB	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110110		SUBPTRA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110111		SUBPTRB	D/#n

000011 ZCN 1 CCCC nnnnnnnnn 010111000		SETPIX	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111001		SETPIXU	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111010		SETPIXV	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111011		SETPIXZ	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111100		SETPIXA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111101		SETPIXR	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111110		SETPIXG	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111111		SETPIXB	D/#n

000011 Z0N 1 CCCC nnnnnnnnn 011000000		SETMULU	D/#n
000011 Z1N 1 CCCC nnnnnnnnn 011000000		SETMULA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000001		SETMULB	D/#n
000011 Z0N 1 CCCC nnnnnnnnn 011000010		SETDIVU	D/#n		(loads [31:0], then [63:32])
000011 Z1N 1 CCCC nnnnnnnnn 011000010		SETDIVA	D/#n		(loads [31:0], then [63:32])
000011 ZCN 1 CCCC nnnnnnnnn 011000011		SETDIVB	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000100		SETSQRH	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000101		SETSQRL	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000110		SETQI	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000111		SETQZ	D/#n

000011 ZCN 1 CCCC nnnnnnnnn 011001000		QLOG	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011001001		QEXP	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011001010		SETF	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011001011		SETTASK	D/#n

000011 ZCN 1 CCCC DDDDDDDnn 011001100		CFGDAC0	D/#n
000011 ZCN 1 CCCC DDDDDDDnn 011001101		CFGDAC1	D/#n
000011 ZCN 1 CCCC DDDDDDDnn 011001110		CFGDAC2	D/#n
000011 ZCN 1 CCCC DDDDDDDnn 011001111		CFGDAC3	D/#n

000011 ZCN 1 CCCC nnnnnnnnn 011010000		SETDAC0	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011010001		SETDAC1	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011010010		SETDAC2	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011010011		SETDAC3	D/#n

000011 ZCN 1 CCCC Dnnnnnnnn 011010100		CFGDACS	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011010101		SETDACS	D/#n

000011 ZCN 1 CCCC DDnnnnnnn 011010110		GETP	D/#n		(pin into !z/c via wz/wc)
000011 ZCN 1 CCCC DDnnnnnnn 011010111		GETNP	D/#n		(pin into z/!c via wz/wc)

000011 ZCN 1 CCCC DDnnnnnnn 011011000		OFFP	D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011001		NOTP	D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011010		CLRP	D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011011		SETP	D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011100		SETPC	D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011101		SETPNC	D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011110		SETPZ	D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011111		SETPNZ	D/#n

000011 ZCN 1 CCCC DDDDDnnnn 011100000		SETCOG	D/#n
000011 ZCN 1 CCCC DDDnnnnnn 011100001		SETMAP	D/#n
000011 Z0N 1 CCCC nnnnnnnnn 011100010		SETQUAD	D/#n
000011 Z1N 1 CCCC nnnnnnnnn 011100010		SETQUAZ	D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100011		SETPORT	D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100100		SETPORA	D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100101		SETPORB	D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100110		SETPORC	D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100111		SETPORD	D/#n

000011 ZCN 1 CCCC nnnnnnnnn 011101000		SETXCH	D/#n
000011 ZCN 1 CCCC DDDnnnnnn 011101001		SETXFR	D/#n
000011 ZCN 1 CCCC DDDDDDDDD 011101010		SETSER	D/#n
000011 ZCN 1 CCCC DDDnnnnnn 011101011		SETSKIP	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011101100		SETVID	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011101101		SETVIDY	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011101110		SETVIDI	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011101111		SETVIDQ	D/#n

000011 ZCN 1 CCCC nnnnnnnnn 011110000		SETCTRA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110001		SETWAVA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110010		SETFRQA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110011		SETPHSA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110100		ADDPHSA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110101		SUBPHSA	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110110		SYNCTRA			(waits for ctra)
000011 ZCN 1 CCCC nnnnnnnnn 011110111		CAPCTRA

000011 ZCN 1 CCCC nnnnnnnnn 011111000		SETCTRB	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111001		SETWAVB	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111010		SETFRQB	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111011		SETPHSB	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111100		ADDPHSB	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111101		SUBPHSB	D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111110		SYNCTRB			(waits for ctrb)
000011 ZCN 1 CCCC nnnnnnnnn 011111111		CAPCTRB

000011 ZCR 1 CCCC DDDDDDDDD 1000bbbbb		ISOB	D,b
000011 ZCR 1 CCCC DDDDDDDDD 1001bbbbb		NOTB	D,b
000011 ZCR 1 CCCC DDDDDDDDD 1010bbbbb		CLRB	D,b
000011 ZCR 1 CCCC DDDDDDDDD 1011bbbbb		SETB	D,b
000011 ZCR 1 CCCC DDDDDDDDD 1100bbbbb		SETBC	D,b
000011 ZCR 1 CCCC DDDDDDDDD 1101bbbbb		SETBNC	D,b
000011 ZCR 1 CCCC DDDDDDDDD 1110bbbbb		SETBZ	D,b
000011 ZCR 1 CCCC DDDDDDDDD 1111bbbbb		SETBNZ	D,b

000100 000 I CCCC DDDDDDDDD SSSSSSSSS		SETACCA	D,S
000100 010 I CCCC DDDDDDDDD SSSSSSSSS		SETACCB	D,S
000100 100 I CCCC DDDDDDDDD SSSSSSSSS		MACA	D,S
000100 110 I CCCC DDDDDDDDD SSSSSSSSS		MACB	D,S
000100 ZC1 I CCCC DDDDDDDDD SSSSSSSSS		MUL	D,S		(waits one clock)

000101 000 I CCCC DDDDDDDDD SSSSSSSSS		MOVF	D,S
000101 010 I CCCC DDDDDDDDD SSSSSSSSS		QSINCOS	D,S
000101 100 I CCCC DDDDDDDDD SSSSSSSSS		QARCTAN	D,S
000101 110 I CCCC DDDDDDDDD SSSSSSSSS		QROTATE	D,S
000101 ZC1 I CCCC DDDDDDDDD SSSSSSSSS		SCL	D,S		(waits one clock)

000110 ZCR I CCCC DDDDDDDDD SSSSSSSSS		ENC	D,S
000111 ZCR I CCCC DDDDDDDDD SSSSSSSSS		JMPRET	D,S

001000 ZCR I CCCC DDDDDDDDD SSSSSSSSS		ROR	D,S
001001 ZCR I CCCC DDDDDDDDD SSSSSSSSS		ROL	D,S
001010 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SHR	D,S
001011 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SHL	D,S
001100 ZCR I CCCC DDDDDDDDD SSSSSSSSS		RCR	D,S
001101 ZCR I CCCC DDDDDDDDD SSSSSSSSS		RCL	D,S
001110 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SAR	D,S
001111 ZCR I CCCC DDDDDDDDD SSSSSSSSS		REV	D,S

010000 ZCR I CCCC DDDDDDDDD SSSSSSSSS		MINS	D,S
010001 ZCR I CCCC DDDDDDDDD SSSSSSSSS		MAXS	D,S
010010 ZCR I CCCC DDDDDDDDD SSSSSSSSS		MIN	D,S
010011 ZCR I CCCC DDDDDDDDD SSSSSSSSS		MAX	D,S
010100 ZCR I CCCC DDDDDDDDD SSSSSSSSS		MOVS	D,S
010101 ZCR I CCCC DDDDDDDDD SSSSSSSSS		MOVD	D,S
010110 ZCR I CCCC DDDDDDDDD SSSSSSSSS		MOVI	D,S
010111 ZCR I CCCC DDDDDDDDD SSSSSSSSS		JMPRETD	D,S

011000 ZCR I CCCC DDDDDDDDD SSSSSSSSS		AND	D,S
011001 ZCR I CCCC DDDDDDDDD SSSSSSSSS		ANDN	D,S
011010 ZCR I CCCC DDDDDDDDD SSSSSSSSS		OR	D,S
011011 ZCR I CCCC DDDDDDDDD SSSSSSSSS		XOR	D,S
011100 ZCR I CCCC DDDDDDDDD SSSSSSSSS		MUXC	D,S
011101 ZCR I CCCC DDDDDDDDD SSSSSSSSS		MUXNC	D,S
011110 ZCR I CCCC DDDDDDDDD SSSSSSSSS		MUXZ	D,S
011111 ZCR I CCCC DDDDDDDDD SSSSSSSSS		MUXNZ	D,S

100000 ZCR I CCCC DDDDDDDDD SSSSSSSSS		ADD	D,S
100001 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SUB	D,S
100010 ZCR I CCCC DDDDDDDDD SSSSSSSSS		ADDABS	D,S
100011 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SUBABS	D,S
100100 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SUMC	D,S
100101 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SUMNC	D,S
100110 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SUMZ	D,S
100111 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SUMNZ	D,S

101000 ZCR I CCCC DDDDDDDDD SSSSSSSSS		MOV	D,S
101001 ZCR I CCCC DDDDDDDDD SSSSSSSSS		NEG	D,S
101010 ZCR I CCCC DDDDDDDDD SSSSSSSSS		ABS	D,S
101011 ZCR I CCCC DDDDDDDDD SSSSSSSSS		ABSNEG	D,S
101100 ZCR I CCCC DDDDDDDDD SSSSSSSSS		NEGC	D,S
101101 ZCR I CCCC DDDDDDDDD SSSSSSSSS		NEGNC	D,S
101110 ZCR I CCCC DDDDDDDDD SSSSSSSSS		NEGZ	D,S
101111 ZCR I CCCC DDDDDDDDD SSSSSSSSS		NEGNZ	D,S

110000 ZCR I CCCC DDDDDDDDD SSSSSSSSS		CMPS	D,S
110001 ZCR I CCCC DDDDDDDDD SSSSSSSSS		CMPSX	D,S
110010 ZCR I CCCC DDDDDDDDD SSSSSSSSS		ADDX	D,S
110011 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SUBX	D,S
110100 ZCR I CCCC DDDDDDDDD SSSSSSSSS		ADDS	D,S
110101 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SUBS	D,S
110110 ZCR I CCCC DDDDDDDDD SSSSSSSSS		ADDSX	D,S
110111 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SUBSX	D,S

111000 ZCR I CCCC DDDDDDDDD SSSSSSSSS		SUBR	D,S
111001 ZCR I CCCC DDDDDDDDD SSSSSSSSS		CMPSUB	D,S
111010 ZCR I CCCC DDDDDDDDD SSSSSSSSS		INCMOD	D,S
111011 ZCR I CCCC DDDDDDDDD SSSSSSSSS		DECMOD	D,S

111000 000 I BBAA DDDDDDDDD SSSSSSSSS		SETINDx	D,S		(SETINDA S   / SETINDB D   / SETINDS D,S)
111001 000 I 0B0A DDDDDDDDD SSSSSSSSS		FIXINDx	D,S		(FIXINDA D,S / FIXINDB D,S / FIXINDS D,S)
111010 000 I CCCC DDDDDDDDD SSSSSSSSS		CFGPINS	D,S		(waits for alt)
111011 000 I CCCC DDDDDDDDD SSSSSSSSS		WAITVID	D,S		(waits for vid)

111100 00R I CCCC DDDDDDDDD SSSSSSSSS		IJZ	D,S
111100 01R I CCCC DDDDDDDDD SSSSSSSSS		IJZD	D,S
111100 10R I CCCC DDDDDDDDD SSSSSSSSS		IJNZ	D,S
111100 11R I CCCC DDDDDDDDD SSSSSSSSS		IJNZD	D,S

111101 00R I CCCC DDDDDDDDD SSSSSSSSS		DJZ	D,S
111101 01R I CCCC DDDDDDDDD SSSSSSSSS		DJZD	D,S
111101 10R I CCCC DDDDDDDDD SSSSSSSSS		DJNZ	D,S
111101 11R I CCCC DDDDDDDDD SSSSSSSSS		DJNZD	D,S

111110 000 I CCCC DDDDDDDDD SSSSSSSSS		TJZ	D,S
111110 010 I CCCC DDDDDDDDD SSSSSSSSS		TJZD	D,S
111110 100 I CCCC DDDDDDDDD SSSSSSSSS		TJNZ	D,S
111110 110 I CCCC DDDDDDDDD SSSSSSSSS		TJNZD	D,S

111110 001 I CCCC DDDDDDDDD SSSSSSSSS		JP	D,S
111110 011 I CCCC DDDDDDDDD SSSSSSSSS		JPD	D,S
111110 101 I CCCC DDDDDDDDD SSSSSSSSS		JNP	D,S
111110 111 I CCCC DDDDDDDDD SSSSSSSSS		JNPD	D,S

111111 0CR I CCCC DDDDDDDDD SSSSSSSSS		WAITCNT	D,S		(waits for cnt32, +cnt64 if wc)
111111 1C0 I CCCC DDDDDDDDD SSSSSSSSS		WAITPEQ	D,S		(waits for pins, +cnt32 if wc)
111111 1C1 I CCCC DDDDDDDDD SSSSSSSSS		WAITPNE	D,S		(waits for pins, +cnt32 if wc)
-------------------------------------------------------------------------------------------------


ZCR		effects
-------------------------------------------------------------------------------------------------
000		nz, nc, nr
001		nz, nc, wr
010		nz, wc, nr
011		nz, wc, wr
100		wz, nc, nr
101		wz, nc, wr
110		wz, wc, nr
111		wz, wc, wr


CCCC	condition		(easier-to-read list)
-------------------------------------------------------------------------------------------------
0000	never			1111	always			(default)
0001	nc  &  nz		1100	if_c						if_b
0010	nc  &  z		0011	if_nc						if_ae
0011	nc			1010	if_z						if_e
0100	 c  &  nz		0101	if_nz						if_ne
0101	nz			1000	if_c_and_z		if_z_and_c
0110	 c  <> z		0100	if_c_and_nz		if_nz_and_c
0111	nc  |  nz		0010	if_nc_and_z		if_z_and_nc
1000	 c  &  z		0001	if_nc_and_nz		if_nz_and_nc		if_a
1001	 c  =  z		1110	if_c_or_z		if_z_or_c		if_be
1010	 z			1101	if_c_or_nz		if_nz_or_c
1011	nc  |  z		1011	if_nc_or_z		if_z_or_nc
1100	 c			0111	if_nc_or_nz		if_nz_or_nc
1101	 c  |  nz		1001	if_c_eq_z		if_z_eq_c
1110	 c  |  z		0110	if_c_ne_z		if_z_ne_c
1111	always			0000	never

CCCC	inda/indb - CCCC=1111 after first stage of pipeline if inda/indb used (indx=inda/indb)
-------------------------------------------------------------------------------------------------
xx00	source indx
xx01	source indx++
xx10	source indx--
xx11	source ++indx

00xx	destination indx
01xx	destination indx++
10xx	destination indx--
11xx	destination ++indx


I	SSSSSSSSS	source operand
-------------------------------------------------------------------------------------------------
0	SSSSSSSSS	register
1	#SSSSSSSSS	immediate, zero-extended


	DDDDDDDDD	destination operand
-------------------------------------------------------------------------------------------------
	DDDDDDDDD	register

Cluso99 · 2012-12-14 02:22

Thanks Chip. Back to fix my instruction table

Roy Eltham · 2012-12-14 03:01

I had REPS and REPD swapped in my head.

Ok, so you can use 16,383 instead of 511.

Ariba wrote: »

Roy

I think this is only the case with REPD 511,#xx.
But REPD needs 3 delay slots before the loop, so it is slower when the LMM support funtions jump back to the LMM loop.

Andy

David Betz · 2012-12-14 03:56

cgracey wrote: »

That's right, the way the constant is interpreted and the way the register's contents are interpreted for REPD loops is the same: 0 -> 1 loop (actually, no loops - just the initial fall through).

Yes, I understand that the constant encoded in the instruction and the register value are treated in the same way. My point though has to do with how this is handled in the assembler. It seems that in PNut, if you give the parameter #5, it gets encoded into the instruction as $4. That means that the following two sequences do the same thing:

    repd #5, #10

r0 long 0

...

    mov r0, #4
    repd r0, #10

This seems a bit odd to me.

David Betz · 2012-12-14 13:18

I finally had a chance to add REPD to gas so was able to run the following LMM loop. Note that gas does not subtract one from the immediate values so #7 will repeat 8 instructions.

__LMM_loop
        repd #$1ff,#7
        nop
        nop
        nop
        rdlongc L_ins1,pc
        add pc,#4
        nop
L_ins1  nop
        rdlongc L_ins2,pc
        add pc,#4
        nop
L_ins2  nop
        jmp #__LMM_loop

Result:

fibo(26) = 121393 (916ms) (54997036 ticks)

Cluso99 · 2012-12-14 16:41

REPD & REPS:
Thinking about this some more, could the instruction mnemonics be changed to REP1D and REP1S (I would prefer REP1Q for quick or F for fast) and do not change thecompiler to subtract 1 from # values ???
Then a just place a warning that the REP1x instructions execute the instructions 1+ the repeated number.

evanh · 2012-12-14 18:12

Chip,

What's the chances of wangling the REP instructions to parallel initialise along side the first instructions in the loop letting them tack the place of the spacer instructions? Ie: If there is more than three instructions in the loop then no pre-loop spacers are needed for a REPD.

I see this as possible, although probably too late for the Prop2.

Of course this would likely make REPS redundant.

EDIT: Reworded.

LMM2 - Propeller 2 LMM experiments (50-80 LMM2 MIPS @ 160MHz)

Comments