P2 Tricks, Traps & Differences between P1 (Reference Material Only)

Peter Jakacki · 2020-02-07 07:47

ozpropdev wrote: »
SPAN IO pins feature

From the docs

DIRx/OUTx/FLTx/DRVx can now work on a span of pins (+D[10:6] pins).
Prior SETQ overrides D[10:6].

WRPIN/WXPIN/WYPIN/AKPIN can now work on a span of pins (+S[10:6] pins).
Prior SETQ overrides S[10:6].

Be aware that this feature cannpnt cross the PinA/PinB boundary and
wraps within the 32 pin group.

For example the following code configures pins 30,31,0,1 not 30,31,32,33
		setq	#3
		wrpin	##%011_00_00000_0,#30 '150k pulldown
		setq	#3
		drvl	#30 

The way I read it and taking into account the fact that SETQ would be needed to specify more than 32 pins, I took it that it did span, but alas I tested it and it did not.

Check pin states where h and l are input states and H and L are output states, then from pin 30 set 8 pins low and check.

TAQOZ# .io ---  0:hlhhhhhh 8:hhhhhhhh 16:hhhhhhhh 24:hhhhhhhh 32:hlhhhhhh 40:hhhhhhhh 48:lllllhhh 56:LhlLLLHL ok
TAQOZ# 30 8 PINS LOW ---  ok
TAQOZ# .io ---  0:LLLLLLhh 8:hhhhhhhh 16:hhhhhhhh 24:hhhhhhLL 32:hlhhhhhh 40:hhhhhhhh 48:lllllhhh 56:LhlLLLHL ok

So SETQ cannot be used to specify more than 32 pins anyway.

EDIT: This is also a perfect way to sync smartpins. I tried setting P24 for 8 pins as a PIN specifier and then set them to NCO mode at 1 MHZ, They were all perfectly synch'd.

24 8 PINS PIN 1 MHZ

Rayman · 2020-02-07 15:09

ozpropdev wrote: »
SPAN IO pins feature

From the docs

DIRx/OUTx/FLTx/DRVx can now work on a span of pins (+D[10:6] pins).
Prior SETQ overrides D[10:6].

WRPIN/WXPIN/WYPIN/AKPIN can now work on a span of pins (+S[10:6] pins).
Prior SETQ overrides S[10:6].

Be aware that this feature cannpnt cross the PinA/PinB boundary and
wraps within the 32 pin group.

For example the following code configures pins 30,31,0,1 not 30,31,32,33
		setq	#3
		wrpin	##%011_00_00000_0,#30 '150k pulldown
		setq	#3
		drvl	#30 

This is unfortunate. This must be a bug in the Verilog.
Wonder if Chip knows about this...

cgracey · 2020-02-07 17:25

It was not possible to involve both A and B ports in the pin span, since the data-forwarding circuitry only handles one 32-bit register. The bit span wraps, as well.

So, It's not a bug, but it's not really a feature, either.

Rayman · 2020-02-07 18:17

This is a case where I'd wonder about violating the RISC philosophy and having this instruction take 4 (or more) clock cycles, so as to give the expected result.

cgracey · 2020-02-07 20:43

> @Rayman said:
> This is a case where I'd wonder about violating the RISC philosophy and having this instruction take 4 (or more) clock cycles, so as to give the expected result.

We could have done it all in 2 clocks, but it would have grown the data-forwarding circuit immensely. Didn't seem like it was worth doing.

Rayman · 2020-02-07 21:28

I see the spreadsheet notes this behavior for some instructions.
But, not for BITH, BITL, etc.
Are they the same way?

scratch that... Obviously not the same type of instruction...

I guess this is another thing that should be added to the docs at some point...

msrobots · 2020-02-07 21:30

P3 will need 64 bit register, but then we might get more then 64 pins and the problem is just moved.

But since all pins are equal in opposite to other MCs, one can design the pinout on the board to avoid the overlap.

kwinn · 2020-02-08 04:53

Rayman wrote: »
ozpropdev wrote: »
SPAN IO pins feature

From the docs

DIRx/OUTx/FLTx/DRVx can now work on a span of pins (+D[10:6] pins).
Prior SETQ overrides D[10:6].

WRPIN/WXPIN/WYPIN/AKPIN can now work on a span of pins (+S[10:6] pins).
Prior SETQ overrides S[10:6].

Be aware that this feature cannpnt cross the PinA/PinB boundary and
wraps within the 32 pin group.

For example the following code configures pins 30,31,0,1 not 30,31,32,33
		setq	#3
		wrpin	##%011_00_00000_0,#30 '150k pulldown
		setq	#3
		drvl	#30 
This is unfortunate. This must be a bug in the Verilog.
Wonder if Chip knows about this...

Rayman wrote: »

This is a case where I'd wonder about violating the RISC philosophy and having this instruction take 4 (or more) clock cycles, so as to give the expected result.

IMHO this is not a bug, but the standard (and quite logical) way things have been done on any computer, microprocessor, or microcontroller I have worked with. Having such a feature would not only make the logic circuitry more complicated, it would also open up a whole new set of potential bugs.

Rayman · 2020-02-09 20:12

Think I just found a bug in a code related to SHR...

I assumed that SHR with a source value >31 would result in destination:=0.
But, seems this type of instruction only looks at the lower 5 bits.

Wouldn't it be better if SHR with source>31 made result zero?

Peter Jakacki · 2020-02-09 22:52

Rayman wrote: »

Think I just found a bug in a code related to SHR...

I assumed that SHR with a source value >31 would result in destination:=0.
But, seems this type of instruction only looks at the lower 5 bits.

Wouldn't it be better if SHR with source>31 made result zero?

There are many instructions that have unused source bits but I would not call that a bug. To have it handle the unused bits requires extra logic and do you really want it to work that way anyway?

I just realized that I had marked the source fields of these instructions in my formatted copy of the instruction sheet as xxxxSSSSS, but that is only correct for immediate mode, or more correctly only 5 bits of the source data.

ersmith · 2020-02-10 01:02

Rayman wrote: »

Think I just found a bug in a code related to SHR...

I assumed that SHR with a source value >31 would result in destination:=0.
But, seems this type of instruction only looks at the lower 5 bits.

Wouldn't it be better if SHR with source>31 made result zero?

If it's a bug, it's a bug shared by x86, ARM, and RISC-V -- it seems to be pretty standard in microprocessors to use only the lower 5 bits of the shift amount. I agree that it's unfortunate (it would make more sense to output 0 for values > 31) but the P2 is in good company there.

evanh · 2020-02-10 01:41

It's only a bug when buggy software trips up on it.

Cluso99 · 2020-02-10 02:50

Rayman wrote: »

Think I just found a bug in a code related to SHR...

I assumed that SHR with a source value >31 would result in destination:=0.
But, seems this type of instruction only looks at the lower 5 bits.

Wouldn't it be better if SHR with source>31 made result zero?

I am pretty confident that this is the same behaviour on P1.

Wuerfel_21 · 2020-02-10 12:11

ersmith wrote: »

If it's a bug, it's a bug shared by x86, ARM, and RISC-V -- it seems to be pretty standard in microprocessors to use only the lower 5 bits of the shift amount. I agree that it's unfortunate (it would make more sense to output 0 for values > 31) but the P2 is in good company there.

Hitachi/Renesas/whoever-owns-it-this-week SuperH processors (If you know nothing about them, just know that narrow loads/stores are sign-extended by default to be scared of them for life) do something like that with their SHLD instructions - if you shift left by -32, you get zero... (yet it still only looks at the bottom five bits and the sign bit, so -33 is the same as -1). This whacky nonsense, in addition to the fact that that is the only dynamic shift instruction (aside from SHAD, which does arithmetic shifts, you just get fixed 1/2/8/16 bit shifts and a 1-bit rotate, oof) makes bitwise ops real slow on those, lol.

JonnyMac · 2020-02-27 17:28

Spin2: >| changed to ENCOD
...and with an operational difference. In the P1, >| returns the highest bit set plus one; 0 if tested value is 0. In the P2, ENCOD returns the highest bit set; 0 even if the tested value is 0

Val       P1      P2
----      --      --
   0       0       0
   1       1       0
  10       2       1
  11       2       1
 100       3       2
 101       3       2
 110       3       2
 111       3       2
1000       4       3
1001       4       3
1010       4       3
1011       4       3
1100       4       3
1101       4       3
1110       4       3
1111       4       3

JonnyMac · 2020-02-27 17:47

Spin2 : Return value(s) must be declared.
In the P1 we could do this in a method:

  return 5

or

  result := 5

The P2 doesn't have a default return value, hence it must be declared. To emulate the single return value of the P1, update the method declaration with an explicit return value.

pub my_method() : result

Now return will work as in the P1.

Rayman · 2020-02-27 18:47

Here are the changes I had to make to Graphics.spin to make it work in Fastspin on P2:

'MINS-->FGES
'MAXS-->FLES
'MOVS-->SETS
':-->. (many places)
'SETD-->SETD2 'SETD is an instruction in P2
'cmps wc,wr --> subs wc
'command := cmd << 16 + argptr --> command := cmd << 24 + argptr
'Using QROTATE to fix missing sin table
'Used MUL to speed multiply

The command part is due to the larger address space.
A lot of P1 code assumed address was only in the lower word.
But now, 20-bits are used for address (the upper 12 bits are ignored by the hardware).

Also, found that PTRA now takes the place of PAR in coginit passing of a pointer from hub to cog...

Rayman · 2020-02-27 18:48

Also, fixed the jump table like this:

                        'Trying new approach to jump table (suggested by Wuerfel_21)
                        'ALTGB D, S will select byte #D from table starting at S
                        'Getbyte D brings that byte into D
                        'JMP to that D (and not #D) takes you where you need to go.
                        
                        altgb   t1,#jumps
                        getbyte t1
                        jmp     t1
                        

long
jumps                   byte    0                       '0
                        byte    setup_                  '1
                        byte    color_                  '2
                        byte    width_                  '3
                        byte    plot_                   '4
                        byte    line_                   '5
                        byte    arc_                    '6
                        byte    vec_                    '7
                        byte    vecarc_                 '8
                        byte    pix_                    '9
                        byte    pixarc_                 'A
                        byte    text_                   'B
                        byte    textarc_                'C
                        byte    textmode_               'D
                        byte    fill_                   'E
                        byte    loop                    'F

Rayman · 2020-02-27 18:55

Also, when moving graphics to hubexec I noticed that:

1. Had to leave self-generating PASM in the cog. These two lines where operated on by SETS:

DAT 'NeededInCog  -This part doesn't work in hubexec because target of sets needs to be in cog
NeededInCog

ToShifts
shift0                 shl     mask0,#0                'position slice
shift1                 shr     mask1,#0
                        jmp     #DoneShifts

2. This like TJZ jumps back into cog don't work, had to replace with CMP and JMP instructions.

Cluso99 · 2020-02-27 19:40

Rayman wrote: »
Here are the changes I had to make to Graphics.spin to make it work in Fastspin on P2:
'MINS-->FGES
'MAXS-->FLES
'MOVS-->SETS
':-->. (many places)
'SETD-->SETD2 'SETD is an instruction in P2
'cmps wc,wr --> subs wc
'command := cmd << 16 + argptr --> command := cmd << 24 + argptr
'Using QROTATE to fix missing sin table
'Used MUL to speed multiply
The command part is due to the larger address space.
A lot of P1 code assumed address was only in the lower word.
But now, 20-bits are used for address (the upper 12 bits are ignored by the hardware).

Also, found that PTRA now takes the place of PAR in coginit passing of a pointer from hub to cog...

And PTRA can be modified whereas PTR was a fixed value that remains while the cog is active.

JonnyMac · 2020-02-28 16:00

Spin2 : || (absolute in P1) is now abs.

To get the absolute value:

  x := abs y

The P2 || operator is now logical or.

Rayman · 2020-02-28 16:48

Here is the Spin2 operators text file from Pnut.

I wonder if Eric will change Fastspin over to this at some point...

Big one for me is that >= is finally in the correct order (the way you say it).

Rayman · 2020-02-28 17:04

Coincidentally, I just got bit by this in the current Fastspin with Spin1 operators:

    if mx<-128
      mx:=-128

That "<-" turns out to be bitwise rotate left in Spin1 and not less than a negative number.
Guess it was good to change that...

JonnyMac · 2020-02-28 19:39

Here is the simplified list of P2 operators with P1 comparisons/changes.

Spin Operators

*  New operator in P2
** Behavioral change in P2

P2              P1              Description
-----------------------------------------------------------------
++ (pre)        ++              Pre-increment
-- (pre)        --              Pre-decrement
?? (pre)        ?      **       XORO32, iterate and return pseudo-random

++ (post)       ++              Post-increment
-- (post)       --              Post-decrement
!! (post)                       Post-logical NOT
!  (post)                       Post-bitwise NOT
\  (post)                       Post-set
~  (post)       ~               Post-set to 0
~~ (post)       ~~              Post-set to -1

!               !               Bitwise NOT, 1's complement
-               -               Negation, 2's complement
ABS             ||     *        Absolute value
ENCOD           >|     **       Encode MSB, 31..0
DECOD           |<     *        Decode, 1 << (x & $1F)
BMASK                           Bitmask, (2 << (x & $1F)) - 1
ONES                            Count ones
SQRT            ^^     *        Square root of unsigned x
QLOG                            Unsigned to logarithm
QEXP                            Logarithm to unsigned

>>              >>              Shift right, insert 0's
<<              <<              Shift left, insert 0's
SAR             ~>     *        Shift right, insert MSB's
ROR             ->     *        Rotate right
ROL             <-     *        Rotate left
REV             ><     *        Reverse y LSBs of x and zero-extend
ZEROX                           Zero-extend above bit y
SIGNX           ~, ~~  **       Sign-extend from bit y

&               &               Bitwise AND
^               ^               Bitwise XOR
|               |               Bitwise OR

*               *               Signed multiply
/               /               Signed divide, return quotient
+/                              Unsigned divide, return quotient
//              //              Signed divide, return remainder
+//                             Unsigned divide, return remainder
SCA                             Unsigned scale (x * y) >> 32
SCAS                            Signed scale (x * y) >> 30
FRAC                            Unsigned fraction {x, 32'b0} / y

+               +               Add
-               -               Subtract

#>                              Ensure x => y, signed
<#                              Ensure x <= y, signed

ADDBITS                         Make bitfield, (x & $1F) | (y & $1F) << 5
ADDPINS                         Make pinfield, (x & $3F) | (y & $1F) << 6

<               <               Signed less than                (returns 0 or -1)
+<                              Unsigned less than              (returns 0 or -1)
<=              =<     *        Signed less than or equal       (returns 0 or -1)
+<=                             Unsigned less than or equal     (returns 0 or -1)
==              ==              Equal                           (returns 0 or -1)
<>              <>              Not equal                       (returns 0 or -1)
>=              =>     *        Signed greater than or equal    (returns 0 or -1)
+>=                             Unsigned greater than or equal  (returns 0 or -1)
>               >               Signed greater than             (returns 0 or -1)
+>                              Unsigned greater than           (returns 0 or -1)
<=>                             Signed comparison          (<,=,> returns -1,0,1)

!!, NOT         not    *        Logical NOT  (x == 0,            returns 0 or -1)
&&, AND         and    *        Logical AND  (x <> 0 AND y <> 0, returns 0 or -1)
^^, XOR         xor    **       Logical XOR  (x <> 0 XOR y <> 0, returns 0 or -1)
||, OR          or     **       Logical OR   (x <> 0 OR  y <> 0, returns 0 or -1)

? :                    *        If x <> 0 then choose y, else choose z

:=              :=     **       Set var(s) to x
                                P2: v1,v2,... := x,y,... ' set v1 to x, v2 to y, etc. '_' = ignore


Complex math functions
---------------------------------------------------------------------------------------------------
var_x,var_y := ROTXY(x,y,t)     Rotate cartesian (x,y) by t and assign resultant (x,y)
var_r,var_t := XYPOL(x,y)       Convert cartesian (x,y) to polar and assign resultant (r,t)
var_x,var_y := POLXY(r,t)       Convert polar (r,t) to cartesian and assign resultant (x,y)

SaucySoliton · 2020-02-29 03:39

Floating Point Constants
I think this applies to both P1 and P2, but it bit me a few days ago. It's convenient to use floating point math in the CON section, but it's very unlikely that your program wants IEEE754 floating point values. When processed as an integer, the value will be much different than you expect.

This is a snippet from a VGA driver bundled with PNut: This is fine.

fpix		= 40_000_000
...
		qfrac	##fpix,pa

fpix		= 40_000_000.0      ' PROBLEM
...
		qfrac	##fpix,pa   ' PROBLEM -IEE754 value inserted where P2 most likely expects an integer

fpix		= 40_000_000.0
...
		qfrac	##round(fpix),pa  ' FIXED use round() or trunc() to convert float constant to integer

Should there be a warning about this? On fastspin, round() or trunc() on an integer constant causes an error.

evanh · 2020-02-29 09:27

There's another difference with these type constant definitions. Pnut auto-promotes to floats, Fastspin does not and will complain when mixed without explicit casts.

JonnyMac · 2020-02-29 18:10

Spin2: ~ and ~~ sign extensions replaced with SIGNX

  P1 : signedLong := ~signedByte
  P2 : signedLong := signedByte signx 7

  P1 : signedLong := ~~signedWord
  P2 : signedLong := signedWord signx 15

JonnyMac · 2020-03-03 06:26

Spin2: >< (reverse operator) replaced with REV
REV behaves differently, as well. For example, this snippet of code preps a value for a bit-banged SPI output in the P1.

  if (mode == LSBFIRST)
    outbits ><= 32                                              ' flip bits, align lsb to bit31
  else
    outbits <<= (32-bits)                                       ' align msb to bit31

The modification for running on the P2 is:

  if (mode == LSBFIRST)
    outbits rev= 31                                             ' flip bits, align lsb to bit31
  else
    outbits <<= (32-bits)                                       ' align msb to bit31

Note that the value used in the P2 is the last bit to be persevered; REV will reverse the bits between 0 and the target, then clear everything above the target to 0.

SaucySoliton · 2020-03-09 23:10

P1 Style Indirect Addressing

1) MOVS and MOVD are renamed to SETS and SETD respectively. This is noted in the 2nd post in the thread.

2) The P2's pipeline requires an additional instruction between the SETS/SETD and the instruction that uses it.

There is an instruction timing diagram in the "Assembly Language" section of the Documentation.
1. Ib read ' SETD
2. Db,Sb read
3. Ic read, Ra write ' First delay instruction
4. Dc,Sc read
5. Id read, Rb write ' SETD is writing result here, reading the same location does not result in new data
6. Dd,Sd read
7. Ie read, Rc write ' Can read Rb now

                        setd      .loop,#Temp_Data                              
                        nop                                   ' Seems like P2 needs an additional delay
                        add       t3,#1                       ' Address the next data register
.loop                   wrbyte    0-0,t3                      ' Write the data bytes into hub memory
                        add       .loop,bit_9
                        add       t3,#1                       ' Address the next data register
                        djnz      t2,#.loop

This code snippet is from the CAN bus object. My P2 port is working pretty well now.

JonnyMac · 2020-03-12 22:51

Trap: waitms() and waitus() are limited to a delay of 2^31 / clkfreq seconds. In my 200MHz test that works out to 10_737 milliseconds for waitms(), and 10_737_419 microseconds for waitus().

For long delays I put these methods into my jm_timer.spin2 object.

pub pause(ms) | t0, tixms

'' Delay in milliseconds

  org
                        getct   t0                              ' snapshot counter
                        sub     t0, ##592                       ' fix call overhead
                        rdlong  tixms, #$44                     ' get clkfreq
                        qdiv    tixms, ##1_000                  ' get ticks/ms
                        getqx   tixms
                        rep     #2, ms                          ' delay
                        addct1  t0, tixms
                        waitct1
  end


pub pause_us(us) | t0, tixus

'' Delay in microseconds
'' -- for low speed system frequency, use waitus()

  org
                        getct   t0                              ' snapshot counter
                        sub     t0, ##560                       ' fix call overhead
                        rdlong  tixus, #$44                     ' get clkfreq
                        qdiv    tixus, ##1_000_000              ' get ticks/us
                        getqx   tixus
                        rep     #2, us                          ' delay
                        addct1  t0, tixus
                        waitct1
  end

Edits:
-- fixed opening statement for clarity
-- updated delay routines to be frequency independent

P2 Tricks, Traps &amp; Differences between P1 (Reference Material Only)

Comments

P2 Tricks, Traps & Differences between P1 (Reference Material Only)