P2 Tricks, Traps & Differences between P1 (Reference Material Only)

13

Comments

  • Peter JakackiPeter Jakacki Posts: 8,980
    edited 2020-02-07 - 09:06:35
    ozpropdev wrote: »
    SPAN IO pins feature

    From the docs
    DIRx/OUTx/FLTx/DRVx can now work on a span of pins (+D[10:6] pins).
    Prior SETQ overrides D[10:6].

    WRPIN/WXPIN/WYPIN/AKPIN can now work on a span of pins (+S[10:6] pins).
    Prior SETQ overrides S[10:6].

    Be aware that this feature cannpnt cross the PinA/PinB boundary and
    wraps within the 32 pin group.

    For example the following code configures pins 30,31,0,1 not 30,31,32,33
    		setq	#3
    		wrpin	##%011_00_00000_0,#30 '150k pulldown
    		setq	#3
    		drvl	#30 
    

    The way I read it and taking into account the fact that SETQ would be needed to specify more than 32 pins, I took it that it did span, but alas I tested it and it did not.

    Check pin states where h and l are input states and H and L are output states, then from pin 30 set 8 pins low and check.
    TAQOZ# .io ---  0:hlhhhhhh 8:hhhhhhhh 16:hhhhhhhh 24:hhhhhhhh 32:hlhhhhhh 40:hhhhhhhh 48:lllllhhh 56:LhlLLLHL ok
    TAQOZ# 30 8 PINS LOW ---  ok
    TAQOZ# .io ---  0:LLLLLLhh 8:hhhhhhhh 16:hhhhhhhh 24:hhhhhhLL 32:hlhhhhhh 40:hhhhhhhh 48:lllllhhh 56:LhlLLLHL ok
    

    So SETQ cannot be used to specify more than 32 pins anyway.

    EDIT: This is also a perfect way to sync smartpins. I tried setting P24 for 8 pins as a PIN specifier and then set them to NCO mode at 1 MHZ, They were all perfectly synch'd.
    24 8 PINS PIN 1 MHZ
    
  • ozpropdev wrote: »
    SPAN IO pins feature

    From the docs
    DIRx/OUTx/FLTx/DRVx can now work on a span of pins (+D[10:6] pins).
    Prior SETQ overrides D[10:6].

    WRPIN/WXPIN/WYPIN/AKPIN can now work on a span of pins (+S[10:6] pins).
    Prior SETQ overrides S[10:6].

    Be aware that this feature cannpnt cross the PinA/PinB boundary and
    wraps within the 32 pin group.

    For example the following code configures pins 30,31,0,1 not 30,31,32,33
    		setq	#3
    		wrpin	##%011_00_00000_0,#30 '150k pulldown
    		setq	#3
    		drvl	#30 
    

    This is unfortunate. This must be a bug in the Verilog.
    Wonder if Chip knows about this...
  • cgraceycgracey Posts: 12,643
    edited 2020-02-07 - 17:26:11
    It was not possible to involve both A and B ports in the pin span, since the data-forwarding circuitry only handles one 32-bit register. The bit span wraps, as well.

    So, It's not a bug, but it's not really a feature, either.
  • This is a case where I'd wonder about violating the RISC philosophy and having this instruction take 4 (or more) clock cycles, so as to give the expected result.
  • cgraceycgracey Posts: 12,643
    edited 2020-02-07 - 20:44:15
    > @Rayman said:
    > This is a case where I'd wonder about violating the RISC philosophy and having this instruction take 4 (or more) clock cycles, so as to give the expected result.

    We could have done it all in 2 clocks, but it would have grown the data-forwarding circuit immensely. Didn't seem like it was worth doing.
  • RaymanRayman Posts: 10,470
    edited 2020-02-08 - 11:19:15
    I see the spreadsheet notes this behavior for some instructions.
    But, not for BITH, BITL, etc.
    Are they the same way?


    scratch that... Obviously not the same type of instruction...

    I guess this is another thing that should be added to the docs at some point...
  • P3 will need 64 bit register, but then we might get more then 64 pins and the problem is just moved.

    But since all pins are equal in opposite to other MCs, one can design the pinout on the board to avoid the overlap.
  • Rayman wrote: »
    ozpropdev wrote: »
    SPAN IO pins feature

    From the docs
    DIRx/OUTx/FLTx/DRVx can now work on a span of pins (+D[10:6] pins).
    Prior SETQ overrides D[10:6].

    WRPIN/WXPIN/WYPIN/AKPIN can now work on a span of pins (+S[10:6] pins).
    Prior SETQ overrides S[10:6].

    Be aware that this feature cannpnt cross the PinA/PinB boundary and
    wraps within the 32 pin group.

    For example the following code configures pins 30,31,0,1 not 30,31,32,33
    		setq	#3
    		wrpin	##%011_00_00000_0,#30 '150k pulldown
    		setq	#3
    		drvl	#30 
    

    This is unfortunate. This must be a bug in the Verilog.
    Wonder if Chip knows about this...
    Rayman wrote: »
    This is a case where I'd wonder about violating the RISC philosophy and having this instruction take 4 (or more) clock cycles, so as to give the expected result.

    IMHO this is not a bug, but the standard (and quite logical) way things have been done on any computer, microprocessor, or microcontroller I have worked with. Having such a feature would not only make the logic circuitry more complicated, it would also open up a whole new set of potential bugs.
  • Think I just found a bug in a code related to SHR...

    I assumed that SHR with a source value >31 would result in destination:=0.
    But, seems this type of instruction only looks at the lower 5 bits.

    Wouldn't it be better if SHR with source>31 made result zero?
  • Rayman wrote: »
    Think I just found a bug in a code related to SHR...

    I assumed that SHR with a source value >31 would result in destination:=0.
    But, seems this type of instruction only looks at the lower 5 bits.

    Wouldn't it be better if SHR with source>31 made result zero?

    There are many instructions that have unused source bits but I would not call that a bug. To have it handle the unused bits requires extra logic and do you really want it to work that way anyway?

    I just realized that I had marked the source fields of these instructions in my formatted copy of the instruction sheet as xxxxSSSSS, but that is only correct for immediate mode, or more correctly only 5 bits of the source data.
  • Rayman wrote: »
    Think I just found a bug in a code related to SHR...

    I assumed that SHR with a source value >31 would result in destination:=0.
    But, seems this type of instruction only looks at the lower 5 bits.

    Wouldn't it be better if SHR with source>31 made result zero?

    If it's a bug, it's a bug shared by x86, ARM, and RISC-V -- it seems to be pretty standard in microprocessors to use only the lower 5 bits of the shift amount. I agree that it's unfortunate (it would make more sense to output 0 for values > 31) but the P2 is in good company there.
  • It's only a bug when buggy software trips up on it. ;)
  • Rayman wrote: »
    Think I just found a bug in a code related to SHR...

    I assumed that SHR with a source value >31 would result in destination:=0.
    But, seems this type of instruction only looks at the lower 5 bits.

    Wouldn't it be better if SHR with source>31 made result zero?

    I am pretty confident that this is the same behaviour on P1.
  • Wuerfel_21Wuerfel_21 Posts: 604
    edited 2020-02-10 - 12:12:21
    ersmith wrote: »
    If it's a bug, it's a bug shared by x86, ARM, and RISC-V -- it seems to be pretty standard in microprocessors to use only the lower 5 bits of the shift amount. I agree that it's unfortunate (it would make more sense to output 0 for values > 31) but the P2 is in good company there.

    Hitachi/Renesas/whoever-owns-it-this-week SuperH processors (If you know nothing about them, just know that narrow loads/stores are sign-extended by default to be scared of them for life) do something like that with their SHLD instructions - if you shift left by -32, you get zero... (yet it still only looks at the bottom five bits and the sign bit, so -33 is the same as -1). This whacky nonsense, in addition to the fact that that is the only dynamic shift instruction (aside from SHAD, which does arithmetic shifts, you just get fixed 1/2/8/16 bit shifts and a 1-bit rotate, oof) makes bitwise ops real slow on those, lol.
  • JonnyMacJonnyMac Posts: 6,586
    edited 2020-02-27 - 17:32:32
    Spin2: >| changed to ENCOD
    ...and with an operational difference. In the P1, >| returns the highest bit set plus one; 0 if tested value is 0. In the P2, ENCOD returns the highest bit set; 0 even if the tested value is 0
    Val       P1      P2
    ----      --      --
       0       0       0
       1       1       0
      10       2       1
      11       2       1
     100       3       2
     101       3       2
     110       3       2
     111       3       2
    1000       4       3
    1001       4       3
    1010       4       3
    1011       4       3
    1100       4       3
    1101       4       3
    1110       4       3
    1111       4       3
    

  • JonnyMacJonnyMac Posts: 6,586
    edited 2020-02-27 - 17:48:05
    Spin2 : Return value(s) must be declared.
    In the P1 we could do this in a method:
      return 5
    
    or
      result := 5
    
    The P2 doesn't have a default return value, hence it must be declared. To emulate the single return value of the P1, update the method declaration with an explicit return value.
    pub my_method() : result
    
    Now return will work as in the P1.
  • RaymanRayman Posts: 10,470
    edited 2020-02-27 - 18:55:50
    Here are the changes I had to make to Graphics.spin to make it work in Fastspin on P2:
    'MINS-->FGES
    'MAXS-->FLES
    'MOVS-->SETS
    ':-->. (many places)
    'SETD-->SETD2 'SETD is an instruction in P2
    'cmps wc,wr --> subs wc
    'command := cmd << 16 + argptr --> command := cmd << 24 + argptr
    'Using QROTATE to fix missing sin table
    'Used MUL to speed multiply
    

    The command part is due to the larger address space.
    A lot of P1 code assumed address was only in the lower word.
    But now, 20-bits are used for address (the upper 12 bits are ignored by the hardware).

    Also, found that PTRA now takes the place of PAR in coginit passing of a pointer from hub to cog...
  • Also, fixed the jump table like this:
                            'Trying new approach to jump table (suggested by Wuerfel_21)
                            'ALTGB D, S will select byte #D from table starting at S
                            'Getbyte D brings that byte into D
                            'JMP to that D (and not #D) takes you where you need to go.
                            
                            altgb   t1,#jumps
                            getbyte t1
                            jmp     t1
                            
    
    long
    jumps                   byte    0                       '0
                            byte    setup_                  '1
                            byte    color_                  '2
                            byte    width_                  '3
                            byte    plot_                   '4
                            byte    line_                   '5
                            byte    arc_                    '6
                            byte    vec_                    '7
                            byte    vecarc_                 '8
                            byte    pix_                    '9
                            byte    pixarc_                 'A
                            byte    text_                   'B
                            byte    textarc_                'C
                            byte    textmode_               'D
                            byte    fill_                   'E
                            byte    loop                    'F
    
  • Also, when moving graphics to hubexec I noticed that:

    1. Had to leave self-generating PASM in the cog. These two lines where operated on by SETS:
    DAT 'NeededInCog  -This part doesn't work in hubexec because target of sets needs to be in cog
    NeededInCog
    
    ToShifts
    shift0                 shl     mask0,#0                'position slice
    shift1                 shr     mask1,#0
                            jmp     #DoneShifts
    

    2. This like TJZ jumps back into cog don't work, had to replace with CMP and JMP instructions.
  • Rayman wrote: »
    Here are the changes I had to make to Graphics.spin to make it work in Fastspin on P2:
    'MINS-->FGES
    'MAXS-->FLES
    'MOVS-->SETS
    ':-->. (many places)
    'SETD-->SETD2 'SETD is an instruction in P2
    'cmps wc,wr --> subs wc
    'command := cmd << 16 + argptr --> command := cmd << 24 + argptr
    'Using QROTATE to fix missing sin table
    'Used MUL to speed multiply
    

    The command part is due to the larger address space.
    A lot of P1 code assumed address was only in the lower word.
    But now, 20-bits are used for address (the upper 12 bits are ignored by the hardware).

    Also, found that PTRA now takes the place of PAR in coginit passing of a pointer from hub to cog...
    And PTRA can be modified whereas PTR was a fixed value that remains while the cog is active.
  • JonnyMacJonnyMac Posts: 6,586
    edited 2020-02-28 - 16:00:57
    Spin2 : || (absolute in P1) is now abs.

    To get the absolute value:
      x := abs y
    

    The P2 || operator is now logical or.
  • RaymanRayman Posts: 10,470
    edited 2020-02-28 - 16:49:59
    Here is the Spin2 operators text file from Pnut.

    I wonder if Eric will change Fastspin over to this at some point...

    Big one for me is that >= is finally in the correct order (the way you say it).
  • RaymanRayman Posts: 10,470
    edited 2020-02-28 - 17:04:32
    Coincidentally, I just got bit by this in the current Fastspin with Spin1 operators:
        if mx<-128
          mx:=-128
    

    That "<-" turns out to be bitwise rotate left in Spin1 and not less than a negative number.
    Guess it was good to change that...
  • JonnyMacJonnyMac Posts: 6,586
    edited 2020-03-22 - 02:19:08
    Here is the simplified list of P2 operators with P1 comparisons/changes.
    Spin Operators
    
    *  New operator in P2
    ** Behavioral change in P2
    
    P2              P1              Description
    -----------------------------------------------------------------
    ++ (pre)        ++              Pre-increment
    -- (pre)        --              Pre-decrement
    ?? (pre)        ?      **       XORO32, iterate and return pseudo-random
    
    ++ (post)       ++              Post-increment
    -- (post)       --              Post-decrement
    !! (post)                       Post-logical NOT
    !  (post)                       Post-bitwise NOT
    \  (post)                       Post-set
    ~  (post)       ~               Post-set to 0
    ~~ (post)       ~~              Post-set to -1
    
    !               !               Bitwise NOT, 1's complement
    -               -               Negation, 2's complement
    ABS             ||     *        Absolute value
    ENCOD           >|     **       Encode MSB, 31..0
    DECOD           |<     *        Decode, 1 << (x & $1F)
    BMASK                           Bitmask, (2 << (x & $1F)) - 1
    ONES                            Count ones
    SQRT                            Square root of unsigned x
    QLOG                            Unsigned to logarithm
    QEXP                            Logarithm to unsigned
    
    >>              >>              Shift right, insert 0's
    <<              <<              Shift left, insert 0's
    SAR             ~>     *        Shift right, insert MSB's
    ROR             ->     *        Rotate right
    ROL             <-     *        Rotate left
    REV             ><     *        Reverse y LSBs of x and zero-extend
    ZEROX                           Zero-extend above bit y
    SIGNX           ~, ~~  **       Sign-extend from bit y
    
    &               &               Bitwise AND
    ^               ^               Bitwise XOR
    |               |               Bitwise OR
    
    *               *               Signed multiply
    /               /               Signed divide, return quotient
    +/                              Unsigned divide, return quotient
    //              //              Signed divide, return remainder
    +//                             Unsigned divide, return remainder
    SCA                             Unsigned scale (x * y) >> 32
    SCAS                            Signed scale (x * y) >> 30
    FRAC                            Unsigned fraction {x, 32'b0} / y
    
    +               +               Add
    -               -               Subtract
    
    #>                              Ensure x => y, signed
    <#                              Ensure x <= y, signed
    
    ADDBITS                         Make bitfield, (x & $1F) | (y & $1F) << 5
    ADDPINS                         Make pinfield, (x & $3F) | (y & $1F) << 6
    
    <               <               Signed less than                (returns 0 or -1)
    +<                              Unsigned less than              (returns 0 or -1)
    <=              =<     *        Signed less than or equal       (returns 0 or -1)
    +<=                             Unsigned less than or equal     (returns 0 or -1)
    ==              ==              Equal                           (returns 0 or -1)
    <>              <>              Not equal                       (returns 0 or -1)
    >=              =>     *        Signed greater than or equal    (returns 0 or -1)
    +>=                             Unsigned greater than or equal  (returns 0 or -1)
    >               >               Signed greater than             (returns 0 or -1)
    +>                              Unsigned greater than           (returns 0 or -1)
    <=>                             Signed comparison          (<,=,> returns -1,0,1)
    
    !!, NOT         not             Logical NOT  (x == 0,            returns 0 or -1)
    &&, AND         and             Logical AND  (x <> 0 AND y <> 0, returns 0 or -1)
    ^^, XOR         xor             Logical XOR  (x <> 0 XOR y <> 0, returns 0 or -1)
    ||, OR          or              Logical OR   (x <> 0 OR  y <> 0, returns 0 or -1)
    
    ? :                             If x <> 0 then choose y, else choose z
    
    :=              :=     *        Set var(s) to x
                                    P2: v1,v2,... := x,y,... ' set v1 to x, v2 to y, etc. '_' = ignore
    
    
    Complex math functions
    ---------------------------------------------------------------------------------------------------
    var_x,var_y := ROTXY(x,y,t)     Rotate cartesian (x,y) by t and assign resultant (x,y)
    var_r,var_t := XYPOL(x,y)       Convert cartesian (x,y) to polar and assign resultant (r,t)
    var_x,var_y := POLXY(r,t)       Convert polar (r,t) to cartesian and assign resultant (x,y)
    
  • Floating Point Constants
    I think this applies to both P1 and P2, but it bit me a few days ago. It's convenient to use floating point math in the CON section, but it's very unlikely that your program wants IEEE754 floating point values. When processed as an integer, the value will be much different than you expect.

    This is a snippet from a VGA driver bundled with PNut: This is fine.
    fpix		= 40_000_000
    ...
    		qfrac	##fpix,pa
    
    fpix		= 40_000_000.0      ' PROBLEM
    ...
    		qfrac	##fpix,pa   ' PROBLEM -IEE754 value inserted where P2 most likely expects an integer
    
    fpix		= 40_000_000.0
    ...
    		qfrac	##round(fpix),pa  ' FIXED use round() or trunc() to convert float constant to integer 
    

    Should there be a warning about this? On fastspin, round() or trunc() on an integer constant causes an error.
  • evanhevanh Posts: 9,005
    edited 2020-02-29 - 09:28:01
    There's another difference with these type constant definitions. Pnut auto-promotes to floats, Fastspin does not and will complain when mixed without explicit casts.

  • JonnyMacJonnyMac Posts: 6,586
    edited 2020-02-29 - 18:16:07
    Spin2: ~ and ~~ sign extensions replaced with SIGNX
      P1 : signedLong := ~signedByte
      P2 : signedLong := signedByte signx 7
    
      P1 : signedLong := ~~signedWord
      P2 : signedLong := signedWord signx 15
    


  • Spin2: >< (reverse operator) replaced with REV
    REV behaves differently, as well. For example, this snippet of code preps a value for a bit-banged SPI output in the P1.
      if (mode == LSBFIRST)
        outbits ><= 32                                              ' flip bits, align lsb to bit31
      else
        outbits <<= (32-bits)                                       ' align msb to bit31
    
    The modification for running on the P2 is:
      if (mode == LSBFIRST)
        outbits rev= 31                                             ' flip bits, align lsb to bit31
      else
        outbits <<= (32-bits)                                       ' align msb to bit31
    
    Note that the value used in the P2 is the last bit to be persevered; REV will reverse the bits between 0 and the target, then clear everything above the target to 0.
  • P1 Style Indirect Addressing

    1) MOVS and MOVD are renamed to SETS and SETD respectively. This is noted in the 2nd post in the thread.

    2) The P2's pipeline requires an additional instruction between the SETS/SETD and the instruction that uses it.

    There is an instruction timing diagram in the "Assembly Language" section of the Documentation.
    1. Ib read ' SETD
    2. Db,Sb read
    3. Ic read, Ra write ' First delay instruction
    4. Dc,Sc read
    5. Id read, Rb write ' SETD is writing result here, reading the same location does not result in new data
    6. Dd,Sd read
    7. Ie read, Rc write ' Can read Rb now
                            setd      .loop,#Temp_Data                              
                            nop                                   ' Seems like P2 needs an additional delay
                            add       t3,#1                       ' Address the next data register
    .loop                   wrbyte    0-0,t3                      ' Write the data bytes into hub memory
                            add       .loop,bit_9
                            add       t3,#1                       ' Address the next data register
                            djnz      t2,#.loop
    

    This code snippet is from the CAN bus object. My P2 port is working pretty well now.
  • JonnyMacJonnyMac Posts: 6,586
    edited 2020-03-13 - 20:49:35
    Trap: waitms() and waitus() are limited to a delay of 2^31 / clkfreq seconds. In my 200MHz test that works out to 10_737 milliseconds for waitms(), and 10_737_419 microseconds for waitus().

    For long delays I put these methods into my jm_timer.spin2 object.
    pub pause(ms) | t0, tixms
    
    '' Delay in milliseconds
    
      org
                            getct   t0                              ' snapshot counter
                            sub     t0, ##592                       ' fix call overhead
                            rdlong  tixms, #$44                     ' get clkfreq
                            qdiv    tixms, ##1_000                  ' get ticks/ms
                            getqx   tixms
                            rep     #2, ms                          ' delay
                            addct1  t0, tixms
                            waitct1
      end
    
    
    pub pause_us(us) | t0, tixus
    
    '' Delay in microseconds
    '' -- for low speed system frequency, use waitus()
    
      org
                            getct   t0                              ' snapshot counter
                            sub     t0, ##560                       ' fix call overhead
                            rdlong  tixus, #$44                     ' get clkfreq
                            qdiv    tixus, ##1_000_000              ' get ticks/us
                            getqx   tixus
                            rep     #2, us                          ' delay
                            addct1  t0, tixus
                            waitct1
      end
    

    Edits:
    -- fixed opening statement for clarity
    -- updated delay routines to be frequency independent
Sign In or Register to comment.