Shop Learn
P2 Tricks, Traps & Differences between P1 (Reference Material Only) - Page 4 — Parallax Forums

P2 Tricks, Traps & Differences between P1 (Reference Material Only)

124»

Comments

  • Dave HeinDave Hein Posts: 6,289
    edited 2020-03-13 02:51
    Jon, I think Roy is just saying that your statement about the maximum limit for milliseconds has a typo. Instead of
    -- waitms() is limited to 2^31 / clkfreq milliseconds (e.g., 10737 ms at 200MHz)
    
    it should be
    -- waitms() is limited to 2^31 / (clkfreq/1_000) milliseconds (e.g., 10737 ms at 200MHz)
    
  • Cluso99Cluso99 Posts: 17,686
    edited 2020-05-17 11:01
    Spin v Spin2: LOCKSET v LOCKTRY

    In spin we did
    repeat while(lockset(LockID))
    
    whereas in spin2
    repeat while not (locktry(LockID))
    
    Note the reversed status return ie the NOT requirement

    Postedit: Fixed the second line (remove -1 and rename)
  • Cluso99 wrote: »
    Spin v Spin2: LOCKSET v LOCKTRY

    In spin we did
    repeat while(lockset(LockID))
    
    whereas in spin2
    repeat while not (locktry(cardLockID - 1))
    
    Note the reversed status return ie the NOT requirement

    Isn't there a REPEAT UNTIL that is the same as REPEAT WHILE NOT (except potentially faster?)
  • RaymanRayman Posts: 11,934
    Don't use REP in HUBEXEC if you care about speed.

    Just discovered (or maybe rediscovered) that the SD card reading code, FSRW, was twice as fast when I unrolled a REP loop in inline assembly.
    The P2 documentation says this: "REP works in hub memory, as well, but executes a hidden jump to get back to the top of the repeated instructions."
    What it doesn't say is that this makes it slow...
  • Cluso99Cluso99 Posts: 17,686
    Rayman wrote: »
    Don't use REP in HUBEXEC if you care about speed.

    Just discovered (or maybe rediscovered) that the SD card reading code, FSRW, was twice as fast when I unrolled a REP loop in inline assembly.
    The P2 documentation says this: "REP works in hub memory, as well, but executes a hidden jump to get back to the top of the repeated instructions."
    What it doesn't say is that this makes it slow...
    It’s any jumps, calls and rets as well as reps that cause a fifo load and a pause to sync the hub egg-beater that cause hubexec to be slower than cog or LUT.
  • Cluso99Cluso99 Posts: 17,686
    edited 2020-05-25 03:07
    I/O PIN TIMING

    Warning:
    Please be aware that the documentation Parallax Propeller 2 Documentation 2019-09-13 v33 (Rev B silicon) appears to be incorrect.

    At 200MHz the clocks between an OUTx instruction and a following TESTP instruction appears to require a minimum of 7 clocks (waitx #5).

    At 200MHz the clocks between an OUTx instruction and a following TEST instruction appears to require a minimum of 8 clocks (waitx #6).

    Further clarification has been requested.
  • REP was only meant for cog exec but Chip made it execute a jmp if you tried to use it in hubexec, simply for source code compatibility. I find hubexec is almost or just as fast as cog exec if you have a large linear section of code to execute. Just one jump or looping will slow it all down again.
    However, if you need loops and you still want it fast but can't dedicate the memory for it in cog/lut, you use a SETQ + RDLONG at the start of your hubexec code and copy the code into cog or lut (SETQ2) and jump to that.
    This is something I certainly do for my upscaler from 320x240 to 640x480 in my video player.
  • Cluso99Cluso99 Posts: 17,686
    edited 2020-06-16 07:52
    Just fell into this trap for loading LUT.

    You must make sure that the rdlong address uses a HUB address and not a LUT address.
    Note the positioning of the labels _hub_lut_begin and _USER_LUT_BEGIN with respect to ORGH and ORG $200,
    or the use of @
    '+-------[ Load LUT code ??? ]-------------------------------------------------+
                  setq2     ##_USER_LUT_END-_USER_LUT_BEGIN-1 '\ load LUT
    '             rdlong    0, ##_USER_LUT_BEGIN              '/   <-- uses a LUT address (wrong)
                  rdlong    0, ##_hub_LUT_BEGIN               '/   <-- uses a hub address (correct)
    '             rdlong    0, ##@_USER_LUT_BEGIN             '/   <-- uses a hub address (correct)
    '+-----------------------------------------------------------------------------+
    .....
                            orgh
    _hub_lut_begin
                            org     $200                            ' LUT
    _USER_LUT_BEGIN
    
    go_lut        drvl      #57
                  mov       lmm_x,      #"L"
                  call      #\_hubTx
                  ret
    
    _USER_LUT_END
    
    Postedit: added version using @ per the following post
  • You can always use "@_USER_LUT_BEGIN" if you want the hub address of "_USER_LUT_BEGIN", rather than creating a new label for this.
  • Cluso99Cluso99 Posts: 17,686
    edited 2020-11-17 01:25
    Just noticed this really neat piece of code from jonny mac in his jm_fullduplexserial.spin2

    It shows how to easily pass a set of parameters from a program (spin2 here, but doesn't need to be) to a pasm driver. These parameters can be picked up with two pasm instructions, setq and rdlong.
    Also neat is the reading/writing the head and tail parameters using the prta[n] offset.
    Lastly, the use of incmod to increment the head and tail parameters.

    All these are examples of efficient use of the new P2 instruction set.

    However, I would change one thing. I think the "org" should be an "org 0". Mostly it will not matter, but I think there are some possibilities where it may give incorrect results. So I have taken the liberty of modifying this line.

    Here is an extract of the relevant sections
    var
    
      long  cog                                                     ' cog flag/id
    
      long  rxp                                                     ' rx smart pin
      long  txp                                                     ' tx smart pin
      long  rxhub                                                   ' hub address of rxbuf
      long  txhub                                                   ' hub address of txbuf
    
      long  rxhead                                                  ' rx head index
      long  rxtail                                                  ' rx tail index
      long  txhead                                                  ' tx head index
      long  txtail                                                  ' tx tail index
    
      long  txdelay                                                 ' ticks to transmit one byte
    
      byte  rxbuf[BUF_SIZE]                                         ' buffers
      byte  txbuf[BUF_SIZE]
    
    .....
    
    pub start(rxpin, txpin, mode, baud) : result | baudcfg, spmode
      .....
      cog := coginit(COGEXEC_NEW, @uart_mgr, @rxp) + 1              ' start uart manager cog
    
      return cog
    
    
    dat { smart pin uart/buffer manager }
    
                    org       0
    
    uart_mgr        setq      #4-1                                  ' get 4 parameters from hub
                    rdlong    rxd, ptra
    
    
    uart_main       testb     rxd, #31                      wc      ' rx in use?
        if_nc       call      #rx_serial
    
                    testb     txd, #31                      wc      ' tx in use?
        if_nc       call      #tx_serial
    
                    jmp       #uart_main
    
    
    rx_serial       testp     rxd                           wc      ' anything waiting?
        if_nc       ret
    
                    rdpin     t3, rxd                               ' read new byte
                    shr       t3, #24                               ' align lsb
                    mov       t1, p_rxbuf                           ' t1 := @rxbuf
                    rdlong    t2, ptra[4]                           ' t2 := rxhead
                    add       t1, t2
                    wrbyte    t3, t1                                ' rxbuf[rxhead] := t3
                    incmod    t2, #(BUF_SIZE-1)                     ' update head index
        _ret_       wrlong    t2, ptra[4]                           ' write head index back to hub
    
    
    tx_serial       rdpin     t1, txd                       wc      ' check busy flag
        if_c        ret                                             '  abort if busy
    
                    rdlong    t1, ptra[6]                           ' t1 = txhead
                    rdlong    t2, ptra[7]                           ' t2 = txtail
                    cmp       t1, t2                        wz      ' byte(s) to tx?
        if_e        ret
    
                    mov       t1, p_txbuf                           ' start of tx buffer
                    add       t1, t2                                ' add tail index
                    rdbyte    t3, t1                                ' t3 := txbuf[txtail]
                    wypin     t3, txd                               ' load into sp uart
                    incmod    t2, #(BUF_SIZE-1)                     ' update tail index
        _ret_       wrlong    t2, ptra[7]                           ' write tail index back to hub
    
    
    ' --------------------------------------------------------------------------------------------------
    
    rxd             res       1                                     ' receive pin
    txd             res       1                                     ' transmit pin
    p_rxbuf         res       1                                     ' pointer to rxbuf
    p_txbuf         res       1                                     ' pointer to txbuf
    
    t1              res       1                                     ' work vars
    t2              res       1
    t3              res       1
    
                    fit       472
    
    Please, if you want to comment, use this thread
    forums.parallax.com/discussion/167812/p2-tricks-traps-differences-between-p1-discussion/p1?new=1
    and keep this thread for the actual tricks and traps. Thanks.
  • However, I would change one thing. I think the "org" should be an "org 0". Mostly it will not matter, but I think there are some possibilities where it may give incorrect results. So I have taken the liberty of modifying this line.
    Thanks, Ray. I will update my version.
  • evanhevanh Posts: 10,859
    edited 2020-12-14 19:32
    I've started working on implementing a FIR filter to complement the Sinc filters in the smartpins. First part of constructing the table of taps is done and tested. In the process I've learnt a couple of things about using the ALTI prefixing instruction.

    First: I hadn't quite got it that this instruction can manipulate multiple pointers at once. This is achieved like a SIMD operation. Because cogRAM is addressed in just 9 bits, there is room in a single 32-bit register to hold multiple cogRAM pointers, and so that's exactly what can be done.

    Second: A more minor detail: The %RRR bits in the control word has two modes: One for manipulating the subsequent result register address and the other is a special case for re-encoding the subsequent opcode.

    Anyway, here's a compact use of ALTI with %RRR to fill the FIR tap table in cogRAM:
    ' Build FIR table
    '--------------------------------------------------------------
    .step		qfrac	#1, #firsize			'calculate step angle
    		getqx	.step				'retrive the calculated incremental angle of each step
    		sub	s_angle, .step			'one step back from 180 deg to move off the zero value tap
    		qrotate	s_mag, s_angle			'first cordic op for filling FIR table
    
    .tabloop
    		sub	s_angle, .step		wc	'angle step, assumes stepping backward from 180 deg to 0 deg
    		getqx	pa				'retrive the calculated cosine
    	if_nc	qrotate	s_mag, s_angle			'begin next cordic op
    
    		alti	.firp1, #%111_000_000		'first half of table, post-incrementing index, start to middle
    		add	pa, s_mag			'offset cosine to sit on top the zero
    		alti	.firp2, #%110_000_000		'mirrored second half of table, post-decrementing index, end to middle
    		add	pa, s_mag			'ditto
    	if_nc	jmp	#.tabloop
    '--------------------------------------------------------------
    		...
    		...
    
    .firp1		long	firtab<<19
    .firp2		long	(firsize-1+firtab)<<19
    
    s_angle		long	$8000_0000			'start angle ($8000_0000 == 180 deg)
    s_mag		long	$7fff_ffff / firsize		'FIR table magnitude
    
    firtab		res	firsize
    firbuf		res	firsize
    

    EDIT: Quality improvement by making QROTATE conditional execution to eliminate extraneous case
    EDIT2: Automate setting of table magnitude (s_mag)

  • evanhevanh Posts: 10,859
    A neat attribute of the above use of ALTI is it not only provides the needed register indirection, but the nature of manipulating with %RRR is that it gives the ADD instruction a true third operand. The PA and S_MAG operands are still the ALU inputs as specified in the ADD instruction, unaffected by the prefixing ALTI.

  • Cluso99Cluso99 Posts: 17,686
    Fastest way to clear COG or LUT RAM
      SETQ    #length-1       ' use SETQ for COG and SETQ2 for LUT
      RDLONG  where,##$80000  ' clear cog or lut
    

    This works because HUB $80000-$FBFFF is unmapped HUB RAM area indeed reads zeroes.
    Note the 16KB from $7C000-$7FFFF is dual mapped to $FC000-$FFFFF.

    Thanks @Wuerfel_21 :)
  • Cluso99 wrote: »
    Fastest way to clear COG or LUT RAM
      SETQ    #length-1       ' use SETQ for COG and SETQ2 for LUT
      RDLONG  where,##$80000  ' clear cog or lut
    

    This works because HUB $80000-$FBFFF is unmapped HUB RAM area indeed reads zeroes.
    Note the 16KB from $7C000-$7FFFF is dual mapped to $FC000-$FFFFF.

    Thanks @Wuerfel_21 :)

    Of course that won't work if we ever get a 1MB P2 :(
  • Even then, loading the old software wouldn't change that high RAM, so it'd still work. I guess ##$FB800 is slightly more future-proof
  • Cluso99Cluso99 Posts: 17,686

    With more people now using the P2 I thought it was worth bumping this thread.

    For discussions, please use the discussion version of this thread linked in the first post. Thanks.

  • Here's something that makes sense when you think about it, but isn't explicitly said anywhere in the documentation: When skipping (with SKIPF/EXECF/XBYTE) more than 7 instructions after an ALTx instruction, it won't work.

  • evanhevanh Posts: 10,859
    edited 2021-03-14 06:03

    Hehe, it won't be quite that. I found this in the hardware doc:

    like SKIP, but fast due to PC steps of 1..8

    The way that text is layed in the document sugests it came from the instruction sheet. I suspect it got trimmed out of the instruction sheet at some stage.

    The incremental limit of 8 does seem an unneeded burden. There must have been a reason why. I didn't follow the skipping conversations when it was developed so I don't know why myself.

    EDIT: Oh, it's just the ALTx that fails. Yeah, makes complete sense because the ALTx would then be prefixing a cancelled instruction.

  • Cluso99Cluso99 Posts: 17,686
    edited 2021-03-14 08:05

    There is a limit of free skips to 7/8 instructions. After IIRC 7 skips a clock (or 2?) needs to be inserted as the skip continues.
    I am unsure of the impact of ALT instructions. Whether they are treated any differently to other instructions Chip will need to answer.

    To continue discussion, please use the discussion thread (link in top post).

  • Wuerfel_21Wuerfel_21 Posts: 1,611
    edited 2021-03-16 20:31

    Here's another funny one:
    When porting P1 code, make sure that when translating a MOVS that is being used to modify a jump instruction, you turn that jump into an absolute one (jmp #\whatever instead of jmp #whatever)
    (This of course only works if the address is still a cog address)

Sign In or Register to comment.