Shop OBEX P1 Docs P2 Docs Learn Events
FlexSpin: is mixed HUB/LUT exec possible? - Page 2 — Parallax Forums

FlexSpin: is mixed HUB/LUT exec possible?

2»

Comments

  • TonyB_TonyB_ Posts: 2,193
    edited 2023-09-11 18:42

    @ManAtWork said:

    @evanh said:
    Tony used a technical term there - "Direct Addressing" is the way an instruction is encoded for fetching operand data. Any of the 512 cogRAM addresses can be directly specified in both the encoded S and D fields of an instruction. But, for lutRAM and hubRAM, because of the extra indexing options in the RDLUT/WRLUT instructions, only the first 256 addresses can be dirCorrect, but note that ectly specified this way and only in the S field.

    I'm pretty sure that this does not affect my code if I place the instructions in LUT ram and the data in COG ram. Jumps and branches, REP etc. are PC relative so they still work in LUT ram.

    Executing instructions in LUT RAM that access data in cog RAM is fine. Note that some branch instructions, indicated by Call/Jump to S** in the spreadsheet, have signed-relative addressing in the #S form, which means you cannot branch from everywhere in cog RAM to anywhere in LUT RAM or vice-versa, unless you use S. Usually, though, you can move code by hand so that the branch address is within the PC +/- 256 range for #S.

  • evanhevanh Posts: 16,024
    edited 2023-09-11 22:34

    @ManAtWork said:
    I'm pretty sure that this does not affect my code if I place the instructions in LUT ram and the data in COG ram. Jumps and branches, REP etc. are PC relative so they still work in LUT ram.

    Correct. Addressing modes are about data accesses, not code fetching.

  • ManAtWorkManAtWork Posts: 2,178
    edited 2023-09-12 07:21

    @pik33 said:
    Instead of rdword/add inx/rdword to get 2 samples to interpolate you can use rdlong and getword. rdlong does not have to be long aligned in P2. That saves one hub read, and the cost is one getword and one clock penalty for unaligned long read.
    mul/add/shr maybe can be sped up using scas: scas returns an already shifted result in Q

    Good point. Reading the two table entries at once also has the advantage that I can swap the two MULs which eliminates one of the SUBRs. The first two MULs can't be substituted by SCA because of the odd shifting of #9 but the third can.

                  mov    inx,theta
                  testb  inx,#31 wz                 ' bit 31 -> Z
                  shl    inx,#2 wc                  ' bit 30 -> C
            if_c  not    inx                        ' mirror 2nd and 4th quadrant
                  getbyte ipol,inx,#2               ' fraction -> interpolate
                  shr    inx,#24                    ' MSB -> table index
                  shl    inx,#1
                  add    inx,adrTable
                  rdlong sin1,inx                   ' fetch two consecutive table entries
                  getword sin2,sin1,#1              ' 2nd table entry
                  mul    sin2,ipol
                  subr   ipol,#$100
                  mul    sin1,ipol
                  add    sin1,sin2
                  shr    sin1,#9                    ' 8 bits + /2 for average of sin1+sin2
                  sca    sin1,ampl
                  negz   sin1                       ' flip 3rd and 4th quadrant
    
  • @ManAtWork said:

    @evanh said:
    Tony used a technical term there - "Direct Addressing" is the way an instruction is encoded for fetching operand data. Any of the 512 cogRAM addresses can be directly specified in both the encoded S and D fields of an instruction. But, for lutRAM and hubRAM, because of the extra indexing options in the RDLUT/WRLUT instructions, only the first 256 addresses can be directly specified this way and only in the S field.

    I'm pretty sure that this does not affect my code if I place the instructions in LUT ram and the data in COG ram. Jumps and branches, REP etc. are PC relative so they still work in LUT ram.

    I recall in some cases you might have to use absolute addresses when calling LUTRAM from LUTRAM. I had something like that in my memory driver which had to call code in LUT RAM and I think I had to put in the #\ to make it work. Not sure if this was an assembler limitation at the time or something else.

    Code in LUT RAM:

    r_resume_burst              getnib  b, request, #0          ' a b c d e    get bank parameter LUT address
                                rdlut   b, b                    ' a b c d e    get bank limit/mask
                                bmask   mask, b                 ' | | | d e    build mask for addr
          <snip>
    w_burst                     mov     orighubsize, count      '  a b c | e        save original hub size
    w_locked_fill               cmp     count, #1 wz            '  a b c d |   g    optimization for single transfers
                                shl     count, a                '  a b c | |   |    scale into bytes
                                tjz     count, #nowrite_lut     '  a b c d e   |    check for any bytes to write
    w_resume_burst              mov     c, count                '  a b c d e f g h  get the number of bytes to write
                                call    #\r_resume_burst        '  a b c d e f g h  get per bank limit and read delay info  <----- NOTE: #\ syntax
    
  • @rogloh said:
    I recall in some cases you might have to use absolute addresses when calling LUTRAM from LUTRAM. I had something like that in my memory driver which had to call code in LUT RAM and I think I had to put in the #\ to make it work. Not sure if this was an assembler limitation at the time or something else.

    Code in LUT RAM:

    r_resume_burst              getnib  b, request, #0          ' a b c d e    get bank parameter LUT address
                                rdlut   b, b                    ' a b c d e    get bank limit/mask
                                bmask   mask, b                 ' | | | d e    build mask for addr
          <snip>
    w_burst                     mov     orighubsize, count      '  a b c | e        save original hub size
    w_locked_fill               cmp     count, #1 wz            '  a b c d |   g    optimization for single transfers
                                shl     count, a                '  a b c | |   |    scale into bytes
                                tjz     count, #nowrite_lut     '  a b c d e   |    check for any bytes to write
    w_resume_burst              mov     c, count                '  a b c d e f g h  get the number of bytes to write
                                call    #\r_resume_burst        '  a b c d e f g h  get per bank limit and read delay info  <----- NOTE: #\ syntax
    

    This is skipping related and applies to cog RAM exec and LUT RAM exec. From the doc:

    Special SKIPF Branching Rules

    Within SKIPF sequences where CALL/CALLPA/CALLPB are used to execute subroutines in which skipping will be suspended until after RET, all CALL/CALLPA/CALLPB immediate (#) branch addresses must be absolute in cases where the instruction after the CALL/CALLPA/CALLPB might be skipped. This is not possible for CALLPA/CALLPB but CALL can use '#\address' syntax to achieve absolute immediate addressing. CALL/CALLPA/CALLPB can all use registers as branch addresses, since they are absolute.

  • IMHO, SKIPF is evil. >:) I've used it only once. In some cases it can really save a lot of instructions but it is so error prone especially when jumps, calls or branching is involved, it has to be used with great care.

  • I think SKIPF is the best feature of P2. When used more than trivially, it really helps if skip patterns can be created automatically. For complicated code that's essential. https://forums.parallax.com/discussion/171125/skip-patterns-generated-automatically

    Example:

    ' ROTATES & SHIFTS (8086 emulator)
    
    '                               previous sign bit
    'ROLb           '''a            C
    'RORb           ''b                     6
    'RCLb           ''c             C
    'RCRb           ''d                     6
    'SHLb           ''e             C
    'SHRb           ''f                     6
    'SETMOb         ''g             C                       not "SALb"
    'SARb           ''h                     6
    
    'ROLw           ''A             C
    'RORw           ''B                     14
    'RCLw           ''C             C
    'RCRw           ''D                     14
    'SHLw           ''E             C
    'SHRw           ''F                     14
    'SETMOw         ''G             C                       not "SALw"
    'SARw           ''H                     14
    
    'combined byte/word version
    'count = rotate/shift bit count, > 0
    'width_msb = msb = 7 or 15
    'nc,nz
    
                    movbyts dest,#%%1010                    '|||||||| ABCDE|G|
                    movbyts dest,#%%0000                    'abcde|g| ||||||||
                    signx   dest,width_msb                  '|||||||h |||||||H
                    fle     count,#16                       '||||efgh ||||EFGH      shift 16 max
                    testb   F,#C_bit                wc      '||cd|||| ||CD||||      c = old CF
                    modz    _set                    wz      '||cd|||| ||||||||      z/nz if b/w
    'operation
                    rol     dest,count              wc      'a||||||| A|||||||      c = CF
                    ror     dest,count              wc      '|b|||||| |B||||||      c = CF
                    call    #\RCL_dest_count                '||c||||| ||C|||||      c = CF
                    call    #\RCR_dest_count                '|||d|||| |||D||||      c = CF
                    shl     dest,count              wc      '||||e|g| ||||E|G|      c = CF
                    shr     dest,count              wc      '|||||f|h |||||F|H      c = CF
                    not     dest,#0                         '||||||g| ||||||G|      = -1
    'flags
                    andn    F,#AF_CF_mask                   'abcdefgh ABCDEFGH      write AF = 0; CF = 0 if gG
                    bitc    F,#C_bit                        'abcdef|h ABCDEF|H      write CF
                    testb   dest,#6                 wc      '|b|d|f|h ||||||||
                    testb   dest,#14                wc      '|||||||| |B|D|F|H
                    testb   dest,width_msb          xorc    'abcdefgh ABCDEFGH      c = OF
                    bitc    F,#V_bit                        'abcdefgh ABCDEFGH      write VF
    
  • Yes, SKIPF can be helpful for interpreter and emulator kind of applications where the data flow is always similar but there are many different cases of what to do with the data. But for normal linear and procedural programming I just don't need it very often.

    Unfortunatelly, for my VFD application, things are much more complicated than I first thought. The signals are noisy so I have to apply filtering. This is no problem because the main control loop runs at least 20 times slower than the interrupts processing the ADC samples. But I have to apply the filtering after the Park transformation because applying filtering to a rotating vector causes phase lag and jitter if the timing is not synchronized. After the park transformation the phase relation is quasi-static or at least changes with a much slower rate.

    So I have to do all the vector math in the interrupt service routine and I definitely need the CORDIC. The main control loop then becomes rather trivial. It should be just a few lines of code which can be called from the motion control loop which also runs once per millisecond and hopefully has some CPU time left over. So I can use a dedicated cog running PASM for the VFD without the need for interrupts combined with compiled Spin code.

Sign In or Register to comment.