Shop OBEX P1 Docs P2 Docs Learn Events
The unofficial P2 documentation project - Page 11 — Parallax Forums

The unofficial P2 documentation project

17891113

Comments

  • SeairthSeairth Posts: 2,474
    edited 2013-04-28 08:34
    cgracey wrote: »
    It's my lack of experience in actually programming DSP that caused me to come up with this overly-complex solution to what was a simple problem. If we revise the die, we'll improve this mechanism.

    I suspect your experience is still better than most. Nevertheless, for anyone interested in DSP, I strongly recommend reading The Scientist and Engineer's Guide to Digital Signal Processing (http://www.dspguide.com, PDF version found at http://www.dspguide.com/pdfbook.htm). it's very accessible. It won't give code, but it should give enough understanding to write the code.
  • timx8timx8 Posts: 6
    edited 2013-04-28 10:27
    This is very cool stuff. So of the CORDIC unit, big multiplier, big divider, and big square-rooter, do these live in independent hardware, or do they share resources? Could an ambitious coder run all 4 simultaneously? The fast MUL/SCL/MAC instructions I'm assuming are independent of these?
  • KyeKye Posts: 2,200
    edited 2013-04-28 15:21
    Yeah, I believe they are all separate state machines.
  • cgraceycgracey Posts: 14,206
    edited 2013-04-28 20:42
    Kye wrote: »
    Yeah, I believe they are all separate state machines.

    That's right. They can all run at the same time.
  • potatoheadpotatohead Posts: 10,261
    edited 2013-04-28 21:26
    !!!

    Wow. We have very luxurious math in PASM now. So far, I've used them a few times. Haven't interleaved ops yet, but obviously it's an option. I find myself working shifts and adds, only to remember that we've got fast math now. Fun!
  • pedwardpedward Posts: 1,642
    edited 2013-04-28 22:04
    Chip, I'd like to see documentation on the register remapping, could you detail this?
  • MJBMJB Posts: 1,235
    edited 2013-04-29 01:00
    Seairth wrote: »
    I suspect your experience is still better than most. Nevertheless, for anyone interested in DSP, I strongly recommend reading The Scientist and Engineer's Guide to Digital Signal Processing (http://www.dspguide.com, PDF version found at http://www.dspguide.com/pdfbook.htm). it's very accessible. It won't give code, but it should give enough understanding to write the code.

    The PDFs are a little tedious to find, but if you start here:
    http://www.dspguide.com/CH1.PDF
    then it is CH1 .. CH34.PDF

    I didn't find it in one file - which would be around 15MB
  • SeairthSeairth Posts: 2,474
    edited 2013-04-29 06:20
    MJB wrote: »
    The PDFs are a little tedious to find, but if you start here:
    http://www.dspguide.com/CH1.PDF
    then it is CH1 .. CH34.PDF

    I didn't find it in one file - which would be around 15MB

    That's true. Each chapter is a separate PDF. I used an online tool to merge them into one file. I'd put that up here, but I don't think it falls under "permissible use".
  • cgraceycgracey Posts: 14,206
    edited 2013-04-29 08:03
    pedward wrote: »
    Chip, I'd like to see documentation on the register remapping, could you detail this?

    I'll work on that today. I should have it done by this evening.
  • User NameUser Name Posts: 1,451
    edited 2013-04-29 12:46
    potatohead wrote: »
    We have very luxurious math in PASM now.

    Seriously. I've never seen such luxury in all my years of embedded design. :)

    At one point I wondered why only one cog would fit in a Cyclone IV 4C22. Now I know why. ;) There's really an extraordinary amount of logic packed in the P2.
  • pedwardpedward Posts: 1,642
    edited 2013-04-29 13:21
    User Name wrote: »
    Seriously. I've never seen such luxury in all my years of embedded design. :)

    At one point I wondered why only one cog would fit in a Cyclone IV 4C22. Now I know why. ;) There's really an extraordinary amount of logic packed in the P2.

    Did you think Chip has really be foolin' around for the last 5 years? ;)
  • SapiehaSapieha Posts: 2,964
    edited 2013-04-30 07:15
    Hi Chip.

    You said You will post COG to COG communication by Internal port info and Remapping of COG registers .

    How it is are going ?
  • cgraceycgracey Posts: 14,206
    edited 2013-04-30 08:11
    Sapieha wrote: »
    Hi Chip.

    You said You will post COG to COG communication by Internal port info and Remapping of COG registers .

    How it is are going ?

    I'm still working on the register remapping. After that I'll cover the Port D cog-to-cog communication.
  • SapiehaSapieha Posts: 2,964
    edited 2013-04-30 08:37
    Hi Chip.

    Thanks
  • cgraceycgracey Posts: 14,206
    edited 2013-05-01 13:35
    Okay. Here are the latest doc's which now include register remapping:

    Prop2_Docs.txt

    Here's the new section:
    REGISTER REMAPPING
    ------------------
    
    The SETMAP instruction is used to remap a 2^n-sized block of registers starting at $000, so
    that direct accesses to those registers will be redirected to a range of identically-sized
    blocks, which also build from $000. This feature allows a single program to run multiple
    instances of itself by having unique sets of statically-addressable registers which switch
    according to either INDA or the current task.
    
    When using remapping, you must locate your program code above the last used block of
    registers which the bottom-most block of registers will be remapped to. For example, if you
    select 8 blocks of 16 registers, but are only using 6 of those blocks, your program code
    must not start below register 96 (6*16), to avoid encroaching into the registers which are
    going to be the recipients of remapping.
    
    Here is the SETMAP instruction:
    
        SETMAP  D/#n            - Configure register remapping to %M_BBB_RRR
    
            %M = mode
    
                %0 = INDA selects the block
                %1 = task number selects the block
    
            %BBB = block count
    
                %000 = 1 block          remapping disabled for %000
                %001 = 2 blocks         remapping enabled for %001..%111
                %010 = 4 blocks
                %011 = 8 blocks
                %100 = 16 blocks
                %101 = 32 blocks
                %110 = 64 blocks
                %111 = 128 blocks
    
            %RRR = register count
    
                %000 = 1 register       remap $000
                %001 = 2 registers      remap $000..$001
                %010 = 4 registers      remap $000..$003
                %011 = 8 registers      remap $000..$007
                %100 = 16 registers     remap $000..$00F
                %101 = 32 registers     remap $000..$01F
                %110 = 64 registers     remap $000..$03F
                %111 = 128 registers    remap $000..$07F
    
    
    The new mapping scheme will be in effect on the third instruction after SETMAP. After that,
    changes to INDA or the task number will have an immediate effect on block selection. The
    remapping mechanism only works with hard-coded D and S addresses, not via INDA and INDB
    accesses.
    
    Below is an elaboration of all uniquely-useful remapping schemes:
    
    
                                      S/D addresses
    %M_BBB_RRR    blocks regs      initial -> remapped       block selector
    -----------------------------------------------------------------------------
    %x_000_xxx    1      x               <same>
    
    %0_001_000    2      1      %000000000 -> %00000000P     P = INDA[0]
    %0_001_001    2      2      %00000000X -> %0000000PX
    %0_001_010    2      4      %0000000XX -> %000000PXX     (2 threads)
    %0_001_011    2      8      %000000XXX -> %00000PXXX
    %0_001_100    2      16     %00000XXXX -> %0000PXXXX
    %0_001_101    2      32     %0000XXXXX -> %000PXXXXX
    %0_001_110    2      64     %000XXXXXX -> %00PXXXXXX
    %0_001_111    2      128    %00XXXXXXX -> %0PXXXXXXX
    
    %0_010_000    4      1      %000000000 -> %0000000PP     PP = INDA[1..0]
    %0_010_001    4      2      %00000000X -> %000000PPX
    %0_010_010    4      4      %0000000XX -> %00000PPXX     (4 threads)
    %0_010_011    4      8      %000000XXX -> %0000PPXXX
    %0_010_100    4      16     %00000XXXX -> %000PPXXXX
    %0_010_101    4      32     %0000XXXXX -> %00PPXXXXX
    %0_010_110    4      64     %000XXXXXX -> %0PPXXXXXX
    %0_010_111    4      128    %00XXXXXXX -> %PPXXXXXXX
    
    %0_011_000    8      1      %000000000 -> %000000PPP     PPP = INDA[2..0]
    %0_011_001    8      2      %00000000X -> %00000PPPX
    %0_011_010    8      4      %0000000XX -> %0000PPPXX     (8 threads)
    %0_011_011    8      8      %000000XXX -> %000PPPXXX
    %0_011_100    8      16     %00000XXXX -> %00PPPXXXX
    %0_011_101    8      32     %0000XXXXX -> %0PPPXXXXX
    %0_011_110    8      64     %000XXXXXX -> %PPPXXXXXX
    
    %0_100_000    16     1      %000000000 -> %00000PPPP     PPPP = INDA[3..0]
    %0_100_001    16     2      %00000000X -> %0000PPPPX
    %0_100_010    16     4      %0000000XX -> %000PPPPXX     (16 threads)
    %0_100_011    16     8      %000000XXX -> %00PPPPXXX
    %0_100_100    16     16     %00000XXXX -> %0PPPPXXXX
    %0_100_101    16     32     %0000XXXXX -> %PPPPXXXXX
    
    %0_101_000    32     1      %000000000 -> %0000PPPPP     PPPPP = INDA[4..0]
    %0_101_001    32     2      %00000000X -> %000PPPPPX
    %0_101_010    32     4      %0000000XX -> %00PPPPPXX     (32 threads)
    %0_101_011    32     8      %000000XXX -> %0PPPPPXXX
    %0_101_100    32     16     %00000XXXX -> %PPPPPXXXX
    
    %0_110_000    64     1      %000000000 -> %000PPPPPP     PPPPPP = INDA[5..0]
    %0_110_001    64     2      %00000000X -> %00PPPPPPX
    %0_110_010    64     4      %0000000XX -> %0PPPPPPXX     (64 threads)
    %0_110_011    64     8      %000000XXX -> %PPPPPPXXX
    
    %0_111_000    128    1      %000000000 -> %00PPPPPPP     PPPPPPP = INDA[6..0]
    %0_111_001    128    2      %00000000X -> %0PPPPPPPX
    %0_111_010    128    4      %0000000XX -> %PPPPPPPXX     (128 threads)
    
    %1_001_000    2      1      %000000000 -> %00000000T     T = bit 0 of the task number
    %1_001_001    2      2      %00000000X -> %0000000TX
    %1_001_010    2      4      %0000000XX -> %000000TXX     (2 tasks)
    %1_001_011    2      8      %000000XXX -> %00000TXXX
    %1_001_100    2      16     %00000XXXX -> %0000TXXXX
    %1_001_101    2      32     %0000XXXXX -> %000TXXXXX
    %1_001_110    2      64     %000XXXXXX -> %00TXXXXXX
    %1_001_111    2      128    %00XXXXXXX -> %0TXXXXXXX
    
    %1_010_000    4      1      %000000000 -> %0000000TT     TT = task number
    %1_010_001    4      2      %00000000X -> %000000TTX
    %1_010_010    4      4      %0000000XX -> %00000TTXX     (4 tasks)
    %1_010_011    4      8      %000000XXX -> %0000TTXXX
    %1_010_100    4      16     %00000XXXX -> %000TTXXXX
    %1_010_101    4      32     %0000XXXXX -> %00TTXXXXX
    %1_010_110    4      64     %000XXXXXX -> %0TTXXXXXX
    %1_010_111    4      128    %00XXXXXXX -> %TTXXXXXXX
    
    
    Here is an example program which uses remapping with multi-threading:
    
    DAT             org
    
    period          long    2-1             '$000, thread 0   (20 longs initally execute as NOPs)
    time            long    0               '$001, thread 0
    pin_x           long    0               '$002, thread 0
    pin_y           long    1               '$003, thread 0
    
                    long    4-1             '$000, thread 1
                    long    0               '$001, thread 1
                    long    2               '$002, thread 1
                    long    3               '$003, thread 1
    
                    long    8-1             '$000, thread 2
                    long    0               '$001, thread 2
                    long    4               '$002, thread 2
                    long    5               '$003, thread 2
    
                    long    16-1            '$000, thread 3
                    long    0               '$001, thread 3
                    long    6               '$002, thread 3
                    long    7               '$003, thread 3
    
    pc              long    loop[4]         '$010..$013, all threads start at loop
    
                    setmap  #%0_010_010     'remap 4 blocks of 4 regs by INDA[1..0]
                    fixinda #pc+3,#pc       'set INDA to cycle through blocks and threads
                    nop                     'allow SETMAP 3 clocks to take effect
    
    loop            tasksw                  'switch to next thread
                    incmod  time,period wc  'increment time and reset if period reached (C=1)
            if_c    notp    pin_x           'if period reached, toggle pin_x
                    setpc   pin_y           'if period reached, pin_y high
                    jmp     #loop           '(4 threads executing same code with unique variables)
    
    
    Here is an example program which uses remapping with multi-tasking:
    
    DAT             org
    
    period          long    2-1             '$000, task 0   (16 longs initally execute as NOPs)
    time            long    0               '$001, task 0
    pin_x           long    0               '$002, task 0
    pin_y           long    1               '$003, task 0
    
                    long    4-1             '$000, task 1
                    long    0               '$001, task 1
                    long    2               '$002, task 1
                    long    3               '$003, task 1
    
                    long    8-1             '$000, task 2
                    long    0               '$001, task 2
                    long    4               '$002, task 2
                    long    5               '$003, task 2
    
                    long    16-1            '$000, task 3
                    long    0               '$001, task 3
                    long    6               '$002, task 3
                    long    7               '$003, task 3
    
    
                    setmap  #%1_010_010     'remap 4 blocks of 4 regs by task
                    settask #%11_10_01_00   'set all 4 tasks in motion
                    jmptask #loop,#%1111    'herd tasks to loop
    
    
    loop            incmod  time,period wc  'increment time and reset if period reached (C=1)
            if_c    notp    pin_x           'if period reached, toggle pin_x
                    setpc   pin_y           'if period reached, pin_y high
                    jmp     #loop           '(4 tasks executing same code with unique registers)
    
  • pedwardpedward Posts: 1,642
    edited 2013-05-01 15:36
    Wow, I didn't realize the P2 had both threading and multi-tasking.

    Threading appears to be cooperative multi-tasking, yielding control of the COG when the loop is finished, whereas the multi-tasking appears to be more like temporal multi-threading.

    TASKSW only yields control of the main COG after a single section of code runs, executing only one PC at a time.

    SETTASK allows for up to 4 PCs to be executing simultaneously, but at different pipeline stages, so each PC moves forward in lockstep with another.

    TASKSW is useful for applications where you have either very time sensitive, or blocking code that you want to run, where other tasks don't have hard realtime demands.

    SETTASK is useful for applications where you need hard realtime in multiple threads at once, but at the expense of only using non-blocking, non-flushing instructions.
  • cgraceycgracey Posts: 14,206
    edited 2013-05-01 15:50
    I just added some details to the the latest doc's in post #316, in case anyone already grabbed them.
  • SeairthSeairth Posts: 2,474
    edited 2013-05-01 19:07
    cgracey wrote: »
    I just added some details to the the latest doc's in post #316, in case anyone already grabbed them.

    The attached document states the following for TASKSW: "Instructions trailing TASKSWD are in the next thread". However, this would seem to contradict the way that the other xxxD instructions seem to work (i.e. trailing instructions that are already in the pipeline are associated with the code that's *before* the jump, not after). If TASKSW is conceptually different this way (the documentation is correct), I suggest emphasizing that in the document.
  • cgraceycgracey Posts: 14,206
    edited 2013-05-01 20:26
    Seairth wrote: »
    The attached document states the following for TASKSW: "Instructions trailing TASKSWD are in the next thread". However, this would seem to contradict the way that the other xxxD instructions seem to work (i.e. trailing instructions that are already in the pipeline are associated with the code that's *before* the jump, not after). If TASKSW is conceptually different this way (the documentation is correct), I suggest emphasizing that in the document.

    The reason is because TASKSWD is (I think) 'JMPRETD INDA,++INDA WZ, WC' and when INDA gets incremented, the next instruction has the remapped registers already pointing to the next thread's register block and the flags have been saved and updated, as well. So, the thread context has switched and those trailing instructions are in the next thread.

    I'll make sure this is documented better. Thanks for pointing this out.
  • SeairthSeairth Posts: 2,474
    edited 2013-05-01 21:34
    The threading example makes my brain hurt, which might explain why it looks "wrong" to me. When that code runs, do you actually end up with an initial four switches that basically do nothing but fix up the PC array? Would this also work:
    DAT             org
    
    period          long    2-1             '$000, thread 0   (20 longs initally execute as NOPs)
    time            long    0               '$001, thread 0
    pin_x           long    0               '$002, thread 0
    pin_y           long    1               '$003, thread 0
    
                    long    4-1             '$000, thread 1
                    long    0               '$001, thread 1
                    long    2               '$002, thread 1
                    long    3               '$003, thread 1
    
                    long    8-1             '$000, thread 2
                    long    0               '$001, thread 2
                    long    4               '$002, thread 2
                    long    5               '$003, thread 2
    
                    long    16-1            '$000, thread 3
                    long    0               '$001, thread 3
                    long    6               '$002, thread 3
                    long    7               '$003, thread 3
    
    pc              long    thread[4]       '$010..$013, all threads start at thread
    
                    setmap  #%0_010_010     'remap 4 blocks of 4 regs by INDA[1..0]
                    fixinda #pc+3,#pc       'set INDA to cycle through blocks and threads
                    nop                     'allow SETMAP 3 clocks to take effect
    
    loop            tasksw                  'switch to next thread
    thread          incmod  time,period wc  'increment time and reset if period reached (C=1)
            if_c    notp    pin_x           'if period reached, toggle pin_x
                    setpc   pin_y           'if period reached, pin_y high
                    jmp     #loop           '(4 threads executing same code with unique variables)
    

    My reasoning here is that the pc array will contain the addresses for the thread label (not loop), and TASKSW (rather, JMPRET) is going to load that address from the next array element while storing PC+1 (with PC value being the address of the TASKSW instruction) in the current array element (which is always the same address as the thread label)..

    Or do I have this all wrong?
  • cgraceycgracey Posts: 14,206
    edited 2013-05-01 22:14
    Seairth wrote: »
    The threading example makes my brain hurt, which might explain why it looks "wrong" to me. When that code runs, do you actually end up with an initial four switches that basically do nothing but fix up the PC array? Would this also work:
    DAT             org
    
    period          long    2-1             '$000, thread 0   (20 longs initally execute as NOPs)
    time            long    0               '$001, thread 0
    pin_x           long    0               '$002, thread 0
    pin_y           long    1               '$003, thread 0
    
                    long    4-1             '$000, thread 1
                    long    0               '$001, thread 1
                    long    2               '$002, thread 1
                    long    3               '$003, thread 1
    
                    long    8-1             '$000, thread 2
                    long    0               '$001, thread 2
                    long    4               '$002, thread 2
                    long    5               '$003, thread 2
    
                    long    16-1            '$000, thread 3
                    long    0               '$001, thread 3
                    long    6               '$002, thread 3
                    long    7               '$003, thread 3
    
    pc              long    thread[4]       '$010..$013, all threads start at thread
    
                    setmap  #%0_010_010     'remap 4 blocks of 4 regs by INDA[1..0]
                    fixinda #pc+3,#pc       'set INDA to cycle through blocks and threads
                    nop                     'allow SETMAP 3 clocks to take effect
    
    loop            tasksw                  'switch to next thread
    thread          incmod  time,period wc  'increment time and reset if period reached (C=1)
            if_c    notp    pin_x           'if period reached, toggle pin_x
                    setpc   pin_y           'if period reached, pin_y high
                    jmp     #loop           '(4 threads executing same code with unique variables)
    

    My reasoning here is that the pc array will contain the addresses for the thread label (not loop), and TASKSW (rather, JMPRET) is going to load that address from the next array element while storing PC+1 (with PC value being the address of the TASKSW instruction) in the current array element (which is always the same address as the thread label)..

    Or do I have this all wrong?

    That code will do the same thing as my example. It will just execute the loop's body code before the first TASKSW for each thread.

    My example would have been a lot richer if I showed a more complex program (instead of a loop) which made conditional branches with TASKSW's placed throughout. That way, the PC array wouldn't always contain the same values, but possibly a different return address for each thread, most of the time.
  • SapiehaSapieha Posts: 2,964
    edited 2013-05-01 22:33
    Hi Chip.

    Nice - Thanks

    cgracey wrote: »
    Okay. Here are the latest doc's which now include register remapping:

    Prop2_Docs.txt

    Here's the new section:
    REGISTER REMAPPING
    ------------------
    
    The SETMAP instruction is used to remap a 2^n-sized block of registers starting at $000, so
    that direct accesses to those registers will be redirected to a range of identically-sized
    blocks, which also build from $000. This feature allows a single program to run multiple
    instances of itself by having unique sets of statically-addressable registers which switch
    according to either INDA or the current task.
    
    When using remapping, you must locate your program code above the last used block of
    registers which the bottom-most block of registers will be remapped to. For example, if you
    select 8 blocks of 16 registers, but are only using 6 of those blocks, your program code
    must not start below register 96 (6*16), to avoid encroaching into the registers which are
    going to be the recipients of remapping.
    
    Here is the SETMAP instruction:
    
        SETMAP  D/#n            - Configure register remapping to %M_BBB_RRR
    
            %M = mode
    
                %0 = INDA selects the block
                %1 = task number selects the block
    
            %BBB = block count
    
                %000 = 1 block          remapping disabled for %000
                %001 = 2 blocks         remapping enabled for %001..%111
                %010 = 4 blocks
                %011 = 8 blocks
                %100 = 16 blocks
                %101 = 32 blocks
                %110 = 64 blocks
                %111 = 128 blocks
    
            %RRR = register count
    
                %000 = 1 register       remap $000
                %001 = 2 registers      remap $000..$001
                %010 = 4 registers      remap $000..$003
                %011 = 8 registers      remap $000..$007
                %100 = 16 registers     remap $000..$00F
                %101 = 32 registers     remap $000..$01F
                %110 = 64 registers     remap $000..$03F
                %111 = 128 registers    remap $000..$07F
    
    
    The new mapping scheme will be in effect on the third instruction after SETMAP. After that,
    changes to INDA or the task number will have an immediate effect on block selection. The
    remapping mechanism only works with hard-coded D and S addresses, not via INDA and INDB
    accesses.
    
    Below is an elaboration of all uniquely-useful remapping schemes:
    
    
                                      S/D addresses
    %M_BBB_RRR    blocks regs      initial -> remapped       block selector
    -----------------------------------------------------------------------------
    %x_000_xxx    1      x               <same>
    
    %0_001_000    2      1      %000000000 -> %00000000P     P = INDA[0]
    %0_001_001    2      2      %00000000X -> %0000000PX
    %0_001_010    2      4      %0000000XX -> %000000PXX     (2 threads)
    %0_001_011    2      8      %000000XXX -> %00000PXXX
    %0_001_100    2      16     %00000XXXX -> %0000PXXXX
    %0_001_101    2      32     %0000XXXXX -> %000PXXXXX
    %0_001_110    2      64     %000XXXXXX -> %00PXXXXXX
    %0_001_111    2      128    %00XXXXXXX -> %0PXXXXXXX
    
    %0_010_000    4      1      %000000000 -> %0000000PP     PP = INDA[1..0]
    %0_010_001    4      2      %00000000X -> %000000PPX
    %0_010_010    4      4      %0000000XX -> %00000PPXX     (4 threads)
    %0_010_011    4      8      %000000XXX -> %0000PPXXX
    %0_010_100    4      16     %00000XXXX -> %000PPXXXX
    %0_010_101    4      32     %0000XXXXX -> %00PPXXXXX
    %0_010_110    4      64     %000XXXXXX -> %0PPXXXXXX
    %0_010_111    4      128    %00XXXXXXX -> %PPXXXXXXX
    
    %0_011_000    8      1      %000000000 -> %000000PPP     PPP = INDA[2..0]
    %0_011_001    8      2      %00000000X -> %00000PPPX
    %0_011_010    8      4      %0000000XX -> %0000PPPXX     (8 threads)
    %0_011_011    8      8      %000000XXX -> %000PPPXXX
    %0_011_100    8      16     %00000XXXX -> %00PPPXXXX
    %0_011_101    8      32     %0000XXXXX -> %0PPPXXXXX
    %0_011_110    8      64     %000XXXXXX -> %PPPXXXXXX
    
    %0_100_000    16     1      %000000000 -> %00000PPPP     PPPP = INDA[3..0]
    %0_100_001    16     2      %00000000X -> %0000PPPPX
    %0_100_010    16     4      %0000000XX -> %000PPPPXX     (16 threads)
    %0_100_011    16     8      %000000XXX -> %00PPPPXXX
    %0_100_100    16     16     %00000XXXX -> %0PPPPXXXX
    %0_100_101    16     32     %0000XXXXX -> %PPPPXXXXX
    
    %0_101_000    32     1      %000000000 -> %0000PPPPP     PPPPP = INDA[4..0]
    %0_101_001    32     2      %00000000X -> %000PPPPPX
    %0_101_010    32     4      %0000000XX -> %00PPPPPXX     (32 threads)
    %0_101_011    32     8      %000000XXX -> %0PPPPPXXX
    %0_101_100    32     16     %00000XXXX -> %PPPPPXXXX
    
    %0_110_000    64     1      %000000000 -> %000PPPPPP     PPPPPP = INDA[5..0]
    %0_110_001    64     2      %00000000X -> %00PPPPPPX
    %0_110_010    64     4      %0000000XX -> %0PPPPPPXX     (64 threads)
    %0_110_011    64     8      %000000XXX -> %PPPPPPXXX
    
    %0_111_000    128    1      %000000000 -> %00PPPPPPP     PPPPPPP = INDA[6..0]
    %0_111_001    128    2      %00000000X -> %0PPPPPPPX
    %0_111_010    128    4      %0000000XX -> %PPPPPPPXX     (128 threads)
    
    %1_001_000    2      1      %000000000 -> %00000000T     T = bit 0 of the task number
    %1_001_001    2      2      %00000000X -> %0000000TX
    %1_001_010    2      4      %0000000XX -> %000000TXX     (2 tasks)
    %1_001_011    2      8      %000000XXX -> %00000TXXX
    %1_001_100    2      16     %00000XXXX -> %0000TXXXX
    %1_001_101    2      32     %0000XXXXX -> %000TXXXXX
    %1_001_110    2      64     %000XXXXXX -> %00TXXXXXX
    %1_001_111    2      128    %00XXXXXXX -> %0TXXXXXXX
    
    %1_010_000    4      1      %000000000 -> %0000000TT     TT = task number
    %1_010_001    4      2      %00000000X -> %000000TTX
    %1_010_010    4      4      %0000000XX -> %00000TTXX     (4 tasks)
    %1_010_011    4      8      %000000XXX -> %0000TTXXX
    %1_010_100    4      16     %00000XXXX -> %000TTXXXX
    %1_010_101    4      32     %0000XXXXX -> %00TTXXXXX
    %1_010_110    4      64     %000XXXXXX -> %0TTXXXXXX
    %1_010_111    4      128    %00XXXXXXX -> %TTXXXXXXX
    
    
    Here is an example program which uses remapping with multi-threading:
    
    DAT             org
    
    period          long    2-1             '$000, thread 0   (20 longs initally execute as NOPs)
    time            long    0               '$001, thread 0
    pin_x           long    0               '$002, thread 0
    pin_y           long    1               '$003, thread 0
    
                    long    4-1             '$000, thread 1
                    long    0               '$001, thread 1
                    long    2               '$002, thread 1
                    long    3               '$003, thread 1
    
                    long    8-1             '$000, thread 2
                    long    0               '$001, thread 2
                    long    4               '$002, thread 2
                    long    5               '$003, thread 2
    
                    long    16-1            '$000, thread 3
                    long    0               '$001, thread 3
                    long    6               '$002, thread 3
                    long    7               '$003, thread 3
    
    pc              long    loop[4]         '$010..$013, all threads start at loop
    
                    setmap  #%0_010_010     'remap 4 blocks of 4 regs by INDA[1..0]
                    fixinda #pc+3,#pc       'set INDA to cycle through blocks and threads
                    nop                     'allow SETMAP 3 clocks to take effect
    
    loop            tasksw                  'switch to next thread
                    incmod  time,period wc  'increment time and reset if period reached (C=1)
            if_c    notp    pin_x           'if period reached, toggle pin_x
                    setpc   pin_y           'if period reached, pin_y high
                    jmp     #loop           '(4 threads executing same code with unique variables)
    
    
    Here is an example program which uses remapping with multi-tasking:
    
    DAT             org
    
    period          long    2-1             '$000, task 0   (16 longs initally execute as NOPs)
    time            long    0               '$001, task 0
    pin_x           long    0               '$002, task 0
    pin_y           long    1               '$003, task 0
    
                    long    4-1             '$000, task 1
                    long    0               '$001, task 1
                    long    2               '$002, task 1
                    long    3               '$003, task 1
    
                    long    8-1             '$000, task 2
                    long    0               '$001, task 2
                    long    4               '$002, task 2
                    long    5               '$003, task 2
    
                    long    16-1            '$000, task 3
                    long    0               '$001, task 3
                    long    6               '$002, task 3
                    long    7               '$003, task 3
    
    
                    setmap  #%1_010_010     'remap 4 blocks of 4 regs by task
                    settask #%11_10_01_00   'set all 4 tasks in motion
                    jmptask #loop,#%1111    'herd tasks to loop
    
    
    loop            incmod  time,period wc  'increment time and reset if period reached (C=1)
            if_c    notp    pin_x           'if period reached, toggle pin_x
                    setpc   pin_y           'if period reached, pin_y high
                    jmp     #loop           '(4 tasks executing same code with unique registers)
    
  • BEEPBEEP Posts: 58
    edited 2013-05-02 01:28
    Prop2_Docs_130501.pdf

    Edit:
    File deleted.
  • cgraceycgracey Posts: 14,206
    edited 2013-05-02 14:00
    Okay. I got the inter-cog exchange documented:

    Prop2_Docs.txt

    Here's the new part:
    INTER-COG EXCHANGE
    ------------------
    
    The fourth I/O port of each cog (PIND/DIRD) is implemented as a 32-bit inter-cog data
    exchange, instead of 32 external I/O pins.
    
    Each cog outputs 32 bits of data via PIND, with the actual output being the logical-AND of
    the PIND and DIRD registers. Each cog can select which of the other cogs' PIND outputs are
    going to be gated into its own PIND inputs, on a per-byte basis.
    
    The only control over the inter-cog exchange is each cog's PIND input filter.
    
    The SETXCH instruction is used to set the PIND input filter:
    
        SETXCH  D/#n            - Set PIND input filter to %DDDDDDDD_CCCCCCCC_BBBBBBBB_AAAAAAAA
    
            %DDDDDDDD = filter for PIND input bits 31..24
    
                %xxxxxxx1 = cog 0's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xxxxxx1x = cog 1's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xxxxx1xx = cog 2's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xxxx1xxx = cog 3's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xxx1xxxx = cog 4's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xx1xxxxx = cog 5's PIND[31..24] output will be OR'd into PIND[31..24] input
                %x1xxxxxx = cog 6's PIND[31..24] output will be OR'd into PIND[31..24] input
                %1xxxxxxx = cog 7's PIND[31..24] output will be OR'd into PIND[31..24] input
    
            %CCCCCCCC = filter for PIND input bits 23..16
    
                %xxxxxxx1 = cog 0's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xxxxxx1x = cog 1's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xxxxx1xx = cog 2's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xxxx1xxx = cog 3's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xxx1xxxx = cog 4's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xx1xxxxx = cog 5's PIND[23..16] output will be OR'd into PIND[23..16] input
                %x1xxxxxx = cog 6's PIND[23..16] output will be OR'd into PIND[23..16] input
                %1xxxxxxx = cog 7's PIND[23..16] output will be OR'd into PIND[23..16] input
    
            %BBBBBBBB = filter for PIND input bits 15..8
    
                %xxxxxxx1 = cog 0's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xxxxxx1x = cog 1's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xxxxx1xx = cog 2's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xxxx1xxx = cog 3's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xxx1xxxx = cog 4's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xx1xxxxx = cog 5's PIND[15..8] output will be OR'd into PIND[15..8] input
                %x1xxxxxx = cog 6's PIND[15..8] output will be OR'd into PIND[15..8] input
                %1xxxxxxx = cog 7's PIND[15..8] output will be OR'd into PIND[15..8] input
    
            %AAAAAAAA = filter for PIND input bits 7..0
    
                %xxxxxxx1 = cog 0's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xxxxxx1x = cog 1's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xxxxx1xx = cog 2's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xxxx1xxx = cog 3's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xxx1xxxx = cog 4's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xx1xxxxx = cog 5's PIND[7..0] output will be OR'd into PIND[7..0] input
                %x1xxxxxx = cog 6's PIND[7..0] output will be OR'd into PIND[7..0] input
                %1xxxxxxx = cog 7's PIND[7..0] output will be OR'd into PIND[7..0] input
    
    
    To input only cog 0's 32-bit output, you would use the filter value $01_01_01_01. To input
    the logical-OR of cog 0's and cog 1's 32-bit outputs, you would use $03_03_03_03. In most
    programming cases, it may be desirable to just see one other cog's full 32-bit output in
    your PIND input, but many other arrangements are possible. For example, by using 8-bit or
    16-bit fields with SETF/MOVF to transfer data piecewise from several cogs, a final cog can
    read the aggregate 32-bit result.
    
    After SETXCH, PIND can be read for newly-filtered data on the third clock:
    
            SETXCH  #$00000001      'change filter
            MOV     X,PIND          'data from old filter
            MOV     X,PIND          'data from old filter
            MOV     X,PIND          'data from new filter
    
    
    Writes to a PIND are readable from a PIND on the third clock, as well.
    
    The PIND port does not connect to CTRA, CTRB, XFR, or SER, but it does support the
    following pin instructions, as if it were a regular I/O port:
    
        GETP/GETNP                                      - pin reads
        OFFP/NOTP/CLRP/SETP/SETPC/SETPNC/SETPZ/SETPNZ   - pin writes
        JP/JPD/JNP/JNPD                                 - pin branches
    
  • SapiehaSapieha Posts: 2,964
    edited 2013-05-02 14:01
    Hi Chip.

    BIG Thanks.



    cgracey wrote: »
    Okay. I got the inter-cog exchange documented:

    Prop2_Docs.txt

    Here's the new part:
    INTER-COG EXCHANGE
    ------------------
    
    The fourth I/O port of each cog (PIND/DIRD) is implemented as a 32-bit inter-cog data
    exchange, instead of 32 external I/O pins.
    
    Each cog outputs 32 bits of data via PIND, with the actual output being the logical-AND of
    the PIND and DIRD registers. Each cog can select which of the other cogs' PIND outputs are
    going to be gated into its own PIND inputs, on a per-byte basis.
    
    The only control over the inter-cog exchange is each cog's PIND input filter.
    
    The SETXCH instruction is used to set the PIND input filter:
    
        SETXCH  D/#n            - Set PIND input filter to %DDDDDDDD_CCCCCCCC_BBBBBBBB_AAAAAAAA
    
            %DDDDDDDD = filter for PIND input bits 31..24
    
                %xxxxxxx1 = cog 0's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xxxxxx1x = cog 1's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xxxxx1xx = cog 2's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xxxx1xxx = cog 3's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xxx1xxxx = cog 4's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xx1xxxxx = cog 5's PIND[31..24] output will be OR'd into PIND[31..24] input
                %x1xxxxxx = cog 6's PIND[31..24] output will be OR'd into PIND[31..24] input
                %1xxxxxxx = cog 7's PIND[31..24] output will be OR'd into PIND[31..24] input
    
            %CCCCCCCC = filter for PIND input bits 23..16
    
                %xxxxxxx1 = cog 0's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xxxxxx1x = cog 1's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xxxxx1xx = cog 2's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xxxx1xxx = cog 3's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xxx1xxxx = cog 4's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xx1xxxxx = cog 5's PIND[23..16] output will be OR'd into PIND[23..16] input
                %x1xxxxxx = cog 6's PIND[23..16] output will be OR'd into PIND[23..16] input
                %1xxxxxxx = cog 7's PIND[23..16] output will be OR'd into PIND[23..16] input
    
            %BBBBBBBB = filter for PIND input bits 15..8
    
                %xxxxxxx1 = cog 0's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xxxxxx1x = cog 1's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xxxxx1xx = cog 2's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xxxx1xxx = cog 3's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xxx1xxxx = cog 4's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xx1xxxxx = cog 5's PIND[15..8] output will be OR'd into PIND[15..8] input
                %x1xxxxxx = cog 6's PIND[15..8] output will be OR'd into PIND[15..8] input
                %1xxxxxxx = cog 7's PIND[15..8] output will be OR'd into PIND[15..8] input
    
            %AAAAAAAA = filter for PIND input bits 7..0
    
                %xxxxxxx1 = cog 0's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xxxxxx1x = cog 1's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xxxxx1xx = cog 2's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xxxx1xxx = cog 3's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xxx1xxxx = cog 4's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xx1xxxxx = cog 5's PIND[7..0] output will be OR'd into PIND[7..0] input
                %x1xxxxxx = cog 6's PIND[7..0] output will be OR'd into PIND[7..0] input
                %1xxxxxxx = cog 7's PIND[7..0] output will be OR'd into PIND[7..0] input
    
    
    To input only cog 0's 32-bit output, you would use the filter value $01_01_01_01. To input
    the logical-OR of cog 0's and cog 1's 32-bit outputs, you would use $03_03_03_03. In most
    programming cases, it may be desirable to just see one other cog's full 32-bit output in
    your PIND input, but many other arrangements are possible. For example, by using 8-bit or
    16-bit fields with SETF/MOVF to transfer data piecewise from several cogs, a final cog can
    read the aggregate 32-bit result.
    
    After SETXCH, PIND can be read for newly-filtered data on the third clock:
    
            SETXCH  #$00000001      'change filter
            MOV     X,PIND          'data from old filter
            MOV     X,PIND          'data from old filter
            MOV     X,PIND          'data from new filter
    
    
    Writes to a PIND are readable from a PIND on the third clock, as well.
    
    The PIND/DIRD port does not connect to the CTR's, XFR, or SER, but it does support the
    following pin instructions, as if it were a regular I/O port:
    
        GETP/GETNP                                      - pin reads
        OFFP/NOTP/CLRP/SETP/SETPC/SETPNC/SETPZ/SETPNZ   - pin writes
        JP/JPD/JNP/JNPD                                 - pin branches
    
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-05-02 14:13
    Thanks Chip!

    Sapieha just messaged me that you added the PIND docs... I am now digesting it.
    cgracey wrote: »
    Okay. I got the inter-cog exchange documented:

    Prop2_Docs.txt

    Here's the new part:
    INTER-COG EXCHANGE
    ------------------
    
    The fourth I/O port of each cog (PIND/DIRD) is implemented as a 32-bit inter-cog data
    exchange, instead of 32 external I/O pins.
    
    Each cog outputs 32 bits of data via PIND, with the actual output being the logical-AND of
    the PIND and DIRD registers. Each cog can select which of the other cogs' PIND outputs are
    going to be gated into its own PIND inputs, on a per-byte basis.
    
    The only control over the inter-cog exchange is each cog's PIND input filter.
    
    The SETXCH instruction is used to set the PIND input filter:
    
        SETXCH  D/#n            - Set PIND input filter to %DDDDDDDD_CCCCCCCC_BBBBBBBB_AAAAAAAA
    
            %DDDDDDDD = filter for PIND input bits 31..24
    
                %xxxxxxx1 = cog 0's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xxxxxx1x = cog 1's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xxxxx1xx = cog 2's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xxxx1xxx = cog 3's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xxx1xxxx = cog 4's PIND[31..24] output will be OR'd into PIND[31..24] input
                %xx1xxxxx = cog 5's PIND[31..24] output will be OR'd into PIND[31..24] input
                %x1xxxxxx = cog 6's PIND[31..24] output will be OR'd into PIND[31..24] input
                %1xxxxxxx = cog 7's PIND[31..24] output will be OR'd into PIND[31..24] input
    
            %CCCCCCCC = filter for PIND input bits 23..16
    
                %xxxxxxx1 = cog 0's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xxxxxx1x = cog 1's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xxxxx1xx = cog 2's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xxxx1xxx = cog 3's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xxx1xxxx = cog 4's PIND[23..16] output will be OR'd into PIND[23..16] input
                %xx1xxxxx = cog 5's PIND[23..16] output will be OR'd into PIND[23..16] input
                %x1xxxxxx = cog 6's PIND[23..16] output will be OR'd into PIND[23..16] input
                %1xxxxxxx = cog 7's PIND[23..16] output will be OR'd into PIND[23..16] input
    
            %BBBBBBBB = filter for PIND input bits 15..8
    
                %xxxxxxx1 = cog 0's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xxxxxx1x = cog 1's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xxxxx1xx = cog 2's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xxxx1xxx = cog 3's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xxx1xxxx = cog 4's PIND[15..8] output will be OR'd into PIND[15..8] input
                %xx1xxxxx = cog 5's PIND[15..8] output will be OR'd into PIND[15..8] input
                %x1xxxxxx = cog 6's PIND[15..8] output will be OR'd into PIND[15..8] input
                %1xxxxxxx = cog 7's PIND[15..8] output will be OR'd into PIND[15..8] input
    
            %AAAAAAAA = filter for PIND input bits 7..0
    
                %xxxxxxx1 = cog 0's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xxxxxx1x = cog 1's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xxxxx1xx = cog 2's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xxxx1xxx = cog 3's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xxx1xxxx = cog 4's PIND[7..0] output will be OR'd into PIND[7..0] input
                %xx1xxxxx = cog 5's PIND[7..0] output will be OR'd into PIND[7..0] input
                %x1xxxxxx = cog 6's PIND[7..0] output will be OR'd into PIND[7..0] input
                %1xxxxxxx = cog 7's PIND[7..0] output will be OR'd into PIND[7..0] input
    
    
    To input only cog 0's 32-bit output, you would use the filter value $01_01_01_01. To input
    the logical-OR of cog 0's and cog 1's 32-bit outputs, you would use $03_03_03_03. In most
    programming cases, it may be desirable to just see one other cog's full 32-bit output in
    your PIND input, but many other arrangements are possible. For example, by using 8-bit or
    16-bit fields with SETF/MOVF to transfer data piecewise from several cogs, a final cog can
    read the aggregate 32-bit result.
    
    After SETXCH, PIND can be read for newly-filtered data on the third clock:
    
            SETXCH  #$00000001      'change filter
            MOV     X,PIND          'data from old filter
            MOV     X,PIND          'data from old filter
            MOV     X,PIND          'data from new filter
    
    
    Writes to a PIND are readable from a PIND on the third clock, as well.
    
    The PIND port does not connect to CTRA, CTRB, XFR, or SER, but it does support the
    following pin instructions, as if it were a regular I/O port:
    
        GETP/GETNP                                      - pin reads
        OFFP/NOTP/CLRP/SETP/SETPC/SETPNC/SETPZ/SETPNZ   - pin writes
        JP/JPD/JNP/JNPD                                 - pin branches
    
  • jazzedjazzed Posts: 11,803
    edited 2013-05-02 14:27
    Chip, what is SER ? I see brief references to SNDSER and RCVSER in the doc, but nothing else.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-05-02 14:51
    Steve,

    SNDSER & RCVSER are Chip's new inter-prop communications instructions. Last I heard was:

    - 32 bit buffered input and output (1 long buffer I think)
    - three lines needed: TXD, RXD, CLK
    - can block on read/write, or poll for completion
    - may need shared crystal between the props
    - 1 bit per clock cycle (original plan was PLL'd higher)

    Many of us are waiting for more info :-)
  • pedwardpedward Posts: 1,642
    edited 2013-05-02 15:00
    The SER runs at clock speed, without any specific control, last I heard. Similar to how SDRAM runs at clock rate.

    The synthesis was too muddy to make an independent clock work.
  • jazzedjazzed Posts: 11,803
    edited 2013-05-02 15:14
    Wish it was SERDES. I begged, and begged, and begged.
Sign In or Register to comment.