Shop OBEX P1 Docs P2 Docs Learn Events
Is a P2 Edge "Multi" Board feasible? - Page 2 — Parallax Forums

Is a P2 Edge "Multi" Board feasible?

2»

Comments

  • RossHRossH Posts: 5,462
    edited 2023-03-11 09:13

    Okay! My simple Prop-to-Prop cables arrived this week, so I have connected a P2_EVAL to a P2_EDGE, and here is my initial test program - simple but reliable synchronous transfers using a 32 bit parallel bus between two or more Propellers at 32Mb/s ...

    /*
     * Program to test how fast and reliably bus read/writes can be done using a 
     * simple synchronous parallel bus connecting two or more Propellers. 
     *
     * There can be only one sender cog at a time, but can be multiple receiver 
     * cogs. The intention is to start one sender cog on one Propeller, and 
     * multiple receiver cogs on the other Propellers. This program supports
     * four receivers on a single Propeller.
     *
     * The Propellers must be connected pin to pin on pins 00 .. 31 (i.e. port A). 
     * If you have more than two Propellers connected, you could start additional 
     * receiver cogs on the other Propellers.
     *
     * This is a synchronous transfer, so how much data can be transferred depends
     * on how closely the respective Propeller clocks are synchronized. The crystal 
     * accuracy is typically +/-0.5 PPM. At 25 clocks per long, this means the 
     * clocks may be out of sync by up one clock after 40,000 longs. A suitable 
     * maximum size for a single synchronous transfer might therefore be 20,000 
     * longs, which is why this value is used by this program. Larger transfers 
     * can be performed by doing multiple smaller transfers (which is the point 
     * of this test program!).
     *
     * The program uses P2 NATIVE PASM, so it must be compiled in P2 NATIVE mode
     * (which is the default mode for the Propeller 2). 
     *
     * To maximize available cogs, add -C NO_MOUSE and -C NO_FLOAT (and -C SIMPLE
     * if using a serial HMI). However in this simple-minded program, buffer space
     * is likely to be the limiting factor, not cogs.
     *
     *  To build as a sender, add -C SENDER
     *
     *  To build as a receiver, add -C RECEIVER
     *
     * For example, compile with a command like:
     *
     *    catalina -p2 -lci p2_bus.c -o sender -C NO_MOUSE -C NO_FLOAT -C SENDER
     * or
     *    catalina -p2 -lci p2_bus.c -o receiver -C NO_MOUSE -C NO_FLOAT -C RECEIVER
     *
     * Then load and execute with commands like:
     *
     *    payload sender -PN -i
     *    payload receiver -PM -i
     *
     * where N & M are the Propeller ports to use for the sender and receiver,
     * respectively (add more 'receiver' commands if there are more than two
     * Propellers on the bus).
     */
    
    #if !defined (__CATALINA_SENDER) && !defined(__CATALINA_RECEIVER)
    #error EITHER SENDER OR RECEIVER MUST BE DEFINED!
    #endif
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <catalina_cog.h>
    #include <catalina_plugin.h>
    
    #define XFER_DELAY 7             // clocks per long is 18+this (7 works!)
    
    #define NUM_LONGS  20000         // number of longs to transfer (20000 works!)
    
    #define STACK_SIZE 500           // size of cog stack (stdio requires 500)
    
    #define TOTAL_ERRORS_ONLY        // print only total errors, not details
    
    static unsigned long start = 0;  // clock count used to synchronize cogs
    static int lock = 0;             // lock to protect I/O
    
    static unsigned long *send_buff; // data to be sent
    static unsigned long *rcv1_buff; // data received (1)
    static unsigned long *rcv2_buff; // data received (2)
    static unsigned long *rcv3_buff; // data received (3)
    static unsigned long *rcv4_buff; // data received (4)
    
    /*
     * sync - synchronize multiple cogs to start on a specific clock count
     *
     *    'start' should be set to a clock count some time in the 
     *            future - e.g. _cnt() + _clockfreq() for one second
     */
    int sync(unsigned long start) {
       return PASM (
          " getct  r0\n"
          " sub    r2, r0\n"
          " waitx  r2\n"
          " getct  r0\n"
       );
    }
    
    /*
     * send - pasm code to write a number of longs to the 32 bit bus
     *        Note: sending starts immediately, and then sends a new long
     *              every 'time'+18 ticks.
     *        Note: loads the code into LUT RAM and executes it there.
     *
     *    'time' (passed in r4) is the time between writes in clocks (+18)
     *    'buff' (passed in r3) is an array holding longs to send
     *    'size' (passed in r2) is the size of the array
     */
    int send(int time, void *buff, int size) {
       return PASM (
          // set pins 00 .. 31 as outputs
          "        mov     outa, #0\n"
          "        or      dira,##$FFFFFFFF\n"
    
          // load LUT RAM:
          "        setq2   #(send_end - send_start - 1)\n"
          "        rdlong  0, ##@send_start\n"
    
          // jump to code in LUT RAM:
          "        jmp     #send_start\n" 
    
          // code to be executed in LUT RAM:
          "        org     $200\n"
          "send_start\n"
          "        getct   r0\n"             // LUT: 2 (clocks)
          "\n"
          "        rep     #7, r2\n"         // LUT: 2
          "        addct1  r0, #18\n"        // LUT: 2
          "        rdlong  r1, r3\n"         // LUT: 9 .. 16
          "        waitct1\n"                // LUT: 2
          "        add     r3, #4\n"         // LUT: 2
          "        mov     outa, r1\n"       // LUT: 2
          "        addct1  r0, r4\n"         // LUT: 2
          "        waitct1\n"                // LUT: 2
          "\n"
          "        addct1  r0, r4\n"
          "        addct1  r0, #18\n"
          "        waitct1\n"
          "        mov     outa, #0\n"
          "        getct   r0\n"
          "        jmp     #send_cont\n"
          "send_end\n" 
    
          // resume Hub Execution:
          "       orgh\n"
          "send_cont\n" 
       );
    }
    
    /*
     * sender - send an array of longs to one or more receivers
     *
     *    'buff' is an array of NUM_LONGS longs.
     */
    void sender(void *buff) {
       unsigned int started, stopped, total;
       int me = _cogid();
    
       started = _cnt();
       stopped = send(XFER_DELAY, buff, NUM_LONGS);
       total   = stopped - started;
       ACQUIRE(lock);
       printf("send (cog %d) took %d clocks (%d per long)\n\n", 
               me, total, total/NUM_LONGS);
       RELEASE(lock);
       while(1); // don't exit
    }
    
    /*
     * recv - pasm code to read a number of longs from the 32 bit bus. 
     *        Note: receiving starts 'time' clock ticks after any non-zero
     *              value is detected on the bus, and then reads a new long 
     *              every 'time'+18 ticks.
     *        Note: loads the code into LUT RAM and executes it there.
     *
     *    'time' (passed in r4) is the time between reads in clocks (+18)
     *    'buff' (passed in r3) is an array to hold longs received
     *    'size' (passed in r2) is the size of the array
     */
    int recv(int time, void *buff, int size) {
       return PASM (
          // set pins 00 .. 31 as inputs
          "        andn    dira,##$FFFFFFFF\n"
    
          // load LUT RAM:
          "        setq2   #(recv_end - recv_start - 1)\n"
          "        rdlong  0, ##@recv_start\n"
    
          // jump to code in LUT RAM:
          "        jmp     #recv_start\n"
    
          // code to be executed in LUT RAM:
          "        org     $200\n"
          "recv_start\n"
          "        mov     r1, ina\n"        // LUT: 2 (clocks)
          "        cmp     r1, #0 wz\n"      // LUT: 2
          " if_z   jmp     #recv_start\n"    // LUT: 4
          "        getct   r0\n"             // LUT: 2 
          "\n"
          "        rep     #7, r2\n"         // LUT: 2
          "        addct1  r0, #18\n"        // LUT: 2
          "        wrlong  r1, r3\n"         // LUT: 3 .. 10
          "        waitct1\n"                // LUT: 2
          "        add     r3, #4\n"         // LUT: 2
          "        addct1  r0, r4\n"         // LUT: 2
          "        waitct1\n"                // LUT: 2
          "        mov     r1, ina\n"        // LUT: 2 
          "\n"
          "        getct   r0\n"
          "        jmp     #recv_cont\n"
          "recv_end\n"
    
          // resume Hub Execution:
          "        orgh\n"
          "recv_cont\n"
       );
    }
    
    /*
     * receiver- receive an array of longs from the 32 bit bus
     *
     *    'buff' is an array of NUM_LONGS longs.
     */
    void receiver(void *buff) {
       unsigned int started, stopped;
       int me = _cogid();
    
       started = sync(start);
       stopped = recv(XFER_DELAY, buff, NUM_LONGS);
       ACQUIRE(lock);
       printf("recv (cog %d) started at clock 0x%08x\n", me, started);
       RELEASE(lock);
       while(1); // don't exit
    }
    
    
    void main(void) {
       unsigned long i;
       long send_stack[STACK_SIZE];
       long rcv1_stack[STACK_SIZE];
       long rcv2_stack[STACK_SIZE];
       long rcv3_stack[STACK_SIZE];
       long rcv4_stack[STACK_SIZE];
       int send_cog;
       int rcv1_cog;
       int rcv2_cog;
       int rcv3_cog;
       int rcv4_cog;
       int errors = 0;
       int transfers = 0;
    
       // assign a lock to be used to avoid plugin contention
       lock = _locknew();
    
       // give the vt100 emulator (if used) a chance to start
    #ifdef __CATALINA_VT100
       _waitms(500);
    #endif
    
       // allocate the arrays (we allocate them all whether we are a sender
       // or a receiver)
       send_buff = malloc(NUM_LONGS*4);
       rcv1_buff = malloc(NUM_LONGS*4);
       rcv2_buff = malloc(NUM_LONGS*4);
       rcv3_buff = malloc(NUM_LONGS*4);
       rcv4_buff = malloc(NUM_LONGS*4);
    
       // initialize the arrays
       for (i = 0; i < NUM_LONGS; i++) {
          send_buff[i] = i+1; // can be anything, but must be non-zero
          rcv1_buff[i] = 0;
          rcv2_buff[i] = 0;
          rcv3_buff[i] = 0;
          rcv4_buff[i] = 0;
       }
    
    #ifdef __CATALINA_SENDER
    
       // set all bus pins to zero
       PASM(
          "       mov     outa, #0\n"
          "       or      dira,##$FFFFFFFF\n"
       );
    
       // start ONE sender
       k_clear();
       ACQUIRE(lock);
       printf("SENDER (Clock = %lu Hz)\n\n", _clockfreq());
       printf("Start the receiver cogs in the Receiver program,\n");
       printf("then press any key to start sender cog\n");
       RELEASE(lock);
       k_wait();
    
       // keep transferring forever
       while (1) {
          ACQUIRE(lock);
          printf("\nStarting sender cog ...\n\n");
          RELEASE(lock);
          send_cog = _cogstart_C(&sender, send_buff, send_stack, STACK_SIZE);
    
          // give the sender a chance to send (3 seconds is generous!)
          _waitms(3000);
          ACQUIRE(lock);
          printf("... done\n");
          RELEASE(lock);
    
          // cancel the sender (we restart it again for each transfer)
          _cogstop(send_cog);
    
          // update and print statistics
          transfers++;
          ACQUIRE(lock);
          printf("\nTotal %d transfers\n",transfers); 
          RELEASE(lock);
       }
    
    #endif
    
    #ifdef __CATALINA_RECEIVER
    
       ACQUIRE(lock);
       printf("RECEIVER (Clock = %lu Hz)\n\n", _clockfreq());
       printf("Press any key to start receiver cogs, then start the sender\n");
       printf("cog in the sender program\n");
       RELEASE(lock);
       k_wait();
       ACQUIRE(lock);
       printf("Starting receiver cogs ...\n\n");
       RELEASE(lock);
    
       while (1) {
          // set a start time for the receiver cogs to use in the sync function
          start = _cnt() + _clockfreq(); // set start time for +1 seconds
    
          // start FOUR receiver cogs
          rcv1_cog = _cogstart_C(&receiver, rcv1_buff, rcv1_stack, STACK_SIZE);
          rcv2_cog = _cogstart_C(&receiver, rcv2_buff, rcv2_stack, STACK_SIZE);
          rcv3_cog = _cogstart_C(&receiver, rcv3_buff, rcv3_stack, STACK_SIZE);
          rcv4_cog = _cogstart_C(&receiver, rcv4_buff, rcv4_stack, STACK_SIZE);
          ACQUIRE(lock);
          printf("... done\n");
          RELEASE(lock);
    
          // give receiver a chance to receive
          _waitms(1000);
    
          // wait till receivers complete
          while (
              (rcv1_buff[NUM_LONGS - 1] == 0)
           && (rcv2_buff[NUM_LONGS - 1] == 0)
           && (rcv3_buff[NUM_LONGS - 1] == 0)
           && (rcv4_buff[NUM_LONGS - 1] == 0)
          ) {
              _waitms(1000);
          }
    
          // terminate the receiver cogs (we restart them again for each transfer)
          _cogstop(rcv1_cog);
          _cogstop(rcv2_cog);
          _cogstop(rcv3_cog);
          _cogstop(rcv4_cog);
    
          // check the results
          ACQUIRE(lock);
          printf("\nChecking data ...\n");
          for (i = 0; i < NUM_LONGS; i++) {
             // check receiver 1 got the correct data
             if (send_buff[i] != rcv1_buff[i]) {
                printf("send[%3d]=0x%08X != rcv1[%3d]=0x%08X\n",  
                       i, send_buff[i], i, rcv1_buff[i]);
                _waitms(5);
    #ifdef TOTAL_ERRORS_ONLY
                // if only counting total errors, we are done - we don't 
                // report each mismatch
                errors++;
                break;
    #endif
             }
             // check receiver 2 got the correct data
             if (send_buff[i] != rcv2_buff[i]) {
                printf("send[%3d]=0x%08X != rcv2[%3d]=0x%08X\n",  
                       i, send_buff[i], i, rcv2_buff[i]);
                _waitms(5);
    #ifdef TOTAL_ERRORS_ONLY
                // if only counting total errors, we are done - we don't 
                // report each mismatch
                errors++;
                break;
    #endif
             }
             // check receiver 3 got the correct data
             if (send_buff[i] != rcv3_buff[i]) {
                printf("send[%3d]=0x%08X != rcv3[%3d]=0x%08X\n",  
                       i, send_buff[i], i, rcv3_buff[i]);
                _waitms(5);
    #ifdef TOTAL_ERRORS_ONLY
                // if only counting total errors, we are done - we don't 
                // report each mismatch
                errors++;
                break;
    #endif
             }
             // check receiver 4 got the correct data
             if (send_buff[i] != rcv4_buff[i]) {
                printf("send[%3d]=0x%08X != rcv4[%3d]=0x%08X\n",  
                       i, send_buff[i], i, rcv4_buff[i]);
                _waitms(5);
    #ifdef TOTAL_ERRORS_ONLY
                // if only counting total errors, we are done - we don't 
                // report each mismatch
                errors++;
                break;
    #endif
             }
          }
    
          // update and print statistics
          transfers++;
          printf("Total errors = %d (from %d transfers)\n", errors, transfers); 
          RELEASE(lock);
    
          // re-initialize the arrays for the next transfer
          for (i = 0; i < NUM_LONGS; i++) {
             rcv1_buff[i] = 0;
             rcv2_buff[i] = 0;
             rcv3_buff[i] = 0;
             rcv4_buff[i] = 0;
          }
       }
    
    #endif
    
    }
    
    

    More to come!

  • RossHRossH Posts: 5,462

    I've updated my "Propeller2Propeller" bus test program (now called p2p.c) to add the ability for a Propeller to either be a 'sender', a 'receiver', or a 'transceiver' which alternates between sending and receiving (which makes it a more realistic test).

    Also, I had some failures with transferring synchronous blocks of 20,000 longs so I have wound it back to 10,000 longs at a time for the moment. I think this is due to slight differences between the Propeller clocks. Larger transfers will need to be done in multiple synchronous blocks anyway, but eventually I might add some code to auto detect the maximum block size so the user doesn't need to configure it.

    The next step is to add a higher level protocol to allow multiple Propellers to share the bus and send and receive without bus contention.

    Also, I intend to make it configurable whether the P2P bus is 8, 16 or 32 bits wide.

    Ross.

    c
    c
    18K
  • evanhevanh Posts: 15,912
    edited 2023-03-12 05:02

    PASM(" andn dira, ##$FFFFFFFF\n");

    can be PASM(" and dira, #0\n");
    or PASM(" mov dira, #0\n");

    or dira, ##$FFFFFFFF

    can be not dira, #0
    or neg dira, #1
    or bmask dira, #31

  • RossHRossH Posts: 5,462

    @evanh said:

    PASM(" andn dira, ##$FFFFFFFF\n");

    can be PASM(" and dira, #0\n");
    or PASM(" mov dira, #0\n");

    or dira, ##$FFFFFFFF

    can be not dira, #0
    or neg dira, #1
    or bmask dira, #31

    True, but eventually I will allow for 8, 16 or 32 bit bus configurations using any combination of the 4 bytes inn the port, so I was keeping it straightforward - i.e. you set the corresponding bit to 1 to include that bit in the bus. Also, this means I can eventually just define one mask to represent the bus bits and use it everywhere.

  • evanhevanh Posts: 15,912
    edited 2023-03-12 08:15

    There is also dirl #basepin | 7<<6 and dirh #basepin | 7<<6 for eight consecutive pins. And this also works for port B, although you can't straddle both ports in one op.

    PS: The previous 32-bit ops can also be dirl #0 | 31<<6 and dirh #0 | 31<<6 respectively.

  • RossHRossH Posts: 5,462

    @evanh said:
    There is also dirl #basepin | 7<<6 and dirh #basepin | 7<<6 for eight consecutive pins. And this also works for port B, although you can't straddle both ports in one op.

    PS: The previous 32-bit ops can also be dirl #0 | 31<<6 and dirh #0 | 31<<6 respectively.

    The P2 has more possibilities than I have had hot dinners! :)

  • evanhevanh Posts: 15,912

    Oops, those "PS:" don't encode without ##. #7<<6 is the largest immediate.

Sign In or Register to comment.