Okay! My simple Prop-to-Prop cables arrived this week, so I have connected a P2_EVAL to a P2_EDGE, and here is my initial test program - simple but reliable synchronous transfers using a 32 bit parallel bus between two or more Propellers at 32Mb/s ...
/*
* Program to test how fast and reliably bus read/writes can be done using a
* simple synchronous parallel bus connecting two or more Propellers.
*
* There can be only one sender cog at a time, but can be multiple receiver
* cogs. The intention is to start one sender cog on one Propeller, and
* multiple receiver cogs on the other Propellers. This program supports
* four receivers on a single Propeller.
*
* The Propellers must be connected pin to pin on pins 00 .. 31 (i.e. port A).
* If you have more than two Propellers connected, you could start additional
* receiver cogs on the other Propellers.
*
* This is a synchronous transfer, so how much data can be transferred depends
* on how closely the respective Propeller clocks are synchronized. The crystal
* accuracy is typically +/-0.5 PPM. At 25 clocks per long, this means the
* clocks may be out of sync by up one clock after 40,000 longs. A suitable
* maximum size for a single synchronous transfer might therefore be 20,000
* longs, which is why this value is used by this program. Larger transfers
* can be performed by doing multiple smaller transfers (which is the point
* of this test program!).
*
* The program uses P2 NATIVE PASM, so it must be compiled in P2 NATIVE mode
* (which is the default mode for the Propeller 2).
*
* To maximize available cogs, add -C NO_MOUSE and -C NO_FLOAT (and -C SIMPLE
* if using a serial HMI). However in this simple-minded program, buffer space
* is likely to be the limiting factor, not cogs.
*
* To build as a sender, add -C SENDER
*
* To build as a receiver, add -C RECEIVER
*
* For example, compile with a command like:
*
* catalina -p2 -lci p2_bus.c -o sender -C NO_MOUSE -C NO_FLOAT -C SENDER
* or
* catalina -p2 -lci p2_bus.c -o receiver -C NO_MOUSE -C NO_FLOAT -C RECEIVER
*
* Then load and execute with commands like:
*
* payload sender -PN -i
* payload receiver -PM -i
*
* where N & M are the Propeller ports to use for the sender and receiver,
* respectively (add more 'receiver' commands if there are more than two
* Propellers on the bus).
*/
#if !defined (__CATALINA_SENDER) && !defined(__CATALINA_RECEIVER)
#error EITHER SENDER OR RECEIVER MUST BE DEFINED!
#endif
#include <stdio.h>
#include <stdlib.h>
#include <catalina_cog.h>
#include <catalina_plugin.h>
#define XFER_DELAY 7 // clocks per long is 18+this (7 works!)
#define NUM_LONGS 20000 // number of longs to transfer (20000 works!)
#define STACK_SIZE 500 // size of cog stack (stdio requires 500)
#define TOTAL_ERRORS_ONLY // print only total errors, not details
static unsigned long start = 0; // clock count used to synchronize cogs
static int lock = 0; // lock to protect I/O
static unsigned long *send_buff; // data to be sent
static unsigned long *rcv1_buff; // data received (1)
static unsigned long *rcv2_buff; // data received (2)
static unsigned long *rcv3_buff; // data received (3)
static unsigned long *rcv4_buff; // data received (4)
/*
* sync - synchronize multiple cogs to start on a specific clock count
*
* 'start' should be set to a clock count some time in the
* future - e.g. _cnt() + _clockfreq() for one second
*/
int sync(unsigned long start) {
return PASM (
" getct r0\n"
" sub r2, r0\n"
" waitx r2\n"
" getct r0\n"
);
}
/*
* send - pasm code to write a number of longs to the 32 bit bus
* Note: sending starts immediately, and then sends a new long
* every 'time'+18 ticks.
* Note: loads the code into LUT RAM and executes it there.
*
* 'time' (passed in r4) is the time between writes in clocks (+18)
* 'buff' (passed in r3) is an array holding longs to send
* 'size' (passed in r2) is the size of the array
*/
int send(int time, void *buff, int size) {
return PASM (
// set pins 00 .. 31 as outputs
" mov outa, #0\n"
" or dira,##$FFFFFFFF\n"
// load LUT RAM:
" setq2 #(send_end - send_start - 1)\n"
" rdlong 0, ##@send_start\n"
// jump to code in LUT RAM:
" jmp #send_start\n"
// code to be executed in LUT RAM:
" org $200\n"
"send_start\n"
" getct r0\n" // LUT: 2 (clocks)
"\n"
" rep #7, r2\n" // LUT: 2
" addct1 r0, #18\n" // LUT: 2
" rdlong r1, r3\n" // LUT: 9 .. 16
" waitct1\n" // LUT: 2
" add r3, #4\n" // LUT: 2
" mov outa, r1\n" // LUT: 2
" addct1 r0, r4\n" // LUT: 2
" waitct1\n" // LUT: 2
"\n"
" addct1 r0, r4\n"
" addct1 r0, #18\n"
" waitct1\n"
" mov outa, #0\n"
" getct r0\n"
" jmp #send_cont\n"
"send_end\n"
// resume Hub Execution:
" orgh\n"
"send_cont\n"
);
}
/*
* sender - send an array of longs to one or more receivers
*
* 'buff' is an array of NUM_LONGS longs.
*/
void sender(void *buff) {
unsigned int started, stopped, total;
int me = _cogid();
started = _cnt();
stopped = send(XFER_DELAY, buff, NUM_LONGS);
total = stopped - started;
ACQUIRE(lock);
printf("send (cog %d) took %d clocks (%d per long)\n\n",
me, total, total/NUM_LONGS);
RELEASE(lock);
while(1); // don't exit
}
/*
* recv - pasm code to read a number of longs from the 32 bit bus.
* Note: receiving starts 'time' clock ticks after any non-zero
* value is detected on the bus, and then reads a new long
* every 'time'+18 ticks.
* Note: loads the code into LUT RAM and executes it there.
*
* 'time' (passed in r4) is the time between reads in clocks (+18)
* 'buff' (passed in r3) is an array to hold longs received
* 'size' (passed in r2) is the size of the array
*/
int recv(int time, void *buff, int size) {
return PASM (
// set pins 00 .. 31 as inputs
" andn dira,##$FFFFFFFF\n"
// load LUT RAM:
" setq2 #(recv_end - recv_start - 1)\n"
" rdlong 0, ##@recv_start\n"
// jump to code in LUT RAM:
" jmp #recv_start\n"
// code to be executed in LUT RAM:
" org $200\n"
"recv_start\n"
" mov r1, ina\n" // LUT: 2 (clocks)
" cmp r1, #0 wz\n" // LUT: 2
" if_z jmp #recv_start\n" // LUT: 4
" getct r0\n" // LUT: 2
"\n"
" rep #7, r2\n" // LUT: 2
" addct1 r0, #18\n" // LUT: 2
" wrlong r1, r3\n" // LUT: 3 .. 10
" waitct1\n" // LUT: 2
" add r3, #4\n" // LUT: 2
" addct1 r0, r4\n" // LUT: 2
" waitct1\n" // LUT: 2
" mov r1, ina\n" // LUT: 2
"\n"
" getct r0\n"
" jmp #recv_cont\n"
"recv_end\n"
// resume Hub Execution:
" orgh\n"
"recv_cont\n"
);
}
/*
* receiver- receive an array of longs from the 32 bit bus
*
* 'buff' is an array of NUM_LONGS longs.
*/
void receiver(void *buff) {
unsigned int started, stopped;
int me = _cogid();
started = sync(start);
stopped = recv(XFER_DELAY, buff, NUM_LONGS);
ACQUIRE(lock);
printf("recv (cog %d) started at clock 0x%08x\n", me, started);
RELEASE(lock);
while(1); // don't exit
}
void main(void) {
unsigned long i;
long send_stack[STACK_SIZE];
long rcv1_stack[STACK_SIZE];
long rcv2_stack[STACK_SIZE];
long rcv3_stack[STACK_SIZE];
long rcv4_stack[STACK_SIZE];
int send_cog;
int rcv1_cog;
int rcv2_cog;
int rcv3_cog;
int rcv4_cog;
int errors = 0;
int transfers = 0;
// assign a lock to be used to avoid plugin contention
lock = _locknew();
// give the vt100 emulator (if used) a chance to start
#ifdef __CATALINA_VT100
_waitms(500);
#endif
// allocate the arrays (we allocate them all whether we are a sender
// or a receiver)
send_buff = malloc(NUM_LONGS*4);
rcv1_buff = malloc(NUM_LONGS*4);
rcv2_buff = malloc(NUM_LONGS*4);
rcv3_buff = malloc(NUM_LONGS*4);
rcv4_buff = malloc(NUM_LONGS*4);
// initialize the arrays
for (i = 0; i < NUM_LONGS; i++) {
send_buff[i] = i+1; // can be anything, but must be non-zero
rcv1_buff[i] = 0;
rcv2_buff[i] = 0;
rcv3_buff[i] = 0;
rcv4_buff[i] = 0;
}
#ifdef __CATALINA_SENDER
// set all bus pins to zero
PASM(
" mov outa, #0\n"
" or dira,##$FFFFFFFF\n"
);
// start ONE sender
k_clear();
ACQUIRE(lock);
printf("SENDER (Clock = %lu Hz)\n\n", _clockfreq());
printf("Start the receiver cogs in the Receiver program,\n");
printf("then press any key to start sender cog\n");
RELEASE(lock);
k_wait();
// keep transferring forever
while (1) {
ACQUIRE(lock);
printf("\nStarting sender cog ...\n\n");
RELEASE(lock);
send_cog = _cogstart_C(&sender, send_buff, send_stack, STACK_SIZE);
// give the sender a chance to send (3 seconds is generous!)
_waitms(3000);
ACQUIRE(lock);
printf("... done\n");
RELEASE(lock);
// cancel the sender (we restart it again for each transfer)
_cogstop(send_cog);
// update and print statistics
transfers++;
ACQUIRE(lock);
printf("\nTotal %d transfers\n",transfers);
RELEASE(lock);
}
#endif
#ifdef __CATALINA_RECEIVER
ACQUIRE(lock);
printf("RECEIVER (Clock = %lu Hz)\n\n", _clockfreq());
printf("Press any key to start receiver cogs, then start the sender\n");
printf("cog in the sender program\n");
RELEASE(lock);
k_wait();
ACQUIRE(lock);
printf("Starting receiver cogs ...\n\n");
RELEASE(lock);
while (1) {
// set a start time for the receiver cogs to use in the sync function
start = _cnt() + _clockfreq(); // set start time for +1 seconds
// start FOUR receiver cogs
rcv1_cog = _cogstart_C(&receiver, rcv1_buff, rcv1_stack, STACK_SIZE);
rcv2_cog = _cogstart_C(&receiver, rcv2_buff, rcv2_stack, STACK_SIZE);
rcv3_cog = _cogstart_C(&receiver, rcv3_buff, rcv3_stack, STACK_SIZE);
rcv4_cog = _cogstart_C(&receiver, rcv4_buff, rcv4_stack, STACK_SIZE);
ACQUIRE(lock);
printf("... done\n");
RELEASE(lock);
// give receiver a chance to receive
_waitms(1000);
// wait till receivers complete
while (
(rcv1_buff[NUM_LONGS - 1] == 0)
&& (rcv2_buff[NUM_LONGS - 1] == 0)
&& (rcv3_buff[NUM_LONGS - 1] == 0)
&& (rcv4_buff[NUM_LONGS - 1] == 0)
) {
_waitms(1000);
}
// terminate the receiver cogs (we restart them again for each transfer)
_cogstop(rcv1_cog);
_cogstop(rcv2_cog);
_cogstop(rcv3_cog);
_cogstop(rcv4_cog);
// check the results
ACQUIRE(lock);
printf("\nChecking data ...\n");
for (i = 0; i < NUM_LONGS; i++) {
// check receiver 1 got the correct data
if (send_buff[i] != rcv1_buff[i]) {
printf("send[%3d]=0x%08X != rcv1[%3d]=0x%08X\n",
i, send_buff[i], i, rcv1_buff[i]);
_waitms(5);
#ifdef TOTAL_ERRORS_ONLY
// if only counting total errors, we are done - we don't
// report each mismatch
errors++;
break;
#endif
}
// check receiver 2 got the correct data
if (send_buff[i] != rcv2_buff[i]) {
printf("send[%3d]=0x%08X != rcv2[%3d]=0x%08X\n",
i, send_buff[i], i, rcv2_buff[i]);
_waitms(5);
#ifdef TOTAL_ERRORS_ONLY
// if only counting total errors, we are done - we don't
// report each mismatch
errors++;
break;
#endif
}
// check receiver 3 got the correct data
if (send_buff[i] != rcv3_buff[i]) {
printf("send[%3d]=0x%08X != rcv3[%3d]=0x%08X\n",
i, send_buff[i], i, rcv3_buff[i]);
_waitms(5);
#ifdef TOTAL_ERRORS_ONLY
// if only counting total errors, we are done - we don't
// report each mismatch
errors++;
break;
#endif
}
// check receiver 4 got the correct data
if (send_buff[i] != rcv4_buff[i]) {
printf("send[%3d]=0x%08X != rcv4[%3d]=0x%08X\n",
i, send_buff[i], i, rcv4_buff[i]);
_waitms(5);
#ifdef TOTAL_ERRORS_ONLY
// if only counting total errors, we are done - we don't
// report each mismatch
errors++;
break;
#endif
}
}
// update and print statistics
transfers++;
printf("Total errors = %d (from %d transfers)\n", errors, transfers);
RELEASE(lock);
// re-initialize the arrays for the next transfer
for (i = 0; i < NUM_LONGS; i++) {
rcv1_buff[i] = 0;
rcv2_buff[i] = 0;
rcv3_buff[i] = 0;
rcv4_buff[i] = 0;
}
}
#endif
}
I've updated my "Propeller2Propeller" bus test program (now called p2p.c) to add the ability for a Propeller to either be a 'sender', a 'receiver', or a 'transceiver' which alternates between sending and receiving (which makes it a more realistic test).
Also, I had some failures with transferring synchronous blocks of 20,000 longs so I have wound it back to 10,000 longs at a time for the moment. I think this is due to slight differences between the Propeller clocks. Larger transfers will need to be done in multiple synchronous blocks anyway, but eventually I might add some code to auto detect the maximum block size so the user doesn't need to configure it.
The next step is to add a higher level protocol to allow multiple Propellers to share the bus and send and receive without bus contention.
Also, I intend to make it configurable whether the P2P bus is 8, 16 or 32 bits wide.
can be PASM(" and dira, #0\n");
or PASM(" mov dira, #0\n");
or dira, ##$FFFFFFFF
can be not dira, #0
or neg dira, #1
or bmask dira, #31
True, but eventually I will allow for 8, 16 or 32 bit bus configurations using any combination of the 4 bytes inn the port, so I was keeping it straightforward - i.e. you set the corresponding bit to 1 to include that bit in the bus. Also, this means I can eventually just define one mask to represent the bus bits and use it everywhere.
There is also dirl #basepin | 7<<6 and dirh #basepin | 7<<6 for eight consecutive pins. And this also works for port B, although you can't straddle both ports in one op.
PS: The previous 32-bit ops can also be dirl #0 | 31<<6 and dirh #0 | 31<<6 respectively.
@evanh said:
There is also dirl #basepin | 7<<6 and dirh #basepin | 7<<6 for eight consecutive pins. And this also works for port B, although you can't straddle both ports in one op.
PS: The previous 32-bit ops can also be dirl #0 | 31<<6 and dirh #0 | 31<<6 respectively.
The P2 has more possibilities than I have had hot dinners!
Comments
Okay! My simple Prop-to-Prop cables arrived this week, so I have connected a P2_EVAL to a P2_EDGE, and here is my initial test program - simple but reliable synchronous transfers using a 32 bit parallel bus between two or more Propellers at 32Mb/s ...
More to come!
I've updated my "Propeller2Propeller" bus test program (now called p2p.c) to add the ability for a Propeller to either be a 'sender', a 'receiver', or a 'transceiver' which alternates between sending and receiving (which makes it a more realistic test).
Also, I had some failures with transferring synchronous blocks of 20,000 longs so I have wound it back to 10,000 longs at a time for the moment. I think this is due to slight differences between the Propeller clocks. Larger transfers will need to be done in multiple synchronous blocks anyway, but eventually I might add some code to auto detect the maximum block size so the user doesn't need to configure it.
The next step is to add a higher level protocol to allow multiple Propellers to share the bus and send and receive without bus contention.
Also, I intend to make it configurable whether the P2P bus is 8, 16 or 32 bits wide.
Ross.
can be
PASM(" and dira, #0\n");
or
PASM(" mov dira, #0\n");
can be
not dira, #0
or
neg dira, #1
or
bmask dira, #31
True, but eventually I will allow for 8, 16 or 32 bit bus configurations using any combination of the 4 bytes inn the port, so I was keeping it straightforward - i.e. you set the corresponding bit to 1 to include that bit in the bus. Also, this means I can eventually just define one mask to represent the bus bits and use it everywhere.
There is also
dirl #basepin | 7<<6
anddirh #basepin | 7<<6
for eight consecutive pins. And this also works for port B, although you can't straddle both ports in one op.PS: The previous 32-bit ops can also be
dirl #0 | 31<<6
anddirh #0 | 31<<6
respectively.The P2 has more possibilities than I have had hot dinners!
Oops, those "PS:" don't encode without ##. #7<<6 is the largest immediate.