Simple question about SimpleIDE and fdserial @ 155200 baud

KeithE · 2016-03-09 21:45

I have a simple question about using a QuickStart board to pass through serial data. The code below isn't fast enough to keep up at 115 kbaud, so some serial corruption can occur. Is there a simple change that could make it fast enough? I believe that the default memory model is CMM, and the optimization is -Os Size. Maybe changing those settings is enough, but perhaps it requires a change away from fdserial? I haven't been using Propellers much recently but I know that fdserial has been a subject of discussion over the years. I'm working with some remote engineers who have never used Propellers so SimpleIDE seemed like a good choice. I can see if I can come up with a good way to detect serial corruption (e.g. getting statistics based on file transfers to an embedded device's U-Boot), and then try various solutions. Nothing else is running on the Propeller, so all resources are available. Using cogs inefficiently is o.k. if it makes this easier to get working - e.g. doesn't require installing a new serial object.

For anyone wondering why I would do this, it started out with different baud rates for the host and device sides to deal with an embedded device which used a non-standard baud rate. And then once communications are established the device was reprogrammed to work at 115 kbaud. And then it's just convenient to leave the QuickStart in place to do some testing. But this testing is automated and hits the serial port a lot harder than the initial traffic. The host side is using the on-board FTDI chip to connect to USB.

#include "simpletools.h"
#include "fdserial.h"

int main() {
  fdserial *host;
  fdserial *device;
  int c;
  
  simpleterm_close();
  host   = fdserial_open(31, 30, 0, 115200);
  device = fdserial_open(23, 22, 0, 115200);

  while(1)
  {
    c = fdserial_rxCheck(host);
    if (c != -1) {
      fdserial_txChar(device, c);
    }
          
    c = fdserial_rxCheck(device);
    if (c != -1) {
      fdserial_txChar(host, c);
    }      
  }  
}

JasonDorie · 2016-03-09 22:04

Recompiling as XMM would likely help a lot. CMM is faster than Spin by anywhere from 2x to 4x, but FDSerial itself has a lot of internal overhead, which will slow it down.

Since you know that your transmit rate is limited to 115,200 by the source, you could re-implement fdserial_txChar to skip buffer checks, and that would speed things up a lot.

Here's the internals of txChar:

int fdserial_txChar(fdserial *term, int txbyte)
{
  int rc = -1;
  volatile fdserial_st* fdp = (fdserial_st*) term->devst;
  volatile char* txbuf = (volatile char*) fdp->buffptr + FDSERIAL_BUFF_MASK+1;

  while(fdp->tx_tail == ((fdp->tx_head+1) & FDSERIAL_BUFF_MASK))
      ; // wait for queue to be empty
  txbuf[fdp->tx_head] = txbyte;
  fdp->tx_head = (fdp->tx_head+1) & FDSERIAL_BUFF_MASK;
  if(fdp->mode & FDSERIAL_MODE_IGNORE_TX_ECHO)
      rc = fdserial_rxChar(term); // why not rxcheck or timeout ... this blocks for char
  return rc;
}

If you locally cache the FDP pointer in your own code, you can directly manipulate the buffer yourself, instead of calling the function to do it for you, and that saves a function call with all the push/pop state save/restore that comes with it, and the pointer de-reference. If you cache the buffer pointer, you avoid the dereference and the add. Remove the check for the ignore_echo too.

And then, after all that, you could skip the buffer wait. You'll end up with this:

#include "simpletools.h"
#include "fdserial.h"

void MyTxChar( volatile fdserial_st * fdp, volatile char * txbuf , int txByte )
{
  // Remove or comment these two lines to skip the buffer wait
  while(fdp->tx_tail == ((fdp->tx_head+1) & FDSERIAL_BUFF_MASK))
      ; // wait for queue to be empty

  // put the new character in the transmit buffer
  txbuf[fdp->tx_head] = txbyte;

  // update the buffer head index
  fdp->tx_head = (fdp->tx_head+1) & FDSERIAL_BUFF_MASK;
}

int main() {
  fdserial *host;
  fdserial *device;
  int c;

  simpleterm_close();
  host   = fdserial_open(31, 30, 0, 115200);
  device = fdserial_open(23, 22, 0, 115200);

  volatile fdserial_st* deviceFdp= (fdserial_st*) device->devst;
  volatile char* deviceBuf= (volatile char*) device->buffptr + FDSERIAL_BUFF_MASK+1;

  volatile fdserial_st* hostFdp= (fdserial_st*) host->devst;
  volatile char* hostBuf= (volatile char*) host->buffptr + FDSERIAL_BUFF_MASK+1;


  while(1)
  {
    c = fdserial_rxCheck(host);
    if (c != -1) {
      MyTxChar(deviceFdp, deviceBuf, c);
    }
          
    c = fdserial_rxCheck(device);
    if (c != -1) {
      MyTxChar(hostFdp, hostBuf, c);
    }      
  }  
}

I haven't compiled this, but I have done exactly this on my own to get transmit times to be quicker. You can do similar things with the receive code - the rxCheck call is basically just a "is the head the same as the tail" check, so putting that inline would eliminate a bunch of overhead too.

Ariba · 2016-03-10 00:21

If the Propeller just needs to pass through the signals then you can set up two hardware counters with a feedback mode. The feedback mode outputs the inverted state of the PINA input pin at the PinB output pin.
Because this is only possible with inverting the signals, you need two additional pins and do this 2 times so the passed through signal is not inverted.
So it needs 4 counters and therefore a second cog. Here is an untested example:

#include "propeller.h"

volatile int stack[50];

void cog2()
{
   DIRA = (1<<30) + (1<<17);         // outputs
   CTRA = (9<<26) + (17<<9) + 23;    // device RX -> /P17
   CTRB = (9<<26) + (30<<9) + 17;    // /P17 -> host TX
   while(1) ;
}

int main()
{
   cogstart(&cog2,0,stack,50);       // start second cog for 2 more counters
   DIRA = (1<<22) + (1<<16);         // outputs
   CTRA = (9<<26) + (16<<9) + 31;    // host RX -> /P16
   CTRB = (9<<26) + (22<<9) + 16;    // /P16 -> device TX
   while(1) ;
}

The cogs just wait in a loop, all the work is done by the counters. The sampling frequency is the clock frequency (80 MHz) and there is only one clock cycle delay (12.5ns).

Andy

JasonDorie · 2016-03-10 00:37

That's pretty damn clever.

DavidZemon · 2016-03-10 00:59

JasonDorie wrote: »

Recompiling as XMM would likely help a lot.

I think you mean LMM

KeithE · 2016-03-10 02:38

Thanks guys - I figured LMM as well. There are a couple of typos in the fdserial example but they are pretty obvious. e.g. device->buffptr should be deviceFdp->buffptr

I can try Ariba's solution too, because for this case it's really just emulating a wire. And supporting higher baud rates would be great.

I'm playing around with binary kermit transfers first such that I can validate performance improvements. I'm starting with an FTDI cable to show that it's reliable, and then will switch back to the Propeller. (I think that someone here bought some counterfeit FTDI cables from Amazon too - nothing related to Parallax, but it just adds to the confusion)

JasonDorie · 2016-03-10 05:39

DavidZemon wrote: »

JasonDorie wrote: »

Recompiling as XMM would likely help a lot.

I think you mean LMM

Ooops - Yes, that is exactly what I meant.

KeithE · 2016-03-10 18:00

Andy - thank you for posting that code. This method is working just fine. Thanks to Jason too, perhaps I will use those ideas too for other cases where Propeller needs to do something with the data.

Simple question about SimpleIDE and fdserial @ 155200 baud

Comments