Shop OBEX P1 Docs P2 Docs Learn Events
cogserial - fullduplex smart serial using interrupt — Parallax Forums

cogserial - fullduplex smart serial using interrupt

msrobotsmsrobots Posts: 3,701
edited 2019-02-19 02:50 in Propeller 2
This is the first running Version, not really optimized.

It is a replacement for smartserial in spin2gui. I runs on its own cog and uses lut as send and receive buffer.

Currently I am saving bytes in longs giving 255 bytes send buffer and 255 bytes receive buffer. This will be optimized soon and either gives 4 times the buffer or 2 pairs of 2 times but for two pairs of send/receive. Not sure yet.

I am using a interrupt for receiving, but was not able yet to use a interrupt for sending thus sends are done by the cog itself.

It has a extended start method StartEx where one can also provide the smartpin selector mode and baud rate for each send and receive channel, thus it is possible to send at say 115_200 and receive at 230_400 for example.

Besides the Spin stub the cogSerial driver needs a 4 long long mailbox to communicate, like JDserial it is easy usable from assembler. Or it will be.

Because for some #@$@#%$@ reason I can not get fastspin to start my cog with a mailbox address as parameter and have to manually patch the needed parameter in my start method into the DAT block before loading the COG.

There is still some work to do to make it nicer, but as is it works.

I have no clue as how fast it will work, I am just testing it from fastspins terminal right now

Enjoy!

Mike
«13456

Comments

  • msrobots wrote: »
    Besides the Spin stub the cogSerial driver needs a 4 long long mailbox to communicate, like JDserial it is easy usable from assembler. Or it will be.

    Because for some #@$@#%$@ reason I can not get fastspin to start my cog with a mailbox address as parameter and have to manually patch the needed parameter in my start method into the DAT block before loading the COG.
    [/code]

    I don't think this is fastspin's fault. I've tested passing parameters to assembly functions and it seems to work. Could you provide some more context? I've attached a demo program to show how parameter passing works. It looks pretty similar to what you did, although I don't think you have any synchronization. Perhaps there's a timing problem?

    Also, I notice you did a coginit(1,...) rather than cognew(...). Is there a reason you hard-coded the COG to use to 1?

    Thanks,
    Eric
  • hmm, the coginit is just a leftover of some tests, should be a cognew.

    I am not sure while parameter passing thru ptra does not work, is there maybe a problem with using @varname in sub objects?

    I sort of got stuck there and just patched the values to get it running, will examine further today.

    At least it is working

    Enjoy!

    Mike
  • msrobots wrote: »
    I am not sure while parameter passing thru ptra does not work, is there maybe a problem with using @varname in sub objects?
    Nope, I moved the remotecog method into its own object (with the mbox variable in that object) and it worked fine.

    If you're still having trouble with mailboxes post the mailbox version of your code and we can take a look at it (many eyes make bugs easier to find :) ).

  • I appear to be having the same problem with ptra. I may try the same with @varname.

    Is there any documentation for fastspin2?
  • ke4pjw wrote: »
    I appear to be having the same problem with ptra. I may try the same with @varname.
    Could you post your code? Is it much different from the example I posted earlier (which does work, I've tested it).
    Is there any documentation for fastspin2?

    In the fastspin docs/ folder, which gets copied to the spin2gui doc/ folder. For Spin specifically there's a spin.md. Mainly it covers differences between "standard" Spin and fastspin.

  • msrobotsmsrobots Posts: 3,701
    edited 2019-01-24 22:40
    ersmith wrote: »
    msrobots wrote: »
    Besides the Spin stub the cogSerial driver needs a 4 long long mailbox to communicate, like JDserial it is easy usable from assembler. Or it will be.

    Because for some #@$@#%$@ reason I can not get fastspin to start my cog with a mailbox address as parameter and have to manually patch the needed parameter in my start method into the DAT block before loading the COG.
    [/code]

    I don't think this is fastspin's fault. I've tested passing parameters to assembly functions and it seems to work. Could you provide some more context? I've attached a demo program to show how parameter passing works. It looks pretty similar to what you did, although I don't think you have any synchronization. Perhaps there's a timing problem?

    Also, I notice you did a coginit(1,...) rather than cognew(...). Is there a reason you hard-coded the COG to use to 1?

    Thanks,
    Eric

    I had to read your post and code twice but still no joy. I am not sure about synchronization and I do not see what dummy is doing. Maybe I am still to tiered after night shift.

    The only difference I can make out is that you are starting a spin cog and I am starting a assembler cog. I have some time right now and will investigate further.

    Edit - I see now what you did for synchronization, need to think about it.

    Enjoy!

    Mike
  • ersmithersmith Posts: 5,900
    edited 2019-01-24 23:32
    msrobots wrote: »
    I had to read your post and code twice but still no joy. I am not sure about synchronization and I do not see what dummy is doing. Maybe I am still to tiered after night shift.
    The "dummy" is the variable in the first rdlong in pasmtest? That doesn't do anything, it's just skipping over the initial sync value. I threw together the example pretty quickly so I didn't add very many comments, sorry. But basically the mailbox has 3 longs. The first one (read as "dummy") always starts as a non-zero value. The assembler code writes 0 back to it to indicate that it has finished reading the mailbox and the Spin can proceed. The second mbox entry is a pointer to a variable. The third is the value to place in the variable. The assembly code reads those 3 mailbox entries, then writes the value to the pointer, then writes 0 back to the first element of the mailbox. The Spin code waits for that first mailbox entry to be 0 (to indicate that the assembly code has finished with the mailbox contents).

    The writing 0 back to mbox[0] could happen at any time after the assembly has read everything it needs to out of the mailbox. It's just a device to make sure that the Spin code doesn't get ahead of the assembly and overwrite the mailbox while the PASM is still using it. In P1 Spin nobody much worried about that because Spin was so much slower than PASM. But with fastspin it can matter, because the Spin COG is itself running PASM "under the hood", so it's just about as fast as the PASM COG.
    The only difference I can make out is that you are starting a spin cog and I am starting a assembler cog. I have some time right now and will investigate further.

    I think you'll have to look a bit closer. The pasmtest.spin2 code I posted above is in fact starting assembler (the PASM code it starts is in the DAT section and starts with the label "asmfunc"). I have started Spin code in other demos. But actually in fastspin there's really not much difference -- it's all assembly code once the compiler is done with converting the Spin into PASM.
  • @ersmith,

    YES, this was exactly part of my problem.

    But mostly I stumbled over a difference between P1 and P2.

    I wasn't interested in the value behind ptra, but of its address. so I did not do a rdlong cogptr, ptra but tried a mov cogptr, ptra.

    And that did not work at all. And I still don't know why.

    But your version got me to a very clean approach. I do create a local array in my start function and populate it with the needed parameter for my cog. Then I set a sync value, start the cog and wait until it has read the parameters and clears my sync value, so that the start function can return and destroy the local array.

    The nice thing is I currently need just 4 longs in the HUB for my mailbox, but have 6 parameters to feed at start. Using a local array for the start parameters solves this problem also.

    So there is no @%#@@#@% problem with fastspin, the problem was sitting 2 foot away from the monitor.

    I need to clean up the code a bit and will post a nicer version.

    Thanks,

    Mike
  • Hmmm, doing more testing I did find a bug in fastspin 3.9.15 in doing coginit() of a Spin (or BASIC) method. The memory for _clkmode and _clkfreq was overlapping some of the initialization code, which could sometimes cause the Spin method on the new COG to not start up properly. This bug was introduced when I added the -H and -E flags, so it wasn't in 3.9.14. It shouldn't affect PASM coginit though, only Spin coginit (and BASIC's "cpu" keyword applied to BASIC functions).

    I'll have a fix for it in the next release.
  • @ersmith,

    everybody here is complaining about the missing tools development, I on the other hand have basically to reload all my tools every week or so because you and @"Dave Hein" and @Rayman are pushing out changes faster as I update my tools.

    I think my problem was that I assumed I can read ptra at start of my program just like I used to on the P1

    on the P1 I can do
    		rdlong myvar, par
    
    		or
    
    		mov myptr, par
    		rdlong myvar, myptr
    
    on the P2 it does not work like this, and I think it should
    
    If I do a
    
    		rdlong myvar, ptra
    it works
    but trying to do
    
    		mov myptr, ptra
    		rdlong myvar, myptr
    
    fails for reasons I do not understand
    
    

    But as stated before your approach with a local array looks quite more clean and works.

    attached now a slightly better version.

    Enjoy!

    Mike
  • Progress. By changing cognew to coginit, my code now has a pulse.

    okay := cog := coginit(1,@loop,@command) + 1
    ' okay := cog := cognew(@loop, @command) + 1

    I don't know why that would make any difference.

    Thoughts?
  • ersmith wrote: »
    Hmmm, doing more testing I did find a bug in fastspin 3.9.15 in doing coginit() of a Spin (or BASIC) method. The memory for _clkmode and _clkfreq was overlapping some of the initialization code, which could sometimes cause the Spin method on the new COG to not start up properly. This bug was introduced when I added the -H and -E flags, so it wasn't in 3.9.14. It shouldn't affect PASM coginit though, only Spin coginit (and BASIC's "cpu" keyword applied to BASIC functions).

    I'll have a fix for it in the next release.

    Do you know if there is a similar issue with cognew?
  • Here now the next version of cogserial.

    Fighting with the interrupts I decided to switch strategy. The driver now supports two pairs of full duplex serial channels, but you can just use it as a single driver using just one pair. If using 2 pairs of full duplex serial channels the buffer size for each channel is halved, currently 128 byte per channel if 2 ports are used or 256 bytes per channel if just one port is used.

    The driver uses int1 for serial receive of rx1, int2 for serial receive on rx2 and int3 for checking transmit status and transmitting both output buffers, running on a timed base.

    Right now I just run int3 at every 500 sysclocks. This is just a test, I need to calculate something out of the bitrate of the faster transmit channel (which I basically have) to say run the interrupt as fast that it will trigger twice the time needed to catch the fastest tx. But right now it is 500 sysclocks. I have the numbers but haven't done the math, yet.

    This is work in progress, but a lot of fun for me. I need to think about a test harness running on other cogs to really stress this thing.

    But currently I have a 2 port full duplex serial buffered driver running in a cog, just needing 8 longs in HUB for communicating. This is fun...

    Mike
  • msrobots wrote: »
    I think my problem was that I assumed I can read ptra at start of my program just like I used to on the P1

    on the P1 I can do
    		rdlong myvar, par
    
    		or
    
    		mov myptr, par
    		rdlong myvar, myptr
    
    on the P2 it does not work like this, and I think it should
    

    Exactly the same thing (with ptra instead of par) *should* work on P2 -- there's no reason ptra should be treated any differently from any other register. Do you have time to play around with it a bit and figure out what's going on? Or to post the "broken" code? Most likely it's a typo or something in your code, but it would be nice to rule out a compiler or (heaven forbid) hardware bug.
  • ke4pjw wrote: »
    Progress. By changing cognew to coginit, my code now has a pulse.

    okay := cog := coginit(1,@loop,@command) + 1
    ' okay := cog := cognew(@loop, @command) + 1

    I don't know why that would make any difference.

    Thoughts?

    Is "loop" a Spin function or PASM code? If it's a Spin function then you could be running into the fastspin 3.9.15 bug I mentioned earlier; it's a memory corruption kind of thing that affects both cognew and coginit, so small code changes that seem unrelated can cause it to trigger. If "loop" is PASM code then that's not the problem.

    Otherwise coginit and cognew are pretty much the same (cognew is translated to coginit with a special first parameter that says "allocate a COG" instead of requiring a specific COG).

  • msrobots,
    Thanks for doing that code and with very good documentation. I can read it better than the original FDSR.
    I am trying to get waitcnt(clkfreq+cnt) to work.
    I assume that there is a slightly different way in this version of spin.
    I am using spin2gui. The c version uses this: waitcnt(getcnt() + CLKFREQ/2);
  • evanhevanh Posts: 15,126
    I don't know what fastspin supports but the prop2 has WAITX instruction for doing that same function of just pausing for certain number of clocks. You don't fetch the CNT value then, just supply pause duration only.
  • I am working with msrobot's cogserial and it is in spin2.
    I will try the waitx. Will it work in a spin2 file.
  • ersmithersmith Posts: 5,900
    edited 2019-01-27 22:09
    pilot0315 wrote: »
    msrobots,
    Thanks for doing that code and with very good documentation. I can read it better than the original FDSR.
    I am trying to get waitcnt(clkfreq+cnt) to work.
    I assume that there is a slightly different way in this version of spin.
    I am using spin2gui. The c version uses this: waitcnt(getcnt() + CLKFREQ/2);

    waitcnt(clkfreq+cnt) will work in fastspin and spin2gui for both P1 and P2 processors. The compiler will automatically translate it to whatever you need (a waitcnt instruction on P1, and waitct1 on P2).

    (This is for Spin code, of course. If you are writing PASM or PASM2 then you have to do the translation to waitct1 or waitx yourself).
  • I have seen what is called inline pasm. Not familiar with it. Is there a simple way to put the waitx using inline?
  • I figured it out. Been a while since I coded in spin.
    Thanks
  • ersmith wrote: »
    ke4pjw wrote: »
    Progress. By changing cognew to coginit, my code now has a pulse.

    okay := cog := coginit(1,@loop,@command) + 1
    ' okay := cog := cognew(@loop, @command) + 1

    I don't know why that would make any difference.

    Thoughts?

    Is "loop" a Spin function or PASM code? If it's a Spin function then you could be running into the fastspin 3.9.15 bug I mentioned earlier; it's a memory corruption kind of thing that affects both cognew and coginit, so small code changes that seem unrelated can cause it to trigger. If "loop" is PASM code then that's not the problem.

    Otherwise coginit and cognew are pretty much the same (cognew is translated to coginit with a special first parameter that says "allocate a COG" instead of requiring a specific COG).

    Yes, loop is the mailbox monitor in PASM. Very strange. I have no idea why coginit works and cognew does not.
  • msrobotsmsrobots Posts: 3,701
    edited 2019-01-29 03:54
    OK, some progress made.

    I am using now fastspins feature of providing standard constants to parameters. That did reduce the needed spin code a lot. Wonderful, thanks @ersmith.

    I also did a lot of commenting to keep track of what it is supposed to do, and as far as I can see it does.

    The concept here is the full duplex driver, running in its own cog, is buffering 1 or 2 serial full duplex connections using interrupts and smart pins.
    Rx for both channels is bound to int 1 and 2, Tx for both channels uses int 3 and the cog itself just takes care of the mailbox to serve the calling program.
    The driver supports async access to both pairs of rx/tx and actually reads and writes the result itself to hub so the calling cog just needs to send off commands.

    You do not need to use two ports, if not enabled the second pair will not be used.

    a couple of more days and I can slap a MIT license on it and put my name on the top. Right now it needs more documentation...

    Enjoy!

    Mike
  • Now I am stress testing the driver and find some issue with smartpins.

    I have one main cog using one serial driver to talk to the terminal. (2 COGS)
    via mailbox I start a testrunner COG running tests on a second serial driver COG (2 COGS)
    I also have a echo COG running a third serial driver COG. this one reads is RX and writes its TX.

    The testrunner clears a ram buffer, transfers 16K rom over serial and back into a ram buffer and then compares the buffer with the rom to see if its done its job correct.

    If I run the buffered driver talking to itself with 2 SPs (RX1 listening to TX1) it runs up to and fails at 90085400 baud. seems OK.
    If I run the buffered driver talking to itself with 4 SPs (RX1 listening to TX1 and RX2 listening to TX2)) it runs up to and fails at 90085400 baud. seems OK.

    Now I activate the echo server in between. So the testrunner sends on TX1, the echoserver receives on his RX1, sends out on his TX1 and the testrunner receives on RX1.

    And now it fails already at 921600 baud with one channel and when using both channels it fails already at 460800.

    All tests are running at 180Mhz and using SPs 0-7.

    What I do not understand is the drastic reduction of transfer speed when using the echo server inbetween.

    the main file to run is testserial.spin2 that will use/include the other files.

    Maybe someone can look at it if I made some stupid mistake. Or test the driver with some other tool, cogserial.spin2 uses cogserialpasm.spin2.

    I simply do not understand why echo is so slow.

    Help needed,

    Mike
  • jmgjmg Posts: 15,140
    msrobots wrote: »
    ...
    If I run the buffered driver talking to itself with 2 SPs (RX1 listening to TX1) it runs up to and fails at 90085400 baud. seems OK.
    If I run the buffered driver talking to itself with 4 SPs (RX1 listening to TX1 and RX2 listening to TX2)) it runs up to and fails at 90085400 baud. seems OK.
    you mean 90MBd ? Nice to see those speeds hit :)
    A RX up to SysCLK /2 is not going to be practical between two P2's that are not phase locked.

    msrobots wrote: »
    Now I activate the echo server in between. So the testrunner sends on TX1, the echoserver receives on his RX1, sends out on his TX1 and the testrunner receives on RX1.

    And now it fails already at 921600 baud with one channel and when using both channels it fails already at 460800.

    All tests are running at 180Mhz and using SPs 0-7.

    What I do not understand is the drastic reduction of transfer speed when using the echo server inbetween.
    I simply do not understand why echo is so slow.
    How does it fail ? - are early bytes ok, and later ones fail ?
    Can you add a char counter to each stage, and check those after a run ?
    I've found char counters a great cross check for serial stress testing.
    I also send blocks of "U" and check the MHz with a frequency/edge counter - then compare that with the expected baud rates.
    This finds (usually undocumented) creepage issues in the links. eg Sometimes, extra stop bits are added at high baud rates.

  • msrobotsmsrobots Posts: 3,701
    edited 2019-02-03 01:19
    - you mean 90MBd ?
    yes, I increment by 115200 and it fails at 90085400, so 90 could work.
    and it does that full duplex with two pairs of RX and TX.

    - A RX up to SysCLK /2 is not going to be practical between two P2's that are not phase locked.

    phase locked, that might be part of the problem, the delay shows up when two different COGs reading the same smartpins, or to be clear there the RX smart pins are always reading the pin next to them driven as TX smart pin. TX and RX has each a own smart pin, but rx reads the pin next to it for not having to put resistors between the pins.

    - how does it fail.

    good question. I clear a 16K buffer then send the ROM content async , receive it async, wait for completion of write and read, then compare buffer with rom, if not equal, fail.

    I also have time-outs on RX if nothing there but I am not sure yet if they even hit,

    -Can you add a char counter to each stage, and check those after a run ?

    I think I can try that to see when it fails.

    -I also send blocks of "U" and check the MHz with a frequency/edge counter - then compare that with the expected baud rates.
    -This finds (usually undocumented) creepage issues in the links. eg Sometimes, extra stop bits are added at high baud rates.

    could you maybe try that, I do not have the equipment to do so? Because that might be a case.

    But my primary guess is that I have a stupid typo somewhere checking rxWhateve instead of txWhatever.

    EDIT: ahh - I forgot

    when transmitting is successful I 'lose' about 15-90 sysclocks per byte when comparing set baudrate with sysclocks used. But that is the calloverhead and seems to be quit constant
    when failing this goes up to 300

    Enjoy!

    Mike

  • jmgjmg Posts: 15,140
    msrobots wrote: »
    phase locked, that might be part of the problem, the delay shows up when two different COGs reading the same smartpins, or to be clear there the RX smart pins are always reading the pin next to them driven as TX smart pin. TX and RX has each a own smart pin, but rx reads the pin next to it for not having to put resistors between the pins.
    I guess you could try jumpering pins on the headers, to avoid the next-pin mapping, as a check, but I would not expect such a large effect from sampling shifts.
    ie If you go from 90MBd to under 1MBd that's a massive drop.
    IIRC Chip has reported samples-per-bit in the order of 3-4-5 are needed for true ASYNC, (ie between two separate clocked P2's ) and many MCUs have x8 sample UART modes.

  • jmg wrote: »
    msrobots wrote: »
    phase locked, that might be part of the problem, the delay shows up when two different COGs reading the same smartpins, or to be clear there the RX smart pins are always reading the pin next to them driven as TX smart pin. TX and RX has each a own smart pin, but rx reads the pin next to it for not having to put resistors between the pins.
    I guess you could try jumpering pins on the headers, to avoid the next-pin mapping, as a check, but I would not expect such a large effect from sampling shifts.
    ie If you go from 90MBd to under 1MBd that's a massive drop.
    IIRC Chip has reported samples-per-bit in the order of 3-4-5 are needed for true ASYNC, (ie between two separate clocked P2's ) and many MCUs have x8 sample UART modes.

    Yes I was considering to skip the SSP reading next to it and jumper the pins, but am afraid to just jumper them. I need some resistors, don't want to fry pins. And all my electronics stuff is still in Boxes, since I moved recently.

    But since reading next pin works with a single COG, it should work with two COGS except that it doesn't.

    And you are exactly right from 90 down to 1 makes not really sense. It never hangs on tx but hangs on RX as far as I could see.

    IIRC Chip has reported samples-per-bit in the order of 3-4-5 are needed for true ASYNC, (ie between two separate clocked P2's ) and many MCUs have x8 sample UART modes.

    my RXcheck times out after 100_000 cycles so about 800_000 sysclocks, TX1 and TX2 using int3, RX1 int1, RX2 int2.

    maybe putting TXes on int1?

    I am drawing at straws right now.

    Enjoy!

    Mike
  • jmgjmg Posts: 15,140
    msrobots wrote: »
    ..
    But since reading next pin works with a single COG, it should work with two COGS except that it doesn't.

    One thing that does/can change, with 2 COGs vs one, is the relative opcode phase, since opcodes are 2 sysclks.
    ie Talking-to-self would always be opcode-phase-locked, but talking to another might be off by half an opcode time. Maybe that matters ?


    Can you add an extra stop bit to Tx ? That can give more tolerance to creep, and it may change the failure frequency.


  • First of all good call with - how does it fail

    because it is different. The one pair (RX1/TX1) version fails with a timeout on RX1, but the two pair version (RX1/TX1 and RX2/TX2) fails with buffer check wrong.

    Not sure why, but at least some hint.

    as for 2 stop bits, might be a try, I just don't now how to do that with smart pins, must read a bit about that.

    as for being off 1 or 2 clocks, I do not think that this would explain 90Mbit vs 10Mbit

    the current version goes does this for using just one rx/tx pair and using the echo server
    running at baud 691200
      45061683 - PASS - 639204 - 146
      45061619 - PASS - 639204 - 146
      45061723 - PASS - 639204 - 146
    
    running at baud 806400
    -100297363 - FAIL - 6865 - -8353 - timeout rx1
    - 37563451 - FAIL - 6764 - -4524 - timeout rx1
    -100292443 - FAIL - 6865 - -8353 - timeout rx1
    
    and this when using two pairs rx1/tx1 r2/tx pair and using the echo server also with two pairs
    running at baud 345600
      86027293 - PASS - 334821 - 42
      86027373 - PASS - 334821 - 42
      86027389 - PASS - 334821 - 42
    
    running at baud 460800
    -201026197 - FAIL - 7034 - -16175 - buffer 2 does not check
    - 65026469 - FAIL - 6808 - -7874 - buffer 2 does not check
    -201026493 - FAIL - 7034 - -16175 - buffer 2 does not check
    

    the first number is sysclock taken for test, thus negative on errors
    the number after PASS is the effective baudrate inclding code overhead and the third number the derivation in sysclocks per byte, because of that overhead.


    leaving the echo COG out and just running smartpins in one COG:

    on the top end I seem not to outrun the SPs, but the processing code

    first number sysclocks taken, second effective baudrate third derivation, so gabs in between chars in sysclocks
    running at baud 80640000
       4097059 - PASS - 7031250 - 228
       4097059 - PASS - 7031250 - 228
       4096963 - PASS - 7031250 - 228
    
    running at baud 80755200
       4097059 - PASS - 7031250 - 228
       4097123 - PASS - 7031250 - 228
       4096915 - PASS - 7031250 - 228
    
    

    same goes for the two port code.

    I am still digging here...

    Enjoy!

    Mike
Sign In or Register to comment.