I have begun my Propeller 2 journey!

RossH · 2023-04-25 08:46

Hello @enorton

I have uploaded a second beta release to address the issue I found in the 8 port serial driver. It was not really a bug as such - it looks like it was by design. If the receive buffer ever filled up, any new character overwrites the oldest character in the buffer, which effectively discards the entire buffer and starts over. However, this behavior doesn't make much sense to me, so I have modified it so that new characters are discarded if there is no space in the buffer to store them. Previously received characters are never discarded.

This makes the 8 port serial driver match the behavior of the 2 port serial driver. It should make no difference to a program if the receive buffer is never allowed to completely fill up.

Ross.

enorton · 2023-04-25 23:34

@RossH said:

@enorton said:

I just tested one of my products and the buffer function shows the maximum buffer size and when the buffer is used it subtracts from it. The way you have your buffers set up is I assume increments by how many are sent or available and the same for receive buffer. Is this correct? I had to build some boards today and have not been able to devote time to development today. I may get back on the project tomorrow afternoon and will take a closer look at how all of this works and report back to you.

What you describe is true of the s8_txcheck() function for the tx buffer (but note that s8_rxcheck() does not implement the corresponding function for the rx buffer - it does not return a count, it returns either -1 or the next character in the rx buffer).

The new functions are s8_txcount() and s8_rxcount(), which return the number of characters in the respective buffer i.e. from 0 to the buffer length - 1.

For example, the ex_wait.c test program fills the tx buffer, then fetches the value of s8_txcount() after a wait of a few milliseconds, so you will see this value change if you change the value of the _waitms() function call (which is 5 by default) which allows more or less time for characters to be sent. For example, here are the values I see for various wait times:
_waitms(0)  : 410
_waitms(5)  : 297
_waitms(10) : 184
_waitms(15) : 71
_waitms(20) : 0
To see the value of s8_rxcount() change, simply press a few keys faster than the test program prints them, and the rx count will keep rising, then fall again when you stop pressing keys. It should never go higher than the buffer size even if you just hold a key down forever, but I just noticed while writing this answer that it wraps back to zero, which is a bug!

I will fix that and post a new version soon.

Ross.

I think there is a miss communication in what I need. The buffer size check is like so:

define RX_BUFFER_SIZE 8192

uint16_t check_rx_buffer_size(void)
{
uint16_t rtail = serial_rx_buffer_tail; // Copy to limit multiple calls to volatile
if (serial_rx_buffer_head >= rtail) { return(RX_BUFFER_SIZE - (serial_rx_buffer_head-rtail)); }
return((rtail-serial_rx_buffer_head-1));
}

When the program runs, the RX buffer check shows 8192 as the size when idle. When the machine is in use the buffer size goes lower or is decremented (for example 8012, 7910, etc.) when in use. The main program checks to see how low the RX buffer is and then sends that many more bytes to keep the buffer full at all times or the system slows down. Hope this helps to clarify what I need. Without this buffer check, I cannot proceed to do full-speed testing to verify the P2 chip will work for what I want it to do.

enorton · 2023-04-25 23:36

@enorton said:
@RossH said:

@enorton said:

I just tested one of my products and the buffer function shows the maximum buffer size and when the buffer is used it subtracts from it. The way you have your buffers set up is I assume increments by how many are sent or available and the same for receive buffer. Is this correct? I had to build some boards today and have not been able to devote time to development today. I may get back on the project tomorrow afternoon and will take a closer look at how all of this works and report back to you.

What you describe is true of the s8_txcheck() function for the tx buffer (but note that s8_rxcheck() does not implement the corresponding function for the rx buffer - it does not return a count, it returns either -1 or the next character in the rx buffer).

The new functions are s8_txcount() and s8_rxcount(), which return the number of characters in the respective buffer i.e. from 0 to the buffer length - 1.

For example, the ex_wait.c test program fills the tx buffer, then fetches the value of s8_txcount() after a wait of a few milliseconds, so you will see this value change if you change the value of the _waitms() function call (which is 5 by default) which allows more or less time for characters to be sent. For example, here are the values I see for various wait times:
_waitms(0)  : 410
_waitms(5)  : 297
_waitms(10) : 184
_waitms(15) : 71
_waitms(20) : 0
To see the value of s8_rxcount() change, simply press a few keys faster than the test program prints them, and the rx count will keep rising, then fall again when you stop pressing keys. It should never go higher than the buffer size even if you just hold a key down forever, but I just noticed while writing this answer that it wraps back to zero, which is a bug!

I will fix that and post a new version soon.

Ross.
I think there is a miss communication in what I need. The buffer size check is like so:

define RX_BUFFER_SIZE 8192

uint16_t check_rx_buffer_size(void)
{
uint16_t rtail = serial_rx_buffer_tail; // Copy to limit multiple calls to volatile
if (serial_rx_buffer_head >= rtail) { return(RX_BUFFER_SIZE - (serial_rx_buffer_head-rtail)); }
return((rtail-serial_rx_buffer_head-1));
}

When the program runs, the RX buffer check shows 8192 as the size when idle. When the machine is in use the buffer size goes lower or is decremented (for example 8012, 7910, etc.) when in use. The main program checks to see how low the RX buffer is and then sends that many more bytes to keep the buffer full at all times or the system slows down. Hope this helps to clarify what I need. Without this buffer check, I cannot proceed to do full-speed testing to verify the P2 chip will work for what I want it to do.

I guess I could do something like this: (RX_BUFFER_SIZE - s8_rxcount(UART1))

RossH · 2023-04-26 00:15

@enorton said:
I guess I could do something like this: (RX_BUFFER_SIZE - s8_rxcount(UART1))

Yes. I didn't implement sX_rxcheck() to correspond with sX_txcheck() because that name had already been used for something else in some of the original Spin drivers, and I maintain compatibility with Spin where possible. So I implemented sX_rxcount() and sX_txcount() instead. Use a #define:

#define rxcheck(PORT) (RX_BUFFER_SIZE - s8_rxcount(PORT))

Ross.

enorton · 2023-04-27 14:48

@RossH said:

@enorton said:
I guess I could do something like this: (RX_BUFFER_SIZE - s8_rxcount(UART1))

Yes. I didn't implement sX_rxcheck() to correspond with sX_txcheck() because that name had already been used for something else in some of the original Spin drivers, and I maintain compatibility with Spin where possible. So I implemented sX_rxcount() and sX_txcount() instead. Use a #define:

#define rxcheck(PORT) (RX_BUFFER_SIZE - s8_rxcount(PORT))

Ross.

Hi Ross,

Ok, I have a question. I remember a little while back you mentioned something along the lines of the serial port/s being processed by using a separate cog. Is this true? Are the functions sX_tx() and sX_rx() passed to a cog for further processing by a "serial" cog? I finally was able to get one of the motors moving but when data starts getting heavy it exhibits the same behavior as all the other microcontrollers I have used previous to the P2 chip. If serial data IS processed by another cog this is good and may need to do some other trickery here to figure things out.

Is there a way to convert the serial code to C code so that I can change things? I'd like to convert the module you created to custom functions I set up in a new cog to mess with the buffers and also change the RX buffer check to my liking. I really want the P2 chip to work and not being able to tweak things is driving me crazy.

On another note, I think the math functions may be a little too slow. I noticed some slow processing of sqrt() function. Can I pass the sqrt() function to the CORDIC for faster processing? Can other math functions be passed to the CORDIC? Either way, it is slower than what I am used to. The Nuvoton chip I used previously has an onboard FPU that processed all of the ARM math functions and when finished it would pass the results back to the application. I want to do something similar here if possible. I noticed you said the -lmc option uses a COG for math processing. Is this true for all of the math functions or some? Can you elaborate?

RossH · 2023-04-27 23:21

@enorton said:
Ok, I have a question. I remember a little while back you mentioned something along the lines of the serial port/s being processed by using a separate cog. Is this true? Are the functions sX_tx() and sX_rx() passed to a cog for further processing by a "serial" cog? I finally was able to get one of the motors moving but when data starts getting heavy it exhibits the same behavior as all the other microcontrollers I have used previous to the P2 chip. If serial data IS processed by another cog this is good and may need to do some other trickery here to figure things out.

Yes, every Catalina "plugin" represents one or more cogs separate to the cog running the main program. In the case of the serial drivers, all the serial data handling is done by one cog, which interacts with the main cog via one or more buffers in Hub RAM. The main program speed, and the speed of accessing the serial data should not be affected by either the size of the serial buffers or number of characters in them. If it is, then the issue is likely to be in your program rather than the serial plugin or the interface code.

Is there a way to convert the serial code to C code so that I can change things? I'd like to convert the module you created to custom functions I set up in a new cog to mess with the buffers and also change the RX buffer check to my liking. I really want the P2 chip to work and not being able to tweak things is driving me crazy.

The interface code is C code, but the serial code is PASM. Look in source/lib/catalina_serial8 for the C interface code, and target_p2/MultiPortSerial.pasm for the PASM code. If you want to create your own custom interface library, just copy the C library code to a local directory called libserial8, build it as a local library, and then compile your program with -lserial8 as normal - Catalina always looks for a local library first.

On another note, I think the math functions may be a little too slow. I noticed some slow processing of sqrt() function. Can I pass the sqrt() function to the CORDIC for faster processing? Can other math functions be passed to the CORDIC? Either way, it is slower than what I am used to. The Nuvoton chip I used previously has an onboard FPU that processed all of the ARM math functions and when finished it would pass the results back to the application. I want to do something similar here if possible. I noticed you said the -lmc option uses a COG for math processing. Is this true for all of the math functions or some? Can you elaborate?

Yes, -lm uses a software floating point library written in C, whereas -lma, -lmb and -lmc use a separate cog to implement a PASM co-processor - this architecture is a legacy of the Propeller 1 days, which had no CORDIC processor built in. . The -lmc co-processor uses the CORDIC functions and should be faster, so if you are using -lm, then try using -lmc instead. If this is still not fast enough you can either locate another C or CORDIC floating point library, or implement the specific CORDIC functions you need in inline PASM taken from the CORDIC implementation in target/Catalina_Float32_C_Plugin.spin2 - this would require a little more work, but would give you the fastest solution.

Ross.

Electrodude · 2023-04-28 03:12

@RossH If you already have the architecture to support native software FP in C via -lm, why don't you also have a CORDIC version that runs from within the same cog? The P2 has none of the overhead that makes a coprocessor cog desirable on the P1.

RossH · 2023-04-28 03:49

@Electrodude said:
@RossH If you already have the architecture to support native software FP in C via -lm, why don't you also have a CORDIC version that runs from within the same cog? The P2 has none of the overhead that makes a coprocessor cog desirable on the P1.

Easy enough to do - all the CORDIC implementations are already written in PASM for the co-processor. It would only need a few tweaks to turn this code into a stand-alone set of PASM functions that can be called from C wrapper functions. I can add it to my "todo" list, but that list is already mighty long, and it keeps growing faster than I can take things off it!

Ross.

RossH · 2023-04-29 07:39

@Electrodude, @enorton

I just did a quick time trial of the software only floating point option vs the non-cordic plugin option vs the cordic plugin option. I used a mix of floating point operations (e.g. log, sin, tan, pow, exp etc).

The results (as I expected) are that using the cordic plugin (-lmc) is fastest. The non-cordic plugins (-lma and -lmb) take about twice as long, and the software only option (-lm) takes over five times as long.

As for a cordic software only solution, I would expect it to be faster still, but perhaps not by very much. While you avoid the overhead of sending each request off to another cog, you do have the overhead that it would be executing in HUBEXEC mode rather than COGEXEC mode, and also that it would need to be made both interrupt and thread safe, which is already implemented for all plugins. But the worst thing would be that all the temporary variables these operations need would have to either be in Hub RAM or on the stack (which amounts to the same thing) instead of in Cog RAM. So in the end the overall speed increase might be disappointing.

Ross.

P.S. I just realize that I uploaded the second beta release but forgot to change the folder name - you can find it here.

enorton · 2023-04-30 15:12

@RossH said:
@Electrodude, @enorton

I just did a quick time trial of the software only floating point option vs the non-cordic plugin option vs the cordic plugin option. I used a mix of floating point operations (e.g. log, sin, tan, pow, exp etc).

The results (as I expected) are that using the cordic plugin (-lmc) is fastest. The non-cordic plugins (-lma and -lmb) take about twice as long, and the software only option (-lm) takes over five times as long.

As for a cordic software only solution, I would expect it to be faster still, but perhaps not by very much. While you avoid the overhead of sending each request off to another cog, you do have the overhead that it would be executing in HUBEXEC mode rather than COGEXEC mode, and also that it would need to be made both interrupt and thread safe, which is already implemented for all plugins. But the worst thing would be that all the temporary variables these operations need would have to either be in Hub RAM or on the stack (which amounts to the same thing) instead of in Cog RAM. So in the end the overall speed increase might be disappointing.

Ross.

P.S. I just realize that I uploaded the second beta release but forgot to change the folder name - you can find it here.

Thank you Ross

enorton · 2023-05-02 15:04

@RossH said:

@enorton said:
Ok, I have a question. I remember a little while back you mentioned something along the lines of the serial port/s being processed by using a separate cog. Is this true? Are the functions sX_tx() and sX_rx() passed to a cog for further processing by a "serial" cog? I finally was able to get one of the motors moving but when data starts getting heavy it exhibits the same behavior as all the other microcontrollers I have used previous to the P2 chip. If serial data IS processed by another cog this is good and may need to do some other trickery here to figure things out.

Yes, every Catalina "plugin" represents one or more cogs separate to the cog running the main program. In the case of the serial drivers, all the serial data handling is done by one cog, which interacts with the main cog via one or more buffers in Hub RAM. The main program speed, and the speed of accessing the serial data should not be affected by either the size of the serial buffers or number of characters in them. If it is, then the issue is likely to be in your program rather than the serial plugin or the interface code.

Is there a way to convert the serial code to C code so that I can change things? I'd like to convert the module you created to custom functions I set up in a new cog to mess with the buffers and also change the RX buffer check to my liking. I really want the P2 chip to work and not being able to tweak things is driving me crazy.

The interface code is C code, but the serial code is PASM. Look in source/lib/catalina_serial8 for the C interface code, and target_p2/MultiPortSerial.pasm for the PASM code. If you want to create your own custom interface library, just copy the C library code to a local directory called libserial8, build it as a local library, and then compile your program with -lserial8 as normal - Catalina always looks for a local library first.

On another note, I think the math functions may be a little too slow. I noticed some slow processing of sqrt() function. Can I pass the sqrt() function to the CORDIC for faster processing? Can other math functions be passed to the CORDIC? Either way, it is slower than what I am used to. The Nuvoton chip I used previously has an onboard FPU that processed all of the ARM math functions and when finished it would pass the results back to the application. I want to do something similar here if possible. I noticed you said the -lmc option uses a COG for math processing. Is this true for all of the math functions or some? Can you elaborate?

Yes, -lm uses a software floating point library written in C, whereas -lma, -lmb and -lmc use a separate cog to implement a PASM co-processor - this architecture is a legacy of the Propeller 1 days, which had no CORDIC processor built in. . The -lmc co-processor uses the CORDIC functions and should be faster, so if you are using -lm, then try using -lmc instead. If this is still not fast enough you can either locate another C or CORDIC floating point library, or implement the specific CORDIC functions you need in inline PASM taken from the CORDIC implementation in target/Catalina_Float32_C_Plugin.spin2 - this would require a little more work, but would give you the fastest solution.

Ross.

Hi Ross,

I am using the -lmc version and I have found it is fast enough for my purposes. I do apologize for thinking this was not the case. There are so many things going on in my firmware it's hard to pinpoint where the problems are coming from sometimes.

On another note, I had to use two more cogs to implement separate serial transmit and receive functions to get the firmware to behave correctly. The original serial cog is OK but does not perform as I need it to. I tried the rxcount function you created for me, and it isn't quite doing what I'd expect so I created a workaround. What I am doing is using the serial8 and on top of that two extra cogs for transmit and receive to handle the odd buffering checks I need to do and so far, it seems to work great. The only issue I am trying to figure out is how to make the buffer sizes larger for the serial8. I might be blind but do not see anything where I can set a buffer size for the serial ports in use. If you can enlighten me how to change this would be great. Right now, I have set up an 8192-byte receive buffer and a 4096-byte transmit buffer with the two extra cogs. I'd like to match these sizes for the serial8 module if possible

RossH · 2023-05-02 21:39

@enorton said:
If you can enlighten me how to change this would be great. Right now, I have set up an 8192-byte receive buffer and a 4096-byte transmit buffer with the two extra cogs. I'd like to match these sizes for the serial8 module if possible

Just close the ports (if they have been confugured to opened automatically) and then re-open them with custom buffers.

The programs demos\serial8_test_serial8_count.c gives an example.

enorton · 2023-05-03 04:34

@RossH said:

@enorton said:
If you can enlighten me how to change this would be great. Right now, I have set up an 8192-byte receive buffer and a 4096-byte transmit buffer with the two extra cogs. I'd like to match these sizes for the serial8 module if possible

Just close the ports (if they have been confugured to opened automatically) and then re-open them with custom buffers.

The programs demos\serial8_test_serial8_count.c gives an example.

Thanks Ross problem sorted

I have begun my Propeller 2 journey!

Comments

define RX_BUFFER_SIZE 8192

define RX_BUFFER_SIZE 8192