Strange behaviour, my fault or compiler bug?

ManAtWork · 2024-05-07 09:30

I'm still having problems with that EtherCAT stuff. All I get is grey hair. This time it's the driver for the slave device controller. The P2 communicates via a really complicated DPRAM interface which includes several command and data register and a FIFO that is connected to a SPI or QSPI bus. So the DPRAM is not memory mapped into normal address space but instead I have to write the address and length of the memory I need to access into registers and poll how many words are avaliable in the FIFO. Then I can burst read the FIFO and wait/check for available data, again. I think I have solved this although the data sheet doesn't tell about how to work around bugs in the hardware and the original driver code from Microchip was quite buggy (didn't check for the data in the FIFO to be ready and instead relied on the MCU to be slow enough not to cause the buffer to run empty).

Anyway, my code is now working as long as I only test one operation at a time. But as I add more code things start to get weird. If I add code to the end of main() I suddenly get crashes but not in the added code but somewhere near the beginning.

I thought it's a problem of corrupted memory like a buffer overflow writing past array boundaries. So I added guards to my buffers. The following code works perfectly...

enum {_clkfreq = 200_000_000, _xtlfreq = 25_000_000};
enum {pinEscBase = 48};

#include "LAN9252_driver.h"
#include <stdio.h>

static uint8_t zeroes[260];
static uint8_t buffer[260] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};

void PrintBuffer ()
{
    printf ("buffer = \n");
    for (int i=0; i<256; i+= 16)
    {
        for (int j=0; j<16; j++) printf ("%02x ", buffer[i+j]);
        printf ("\n");
    }
}

void main ()
{
    PDI_Init (pinEscBase);
    // direct register access test
    printf ("byte-order = %x\n", PDIReadLAN9252DirectReg (LAN9252_BYTE_ORDER_REG));
    int t= 0;
    // indirect register access (ECat core) test
    SPIReadRegUsingCSR (&t, 0, 4); // <- crashes here
    printf ("type = %x\n", t); // <- print does not appear
    // DPRAM + FIFO write test
    memset (zeroes, 0, 256);
    zeroes[256]= 0xaa;
    buffer[256]= 0x55; // add guards

    SPIWritePDRamRegister (zeroes, 0x1000, 256); // clear
    SPIWritePDRamRegister (buffer, 0x106f, 15); // misaligned write
    memset (buffer, 1, 256);
    SPIReadPDRamRegister (buffer, 0x1000, 256); // read back
    PrintBuffer ();

    // DPRAM + FIFO read test
    for (int i=0; i<256; i++) buffer[i]= i;
    SPIWritePDRamRegister (buffer, 0x1000, 256); // fill
    memset (buffer, 0, 256); // clear
    SPIReadPDRamRegister (buffer+3, 0x1031, 15); // misaligned read 
    PrintBuffer ();
    printf ("Z256=%x B256=%x\n", zeroes[256], buffer[256]);// check guards
    while (1) {};
}

... and prints out exactly what I expected:

Cog0  INIT $0000_0000 $0000_0000 load
Cog0  INIT $0000_0404 $0000_0000 load
byte-order = 87654321
type = 2c0
Wavail=16 cnt=256
Wavail=13 cnt=192
Wavail=15 cnt=140
Wavail=13 cnt=80
Wavail=15 cnt=28
Wavail=16 cnt=15
Ravail=5
Ravail=16
Ravail=13
Ravail=15
Ravail=13
Ravail=2
buffer =
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 
02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Wavail=16 cnt=256
Wavail=13 cnt=192
Wavail=15 cnt=140
Wavail=13 cnt=80
Wavail=15 cnt=28
Ravail=4
buffer =
00 00 00 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d
3e 3f 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Z256=aa B256=55

But when I comment out the last printf() it crashes at the SPIReadRegUsingCSR (&t, 0, 4); // <- crashes here.

Any ideas how can I debug this??? I think I can rule ot the following possible causes:

timing, code that isn't yet executed cannot affect the timing of previous actions
single writes to false addresses, if some of the data would go to wrong indexes there should be gaps in the expected output (as long as I don't write twice)
continously writing past the end of the buffer, the guards are still there

I could post the complete code but it only runs if you have the hardware (LAN9252) connected so it wouldn't be of much use. BTW I use flexspin V6.9.4.

evanh · 2024-05-07 10:27

Could be out of heap or stack problems. The main heap is small by default. Do you launch Spin or C into other cogs?

Wuerfel_21 · 2024-05-07 10:43

@ManAtWork said:
I could post the complete code but it only runs if you have the hardware (LAN9252) connected so it wouldn't be of much use. BTW I use flexspin V6.9.4.

Didn't you send me one like this? I'd have to find it again. I was meaning to ask for docs on it, anyways.

ManAtWork · 2024-05-07 12:31

@evanh said:
Could be out of heap or stack problems. The main heap is small by default. Do you launch Spin or C into other cogs?

Are you sure? I thought the main stack is all the remaining hub RAM by default. I don't lauch any cogs and this test program doesn't use the heap (malloc/free).

@Wuerfel_21 said:
Didn't you send me one like this? I'd have to find it again. I was meaning to ask for docs on it, anyways.

I've sent you one of the general purpose Ethernet accessory boards. It has a simple PHY chip (LAN8720) and only one RJ45 connector. The one I use here is a special EtherCAT slave controller (LAN9252) with two RJ45 jacks. (comparison
)

The driver for the Ethernet PHY is in the OBEX and the "docs" can be found here:
https://forums.parallax.com/discussion/174351/rmii-ethernet-interface-driver-software
https://forums.parallax.com/discussion/174402/ethernet-phy-accessory-board

Wuerfel_21 · 2024-05-07 12:51

I've sent you one of the general purpose Ethernet accessory boards. It has a simple PHY chip (LAN8720) and only one RJ45 connector. The one I use here is a special EtherCAT slave controller (LAN9252) with two RJ45 jacks.
)

Ah, I just remembered LAN-something-something ~

ManAtWork · 2024-05-07 12:58

I forgot to mention that Eric said there is a bug in flexspin 6.9.4

Thank you for the bug report! The root cause is that flexspin didn't know that unions and structures that contain byte arrays need to be placed in memory rather than in registers. That is fixed in version 6.9.5, which is on github.

But this cuased a crash of the compiler itself and I think it doesn't have anything to do with the above problems as plain byte arrays should work.

rogloh · 2024-05-07 13:04

Perhaps try doing the &t from a global variable address rather than a local variable that is on the stack? Just to see if there are any differences in behaviour. The only thing the removal of the printf that triggers the bug is doing is adjusting memory layout.

// indirect register access (ECat core) test
SPIReadRegUsingCSR (&t, 0, 4); // <- crashes here

ManAtWork · 2024-05-07 13:23

With t declared as global variable it always crashes, no matter if with or without printf()
However, If I coment out everything below // DPRAM + FIFO read test it works, again. There must be a serious memory problem.

ersmith · 2024-05-07 13:27

@ManAtWork have you tried compiling with different optimization settings? Most compiler bugs are in the optimizer, so -O0 is less likely to trigger them.

Another thing that might be worth checking is trying it with bytecode output (-2nu instead of -2), which will use an entirely different code generation.

You mentioned that removing the printf causes earlier code to crash. If instead of printing buffer[256] and zeros[256] you do something else with them (writing them to a dummy register or something like that) does the code still work?

ManAtWork · 2024-05-07 13:39

With the last printf() commented out (used to crash) I get theese results with different optimizations:

full optimization: also crashes but later
no optimization: works perfectly until the end
default optimization: crashes at SPIReadRegUsingCSR() (same as above)
size optimization: works perfectly until the end
print or BRK debug makes no difference
Edit: Bytecode always works regardless of the optimization level

ManAtWork · 2024-05-07 13:50

BTW, does the keyword "volatile" have any effect on the Propeller with flexspin? I thought that the compiler automatically places variables in hub memory if I request an address of it (with & in C or @ in Spin), or at least tries to do so and throws an error if it's not possible. But I may be wrong...

ersmith · 2024-05-08 13:30

@ManAtWork said:
With the last printf() commented out (used to crash) I get theese results with different optimizations:

full optimization: also crashes but later

no optimization: works perfectly until the end

default optimization: crashes at SPIReadRegUsingCSR() (same as above)

size optimization: works perfectly until the end

print or BRK debug makes no difference

Edit: Bytecode always works regardless of the optimization level

That does sound suspiciously like an optimizer bug in the assembly language output. I'll try to figure it out. In the meantime, it sounds like you can work around it?

@ManAtWork said:
BTW, does the keyword "volatile" have any effect on the Propeller with flexspin? I thought that the compiler automatically places variables in hub memory if I request an address of it (with & in C or @ in Spin), or at least tries to do so and throws an error if it's not possible. But I may be wrong...

flexspin parses "volatile" and keeps track of it, but doesn't actually do anything with it yet. It is still useful documentation for readers of the code.

ManAtWork · 2024-05-10 11:25

@ersmith said:
That does sound suspiciously like an optimizer bug in the assembly language output. I'll try to figure it out. In the meantime, it sounds like you can work around it?

Yes, I think I can work around it by compiling with -O0. My problem is that I've almost lost confidence in everything I do. The majority of bugs are my own fault but the hardware often does not behave as expected and if I can't trust the compiler it adds another level of trouble and my motivation to do anything drops.

Would it help to send you the hardware? A few $ don't matter and it's far easier than to write a simulated driver for testing that doesn't require special hardware.

flexspin parses "volatile" and keeps track of it, but doesn't actually do anything with it yet. It is still useful documentation for readers of the code.

Ah, good to know.

ersmith · 2024-05-10 16:24

@ManAtWork said:

@ersmith said:
That does sound suspiciously like an optimizer bug in the assembly language output. I'll try to figure it out. In the meantime, it sounds like you can work around it?

Yes, I think I can work around it by compiling with -O0. My problem is that I've almost lost confidence in everything I do. The majority of bugs are my own fault but the hardware often does not behave as expected and if I can't trust the compiler it adds another level of trouble and my motivation to do anything drops.

I'm sorry. It might be wise for you to port all of your code to C, and then you could use a completely different compiler (like Catalina, riscvp2, or p2llvm) as a backup. Or, you could port it to Spin2 and then have the official Spin2 compiler as a backup.

Would it help to send you the hardware? A few $ don't matter and it's far easier than to write a simulated driver for testing that doesn't require special hardware.

I've recently started a new job and so unfortunately I don't have much time to work on Parallax things. So it might not help, my main constraint is time.

I did look at the generated code and it looks OK -- the code with and without the problematic printf is identical (except for the printf calls of course), which surprised me. I thought something about the printf might be causing the optimizer to change the code. Since it isn't, this makes me less convinced that it's an optimizer bug (although of course that's still a possibility). I'm now wondering about a memory overflow or bad pointer.

When you say the code "crashes", I presume you mean it hangs and never comes back? Are you able to insert some LED on/off code to figure out where it first goes wrong?

Regards,
Eric

evanh · 2024-05-11 01:27

@ManAtWork said:

@evanh said:
Could be out of heap or stack problems. The main heap is small by default. Do you launch Spin or C into other cogs?

Are you sure? I thought the main stack is all the remaining hub RAM by default. I don't lauch any cogs and this test program doesn't use the heap (malloc/free).

There is certain libraries that do use the heap anyway. Not that I can remember what Eric had said now though .... EDIT: ah, file/dir access does.

It's only a kByte or two by default. To increase it, create enum. eg: HEAPSIZE = 8800

Rayman · 2024-05-11 01:32

I always increase heap size

evanh · 2024-05-11 01:49

I think that's something that needs structurally changed in Flexspin. Stacks should be defined on a per-task/cog basis. The heap should be singular, allocated via a lock, and is all remaining space.

And stacks would actually be allocated from the heap.

Electrodude · 2024-05-11 03:50

@evanh said:
I think that's something that needs structurally changed in Flexspin. Stacks should be defined on a per-task/cog basis. The heap should be singular, allocated via a lock, and is all remaining space.

And stacks would actually be allocated from the heap.

How difficult would it be to do the thing fancy malloc implementations do, where cogs hold on to some of their own recently-freed allocations and first try to service allocations from there, and then only bother with locking when those are all used up or not big enough?

ManAtWork · 2024-05-11 09:05

I don't think it has anything to do with the heap or stack overflow issues. As I said, This program doesn't use malloc() at all and I don't start a seperate cog with it's own (limited) stack. There are no recursive calls and the memory usage of local variables is quite low.

I'm not 100% sure but I think there are only two possible reasons.
1. A bug in my code that leads to corrupted memory (invalid pointers, uninitialized variables...)
2. A problem with the compiler like overlapping or misaligned memory for variables
Statistically, 1 is more likely and I don't want to point the finger on Eric. I think it's my job to debug this. The problem is how. Without a single step debugger the only way is adding printf()s or other ways of state/progress output. But when the length of the code influences the behaviour it gets difficult.

@ersmith said:
I'm sorry. It might be wise for you to port all of your code to C, and then you could use a completely different compiler (like Catalina, riscvp2, or p2llvm) as a backup. Or, you could port it to Spin2 and then have the official Spin2 compiler as a backup.

That's not really an option. I haven't managed to get a hello-world compiled with Catalina and I fear riscvp2 and p2llvm are even more "for hackers only". Having to port all code to C would mean existing drivers are useless and I'm completely on my own. Propeller Tool has been anounced EOL from Parallax. PNut is still an option but only for very small projects. The possibility to mix existing Spin code with C is so ingenious that I don't want to miss it. I'm commited to Flexspin and I would rather quit using the P2 than using a different compiler (at least for my big projects).

That leads to the quiestion if exactly this feature might be the problem in this case. If there was a serious problem with the compiler somebody else must have found out earlier. I think the ability of Flexspin to compile Spin and C (seperately) is well tested. But maybe the way I use structs and pass parameters from C to Spin triggers some very special case nobody has run into before.

I could port the main code to Spin and compile it with PNut. But that doesn't proove anything. It also runs well with Flexspin if I compile to byte code (-2nu).

evanh · 2024-05-11 11:15

@ManAtWork said:
Statistically, 1 is more likely and I don't want to point the finger on Eric. I think it's my job to debug this. The problem is how. Without a single step debugger the only way is adding printf()s or other ways of state/progress output. But when the length of the code influences the behaviour it gets difficult.

I've also had odd super-touchy conditions like you're having but I really can't remember what those turned out to be any longer. I'll keep poking around my old code to see if I can find anything ...

That's not really an option. I haven't managed to get a hello-world compiled with Catalina and I fear riscvp2 and p2llvm are even more "for hackers only". Having to port all code to C would mean existing drivers are useless and I'm completely on my own....

Catalina does have support for Spin/Pasm along the lines of driver objects I think. It's maybe not drop-in though, but rather one has to adapt it. I'm guessing.

iseries · 2024-05-11 11:20

I have been there, banging my head against the wall trying to figure out why this simple code is not working. Adding print statements sometimes hides the problem by slowing down the code or causes other problems.

I also look at the assembly code that is generated to see if looks right. Very helpful.

In most cases I find it's a timing issue where a variable does not have the value I think it should.

I think mixing spin and C is a bad idea. To different worlds.

I always build my own library functions.

Mike

evanh · 2024-05-11 12:09

@evanh said:
I've also had odd super-touchy conditions like you're having but I really can't remember what those turned out to be any longer. I'll keep poking around my old code to see if I can find anything ...

I remembered one where I was helping out and, after long futile exercises in refining his code, it turned out there was actually a compiler bug that affected only the Windoze build of the compiler!
https://forums.parallax.com/discussion/comment/1548932/#Comment_1548932

evanh · 2024-05-11 12:57

Oh, damn, and this one - https://forums.parallax.com/discussion/comment/1541440/#Comment_1541440
Very reminiscent of what you're going through right now. Hate to say it but that was a compiler bug too.

Rayman · 2024-05-11 13:08

Looking at the assembly output is a good way to see what might be going on

Rayman · 2024-05-11 13:12

I’ve seen array initialization being a problem in the distant past. So I’d try something else with buffer initialization with less than full number of elements…

ManAtWork · 2024-05-11 14:54

@iseries said:
I have been there, banging my head against the wall trying to figure out why this simple code is not working. Adding print statements sometimes hides the problem by slowing down the code or causes other problems.

In most cases I find it's a timing issue where a variable does not have the value I think it should.

Yes, the code had timing issues. The original LAN9252 driver code from Microchip didn't check for the number of available data in the FIFO but instead assumed that the MCU was always slower than the FIFO being filled or written. I think I've fixed this but it might be still wrong. But in that case it should result in wrong values but no crashes or hangs. Hangs or infinite loops could be caused by timing issues but it hangs in the first call to a driver function that doesn't use the FIFO. Adding a printf() to the end cannot affect timing of an earlier call.

I think mixing spin and C is a bad idea. To different worlds.

Maybe... but it's tempting. I have to check if something goes wrong while passing data from one world to the other. I'll also add goards to all data structures I use as arguments to driver calls.

I always build my own library functions.

My plan was to make the slave bug-for-bug compatible with the original C code to find out why my master implementation doesn't work as it should. But I have to admit that's probably "moving the church around the village", one complexity level too much.

rogloh · 2024-05-13 00:45

If you have found that the optimizer is affecting the outcome one thing you might be able to do is to turn on and off the optimizer for different blocks of code to try to narrow down to the offending method, then look at the assembly differences in both cases. This may help pinpoint it faster.

IIRC I believe that @ersmith flexspin compiler had that capability to enable/disable it for different methods but can't recall the syntax and couldn't locate it in the docs (perhaps it's undocumented?), it was like a comment on the method line or something. I once had code that needed to disable it in specific methods.

You could also try to enable different named optimizer features one by one until you find the one that breaks it which may be useful too. See general.md in the spin2cpp docs for what each one does.

EDIT: Google found the syntax in an old post of mine (and it was in fact buried within general.md under "Per-function control of optimizations", my mistake)
https://forums.parallax.com/discussion/comment/1540024/#Comment_1540024

RossH · 2024-05-15 07:44

I haven't managed to get a hello-world compiled with Catalina

Why not?

catalina -p2 -lci hello_world.c
payload -i hello_world

Should work on any Propeller 2. If you have a Propeller 1, omit the -p2 option.

evanh · 2024-05-15 08:27

Just had a go at using Catalina. The supplied binaries complain that it requires GLIBC_2.34 ... I seem to have v2.31. Attempting to build Catalina using ./build_all in catalina/source nets the same error ...

gcc: error: ../catalina/awka-0.7.5/lib/libawka.a: No such file or directory
../catalina/awka: /lib/i386-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ../catalina/awka)

PS: I'm currently using Kubuntu 20.04 with HWE but am planning on moving to 24.04 soon.

RossH · 2024-05-15 09:07

@evanh said:
Just had a go at using Catalina. The supplied binaries complain that it requires GLIBC_2.34 ... I seem to have v2.31. Attempting to build Catalina using ./build_all in catalina/source nets the same error ...
gcc: error: ../catalina/awka-0.7.5/lib/libawka.a: No such file or directory
../catalina/awka: /lib/i386-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ../catalina/awka)
PS: I'm currently using Kubuntu 20.04 with HWE but am planning on moving to 24.04 soon.

The provided Linux binaries will only probably work on recent Ubuntu releases, But building Catalina on Linux is fairly easy. See the BUILD.TXT document for details.

Wuerfel_21 · 2024-05-15 11:53

@RossH said:

The provided Linux binaries will only probably work on recent Ubuntu releases, But building Catalina on Linux is fairly easy. See the BUILD.TXT document for details.

If you're building Linux binaries for distribution, you should reallly make them fully static where possible. It doesn't end up much bigger and doesn't ever have this dumbass glibc version issue. You install musl-tools from APT and then use musl-gcc -static -fno-pie as your compiler/linker.

Strange behaviour, my fault or compiler bug?

Comments