Strange behaviour, my fault or compiler bug?
I'm still having problems with that EtherCAT stuff. All I get is grey hair. This time it's the driver for the slave device controller. The P2 communicates via a really complicated DPRAM interface which includes several command and data register and a FIFO that is connected to a SPI or QSPI bus. So the DPRAM is not memory mapped into normal address space but instead I have to write the address and length of the memory I need to access into registers and poll how many words are avaliable in the FIFO. Then I can burst read the FIFO and wait/check for available data, again. I think I have solved this although the data sheet doesn't tell about how to work around bugs in the hardware and the original driver code from Microchip was quite buggy (didn't check for the data in the FIFO to be ready and instead relied on the MCU to be slow enough not to cause the buffer to run empty).
Anyway, my code is now working as long as I only test one operation at a time. But as I add more code things start to get weird. If I add code to the end of main() I suddenly get crashes but not in the added code but somewhere near the beginning.
I thought it's a problem of corrupted memory like a buffer overflow writing past array boundaries. So I added guards to my buffers. The following code works perfectly...
enum {_clkfreq = 200_000_000, _xtlfreq = 25_000_000}; enum {pinEscBase = 48}; #include "LAN9252_driver.h" #include <stdio.h> static uint8_t zeroes[260]; static uint8_t buffer[260] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}; void PrintBuffer () { printf ("buffer = \n"); for (int i=0; i<256; i+= 16) { for (int j=0; j<16; j++) printf ("%02x ", buffer[i+j]); printf ("\n"); } } void main () { PDI_Init (pinEscBase); // direct register access test printf ("byte-order = %x\n", PDIReadLAN9252DirectReg (LAN9252_BYTE_ORDER_REG)); int t= 0; // indirect register access (ECat core) test SPIReadRegUsingCSR (&t, 0, 4); // <- crashes here printf ("type = %x\n", t); // <- print does not appear // DPRAM + FIFO write test memset (zeroes, 0, 256); zeroes[256]= 0xaa; buffer[256]= 0x55; // add guards SPIWritePDRamRegister (zeroes, 0x1000, 256); // clear SPIWritePDRamRegister (buffer, 0x106f, 15); // misaligned write memset (buffer, 1, 256); SPIReadPDRamRegister (buffer, 0x1000, 256); // read back PrintBuffer (); // DPRAM + FIFO read test for (int i=0; i<256; i++) buffer[i]= i; SPIWritePDRamRegister (buffer, 0x1000, 256); // fill memset (buffer, 0, 256); // clear SPIReadPDRamRegister (buffer+3, 0x1031, 15); // misaligned read PrintBuffer (); printf ("Z256=%x B256=%x\n", zeroes[256], buffer[256]);// check guards while (1) {}; }
... and prints out exactly what I expected:
Cog0 INIT $0000_0000 $0000_0000 load Cog0 INIT $0000_0404 $0000_0000 load byte-order = 87654321 type = 2c0 Wavail=16 cnt=256 Wavail=13 cnt=192 Wavail=15 cnt=140 Wavail=13 cnt=80 Wavail=15 cnt=28 Wavail=16 cnt=15 Ravail=5 Ravail=16 Ravail=13 Ravail=15 Ravail=13 Ravail=2 buffer = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Wavail=16 cnt=256 Wavail=13 cnt=192 Wavail=15 cnt=140 Wavail=13 cnt=80 Wavail=15 cnt=28 Ravail=4 buffer = 00 00 00 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Z256=aa B256=55
But when I comment out the last printf() it crashes at the SPIReadRegUsingCSR (&t, 0, 4); // <- crashes here
.
Any ideas how can I debug this??? I think I can rule ot the following possible causes:
- timing, code that isn't yet executed cannot affect the timing of previous actions
- single writes to false addresses, if some of the data would go to wrong indexes there should be gaps in the expected output (as long as I don't write twice)
- continously writing past the end of the buffer, the guards are still there
I could post the complete code but it only runs if you have the hardware (LAN9252) connected so it wouldn't be of much use. BTW I use flexspin V6.9.4.
Comments
Could be out of heap or stack problems. The main heap is small by default. Do you launch Spin or C into other cogs?
Didn't you send me one like this? I'd have to find it again. I was meaning to ask for docs on it, anyways.
Are you sure? I thought the main stack is all the remaining hub RAM by default. I don't lauch any cogs and this test program doesn't use the heap (malloc/free).
I've sent you one of the general purpose Ethernet accessory boards. It has a simple PHY chip (LAN8720) and only one RJ45 connector. The one I use here is a special EtherCAT slave controller (LAN9252) with two RJ45 jacks. (comparison
)
The driver for the Ethernet PHY is in the OBEX and the "docs" can be found here:
https://forums.parallax.com/discussion/174351/rmii-ethernet-interface-driver-software
https://forums.parallax.com/discussion/174402/ethernet-phy-accessory-board
Ah, I just remembered LAN-something-something ~
I forgot to mention that Eric said there is a bug in flexspin 6.9.4
But this cuased a crash of the compiler itself and I think it doesn't have anything to do with the above problems as plain byte arrays should work.
Perhaps try doing the &t from a global variable address rather than a local variable that is on the stack? Just to see if there are any differences in behaviour. The only thing the removal of the printf that triggers the bug is doing is adjusting memory layout.
// indirect register access (ECat core) test
SPIReadRegUsingCSR (&t, 0, 4); // <- crashes here
With t declared as global variable it always crashes, no matter if with or without printf()
However, If I coment out everything below
// DPRAM + FIFO read test
it works, again. There must be a serious memory problem.@ManAtWork have you tried compiling with different optimization settings? Most compiler bugs are in the optimizer, so -O0 is less likely to trigger them.
Another thing that might be worth checking is trying it with bytecode output (
-2nu
instead of-2
), which will use an entirely different code generation.You mentioned that removing the printf causes earlier code to crash. If instead of printing
buffer[256]
andzeros[256]
you do something else with them (writing them to a dummy register or something like that) does the code still work?With the last printf() commented out (used to crash) I get theese results with different optimizations:
BTW, does the keyword "volatile" have any effect on the Propeller with flexspin? I thought that the compiler automatically places variables in hub memory if I request an address of it (with & in C or @ in Spin), or at least tries to do so and throws an error if it's not possible. But I may be wrong...
That does sound suspiciously like an optimizer bug in the assembly language output. I'll try to figure it out. In the meantime, it sounds like you can work around it?
flexspin parses "volatile" and keeps track of it, but doesn't actually do anything with it yet. It is still useful documentation for readers of the code.
Yes, I think I can work around it by compiling with -O0. My problem is that I've almost lost confidence in everything I do. The majority of bugs are my own fault but the hardware often does not behave as expected and if I can't trust the compiler it adds another level of trouble and my motivation to do anything drops.
Would it help to send you the hardware? A few $ don't matter and it's far easier than to write a simulated driver for testing that doesn't require special hardware.
Ah, good to know.
I'm sorry. It might be wise for you to port all of your code to C, and then you could use a completely different compiler (like Catalina, riscvp2, or p2llvm) as a backup. Or, you could port it to Spin2 and then have the official Spin2 compiler as a backup.
I've recently started a new job and so unfortunately I don't have much time to work on Parallax things. So it might not help, my main constraint is time.
I did look at the generated code and it looks OK -- the code with and without the problematic printf is identical (except for the printf calls of course), which surprised me. I thought something about the printf might be causing the optimizer to change the code. Since it isn't, this makes me less convinced that it's an optimizer bug (although of course that's still a possibility). I'm now wondering about a memory overflow or bad pointer.
When you say the code "crashes", I presume you mean it hangs and never comes back? Are you able to insert some LED on/off code to figure out where it first goes wrong?
Regards,
Eric
There is certain libraries that do use the heap anyway. Not that I can remember what Eric had said now though .... EDIT: ah, file/dir access does.
It's only a kByte or two by default. To increase it, create enum. eg:
HEAPSIZE = 8800
I always increase heap size
I think that's something that needs structurally changed in Flexspin. Stacks should be defined on a per-task/cog basis. The heap should be singular, allocated via a lock, and is all remaining space.
And stacks would actually be allocated from the heap.
How difficult would it be to do the thing fancy malloc implementations do, where cogs hold on to some of their own recently-freed allocations and first try to service allocations from there, and then only bother with locking when those are all used up or not big enough?
I don't think it has anything to do with the heap or stack overflow issues. As I said, This program doesn't use malloc() at all and I don't start a seperate cog with it's own (limited) stack. There are no recursive calls and the memory usage of local variables is quite low.
I'm not 100% sure but I think there are only two possible reasons.
1. A bug in my code that leads to corrupted memory (invalid pointers, uninitialized variables...)
2. A problem with the compiler like overlapping or misaligned memory for variables
Statistically, 1 is more likely and I don't want to point the finger on Eric. I think it's my job to debug this. The problem is how. Without a single step debugger the only way is adding printf()s or other ways of state/progress output. But when the length of the code influences the behaviour it gets difficult.
That's not really an option. I haven't managed to get a hello-world compiled with Catalina and I fear riscvp2 and p2llvm are even more "for hackers only". Having to port all code to C would mean existing drivers are useless and I'm completely on my own. Propeller Tool has been anounced EOL from Parallax. PNut is still an option but only for very small projects. The possibility to mix existing Spin code with C is so ingenious that I don't want to miss it. I'm commited to Flexspin and I would rather quit using the P2 than using a different compiler (at least for my big projects).
That leads to the quiestion if exactly this feature might be the problem in this case. If there was a serious problem with the compiler somebody else must have found out earlier. I think the ability of Flexspin to compile Spin and C (seperately) is well tested. But maybe the way I use structs and pass parameters from C to Spin triggers some very special case nobody has run into before.
I could port the main code to Spin and compile it with PNut. But that doesn't proove anything. It also runs well with Flexspin if I compile to byte code (-2nu).
I've also had odd super-touchy conditions like you're having but I really can't remember what those turned out to be any longer. I'll keep poking around my old code to see if I can find anything ...
Catalina does have support for Spin/Pasm along the lines of driver objects I think. It's maybe not drop-in though, but rather one has to adapt it. I'm guessing.
I have been there, banging my head against the wall trying to figure out why this simple code is not working. Adding print statements sometimes hides the problem by slowing down the code or causes other problems.
I also look at the assembly code that is generated to see if looks right. Very helpful.
In most cases I find it's a timing issue where a variable does not have the value I think it should.
I think mixing spin and C is a bad idea. To different worlds.
I always build my own library functions.
Mike
I remembered one where I was helping out and, after long futile exercises in refining his code, it turned out there was actually a compiler bug that affected only the Windoze build of the compiler!
https://forums.parallax.com/discussion/comment/1548932/#Comment_1548932
Oh, damn, and this one - https://forums.parallax.com/discussion/comment/1541440/#Comment_1541440
Very reminiscent of what you're going through right now. Hate to say it but that was a compiler bug too.
Looking at the assembly output is a good way to see what might be going on
I’ve seen array initialization being a problem in the distant past. So I’d try something else with buffer initialization with less than full number of elements…
Yes, the code had timing issues. The original LAN9252 driver code from Microchip didn't check for the number of available data in the FIFO but instead assumed that the MCU was always slower than the FIFO being filled or written. I think I've fixed this but it might be still wrong. But in that case it should result in wrong values but no crashes or hangs. Hangs or infinite loops could be caused by timing issues but it hangs in the first call to a driver function that doesn't use the FIFO. Adding a printf() to the end cannot affect timing of an earlier call.
Maybe... but it's tempting. I have to check if something goes wrong while passing data from one world to the other. I'll also add goards to all data structures I use as arguments to driver calls.
My plan was to make the slave bug-for-bug compatible with the original C code to find out why my master implementation doesn't work as it should. But I have to admit that's probably "moving the church around the village", one complexity level too much.
If you have found that the optimizer is affecting the outcome one thing you might be able to do is to turn on and off the optimizer for different blocks of code to try to narrow down to the offending method, then look at the assembly differences in both cases. This may help pinpoint it faster.
IIRC I believe that @ersmith flexspin compiler had that capability to enable/disable it for different methods but can't recall the syntax and couldn't locate it in the docs (perhaps it's undocumented?), it was like a comment on the method line or something. I once had code that needed to disable it in specific methods.
You could also try to enable different named optimizer features one by one until you find the one that breaks it which may be useful too. See general.md in the spin2cpp docs for what each one does.
EDIT: Google found the syntax in an old post of mine (and it was in fact buried within general.md under "Per-function control of optimizations", my mistake)
https://forums.parallax.com/discussion/comment/1540024/#Comment_1540024
Why not?
Should work on any Propeller 2. If you have a Propeller 1, omit the
-p2
option.Just had a go at using Catalina. The supplied binaries complain that it requires GLIBC_2.34 ... I seem to have v2.31. Attempting to build Catalina using
./build_all
in catalina/source nets the same error ...PS: I'm currently using Kubuntu 20.04 with HWE but am planning on moving to 24.04 soon.
The provided Linux binaries will only probably work on recent Ubuntu releases, But building Catalina on Linux is fairly easy. See the BUILD.TXT document for details.
If you're building Linux binaries for distribution, you should reallly make them fully static where possible. It doesn't end up much bigger and doesn't ever have this dumbass glibc version issue. You install
musl-tools
from APT and then usemusl-gcc -static -fno-pie
as your compiler/linker.