flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

Wuerfel_21 · 2024-05-12 23:23

You don't need inline ASM for that, there's a __builtin_bswap32 intrinsic

evanh · 2024-05-13 06:33

I doubt that would be very happy with a byte array.

Wuerfel_21 · 2024-05-13 06:54

@evanh said:
I doubt that would be very happy with a byte array.

__builtin_bswap32(*(long*)&byte_array[n])

evanh · 2024-05-13 09:40

Damn! I'm giggling that that works. And it compiles to identical optimised inline code as my generalised function does. Even the same order of local vars are used.
What is that ... referenced byte array item -> cast to pointer to longword -> dereferenced back to a regular longword variable.

__deets__ · 2024-05-26 09:52

@ersmith said:

The code is difficult to share (at least half a dozen files), but if needed I can of course do that.

If it's just a matter of figuring out which files to share, flexspin has a --zip file to create a zip of the project. Just compile as you normally would but add a --zip to the command line, and it should produce a .zip file with everything in it.

I tried that, but it didn't work. This is the command I tried:
Scratch that, it works using flexspin. But now flexcc which I was using. Is there a reason not to include that into the c frontend? However you'll find the result attached. It's a bit confusing that it uses the .binary file for this, after reading the source code it appears I should be able to re-direct that, but that didn't work, gave me a "no such file or directory" error.

Besides the warnings, a new issues has cropped up that's more concerning. I have a tight loop reading a bunch of I2C-sensors. Usual loop times are around 3ms, but every now and then we are getting > 10ms for them. After a few hours of debugging thinking I might have driver issues (some of the sensors have variable acquisition times depending on over sampling rates etc) it turns out the sensor read timing is always the same. But between reading my sensor in a function, and returning from it, I incur 7ms or so. Looking into the generated assembly, I see calls to a GC. That looks scary. I need determinism, one of the big advantages for me of the platform. Is this a probably culprit, and what can I do to reduce or better eliminate any impact? My goal is 100Hz of sampling rate including IMU data that needs a stable delta t.

__deets__ · 2024-05-26 09:58

Ok, after reading the documentation I see I can force GC, which I now do and I'm getting stable readings. However I'm deeply confused as to where I acrue garbage to begin with, I'm not using any memory primitives. Can you point out where I'm doing that, and how to prevent it?

ersmith · 2024-05-26 12:39

@deets said:
Ok, after reading the documentation I see I can force GC, which I now do and I'm getting stable readings. However I'm deeply confused as to where I acrue garbage to begin with, I'm not using any memory primitives. Can you point out where I'm doing that, and how to prevent it?

It's because you're returning some large structures from functions, e.g.:

flight_sensors_data_t flight_sensors_read_data(flight_sensors_data_t* previous)
{
  flight_sensors_data_t res;
  ...
  return res;
}

The original data in res is on the stack, so in order to return it it has to be copied into long term storage, i.e. allocated on the heap. Some more sophisticated C++ compilers are able to optimize this into a copy into the caller's data, but flexcc can't do that. You can do it manually though by changing the signature to something like:

void flight_sensors_read_data(flight_sensors_data* previous, flight_sensors_data_t *ret_p)
{
   ret_p->whatever = stuff
   ...
}

There's a similar pattern in bm1422_single_shot_read -- you can tell by looking for occurences of _gc_alloc_managed in the generated .p2asm file.

evanh · 2024-05-26 13:46

Exhibit A above: C++ has always had a reputation of bloat from poorly coded programs.

Wuerfel_21 · 2024-05-26 14:29

@ersmith said:
The original data in res is on the stack, so in order to return it it has to be copied into long term storage, i.e. allocated on the heap. Some more sophisticated C++ compilers are able to optimize this into a copy into the caller's data, but flexcc can't do that.

That's not an optimization as much as it is the nature of the ABI in use.

__deets__ · 2024-05-26 15:56

@ersmith thanks, that give me something to look out for and improve. Maybe worth a warning?

Wuerfel_21 · 2024-05-26 17:33

Hmm, if changing it to passing a destination pointer is too much work, it should at least manually free the heap object when it's done with it.

__deets__ · 2024-06-08 14:24

I'm back with a rather weird behavior: I had problems producing proper CAN-frames with my system. A Pi Pico based project with the exact same code and CAN-hardware produced proper frames, but my P2-based frames always had the remote request bit set, even though that's officially set to false. After a lot failed attempts at solving this I discovered today that the problem is some frame data corruption. I attached the whole project, the problem can be reproduced without the actual hardware (all pertinent calls are commented out).

This is the code that sets up a CAN frame:

  uint8_t data[5] = {0xabU, 0xcdU, 0xefU, 0x12U, 0x00U};
  can_frame_t my_tx_frame;
  can_make_frame(&my_tx_frame, false, 0x321, sizeof(data), data, false);
  printf("remote: %d\n", my_tx_frame.remote);
  uint32_t queued_ok = 0;

Notice the queued_ok-variable. That one is being incremented for each sent frame.

Now in a loop I do this (removed commented out code):

  while (true) {
    printf("queued ok: %d\n", queued_ok);
    printf("remote: %d\n", my_tx_frame.remote);
      queued_ok++;
    can_frame_get_data(&my_tx_frame)[4]+= 2; // Update last byte of frame payload

The print results of this are

remote: 0
queued ok: 0
remote: 0
queued ok: 1
remote: 1
queued ok: 2
remote: 2
queued ok: 3
remote: 3
queued ok: 4
remote: 4
queued ok: 5
remote: 5
queued ok: 6
remote: 6
queued ok: 7
remote: 7
queued ok: 8
remote: 8
queued ok: 9
remote: 9

So even though I don't touch the remote-flag in the my_tx_frame struct, it still is in sync with queued_ok.

Any suggestions as to what could be the cause here?

ersmith · 2024-06-08 14:45

@deets you don't mention which version of flexspin / flexcc you're using, nor on what platform (is it WIndows, Mac, or Linux)? I've tried several different versions of flexspin, ranging from 6.9.2 to 6.9.8, both Windows and Linux versions, and they all print 0 for the remote on every iteration. Are you using a really old version of flexspin?

__deets__ · 2024-06-08 14:58

Sorry, I somehow assumed that was part of the ZIP or so.

deets@singlemalt:~/Dropbox/shared-stuff/FAR-Nova/FARduino/SW$ 
deets@singlemalt:~/Dropbox/shared-stuff/FAR-Nova/FARduino/SW$ /opt/flexspin/bin/flexspin --version
Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2024 Total Spectrum Software Inc. and contributors
Version 6.9.1-HEAD-v6.9.1 Compiled on: Apr 21 2024

Not that old.

Edit: but older than what you are using. I shall update my compiler and see what happens.

__deets__ · 2024-06-08 15:04

Ok, I upgraded to 6.9.4, and the problem is gone. That's the latest that is referred to in the flexprop repository (that I use to build and install). Would you suggest going to master or building directly?

ersmith · 2024-06-08 23:08

@deets said:
Ok, I upgraded to 6.9.4, and the problem is gone. That's the latest that is referred to in the flexprop repository (that I use to build and install). Would you suggest going to master or building directly?

6.9.4 is pretty recent, although there was a struct bug fixed in 6.9.5 that might be worth getting. You can get compiler binaries from https://github.com/totalspectrum/spin2cpp/releases; if you're not using the FlexProp the .zip files there will have pretty much all you need, except for loadp2 which doesn't change very often.

If you do want to build from source you should probably use the spin2cpp release/v6.9 branch, as that's considered "stable". The 7.0 version is still a work in progress and has some pretty major changes to Spin, although not so much for C.

I hope to make a new FlexProp release Real Soon Now, but it's held up waiting for my new code signing key to be approved.

evanh · 2024-06-08 23:25

@ersmith said:
I hope to make a new FlexProp release Real Soon Now, but it's held up waiting for my new code signing key to be approved.

So that needs a submission for each binary release? Tedious! Oh, and does that incur a fresh fee each time too then?

ersmith · 2024-06-09 00:11

@evanh said:

@ersmith said:
I hope to make a new FlexProp release Real Soon Now, but it's held up waiting for my new code signing key to be approved.

So that needs a submission for each binary release? Tedious! Oh, and does that incur a fresh fee each time too then?

No, just every year. But I changed vendor (the old one increased prices) and it's taking a while to get the process through. I should have started earlier before the old one expired, but I was hoping the original vendor would try to keep me by offering a discount . This whole code signing thing is a real pain on windows. It's relatively easier (and cheaper) on Mac; if only Apple's signatures were recognized by Microsoft. Sigh.

evanh · 2024-06-09 00:22

@ersmith said:
... if only Apple's signatures were recognized by Microsoft. Sigh.

That is bad. Grounds for anticompetitive behaviour.

evanh · 2024-07-06 13:09

I bumped into a possible issue with for loops not counting correctly with unsigned integers a while back. I haven't double checked it of late. This is more a reminder note for me at the moment.

evanh · 2024-07-07 11:39

@evanh said:
I bumped into a possible issue with for loops not counting correctly with unsigned integers a while back.

Right, yep, bug is still there. The attached only loops 3 of the specified 4 loops. But if the unsigned rc variable is changed to a signed variable then the program loops the correct 4 times.

PS: Not surprisingly, -O1,~loop-basic fixes it.

ersmith · 2024-07-07 16:51

Thanks @evanh . That bug should be fixed in the github sources now (it was a missing check for the unsigned version of the <= operator).

avsa242 · 2024-07-07 17:51

I know the official syntax hasn't been 100% set in stone and this was all just recently added, but should it be possible to declare an array in a (spin/2) structure's member?
e.g.,

con

    mystruct( byte member_array[5], word member2, long member3 )

This produces an error when building error: syntax error, unexpected '[', expecting ')' or ','

Thanks!

ersmith · 2024-07-08 10:27

@avsa242 said:
I know the official syntax hasn't been 100% set in stone and this was all just recently added, but should it be possible to declare an array in a (spin/2) structure's member?

Yes, it should be possible. I've fixed this in the spin2cpp github now. If you don't build flexspin yourself, a work-around until the next binary release is to use object syntax for the structs, like:

obj
   mystruct = "mystruct.spin2"

with mystruct.spin2 containing

var
   byte member_array[5]
   word member2
   long member3

Thanks,
Eric

evanh · 2024-07-21 06:59

Feature request:
I'm looking for a way to use SETQ + RDLONG to load an ordered group, or array, of local variables either before or within fcache'd pasm2 code.

I've actually been doing it with the existing smartpin based SD card driver and didn't know it wasn't really supported. Because that particular code is only dealing with a couple of registers it isn't causing any bugs. But now I'm wanting more like 8 or more of these so I've discovered that the optimiser, presumably, is discarding or reordering entries in the locals list.

eg:

{
    uint32_t  pclk, pcmd, pdat0, pdat, pled, ppow;

    __asm {
        loc pa, #pins
        setq    #5
        rdlong  pclk, pa
    }
    printf("pins %d %d %d %d %d\n", pclk, pcmd, pdat0, ppow, pled);
}

The copying of the data is fine. But because pdat is unused the optimiser swaps it with ppow, resulting in ppow containing the value that was meant to be in pdat.

PS: pins is a structure:

typedef struct pins_t {
    uint32_t  clk, cmd, dat0, dat, led, pow, clkdiv, rxlag;
} pins_t;

static pins_t  pins;

evanh · 2024-07-21 07:41

I suppose there is the option of utilising registers $1e0..$1ef, or Spin2's PR0..PR7 ... ah, how does one address cogRAM numerically from C?

Wuerfel_21 · 2024-07-21 12:32

PRx is the official way of doing it. In C they should be available (like all other registers) as _PR0 etc. There really should be a better way to make use of block read/write, since especially RDxxxx is such a slow instruction.

evanh · 2024-07-21 13:03

@Wuerfel_21 said:
PRx is the official way of doing it. In C they should be available (like all other registers) as _PR0 etc. There really should be a better way to make use of block read/write, since especially RDxxxx is such a slow instruction.

Can't just printf( "%x",*(&pr7+1) ); or whatever to get to PR8 equivalent. That doesn't equate to a register.

Wuerfel_21 · 2024-07-21 13:06

oh, there is no PR8

evanh · 2024-07-21 13:20

exactly, yet there is 16 registers allotted there. PR0 maps to $1e0.

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

Comments