FlexProp: a complete programming system for P2 (and P1)

ersmith · 2023-02-03 18:07

@Ariba : Seems like this would go well with riscvp2 (https://forums.parallax.com/discussion/170295/riscvp2-a-c-and-c-compiler-for-p2/p1) (https://github.com/totalspectrum/riscvp2). It probably wouldn't be too hard to have it prepend the RISC-V JIT header to the generated code, and then we'd have a self-hosted compiler for P2 !

Ariba · 2023-02-04 09:19

@ersmith said:
@Ariba : Seems like this would go well with riscvp2 (https://forums.parallax.com/discussion/170295/riscvp2-a-c-and-c-compiler-for-p2/p1) (https://github.com/totalspectrum/riscvp2). It probably wouldn't be too hard to have it prepend the RISC-V JIT header to the generated code, and then we'd have a self-hosted compiler for P2 !

Yeah, that was the plan.
But this C compiler seems to be very limited. Not even divide and modulo is supported. I like the intermediate code representation, but this shows also that the generated code is totally unoptimized.
The biggest problem is: All the source code, the intermediate code and the output need to be in memory at the same time. While this is not a problem on a PC, for the P2 it means all have to fit into 512 kB, together with the compiler itself. So only small programs may be compiled on the P2.
Here is the original source of this RISC-V / ARM C compiler:
https://github.com/mausimus/rvcc

Andy

__deets__ · 2023-02-04 10:32

I'm trying flexcc for the first time, and hit a bump in the road.

I try to emulate a vblank signal using the following code:

void emulate_vblank(void* arg)
{
  unsigned t = getcnt() + CLKFREQ / 50;
  vblank_time = CLKFREQ / (50 * (750 / 42)); // Should be ~94117, is 0
  for(;;)
  {
    waitcnt(t);
    setpin(VBLANK_EMULATOR_PIN, 1);
    t += CLKFREQ / 50;
    waitcnt(getcnt() + vblank_time);
    setpin(VBLANK_EMULATOR_PIN, 0);
  }
}

void main()
{
  vblank_time = 0;
  fds.start(RX_PIN, TX_PIN, 0, SERIAL_BPS);
  fds.str("hello, world!\r\n");
  static long vblank_stack[32];
  cogstart(emulate_vblank, 0, &vblank_stack, 32);
  for(;;)
  {
    fds.dec(vblank_time);
    fds.str("\r\n");
  }
}

I tested with gcc and a simple test-program:

#include <stdio.h>
#define CLKFREQ 80000000


int main(int argc, char *argv[])
{
  unsigned int vblank_time = CLKFREQ / (50 * (750 / 42));
  printf("vblank_time: %ul\n", vblank_time);
  return 0;
}

That works out as advertised, produces

vblank_time: 94117l

Version:

deets@singlemalt:/tmp$ /opt/flexspin/bin/flexcc --version
FlexC compiler (c) 2011-2023 Total Spectrum Software Inc. and contributors
Version 5.9.26-HEAD-v5.9.26-1-g21bb446d Compiled on: Jan 21 2023

ersmith · 2023-02-04 12:46

@deets : I can't tell from your snippet why your program isn't working. I tried something similar (the complete program is attached) and it prints out 94117 as expected. Perhaps something is wrong with the serial object you're using? Or maybe something in the part of the program you didn't post is conflicting?

Here's the program:

#include <stdio.h>
#include <propeller.h>
#define VBLANK_EMULATOR_PIN 16

unsigned vblank_time;

void emulate_vblank(void* arg)
{
  unsigned t = getcnt() + CLKFREQ / 50;
  vblank_time = CLKFREQ / (50 * (750 / 42)); // Should be ~94117, is 0
  for(;;)
  {
    waitcnt(t);
    _pinh(VBLANK_EMULATOR_PIN);
    t += CLKFREQ / 50;
    waitcnt(getcnt() + vblank_time);
    _pinl(VBLANK_EMULATOR_PIN);
  }
}

void main()
{
  vblank_time = 0;
  printf("hello, world!\r\n");
  static long vblank_stack[32];
  cogstart(emulate_vblank, 0, &vblank_stack, 32);
  for(;;)
  {
    printf("vblank_time = %u\r\n", vblank_time);
    _waitms(1000);
  }
}

And here is the output when run on a P1:

Propeller Version 1 on /dev/ttyUSB4
Loading foo.binary to hub memory
2892 bytes sent                  
Verifying RAM ... OK
[ Entering terminal mode. Type ESC or Control-C to exit. ]
hello, world!
vblank_time = 94117
vblank_time = 94117
vblank_time = 94117
vblank_time = 94117
vblank_time = 94117

One thing that's always worth trying when a multi-cog program doesn't work is to increase the size of the stack given to the other cogs; 32 longs is a bit on the small side for many things.

__deets__ · 2023-02-04 12:55

Hm. Weird. This is the current state, I'm actively working on it (so it's a tiny bit more complex now).

I'm also targeting P1, if that wasn't clear from context somehow.

#include <propeller.h>
#include <string.h>

struct __using("FullDuplexSerial.spin") fds;

#define TX_PIN 6
#define RX_PIN 7
#define SERIAL_BPS 115200
#define VBLANK_EMULATOR_PIN 26
#define INPUT_BUFFER_SIZE 127

int vblank_cog = -1;
const char* INPUT_DELIMITERS = " ";

void emulate_vblank(void* arg)
{
  unsigned t = getcnt() + CLKFREQ / 50;
  for(;;)
  {
    waitcnt(t);
    setpin(VBLANK_EMULATOR_PIN, 1);
    t += CLKFREQ / 50;
    waitcnt(getcnt() + CLKFREQ / (50 * (750 / 42)));
    setpin(VBLANK_EMULATOR_PIN, 0);
  }
}

void vblank_toggle()
{
  static long vblank_stack[64];
  if(vblank_cog == -1)
  {
      vblank_cog = cogstart(emulate_vblank, 0, &vblank_stack, 64);
  }
  else
  {
    cogstop(vblank_cog);
    vblank_cog = -1;
  }
}

void parse_command(char* input_buffer)
{
  char* command = strtok(input_buffer, INPUT_DELIMITERS);
  if(command && strlen(command) == 1)
  {
    switch(command[0])
    {
    case 'v':
      vblank_toggle();
      break;
    }
  }
}

void modeline()
{
  fds.str("#MODE:");
  fds.tx(vblank_cog == -1 ? '-' : 'v');
  fds.str("\r\n");
}

void main()
{
  fds.start(RX_PIN, TX_PIN, 0, SERIAL_BPS);
  fds.str("hello, world!\r\n");
  char input_buffer[INPUT_BUFFER_SIZE];
  memset(input_buffer, 0, sizeof(input_buffer));
  int input_pos = 0;

  for(;;)
  {
    int c = fds.rxcheck();
    if(c != -1)
    {
      switch(c)
      {
      case '\r':
        // zero-terminate the string.
        input_buffer[input_pos] = 0;
        parse_command(input_buffer);
        memset(input_buffer, 0, sizeof(input_buffer));
        input_pos = 0;
        modeline();
        break;
      default:
        input_buffer[input_pos] = c;
        input_pos = (input_pos + 1) % INPUT_BUFFER_SIZE;
      }
    }
  }
}

This is the generated pasm:

_emulate_vblank
    mov COUNT_, #1
    call    #pushregs_
    mov local01, cnt
    rdlong  muldiva_, #0
    mov muldivb_, #50
    call    #unsdivide_
    add local01, muldivb_
    call    #LMM_FCACHE_LOAD
    long    (@@@LR__0021-@@@LR__0020)
LR__0020
    mov arg01, local01
    waitcnt arg01, #0
    or  outa, imm_67108864_
    or  dira, imm_67108864_
    rdlong  muldiva_, #0
    mov muldivb_, #50
    call    #unsdivide_
    add local01, muldivb_
    mov arg01, cnt
    mov muldiva_, result1
    mov muldivb_, imm_850_
    call    #unsdivide_
    add arg01, muldivb_
    waitcnt arg01, #0
    andn    outa, imm_67108864_
    or  dira, imm_67108864_
    jmp #LMM_FCACHE_START + (LR__0020 - LR__0020)
LR__0021
    mov sp, fp
    call    #popregs_
_emulate_vblank_ret
    call    #LMM_RET

What I find surprising: seeing how rdlong muldiva_, #0 appears to load CLKFREQ, why is for the second calculation result1 used?

Wuerfel_21 · 2023-02-04 15:00

@deets said:
What I find surprising: seeing how rdlong muldiva_, #0 appears to load CLKFREQ, why is for the second calculation result1 used?

Known bug, fixed in 5.9.27 I think. (@ersmith you forgot to push the tag for that version btw. Also the bugfix isn't mentioned in the changelog.).

EDIT: No actually, this was fixed by the current bleeding edge P1 multiply changes (FindNextRead was forwarding results across mul/div calls, but results were treated as dead after the call)

Though for a different reason the current git master segfaults on your sample code without disabling the loop-basic optflag... You win some, you loose some.

ersmith · 2023-02-04 16:57

@deets try the latest code in github, it should have your issue (and the crash that Ada noticed) fixed.

__deets__ · 2023-02-11 13:16

I'm stumbling over a problem that I can't work out. I'm working on a generic ringbuffer-implementation in C and created a little test framework. Part of that is asserts and utility functions printing via FullDuplexSerial (all on P1).

It worked nicely so far, but now I'm stumped. I tried creating a generic ringbuffer_dump-implementation, but invoking this kicks the P1 into nirvana.

This is the relevant portions of the code (I can share it all, but it's a bit much for a posting and not yet on github):

// serial.h
#ifndef SERIAL_H
#define SERIAL_H
typedef struct __using("FullDuplexSerial.spin") fds_t;
#endif // SERIAL_H
// main.c
void main()
{
  g_fds.start(RX_PIN, TX_PIN, 0, SERIAL_BPS); // defined as global constant
  #ifdef TEST
  ringbuffer_tests(&g_fds);
  #endif
}
// ringbuffer.c
void ringbuffer_dump(ringbuffer_t*, fds_t* fds)
{
  fds->str("-----\r\n");   // <-- Here is the problem. Commenting this out makes the code run just fine.
}
void ringbuffer_tests(fds_t* fds)
{
  fds->str("ringbuffer_tests:begin\r\n");
  ringbuffer_dump(0, fds);
}

Any suggestion as to what I'm missing here with passing fds two function calls deep?

ersmith · 2023-02-11 19:27

@deets : The immediate problem is that FlexC isn't correctly parsing:

void ringbuffer_dump(ringbuffer_t*, fds_t* fds)

It's leaving the first parameter off alltogether . That should have thrown a warning or error somewhere along the way, so I'll try to figure out what's going wrong.
In the meantime, if you provide a dummy variable name for the missing parameter it should work:

void ringbuffer_dump(ringbuffer_t* r_dummy, fds_t* fds)

ersmith · 2023-02-11 19:40

@ersmith said:
@deets : The immediate problem is that FlexC isn't correctly parsing:
void ringbuffer_dump(ringbuffer_t*, fds_t* fds)
It's leaving the first parameter off alltogether . That should have thrown a warning or error somewhere along the way, so I'll try to figure out what's going wrong.
In the meantime, if you provide a dummy variable name for the missing parameter it should work:
void ringbuffer_dump(ringbuffer_t* r_dummy, fds_t* fds)

That should be fixed now in the latest github code.

iseries · 2023-02-14 16:57

I am playing around with low power on the P2 and have a question about timing.

I run the P2 at 200Mhz and want to put it in low power mode for a while:

int main(int argc, char** argv)
{

   //Slow speed low power mode
    _clkset(_clkmode &0xfffd, _clkfreq/10000);

    _pinh(20);

    while (1)
    {
        _pinl(56);
        _pinl(57);
        sleep(1);
        _pinh(56);
        sleep(1);
    }
}

This code works but the _waitms(1000) does not work.
With the P2 at 20khz it uses a lot less power and I should be able to run on batteries for lot longer.

Mike

ersmith · 2023-02-15 16:21

@iseries : Are you sure that _clkmode & 0xfffd is producing the correct mode for the new frequency? That looks a little dubious to me. Note that the frequency argument is basically only informative, it's the clock mode that really changes the hardware.

iseries · 2023-02-15 16:37

The way I see it, is _waitms which is a spin program uses __clkfreq_var whereas sleep and usleep uses _clkfreq.

The program should blink the LED at 1 time per second which it does with the sleep function and with _waitms it waits a really long time.

Mike

ersmith · 2023-02-15 19:01

@iseries said:
The way I see it, is _waitms which is a spin program uses __clkfreq_var whereas sleep and usleep uses _clkfreq.

The program should blink the LED at 1 time per second which it does with the sleep function and with _waitms it waits a really long time.

Mike

Which version of flexspin / flexprop are you using? When I ran your test program with a recent version, I got exactly the same results (LED blinking at 1hz) whether I used sleep() or _waitms(1000). However, I did see it waiting a long time with _waitus(1000000), probably because the clock frequency is below 1Mhz.

iseries · 2023-02-15 19:55

Your right, the program works. Must have forgot to divide the frequency.

Now I want to switch from rcslow back to normal 200Mhz and this stops the processor cold.

_clkset(_clkmode | 0x02, 200000000);

Mike

iseries · 2023-02-15 21:37

Found it. The _clkmode variable is truncated as if for the P1 and not the P2. It only returns the byte value (0xfb).

Using a clock mode of 0x010413fb works just fine.

Mike

ersmith · 2023-02-15 21:56

@iseries : Ah, the type of "_clkmode" was always byte. It should be long in P2. Fixed now in the github sources, thanks.

iseries · 2023-02-17 15:29

These floating functions don't seem to work?

#include <stdio.h>
#include <string.h>
#include <propeller.h>

char data[] = "123.45";

int main(int argc, char** argv)
{
    int i;
    float f;

    f = atof(data);
    f = strtof(data);

    while (1)
    {
        _pinl(56);
        _pinl(57);
        sleep(1);
        _pinh(56);
        _waitms(1000);
    }
}

Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2023 Total Spectrum Software Inc. and contributors
Version 6.0.0-beta-v5.9.28-59-g0db93322 Compiled on: Feb 15 2023
D:/Documents/MyProjects/P2/TestCode.c:20: error: unknown identifier atof used in function call
D:/Documents/MyProjects/P2/TestCode.c:20: error: Unknown symbol atof
D:/Documents/MyProjects/P2/TestCode.c:21: error: unknown identifier strtof used in function call
D:/Documents/MyProjects/P2/TestCode.c:21: error: Unknown symbol strtof

Mike

ersmith · 2023-02-17 16:36

@iseries: I just checked in implementations for atof and strtof. Remember to #include <stdlib.h> to get their definiitions, and also note that strtof has two parameters (the second is a pointer to the end of the converted string).

iseries · 2023-02-17 17:01

That fixed it, works great.

Thank You,

Mike

RS_Jim · 2023-02-21 13:44

@ersmith
Eric, what happens if I compile a program to a bin file under Proptool, and then load the bin via flexprop?
Will it have the same functionality as if compiled and loaded under proptool?
Jim

ersmith · 2023-02-21 18:36

@RS_Jim said:
@ersmith
Eric, what happens if I compile a program to a bin file under Proptool, and then load the bin via flexprop?
Will it have the same functionality as if compiled and loaded under proptool?
Jim

What do you mean by "functionality"?

As far as everything that runs on the P2 is concerned, the P2 doesn't know (or care) how the binary was compiled, the results will be identical. But the code running on the PC is different, obviously, which means that PC specific things like debug windows may work differently or not at all in the two environments.

RS_Jim · 2023-02-21 19:56

@ersmith
Eric,
What I want to do is compile a program under Proptool that uses "REGEDIT" which you have said will never happen in flexprop, and upload the binary with flexprop. Right now, I cannot upload from Proptool under wine as I have not been able to get it to see and connect to the serial ports. Because I can run flexprop in Linux, I have that capability there. My thought was to compile my program under Proptool and upload it to the P2 using flexprop. At that point the only thing that I need is the terminal program which will run fine under flexprop. I am still working on getting proptool to run completely under wine, but I am not there yet. Eventually, I will rewrite the program containing the isr to put that code in another cog, but I wanted to observe the program functioning before I went through the effort of rewriting it to run in its own cog.
Thanks for all you do.
Jim

dgately · 2023-02-21 23:59

@RS_Jim said:
@ersmith
Eric,
What I want to do is compile a program under Proptool that uses "REGEDIT" which you have said will never happen in flexprop, and upload the binary with flexprop. Right now, I cannot upload from Proptool under wine as I have not been able to get it to see and connect to the serial ports. Because I can run flexprop in Linux, I have that capability there. My thought was to compile my program under Proptool and upload it to the P2 using flexprop.

Why not just try it! Seems like a test that will take about a minute...

dgately

iseries · 2023-02-23 11:55

Having problems with some code:

I think the _waitms, and _waitus are broken.

_waitms(500) waits about half a microsecond. Actually, everything below 1000 waits that long.

I don't do spin but it looks like:

ms = m * freq / 1000;
us = m * freq / 1000000;
_waitx(m);

Mike

ersmith · 2023-02-23 12:54

@iseries : thanks for the heads up, that should be fixed in the github sources now.

iseries · 2023-02-23 13:26

I was scratching my head for a while trying to figure out why the same code on the P1 was working and not on the P2.

Question, could you not get rid of the 64 bit divide if you divided the frequency first?

m:= m * (freq / 1000)

Mike

ersmith · 2023-02-24 11:59

@iseries said:
I was scratching my head for a while trying to figure out why the same code on the P1 was working and not on the P2.

Question, could you not get rid of the 64 bit divide if you divided the frequency first?
m:= m * (freq / 1000)
Mike

I didn't want to assume anything about the frequency. For the 1000 (ms) case it's not so bad, most frequencies will be a multiple of 1000, but for 1000000 (us) it's not always true that the frequency is a multiple of 1 MHz, or indeed even above 1 MHz.

ersmith · 2023-02-26 17:44

@iseries : I've checked in some more improvements to _waitms and _waitus. They're now much more accurate; on P2 they're within ~43 cycles of the correct wait time, on P1 it's unfortunately less accurate (to only about ~600 cycles or so).

RS_Jim · 2023-02-28 00:03

@ersmith : What is the latest version?
Jim

FlexProp: a complete programming system for P2 (and P1)

Comments