Yeah, that was the plan.
But this C compiler seems to be very limited. Not even divide and modulo is supported. I like the intermediate code representation, but this shows also that the generated code is totally unoptimized.
The biggest problem is: All the source code, the intermediate code and the output need to be in memory at the same time. While this is not a problem on a PC, for the P2 it means all have to fit into 512 kB, together with the compiler itself. So only small programs may be compiled on the P2.
Here is the original source of this RISC-V / ARM C compiler: https://github.com/mausimus/rvcc
deets@singlemalt:/tmp$ /opt/flexspin/bin/flexcc --version
FlexC compiler (c) 2011-2023 Total Spectrum Software Inc. and contributors
Version5.9.26-HEAD-v5.9.26-1-g21bb446d Compiled on: Jan 212023
@deets : I can't tell from your snippet why your program isn't working. I tried something similar (the complete program is attached) and it prints out 94117 as expected. Perhaps something is wrong with the serial object you're using? Or maybe something in the part of the program you didn't post is conflicting?
Propeller Version 1on /dev/ttyUSB4Loading foo.binary to hub memory
2892 bytes sent
Verifying RAM ... OK
[ Entering terminal mode. Type ESC or Control-C to exit. ]hello, world!
vblank_time = 94117vblank_time = 94117vblank_time = 94117vblank_time = 94117vblank_time = 94117
One thing that's always worth trying when a multi-cog program doesn't work is to increase the size of the stack given to the other cogs; 32 longs is a bit on the small side for many things.
@deets said:
What I find surprising: seeing how rdlong muldiva_, #0 appears to load CLKFREQ, why is for the second calculation result1 used?
Known bug, fixed in 5.9.27 I think. (@ersmith you forgot to push the tag for that version btw. Also the bugfix isn't mentioned in the changelog.).
EDIT: No actually, this was fixed by the current bleeding edge P1 multiply changes (FindNextRead was forwarding results across mul/div calls, but results were treated as dead after the call)
Though for a different reason the current git master segfaults on your sample code without disabling the loop-basic optflag... You win some, you loose some.
I'm stumbling over a problem that I can't work out. I'm working on a generic ringbuffer-implementation in C and created a little test framework. Part of that is asserts and utility functions printing via FullDuplexSerial (all on P1).
It worked nicely so far, but now I'm stumped. I tried creating a generic ringbuffer_dump-implementation, but invoking this kicks the P1 into nirvana.
This is the relevant portions of the code (I can share it all, but it's a bit much for a posting and not yet on github):
// serial.h#ifndef SERIAL_H#define SERIAL_Htypedefstruct __using("FullDuplexSerial.spin") fds_t;#endif// SERIAL_H// main.cvoidmain(){
g_fds.start(RX_PIN, TX_PIN, 0, SERIAL_BPS); // defined as global constant#ifdef TEST
ringbuffer_tests(&g_fds);
#endif
}
// ringbuffer.cvoidringbuffer_dump(ringbuffer_t*, fds_t* fds){
fds->str("-----\r\n"); // <-- Here is the problem. Commenting this out makes the code run just fine.
}
voidringbuffer_tests(fds_t* fds){
fds->str("ringbuffer_tests:begin\r\n");
ringbuffer_dump(0, fds);
}
Any suggestion as to what I'm missing here with passing fds two function calls deep?
@deets : The immediate problem is that FlexC isn't correctly parsing:
voidringbuffer_dump(ringbuffer_t*, fds_t* fds)
It's leaving the first parameter off alltogether . That should have thrown a warning or error somewhere along the way, so I'll try to figure out what's going wrong.
In the meantime, if you provide a dummy variable name for the missing parameter it should work:
@ersmith said:
@deets : The immediate problem is that FlexC isn't correctly parsing:
voidringbuffer_dump(ringbuffer_t*, fds_t* fds)
It's leaving the first parameter off alltogether . That should have thrown a warning or error somewhere along the way, so I'll try to figure out what's going wrong.
In the meantime, if you provide a dummy variable name for the missing parameter it should work:
This code works but the _waitms(1000) does not work.
With the P2 at 20khz it uses a lot less power and I should be able to run on batteries for lot longer.
@iseries : Are you sure that _clkmode & 0xfffd is producing the correct mode for the new frequency? That looks a little dubious to me. Note that the frequency argument is basically only informative, it's the clock mode that really changes the hardware.
@iseries said:
The way I see it, is _waitms which is a spin program uses __clkfreq_var whereas sleep and usleep uses _clkfreq.
The program should blink the LED at 1 time per second which it does with the sleep function and with _waitms it waits a really long time.
Mike
Which version of flexspin / flexprop are you using? When I ran your test program with a recent version, I got exactly the same results (LED blinking at 1hz) whether I used sleep() or _waitms(1000). However, I did see it waiting a long time with _waitus(1000000), probably because the clock frequency is below 1Mhz.
#include<stdio.h>#include<string.h>#include<propeller.h>char data[] = "123.45";
intmain(int argc, char** argv){
int i;
float f;
f = atof(data);
f = strtof(data);
while (1)
{
_pinl(56);
_pinl(57);
sleep(1);
_pinh(56);
_waitms(1000);
}
}
Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2023 Total Spectrum Software Inc. and contributors
Version6.0.0-beta-v5.9.28-59-g0db93322 Compiled on: Feb 152023D:/Documents/MyProjects/P2/TestCode.c:20: error: unknown identifier atof used in function call
D:/Documents/MyProjects/P2/TestCode.c:20: error: Unknown symbol atof
D:/Documents/MyProjects/P2/TestCode.c:21: error: unknown identifier strtof used in function call
D:/Documents/MyProjects/P2/TestCode.c:21: error: Unknown symbol strtof
@iseries: I just checked in implementations for atof and strtof. Remember to #include <stdlib.h> to get their definiitions, and also note that strtof has two parameters (the second is a pointer to the end of the converted string).
@ersmith
Eric, what happens if I compile a program to a bin file under Proptool, and then load the bin via flexprop?
Will it have the same functionality as if compiled and loaded under proptool?
Jim
@RS_Jim said:
@ersmith
Eric, what happens if I compile a program to a bin file under Proptool, and then load the bin via flexprop?
Will it have the same functionality as if compiled and loaded under proptool?
Jim
What do you mean by "functionality"?
As far as everything that runs on the P2 is concerned, the P2 doesn't know (or care) how the binary was compiled, the results will be identical. But the code running on the PC is different, obviously, which means that PC specific things like debug windows may work differently or not at all in the two environments.
@ersmith
Eric,
What I want to do is compile a program under Proptool that uses "REGEDIT" which you have said will never happen in flexprop, and upload the binary with flexprop. Right now, I cannot upload from Proptool under wine as I have not been able to get it to see and connect to the serial ports. Because I can run flexprop in Linux, I have that capability there. My thought was to compile my program under Proptool and upload it to the P2 using flexprop. At that point the only thing that I need is the terminal program which will run fine under flexprop. I am still working on getting proptool to run completely under wine, but I am not there yet. Eventually, I will rewrite the program containing the isr to put that code in another cog, but I wanted to observe the program functioning before I went through the effort of rewriting it to run in its own cog.
Thanks for all you do.
Jim
@RS_Jim said:
@ersmith
Eric,
What I want to do is compile a program under Proptool that uses "REGEDIT" which you have said will never happen in flexprop, and upload the binary with flexprop. Right now, I cannot upload from Proptool under wine as I have not been able to get it to see and connect to the serial ports. Because I can run flexprop in Linux, I have that capability there. My thought was to compile my program under Proptool and upload it to the P2 using flexprop.
Why not just try it! Seems like a test that will take about a minute...
@iseries said:
I was scratching my head for a while trying to figure out why the same code on the P1 was working and not on the P2.
Question, could you not get rid of the 64 bit divide if you divided the frequency first?
m:= m * (freq / 1000)
Mike
I didn't want to assume anything about the frequency. For the 1000 (ms) case it's not so bad, most frequencies will be a multiple of 1000, but for 1000000 (us) it's not always true that the frequency is a multiple of 1 MHz, or indeed even above 1 MHz.
@iseries : I've checked in some more improvements to _waitms and _waitus. They're now much more accurate; on P2 they're within ~43 cycles of the correct wait time, on P1 it's unfortunately less accurate (to only about ~600 cycles or so).
Comments
@Ariba : Seems like this would go well with riscvp2 (https://forums.parallax.com/discussion/170295/riscvp2-a-c-and-c-compiler-for-p2/p1) (https://github.com/totalspectrum/riscvp2). It probably wouldn't be too hard to have it prepend the RISC-V JIT header to the generated code, and then we'd have a self-hosted compiler for P2 !
Yeah, that was the plan.
But this C compiler seems to be very limited. Not even divide and modulo is supported. I like the intermediate code representation, but this shows also that the generated code is totally unoptimized.
The biggest problem is: All the source code, the intermediate code and the output need to be in memory at the same time. While this is not a problem on a PC, for the P2 it means all have to fit into 512 kB, together with the compiler itself. So only small programs may be compiled on the P2.
Here is the original source of this RISC-V / ARM C compiler:
https://github.com/mausimus/rvcc
Andy
I'm trying flexcc for the first time, and hit a bump in the road.
I try to emulate a vblank signal using the following code:
void emulate_vblank(void* arg) { unsigned t = getcnt() + CLKFREQ / 50; vblank_time = CLKFREQ / (50 * (750 / 42)); // Should be ~94117, is 0 for(;;) { waitcnt(t); setpin(VBLANK_EMULATOR_PIN, 1); t += CLKFREQ / 50; waitcnt(getcnt() + vblank_time); setpin(VBLANK_EMULATOR_PIN, 0); } } void main() { vblank_time = 0; fds.start(RX_PIN, TX_PIN, 0, SERIAL_BPS); fds.str("hello, world!\r\n"); static long vblank_stack[32]; cogstart(emulate_vblank, 0, &vblank_stack, 32); for(;;) { fds.dec(vblank_time); fds.str("\r\n"); } }
I tested with gcc and a simple test-program:
#include <stdio.h> #define CLKFREQ 80000000 int main(int argc, char *argv[]) { unsigned int vblank_time = CLKFREQ / (50 * (750 / 42)); printf("vblank_time: %ul\n", vblank_time); return 0; }
That works out as advertised, produces
vblank_time: 94117l
Version:
deets@singlemalt:/tmp$ /opt/flexspin/bin/flexcc --version FlexC compiler (c) 2011-2023 Total Spectrum Software Inc. and contributors Version 5.9.26-HEAD-v5.9.26-1-g21bb446d Compiled on: Jan 21 2023
@deets : I can't tell from your snippet why your program isn't working. I tried something similar (the complete program is attached) and it prints out 94117 as expected. Perhaps something is wrong with the serial object you're using? Or maybe something in the part of the program you didn't post is conflicting?
Here's the program:
#include <stdio.h> #include <propeller.h> #define VBLANK_EMULATOR_PIN 16 unsigned vblank_time; void emulate_vblank(void* arg) { unsigned t = getcnt() + CLKFREQ / 50; vblank_time = CLKFREQ / (50 * (750 / 42)); // Should be ~94117, is 0 for(;;) { waitcnt(t); _pinh(VBLANK_EMULATOR_PIN); t += CLKFREQ / 50; waitcnt(getcnt() + vblank_time); _pinl(VBLANK_EMULATOR_PIN); } } void main() { vblank_time = 0; printf("hello, world!\r\n"); static long vblank_stack[32]; cogstart(emulate_vblank, 0, &vblank_stack, 32); for(;;) { printf("vblank_time = %u\r\n", vblank_time); _waitms(1000); } }
And here is the output when run on a P1:
Propeller Version 1 on /dev/ttyUSB4 Loading foo.binary to hub memory 2892 bytes sent Verifying RAM ... OK [ Entering terminal mode. Type ESC or Control-C to exit. ] hello, world! vblank_time = 94117 vblank_time = 94117 vblank_time = 94117 vblank_time = 94117 vblank_time = 94117
One thing that's always worth trying when a multi-cog program doesn't work is to increase the size of the stack given to the other cogs; 32 longs is a bit on the small side for many things.
Hm. Weird. This is the current state, I'm actively working on it (so it's a tiny bit more complex now).
I'm also targeting P1, if that wasn't clear from context somehow.
#include <propeller.h> #include <string.h> struct __using("FullDuplexSerial.spin") fds; #define TX_PIN 6 #define RX_PIN 7 #define SERIAL_BPS 115200 #define VBLANK_EMULATOR_PIN 26 #define INPUT_BUFFER_SIZE 127 int vblank_cog = -1; const char* INPUT_DELIMITERS = " "; void emulate_vblank(void* arg) { unsigned t = getcnt() + CLKFREQ / 50; for(;;) { waitcnt(t); setpin(VBLANK_EMULATOR_PIN, 1); t += CLKFREQ / 50; waitcnt(getcnt() + CLKFREQ / (50 * (750 / 42))); setpin(VBLANK_EMULATOR_PIN, 0); } } void vblank_toggle() { static long vblank_stack[64]; if(vblank_cog == -1) { vblank_cog = cogstart(emulate_vblank, 0, &vblank_stack, 64); } else { cogstop(vblank_cog); vblank_cog = -1; } } void parse_command(char* input_buffer) { char* command = strtok(input_buffer, INPUT_DELIMITERS); if(command && strlen(command) == 1) { switch(command[0]) { case 'v': vblank_toggle(); break; } } } void modeline() { fds.str("#MODE:"); fds.tx(vblank_cog == -1 ? '-' : 'v'); fds.str("\r\n"); } void main() { fds.start(RX_PIN, TX_PIN, 0, SERIAL_BPS); fds.str("hello, world!\r\n"); char input_buffer[INPUT_BUFFER_SIZE]; memset(input_buffer, 0, sizeof(input_buffer)); int input_pos = 0; for(;;) { int c = fds.rxcheck(); if(c != -1) { switch(c) { case '\r': // zero-terminate the string. input_buffer[input_pos] = 0; parse_command(input_buffer); memset(input_buffer, 0, sizeof(input_buffer)); input_pos = 0; modeline(); break; default: input_buffer[input_pos] = c; input_pos = (input_pos + 1) % INPUT_BUFFER_SIZE; } } } }
This is the generated pasm:
_emulate_vblank mov COUNT_, #1 call #pushregs_ mov local01, cnt rdlong muldiva_, #0 mov muldivb_, #50 call #unsdivide_ add local01, muldivb_ call #LMM_FCACHE_LOAD long (@@@LR__0021-@@@LR__0020) LR__0020 mov arg01, local01 waitcnt arg01, #0 or outa, imm_67108864_ or dira, imm_67108864_ rdlong muldiva_, #0 mov muldivb_, #50 call #unsdivide_ add local01, muldivb_ mov arg01, cnt mov muldiva_, result1 mov muldivb_, imm_850_ call #unsdivide_ add arg01, muldivb_ waitcnt arg01, #0 andn outa, imm_67108864_ or dira, imm_67108864_ jmp #LMM_FCACHE_START + (LR__0020 - LR__0020) LR__0021 mov sp, fp call #popregs_ _emulate_vblank_ret call #LMM_RET
What I find surprising: seeing how rdlong muldiva_, #0 appears to load CLKFREQ, why is for the second calculation result1 used?
Known bug, fixed in 5.9.27 I think. (@ersmith you forgot to push the tag for that version btw. Also the bugfix isn't mentioned in the changelog.).
EDIT: No actually, this was fixed by the current bleeding edge P1 multiply changes (FindNextRead was forwarding results across mul/div calls, but results were treated as dead after the call)
Though for a different reason the current git master segfaults on your sample code without disabling the
loop-basic
optflag... You win some, you loose some.@deets try the latest code in github, it should have your issue (and the crash that Ada noticed) fixed.
I'm stumbling over a problem that I can't work out. I'm working on a generic ringbuffer-implementation in C and created a little test framework. Part of that is asserts and utility functions printing via FullDuplexSerial (all on P1).
It worked nicely so far, but now I'm stumped. I tried creating a generic
ringbuffer_dump
-implementation, but invoking this kicks the P1 into nirvana.This is the relevant portions of the code (I can share it all, but it's a bit much for a posting and not yet on github):
// serial.h #ifndef SERIAL_H #define SERIAL_H typedef struct __using("FullDuplexSerial.spin") fds_t; #endif // SERIAL_H // main.c void main() { g_fds.start(RX_PIN, TX_PIN, 0, SERIAL_BPS); // defined as global constant #ifdef TEST ringbuffer_tests(&g_fds); #endif } // ringbuffer.c void ringbuffer_dump(ringbuffer_t*, fds_t* fds) { fds->str("-----\r\n"); // <-- Here is the problem. Commenting this out makes the code run just fine. } void ringbuffer_tests(fds_t* fds) { fds->str("ringbuffer_tests:begin\r\n"); ringbuffer_dump(0, fds); }
Any suggestion as to what I'm missing here with passing fds two function calls deep?
@deets : The immediate problem is that FlexC isn't correctly parsing:
void ringbuffer_dump(ringbuffer_t*, fds_t* fds)
It's leaving the first parameter off alltogether
. That should have thrown a warning or error somewhere along the way, so I'll try to figure out what's going wrong.
In the meantime, if you provide a dummy variable name for the missing parameter it should work:
void ringbuffer_dump(ringbuffer_t* r_dummy, fds_t* fds)
That should be fixed now in the latest github code.
I am playing around with low power on the P2 and have a question about timing.
I run the P2 at 200Mhz and want to put it in low power mode for a while:
int main(int argc, char** argv) { //Slow speed low power mode _clkset(_clkmode &0xfffd, _clkfreq/10000); _pinh(20); while (1) { _pinl(56); _pinl(57); sleep(1); _pinh(56); sleep(1); } }
This code works but the _waitms(1000) does not work.
With the P2 at 20khz it uses a lot less power and I should be able to run on batteries for lot longer.
Mike
@iseries : Are you sure that
_clkmode & 0xfffd
is producing the correct mode for the new frequency? That looks a little dubious to me. Note that the frequency argument is basically only informative, it's the clock mode that really changes the hardware.The way I see it, is _waitms which is a spin program uses __clkfreq_var whereas sleep and usleep uses _clkfreq.
The program should blink the LED at 1 time per second which it does with the sleep function and with _waitms it waits a really long time.
Mike
Which version of flexspin / flexprop are you using? When I ran your test program with a recent version, I got exactly the same results (LED blinking at 1hz) whether I used
sleep()
or_waitms(1000)
. However, I did see it waiting a long time with_waitus(1000000)
, probably because the clock frequency is below 1Mhz.Your right, the program works. Must have forgot to divide the frequency.
Now I want to switch from rcslow back to normal 200Mhz and this stops the processor cold.
_clkset(_clkmode | 0x02, 200000000);
Mike
Found it. The
_clkmode
variable is truncated as if for the P1 and not the P2. It only returns the byte value (0xfb).Using a clock mode of 0x010413fb works just fine.
Mike
@iseries : Ah, the type of "_clkmode" was always
byte
. It should belong
in P2. Fixed now in the github sources, thanks.These floating functions don't seem to work?
#include <stdio.h> #include <string.h> #include <propeller.h> char data[] = "123.45"; int main(int argc, char** argv) { int i; float f; f = atof(data); f = strtof(data); while (1) { _pinl(56); _pinl(57); sleep(1); _pinh(56); _waitms(1000); } }
Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2023 Total Spectrum Software Inc. and contributors Version 6.0.0-beta-v5.9.28-59-g0db93322 Compiled on: Feb 15 2023 D:/Documents/MyProjects/P2/TestCode.c:20: error: unknown identifier atof used in function call D:/Documents/MyProjects/P2/TestCode.c:20: error: Unknown symbol atof D:/Documents/MyProjects/P2/TestCode.c:21: error: unknown identifier strtof used in function call D:/Documents/MyProjects/P2/TestCode.c:21: error: Unknown symbol strtof
Mike
@iseries: I just checked in implementations for atof and strtof. Remember to #include <stdlib.h> to get their definiitions, and also note that strtof has two parameters (the second is a pointer to the end of the converted string).
That fixed it, works great.
Thank You,
Mike
@ersmith
Eric, what happens if I compile a program to a bin file under Proptool, and then load the bin via flexprop?
Will it have the same functionality as if compiled and loaded under proptool?
Jim
What do you mean by "functionality"?
As far as everything that runs on the P2 is concerned, the P2 doesn't know (or care) how the binary was compiled, the results will be identical. But the code running on the PC is different, obviously, which means that PC specific things like debug windows may work differently or not at all in the two environments.
@ersmith
Eric,
What I want to do is compile a program under Proptool that uses "REGEDIT" which you have said will never happen in flexprop, and upload the binary with flexprop. Right now, I cannot upload from Proptool under wine as I have not been able to get it to see and connect to the serial ports. Because I can run flexprop in Linux, I have that capability there. My thought was to compile my program under Proptool and upload it to the P2 using flexprop. At that point the only thing that I need is the terminal program which will run fine under flexprop. I am still working on getting proptool to run completely under wine, but I am not there yet. Eventually, I will rewrite the program containing the isr to put that code in another cog, but I wanted to observe the program functioning before I went through the effort of rewriting it to run in its own cog.
Thanks for all you do.
Jim
Why not just try it! Seems like a test that will take about a minute...
dgately
Having problems with some code:
I think the _waitms, and _waitus are broken.
_waitms(500) waits about half a microsecond. Actually, everything below 1000 waits that long.
I don't do spin but it looks like:
ms = m * freq / 1000; us = m * freq / 1000000; _waitx(m);
Mike
@iseries : thanks for the heads up, that should be fixed in the github sources now.
I was scratching my head for a while trying to figure out why the same code on the P1 was working and not on the P2.
Question, could you not get rid of the 64 bit divide if you divided the frequency first?
m:= m * (freq / 1000)
Mike
I didn't want to assume anything about the frequency. For the 1000 (ms) case it's not so bad, most frequencies will be a multiple of 1000, but for 1000000 (us) it's not always true that the frequency is a multiple of 1 MHz, or indeed even above 1 MHz.
@iseries : I've checked in some more improvements to _waitms and _waitus. They're now much more accurate; on P2 they're within ~43 cycles of the correct wait time, on P1 it's unfortunately less accurate (to only about ~600 cycles or so).
@ersmith : What is the latest version?
Jim