@msrobots said:
I seem to have some problem alike this, but the other way around.
I need absolute JMPs inside COG ram and FlexSpin seems to code relative JMPs. Any way to force absolute JMPs (just JMP, CALL not needed) inside COG code out of a normal DAT section, not inline.
@ersmith Is it planned for the near future to implement the byte/word/long array DEBUG commands? They are very useful when dealing with buffers. Or the byte_array can also be used to reverse little-endian to big-endian longs.
@ersmith said:
I think they're already handled in -gbrk debugging. Doing them in plain -g debugging is a little hard, because in that version DEBUG gets translated into printf, and there's no printf format for printing arrays.
Unfortunatelly, I found out that I can't really use -gbrk debugging when Spin2 and C are combined in one project. The brk debugging of Spin2 seems to override printf output completely if it uses up too much of the available bandwith. Printf output is then supressed completely.
@ersmith Is it planned for the near future to implement the byte/word/long array DEBUG commands? They are very useful when dealing with buffers. Or the byte_array can also be used to reverse little-endian to big-endian longs.
@ersmith said:
I think they're already handled in -gbrk debugging. Doing them in plain -g debugging is a little hard, because in that version DEBUG gets translated into printf, and there's no printf format for printing arrays.
Unfortunatelly, I found out that I can't really use -gbrk debugging when Spin2 and C are combined in one project. The brk debugging of Spin2 seems to override printf output completely if it uses up too much of the available bandwith. Printf output is then supressed completely.
Do you have an example of that that you could share? Both printf and debug are using the same smartpin code, so I don't see why either would be able to override the other.
@ersmith You have to see this one to believe it. This hangs the compiler with an "out of memory" error after about 15 seconds:
dim i as ubyte
i = 10
print i
And this compiles fine:
dim x as ubyte
x = 10
print x
Here is the error:
"D:/Flex2Gui/flexgui/bin/flexspin" -2 -l --tabs=3 -D_BAUD=230400 -O0 --charset=utf8 -I "D:/Flex2Gui/flexgui/include" "D:/Flex2Gui/flexgui/P2 Libs/compilercrashtest2.bas"
Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2023 Total Spectrum Software Inc. and contributors
Version 5.9.24 Compiled on: Jan 7 2023
FATAL ERROR: out of memory
compilercrashtest2.bas
fmt.c
vfs.c
child process exited abnormally
Finished at Sat Jan 21 16:47:50 2023
I have reinstalled Flex twice, rebooted the machine, etc. It's really bizarre! Did "i" get taken out of the lexicon?
EDIT: This behavior is the same between 5.9.24 and 5.9.26
@JRoark said:
@ersmith You have to see this one to believe it. This hangs the compiler with an "out of memory" error after about 15 seconds:
dim i as ubyte
i = 10
print i
And this compiles fine:
dim x as ubyte
x = 10
print x
Here is the error:
"D:/Flex2Gui/flexgui/bin/flexspin" -2 -l --tabs=3 -D_BAUD=230400 -O0 --charset=utf8 -I "D:/Flex2Gui/flexgui/include" "D:/Flex2Gui/flexgui/P2 Libs/compilercrashtest2.bas"
Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2023 Total Spectrum Software Inc. and contributors
Version 5.9.24 Compiled on: Jan 7 2023
FATAL ERROR: out of memory
compilercrashtest2.bas
fmt.c
vfs.c
child process exited abnormally
Finished at Sat Jan 21 16:47:50 2023
I have reinstalled Flex twice, rebooted the machine, etc. It's really bizarre! Did "i" get taken out of the lexicon?
EDIT: This behavior is the same between 5.9.24 and 5.9.26
Thanks, Jeff. This was a really strange bug, and is basically a conflict between the "i" variable at top level and an "i" in one of the library functions, due to a bug in how nested structures are implemented in C (which is used in that library) which only manifests when there is no optimization. It'll be fixed in the next release. In the meantime, you can avoid it by turning on any optimization at all, which in fact I would recommend since -O0 produces pretty terrible code.
@evanh said:
Eric/Ada,
Is there a good reason why printf() is stuttery?
This outputs non-stop to the terminal:
while(1)
puts( " 1 2 3 4 5 6" );
But this has regular pauses:
while(1)
printf( " 1 2 3 4 5 6\n" );
PS: I'm testing at 4 MHz sysclock.
Are you sure this isn't a terminal artifact of some kind (like temporal aliasing of the output)? I'm not actually seeing much difference on my system. The two calls do go through quite different paths (the default built-in printf gets expanded inline, and has a lock to allow multiple COGs to use it at once) so certainly the timings are different, but as long as only 1 COG at a time is using it it should be able to output pretty steadily.
Definitely not due to PC/USB side, which is Linux here. I can see the TX LED (P62) bursting on the Eval Board. Besides, puts() is fine.
@ersmith said:
The two calls do go through quite different paths (the default built-in printf gets expanded inline, and has a lock to allow multiple COGs to use it at once) so certainly the timings are different, but as long as only 1 COG at a time is using it it should be able to output pretty steadily.
After checking at 200 MHz, the 4 MHz sysclock is a big factor. Maybe it's simply the async overheads that come with such signalling. Certainly only one cog operating ... well at least I've not explicitly started any.
@evanh said:
At 4 MHz, it's really really obvious just by eye-balling. No need for a scope.
You're right -- even with the output redirected to /dev/null I can see the tx pin go off briefly every few seconds. This is utterly bizarre, there's no reason I can think of that _fmtstr (the function ultimately called by printf in this case) should ever pause.
EDIT: Found -D_NO_LOCKIO which is definitely removing the top level calls for locks (removing 7 of 16 instructions from top loop, including 3 of 5 calls) ... but doesn't fix the stuttering.
EDIT2: Comparing printf() against puts() in the two .p2asm files, it looks like a notable difference is printf() calls __system___gettxfunc while puts() doesn't ... which calls __system___make_methodptr ... which calls __system___gc_alloc_managed ...
Quite honestly the use of the GC heap for basic features is the worst thing about flexspin.
GC stutter on a deterministic real time system? Who'd have thunk.
Oh, yes, I encountered this while writing the player. I have one frame =20 ms at 50 Hz to do things: display this, display that, read from the file and I encountered the animated things stuttering. I checked my procedures execution times: 11-12 ms - a lot of time left, so what's the problem? The problem was "print". A simple print sometimes can take a lot (more than one frame) of time because of the garbage collector. So no more print, and no more build-in string functions in the player's main loop. I have inttostr/inttohex and outtextxy in my driver, they don't use any heap, problem solved.
Is there a way to manually tell Flex to suspend GC during critical sections of code? I know you can manually trigger GC, so it seems a “holdoff” should be possible too?
@JRoark said:
Is there a way to manually tell Flex to suspend GC during critical sections of code? I know you can manually trigger GC, so it seems a “holdoff” should be possible too?
GC only runs when a heap allocation would otherwise fail. You can help that by explicitly freeing allocations (which is possible even if you got them from the managed allocator function).
Aargh, yes, I forgot that &v->function uses the memory allocator to create a method pointer. I've added a cache in the vfs_file_t struct for this, and at least the default printf should be better now. Full printf still seems to have some memory allocation going on somewhere, but full printf is a beast anyway.
Thank you Eric. It wasn't critical, but my prints are working smooth now. Unlike Pik, the only reason I noticed the issue was because I was minimising power consumption to limit temperature gradient. That and the lines printed were multi-part so there was many opportunities per line to observe the stutter part way across.
Eric,
Couple of things have recently caught my attention:
- RDPIN val,pin in .p2asm files are always preceded by a MOV val,#0. RDPIN fills all 32 bits of a register and thereby has no need for the preceding initialising MOV.
- I tend to prefer signed maths for metronomic event timing, as opposed to basic inline delays. It allows a simple subtract and compare against timeout when cycling in a loop. Functions like _cnt() and _getms() are suited to this so I always end up typecasting them to int32_t.
@evanh said:
Couple of things have recently caught my attention:
- RDPIN val,pin in .p2asm files are always preceded by a MOV val,#0. RDPIN fills all 32 bits of a register and thereby has no need for the preceding initialising MOV.
The default for instructions is to assume they use their DST register, and RDPIN didn't have any override for that. I've added one now.
I tend to prefer signed maths for metronomic event timing, as opposed to basic inline delays. It allows a simple subtract and compare against timeout when cycling in a loop. Functions like _cnt() and _getms() are suited to this so I always end up typecasting them to int32_t.
I think for many uses (e.g. comparing them) unsigned arithmetic makes more sense, though. Whatever choice I made someone won't like it .
@ersmith said:
I think for many uses (e.g. comparing them) unsigned arithmetic makes more sense, though. Whatever choice I made someone won't like it .
The problem with counters used for timing is they roll over rather frequently. Doing absolute compares creates bugs. So robust solutions end up using a rolling relative compare across 50% of integer range - which works much smoother done as a signed compare.
The most basic timer is the incremental countdown timer that ticks once each iteration and terminates at zero. It gets used a lot in tight inline delays because it can be dedicated to the timing. This works well as unsigned as it allows 100% of the integer range for timing ... But it is a brittle approach both at speed and when performing other functions concurrently where more than one tick may occur at each check - The zero termination can be missed and it gets messy to resolve such a missed case. So, I avoid using this method generally.
I guess a variation of the countdown timer is to add an offset to zero (and the initial start count) that covers any potential jitter. I admit I've not tried this approach.
Okay, I've managed to keep the comparison as a positive simply by splitting out the event setting from pre-compare to post-compare. Which means the compare is no longer at zero crossing. Here's the new working code:
uint32_t time, timer1 = _getms();
uint32_t timer2 = timer1;
while(1) {
rc = sdcmd( 0x100|41, 1<<30 | 1<<20 ); // indicate HCS capable and 3v3 voltage
if( rc>>31 ) // valid Ready bit, card has switched from "idle" to "ready" state
break;
do {
time = _getms();
} while( time - timer2 < 40 ); // pause for 40 ms, SD spec 4.4
timer2 += 40;
if( time - timer1 > 1000 ) // SD spec 4.2.3
break;
}
It still cannot assign a function result to a variable, if the result function is a class that has more than 3 elements.
The test code:
class a
dim b as integer
dim c as integer
dim d as integer
' dim e as integer
end class
print f().b
function f() as a
dim g as a
g.b=23456
print "Returned: "; h().b
g = h()
print "Assigned:"; g.b
return g
end function
function h() as a
dim i as a
i.b=2
return i
end function
The class a has now 3 elements. 4th element is commented out.
The result is good:
( Entering terminal mode. Press Ctrl-] or Ctrl-Z to exit. )
Returned: 2
Assigned:2
2
However, if I uncomment the 4th element, the result is
Returned: 2
Assigned:23456
23456
The first print is correct, the function h() set field b to 2
The assign g=h() didn't work. g.b should be changed from 23456 to 2, but it didn't
The print from the main program got 23456 as a returned value and printed it.
Assign doesn't work if the class has >3 elements and there is a function result on the right side.
... and the difference in listings is huge between 3 and 4 fields in the class. In the working case (3 elements) the function h() was inlined. In the not working case (4 elements) the function h() was called. A default optimization was used.
Comments
JMP #\address
the\
forces absoluteAah, thank you @TonyB_
now my error makes more sense...
Mike
Unfortunatelly, I found out that I can't really use -gbrk debugging when Spin2 and C are combined in one project. The brk debugging of Spin2 seems to override printf output completely if it uses up too much of the available bandwith. Printf output is then supressed completely.
Do you have an example of that that you could share? Both printf and debug are using the same smartpin code, so I don't see why either would be able to override the other.
@ersmith You have to see this one to believe it. This hangs the compiler with an "out of memory" error after about 15 seconds:
And this compiles fine:
Here is the error:
I have reinstalled Flex twice, rebooted the machine, etc. It's really bizarre! Did "i" get taken out of the lexicon?
EDIT: This behavior is the same between 5.9.24 and 5.9.26
Thanks, Jeff. This was a really strange bug, and is basically a conflict between the "i" variable at top level and an "i" in one of the library functions, due to a bug in how nested structures are implemented in C (which is used in that library) which only manifests when there is no optimization. It'll be fixed in the next release. In the meantime, you can avoid it by turning on any optimization at all, which in fact I would recommend since -O0 produces pretty terrible code.
Eric/Ada,
Is there a good reason why printf() is stuttery?
This outputs non-stop to the terminal:
But this has regular pauses:
PS: I'm testing at 4 MHz sysclock.
Are you sure this isn't a terminal artifact of some kind (like temporal aliasing of the output)? I'm not actually seeing much difference on my system. The two calls do go through quite different paths (the default built-in printf gets expanded inline, and has a lock to allow multiple COGs to use it at once) so certainly the timings are different, but as long as only 1 COG at a time is using it it should be able to output pretty steadily.
If you're on Windows (not sure about other OS), the loadp2 terminal is kindof awful and will sometimes do weird things if you hammer it with data.
Definitely not due to PC/USB side, which is Linux here. I can see the TX LED (P62) bursting on the Eval Board. Besides, puts() is fine.
After checking at 200 MHz, the 4 MHz sysclock is a big factor. Maybe it's simply the async overheads that come with such signalling. Certainly only one cog operating ... well at least I've not explicitly started any.
If you capture the serial data on a scope and look at the inter byte gaps for stutters you could see if there is a real P2 timing difference.
At 4 MHz, it's really really obvious just by eye-balling. No need for a scope.
You're right -- even with the output redirected to /dev/null I can see the tx pin go off briefly every few seconds. This is utterly bizarre, there's no reason I can think of that _fmtstr (the function ultimately called by printf in this case) should ever pause.
Is there a compile switch for removing the locks?
EDIT: Found
-D_NO_LOCKIO
which is definitely removing the top level calls for locks (removing 7 of 16 instructions from top loop, including 3 of 5 calls) ... but doesn't fix the stuttering.EDIT2: Comparing printf() against puts() in the two .p2asm files, it looks like a notable difference is printf() calls __system___gettxfunc while puts() doesn't ... which calls __system___make_methodptr ... which calls __system___gc_alloc_managed ...
Garbage collection sounds ominous!
Quite honestly the use of the GC heap for basic features is the worst thing about flexspin.
GC stutter on a deterministic real time system? Who'd have thunk.
Oh, yes, I encountered this while writing the player. I have one frame =20 ms at 50 Hz to do things: display this, display that, read from the file and I encountered the animated things stuttering. I checked my procedures execution times: 11-12 ms - a lot of time left, so what's the problem? The problem was "print". A simple print sometimes can take a lot (more than one frame) of time because of the garbage collector. So no more print, and no more build-in string functions in the player's main loop. I have inttostr/inttohex and outtextxy in my driver, they don't use any heap, problem solved.
Is there a way to manually tell Flex to suspend GC during critical sections of code? I know you can manually trigger GC, so it seems a “holdoff” should be possible too?
I assume GC was added to support Basic.
GC only runs when a heap allocation would otherwise fail. You can help that by explicitly freeing allocations (which is possible even if you got them from the managed allocator function).
Aargh, yes, I forgot that
&v->function
uses the memory allocator to create a method pointer. I've added a cache in the vfs_file_t struct for this, and at least the default printf should be better now. Full printf still seems to have some memory allocation going on somewhere, but full printf is a beast anyway.Thank you Eric. It wasn't critical, but my prints are working smooth now. Unlike Pik, the only reason I noticed the issue was because I was minimising power consumption to limit temperature gradient. That and the lines printed were multi-part so there was many opportunities per line to observe the stutter part way across.
Eric,
Couple of things have recently caught my attention:
-
RDPIN val,pin
in .p2asm files are always preceded by aMOV val,#0
. RDPIN fills all 32 bits of a register and thereby has no need for the preceding initialising MOV.- I tend to prefer signed maths for metronomic event timing, as opposed to basic inline delays. It allows a simple subtract and compare against timeout when cycling in a loop. Functions like
_cnt()
and_getms()
are suited to this so I always end up typecasting them toint32_t
.The default for instructions is to assume they use their DST register, and RDPIN didn't have any override for that. I've added one now.
I think for many uses (e.g. comparing them) unsigned arithmetic makes more sense, though. Whatever choice I made someone won't like it .
You forgot to add it as having side effects.
Thanks, I've fixed that now.
The problem with counters used for timing is they roll over rather frequently. Doing absolute compares creates bugs. So robust solutions end up using a rolling relative compare across 50% of integer range - which works much smoother done as a signed compare.
The most basic timer is the incremental countdown timer that ticks once each iteration and terminates at zero. It gets used a lot in tight inline delays because it can be dedicated to the timing. This works well as unsigned as it allows 100% of the integer range for timing ... But it is a brittle approach both at speed and when performing other functions concurrently where more than one tick may occur at each check - The zero termination can be missed and it gets messy to resolve such a missed case. So, I avoid using this method generally.
I guess a variation of the countdown timer is to add an offset to zero (and the initial start count) that covers any potential jitter. I admit I've not tried this approach.
Okay, I've managed to keep the comparison as a positive simply by splitting out the event setting from pre-compare to post-compare. Which means the compare is no longer at zero crossing. Here's the new working code:
I tried the latest version of the compiler:
Version 6.2.0 Compiled on: Jul 10 2023
It still cannot assign a function result to a variable, if the result function is a class that has more than 3 elements.
The test code:
The class a has now 3 elements. 4th element is commented out.
The result is good:
However, if I uncomment the 4th element, the result is
The first print is correct, the function h() set field b to 2
The assign g=h() didn't work. g.b should be changed from 23456 to 2, but it didn't
The print from the main program got 23456 as a returned value and printed it.
Assign doesn't work if the class has >3 elements and there is a function result on the right side.
Simple assigning, like this:
works, giving a proper printed result 98765
... and the difference in listings is huge between 3 and 4 fields in the class. In the working case (3 elements) the function h() was inlined. In the not working case (4 elements) the function h() was called. A default optimization was used.