@Ariba: Thanks Andy! Your code showed up 3 bugs in fastspin:
(1) SEND with only one plain string argument was being mis-handled (if there was more than one argument everything was fine)
(2) lookup/lookdown as an argument to another lookup/lookdown was also being mis-handled
(3) CLKFREQ was being mis-interpreted as _clkfreq, so differing definitions of this in different objects caused problems
All of these are fixed in github now. I hope to have a new binary release soon, but some of the file I/O changes have caused major internal changes which require some more testing.
@Rayman: perhaps we could figure out a better solution if we knew the application. What did you want to use wrfast for? Could you substitute setq+wrlong? Could the code run in another COG?
@Rayman,
Cog $000-$132 are free in spin and there is also the defined registers PR0-PR7. The docs are incorrect.
Currently I am using $100-$10f (originally required $20 registers - will move these to $110-$11F soon) for SD and $120-$12F for serial/monitor.
The best way to improve the SD performance is to just move the read/write bytes(512) routine with the send/recv routine into cog. But remember that there is a significant time while the card delays the read/write with busy until it is ready.
It's for reading bytes from SD card... Yes, I could start another cog to do it.
Guess that's what I'll have to do...
You don't have to use wrfast to read bytes from an SD card . If you're finding wrlong too slow, one solution would be to buffer some data in COG memory and then use SETQ+WRLONG to write it out. I think $1e0-$1f0 in COG memory is free in fastspin. If you use -O2 to compile then tight loops will be copied to LUT before execution, so there's that too. If you don't use -O2 then LUT is free.
@Rayman,
Cog $000-$132 are free in spin and there is also the defined registers PR0-PR7. The docs are incorrect.
That's only in Chip's spin. fastspin uses most of COG memory. I left $1e0=$1f0 free for compatibility with the ROM, thinking that Chip would do the same, but apparently he had other ideas.
Speaking of compatibility, will FastSpin support COGATN(cogmask) like Parallax Spin2 does? I had a browse through the doc folder of spin2cpp on github but didn't see this listed.
I am trying to write my HyperRAM driver API in a way that would work with either FastSpin and PNUT code tools. I think most of it should be compatible as I'm not doing any fancy stuff but I'm just not sure about doing a COGATN operation. If this is the only thing missing or differs between the two toolchains is there some other way to make the source code compatible (maybe conditional compilation etc?)
Update: Even though COGATN is not documented, it seems to be allowed in FastSpin and it compiles into the code. Is there a place this sort of keyword thing is documented anywhere showing what parts of Chip's PNUT based SPIN2 syntax is also supported by FastSpin?
@Rayman,
Cog $000-$132 are free in spin and there is also the defined registers PR0-PR7. The docs are incorrect.
That's only in Chip's spin. fastspin uses most of COG memory. I left $1e0=$1f0 free for compatibility with the ROM, thinking that Chip would do the same, but apparently he had other ideas.
What???
I am using cog $100-$10F and $120-$12F with spin and fastspin without problems other than the bug I discovered.
@Ariba: Thanks Andy! Your code showed up 3 bugs in fastspin:
(1) SEND with only one plain string argument was being mis-handled (if there was more than one argument everything was fine)
(2) lookup/lookdown as an argument to another lookup/lookdown was also being mis-handled
(3) CLKFREQ was being mis-interpreted as _clkfreq, so differing definitions of this in different objects caused problems
All of these are fixed in github now. I hope to have a new binary release soon, but some of the file I/O changes have caused major internal changes which require some more testing.
Thank you Eric.
I was able to make a workable version, taking these 3 bugs into account.
If you use -O2 to compile then tight loops will be copied to LUT before execution, so there's that too. If you don't use -O2 then LUT is free.
Unfortunately, compiling with full optimization never works for me. Also the fixed test2.spin2 code, which is quite small, only outputs some error messages, that show up so briefly that I can't read them.
Update: Even though COGATN is not documented, it seems to be allowed in FastSpin and it compiles into the code. Is there a place this sort of keyword thing is documented anywhere showing what parts of Chip's PNUT based SPIN2 syntax is also supported by FastSpin?
Almost all of PNut's Spin2 syntax is supported on P2; the tricky part is what's supported on P1, and that's what the documentation is focused on. The parts that aren't compatible are documented in the "Compatibility with Spin2" section of the most recent documentation.
@Rayman,
Cog $000-$132 are free in spin and there is also the defined registers PR0-PR7. The docs are incorrect.
That's only in Chip's spin. fastspin uses most of COG memory. I left $1e0=$1f0 free for compatibility with the ROM, thinking that Chip would do the same, but apparently he had other ideas.
What???
I am using cog $100-$10F and $120-$12F with spin and fastspin without problems other than the bug I discovered.
That's just by luck then. fastspin uses COG memory starting at $0 and growing towards the end, with the intention (as I mentioned) of leaving room at the end for compatibility with the ROM.
In practice the first 16 longs of COG memory are used for initialization code that may be safely overwritten. I'll probably extend this to 32 longs, and also formalize that the end area of $1e0-$1ef is reserved for application use as well.
Almost all of PNut's Spin2 syntax is supported on P2; the tricky part is what's supported on P1, and that's what the documentation is focused on. The parts that aren't compatible are documented in the "Compatibility with Spin2" section of the most recent documentation.
Cheers. Thanks Eric! I'm hoping it will be fully PNUT and FastSpin compatible when it's all done and all your hard work will have helped achieve that too.
I'm not sure setq+wrlong will work, but I can try. Unfortunately the part of the Documentation that covers this appears to have been removed for some reason! There's also nothing on WRFAST now...
If I could copy the code to LUT and run it, that would be great.
Guess I could jump into it from an inline assembly block. But, how to get back? Maybe just RETA?
What would be really nice if there could be an ASM_LUT directive that would be like ASM, but copy the code into LUT for me...
BTW: I think I'm seeing that you can't use ptra in inline assembly. Or at least, you need to restore it if you use it.
Is it safe to use ptrab, PA, PB?
@Rayman,
Cog $000-$132 are free in spin and there is also the defined registers PR0-PR7. The docs are incorrect.
That's only in Chip's spin. fastspin uses most of COG memory. I left $1e0=$1f0 free for compatibility with the ROM, thinking that Chip would do the same, but apparently he had other ideas.
What???
I am using cog $100-$10F and $120-$12F with spin and fastspin without problems other than the bug I discovered.
That's just by luck then. fastspin uses COG memory starting at $0 and growing towards the end, with the intention (as I mentioned) of leaving room at the end for compatibility with the ROM.
In practice the first 16 longs of COG memory are used for initialization code that may be safely overwritten. I'll probably extend this to 32 longs, and also formalize that the end area of $1e0-$1ef is reserved for application use as well.
Would be more beneficial to leave common cog area free. Chip uses cog $132 onwards, save for 8 longs called PR0-7 but cannot recall location atm.
Here's something that may be a bug, depending on how you look at it...
My inline assembly block reader for SD starts out like the below.
I copied this to regular assembly and it gave an error that 512 is too big. Had to use ##
But, strange thing is that the inline code works... So compiler must be automatically fixing...
asm
'Read in 512 bytes
mov y,#512 '#bytes in a block
The use of constants without "#" causes compiler warnings suggesting to use "-0" after the operand. But If I do exactly that the compiler erronously inserts a false "#". (I added some #defines because I thought it had something to do with the preprocessor but that's not the case)
BTW, I was quite surprised as I saw that my assembly code gets optimized. I fear that could be dangerous in some cases when I expect predictable timing. I know, I can suppress it by choosing "no optimization" but that's only possible globally. What if I want optimization to the high-level code and no optimization to assembly? Is there a #pragma to switch it on/off locally?
And one more... Fastspin crashes (segmentation fault) if I load "Foo.c" into FlexGui and hit the compile button. Of course this is nonsense and cannot produce a valid binary. There is no main() and there are probably other problems. But the compiler shouldn't crash.
Edit: I think it has something to do with the body of Bar() being empty.
BTW, I was quite surprised as I saw that my assembly code gets optimized. I fear that could be dangerous in some cases when I expect predictable timing. I know, I can suppress it by choosing "no optimization" but that's only possible globally. What if I want optimization to the high-level code and no optimization to assembly? Is there a #pragma to switch it on/off locally?
Instead of expressing fear, come up with a case where the optimizer actually messed things up, and I'm sure ersmith can fix it. Not saying it's perfect but let's identify them.
You'll want the assembly optimizer on, because everything gets converted to assembly and thus there is some very deep inspection going on to eliminate unnecessary calls and variables.
Instead of expressing fear, come up with a case where the optimizer actually messed things up, and I'm sure ersmith can fix it. Not saying it's perfect but let's identify them.
I don't say I don't trust optimizations and I don't like it. Most of the time it's very good to have it. I just like to have control over it. I make enough errors on my own so I spend a lot of time debugging. If I had to look at the .p2asm file each time because I worry about the compiler possibly having modified my code it'd take twice as long.
And yes, I've seen the optimization is amazingly clever. It inlines functions that are called only once or that are small enough so the stack setup and cleanup is not worth the overhead. It keeps track of which registers contain which variables and eliminates unnecessary moves. So I definitely don't want so switch it off (globally).
I tried to change the inline assembly from # to @ with rep, but that given an error saying it must be an immediate value...
Guess I'll just have to take note and hope to remember...
I tried to change the inline assembly from # to @ with rep, but that given an error saying it must be an immediate value...
Guess I'll just have to take note and hope to remember...
Or just don't use "rep" in inline assembly. There's no real reason to write inline assembly for loops and branches, I think what the compiler generates for these is already pretty good (e.g. if you write REPEAT in Spin you'll get a "rep" generated in most cases). High level code is much easier to read and write, so it's probably good to use it when you can!
fastspin 4.1.9 is available now from github. The changes are:
- Added an error for out of range immediate values in inline asm
- Fixed REG[] dereferencing
- Fixed a problem with passing pointers to pointers to functions
- Fixed a problem with subclasses in system module
- Fixed SIGNX and ZEROX operators to work like Spin2
- Fixed conflict of _clkfreq with CLKFREQ
- Fixed a problem with SEND of simple strings
- Fixed a problem with heap initialization in -O2
- Fixed some issues with register operands in inline assembly
- Implemented mount() and getcwd()
- Implemented OPEN FOR in BASIC
- Implemented "public" and "private" keywords in C++ class declarations (they are currently ignored though).
- Reserved some COG memory in P2 mode
Comments
I just tried putting a RET in there to see if it would reload the fifo on exit, but that doesn't appear to work...
Tried to see if I could jmp somewhere to force reloading of fifo at the end, but that gives "operation too complex error.."
(1) SEND with only one plain string argument was being mis-handled (if there was more than one argument everything was fine)
(2) lookup/lookdown as an argument to another lookup/lookdown was also being mis-handled
(3) CLKFREQ was being mis-interpreted as _clkfreq, so differing definitions of this in different objects caused problems
All of these are fixed in github now. I hope to have a new binary release soon, but some of the file I/O changes have caused major internal changes which require some more testing.
Maybe I could jmp to cogexec at the end of inline assembly and from there jmp back into hubexec?
Ok, that won't work...
But, is there room in cog ram for some subroutine that can be called from hubexec?
Guess that's what I'll have to do...
Cog $000-$132 are free in spin and there is also the defined registers PR0-PR7. The docs are incorrect.
Currently I am using $100-$10f (originally required $20 registers - will move these to $110-$11F soon) for SD and $120-$12F for serial/monitor.
The best way to improve the SD performance is to just move the read/write bytes(512) routine with the send/recv routine into cog. But remember that there is a significant time while the card delays the read/write with busy until it is ready.
You don't have to use wrfast to read bytes from an SD card . If you're finding wrlong too slow, one solution would be to buffer some data in COG memory and then use SETQ+WRLONG to write it out. I think $1e0-$1f0 in COG memory is free in fastspin. If you use -O2 to compile then tight loops will be copied to LUT before execution, so there's that too. If you don't use -O2 then LUT is free.
I am trying to write my HyperRAM driver API in a way that would work with either FastSpin and PNUT code tools. I think most of it should be compatible as I'm not doing any fancy stuff but I'm just not sure about doing a COGATN operation. If this is the only thing missing or differs between the two toolchains is there some other way to make the source code compatible (maybe conditional compilation etc?)
Update: Even though COGATN is not documented, it seems to be allowed in FastSpin and it compiles into the code. Is there a place this sort of keyword thing is documented anywhere showing what parts of Chip's PNUT based SPIN2 syntax is also supported by FastSpin?
I am using cog $100-$10F and $120-$12F with spin and fastspin without problems other than the bug I discovered.
Thank you Eric.
I was able to make a workable version, taking these 3 bugs into account.
Unfortunately, compiling with full optimization never works for me. Also the fixed test2.spin2 code, which is quite small, only outputs some error messages, that show up so briefly that I can't read them.
Almost all of PNut's Spin2 syntax is supported on P2; the tricky part is what's supported on P1, and that's what the documentation is focused on. The parts that aren't compatible are documented in the "Compatibility with Spin2" section of the most recent documentation.
That's just by luck then. fastspin uses COG memory starting at $0 and growing towards the end, with the intention (as I mentioned) of leaving room at the end for compatibility with the ROM.
In practice the first 16 longs of COG memory are used for initialization code that may be safely overwritten. I'll probably extend this to 32 longs, and also formalize that the end area of $1e0-$1ef is reserved for application use as well.
Cheers. Thanks Eric! I'm hoping it will be fully PNUT and FastSpin compatible when it's all done and all your hard work will have helped achieve that too.
If I could copy the code to LUT and run it, that would be great.
Guess I could jump into it from an inline assembly block. But, how to get back? Maybe just RETA?
What would be really nice if there could be an ASM_LUT directive that would be like ASM, but copy the code into LUT for me...
Is it safe to use ptrab, PA, PB?
My inline assembly block reader for SD starts out like the below.
I copied this to regular assembly and it gave an error that 512 is too big. Had to use ##
But, strange thing is that the inline code works... So compiler must be automatically fixing...
BTW, I was quite surprised as I saw that my assembly code gets optimized. I fear that could be dangerous in some cases when I expect predictable timing. I know, I can suppress it by choosing "no optimization" but that's only possible globally. What if I want optimization to the high-level code and no optimization to assembly? Is there a #pragma to switch it on/off locally?
Edit: I think it has something to do with the body of Bar() being empty.
Instead of expressing fear, come up with a case where the optimizer actually messed things up, and I'm sure ersmith can fix it. Not saying it's perfect but let's identify them.
You'll want the assembly optimizer on, because everything gets converted to assembly and thus there is some very deep inspection going on to eliminate unnecessary calls and variables.
@ManAtWork: Thanks for the bug reports, I've got fixes in github. To disable optimization use "__asm const" rather than "__asm"
But, that first "#" needs to be a "@" for pasm2...
Took me a while to figure out why some inline code I copied and pasted from @cheezus didn't work...
I don't say I don't trust optimizations and I don't like it. Most of the time it's very good to have it. I just like to have control over it. I make enough errors on my own so I spend a lot of time debugging. If I had to look at the .p2asm file each time because I worry about the compiler possibly having modified my code it'd take twice as long.
And yes, I've seen the optimization is amazingly clever. It inlines functions that are called only once or that are small enough so the stack setup and cleanup is not worth the overhead. It keeps track of which registers contain which variables and eliminates unnecessary moves. So I definitely don't want so switch it off (globally).
Thanks, that's perfect.
Guess I'll just have to take note and hope to remember...
Or just don't use "rep" in inline assembly. There's no real reason to write inline assembly for loops and branches, I think what the compiler generates for these is already pretty good (e.g. if you write REPEAT in Spin you'll get a "rep" generated in most cases). High level code is much easier to read and write, so it's probably good to use it when you can!