P2 Experiments- P1 source to P2 Fastspin
cheezus
Posts: 298
in Propeller 2
So I've played with TaqOZ a bit and now I'm wanting to try converting P1 source to P2. I ran my touchscreen desktop app through fastspin and got what looks like a complete ASM output. I haven't gone through all of it yet but it looks like it should run. I'm wondering if there's any known gotchas I should watch out for? I'm finishing up new hardware now, waiting on a couple final things. Progress has been slow over the holidays but I'm getting ready for another big push after the new year.
Thanks for any advice!
Thanks for any advice!
Comments
I am using Windows 7.
1) I heard about issues with loadp2.exe so I grabbed and compiled the version from here:
https://github.com/davehein/p2gcc
I placed the newly compiled loadp2.exe version 0.007 into the /bin folder of spin2gui
2) When running spin2gui, I had to open the menu labeled "Commands" and select "Configure Commands..."
This brings up a window called "Executable Paths"
I pressed the [P2 Defaults] button, but had to change the text in "Run Command" to this path: Press OK to close the configure commands window.
3) I opened up the sample file blink2.bas. Changed the first lines to this, and saved:
4) Pressed the "Compile & Run" button.
Got this in the terminal, because I'm using -v for loadp2 (verbose mode):
5) Now I'm stuck. It looks like fastspin inserts a hubset #255 into the asm code. From various other comments, this isn't going to work on the P2-EVAL board.
Here is the beginning of the .lst file it generates for blink2.bas, with me having changed the pin numbers of led1 and led2.
https://forums.parallax.com/discussion/comment/1459371/#Comment_1459371
When changing the .p2asm file to .spin2 as you described, and adding the _CLOCKFREQ constants from that message, I think I had to go down further below to set the clock frequency as well in the COG_BSS_START section, if present?
I'm really impressed with fastspin, I need to clean up the project so it will compile for P1. I'm still not clear on the translation of DAT sections. It LOOKS like they have been parsed to P2asm correctly (of course there may be issues of timing or something?) but it can't be THAT simple can it?
That is what normally happens. Assembly boils down to just being a set of directives to assemble a binary file. It may or may not be instructions. Could just be data.
I'm glad it's working for you. fastspin doesn't try to do any translation of P1 asm to P2 asm, so you're probably just getting lucky if your DAT section has much legacy P1 assembly code in it. The P2 instruction set is "mostly" a superset of P1, and I have kept a few P1 assembly conventions (e.g. fastspin accepts "wc,wz" as well as "wcz"), so some P1 asm will compile OK. But in many cases you'll need to put an "#ifdef __P2__" in the DAT section with the P2 specific assembly.
The Spin code (outside of DAT) should mostly translate fine, so the more you can put in Spin the better. Performance shouldn't be an issue at all, fastspin is compiling to hubexec code on P2, so your P2 compiled Spin code will probably run faster than hand-crafted P1 assembly.
The one thing that I'm looking for in a toolchain is conditional compilation. If I understand correctly fastspin has a preprocessor, with built in directives?
I still haven't tried to run any code on the P2 yet, I really wanted to get something working on the p1 first. I finally got the Spin p1 version working and before going to bed I thought I'd try fastspin and... well it sure lives up to its name. It went from some 10s of seconds to draw a 320x240 blank screen to at least a refresh a second? This is with no work to optimize the spin code for speed. I'd LIKE to create an object that works for P1 and P2 and it seems doable but there's a lot of gotchas with the current codebase.
Is there a "fastspin" manual I'm missing?
Thanks as always to all the forum members here, you are awesome!
Yes. See the docs/spin.md file in the fastspin or spin2gui release. It's a basic subset of the C preprocessor, with #define, #ifdef / #else / #endif, and #include. There are some predefined symbols, like __P2__ if compiling for the P2 (again, documented in the spin.md file; that's a GitHub mark-down file, so readable as plain text).
makes for an easier read.
Seems fastspin doesn't like the copy of fsrw I've been using, giving me an error that's got me scratching my head right now.
"fsrw_Ray.Spin(568) error: internal error : expected statement list, got 87"
So I go looking for an 87 in the code and can't find a dec, hex or binary... But line 568 is a blank line so I thought maybe something in the last function isn't properly formatted or something. I'm sure it's something simple really.
I was finally able to trim down Touch.Spin enough to compile the desktop app. I'm still rewriting the ASM driver in Spin... Got a long way to go and starting to wonder if I should just bypass this step and jump to P2asm. I wish I had the time and "resources" to devote to serious development... But, almost 3 year old twin boys who think playing with wires and cords is almost as amazing as beating each other with plastic baseball bats; a cat that thinks jumper wires are the best thing since hair ties and feathers; as well as preparing for a move all slow the process.
btw, I still haven't rtfm :P
*edit, added sources i'm trying to build, duh
In this case it looks like fastspin messed up compiling the line: in the mount(basepin) function in fsrw_Ray.spin. I'll try to sort out what's wrong, but for now probably removing the \ will get it going.
One of the things I noticed after I posted was touch.spin compiled program size comes out to 35580 bytes? @msrobots mentioned running out of memory and this is entirely possible. I've been trying to shave longs by moving all large constants to a DAT block. Not sure if this is the best way to go...
Yes, unfortunately fastspin binaries are quite a bit bigger than regular Spin ones, since they compile to PASM rather than to compressed bytecode.
Moving large constants to a DAT block probably won't make much difference; the compiler tries to keep only one copy of each large immediate constant. But it won't hurt either, and maybe you'll notice some things the compiler missed. I kind of doubt you'll get 3K back though .
I have some ideas on how to allow fastspin to compress instructions (like PropGCC's CMM mode), but that isn't going to happen in the near future.
When I started playing with this I wasn't even sure it would work but now I'm starting to see potential. Part of the reason for these experiments is I'm trying to understand ways to optimize as well as restructure things. As it stands now, there's a lot of code that only runs once like display inits. This is one place I've always thought things could be improved. I've thought about putting the init code into CSV files on the SD and just parse though a file for display init but it really seems best to have the display init code in EEPROM in case there's something wrong with the SD I can still init the display and print debugging info.
For a while I used a small program to init the display, start the SD and chain into another program that actually loaded the desktop. From here each program could be chained to and when finished kicks back to the EEPROM. This would check the SRAM to see if the desktop attributes were still in ram and if so skip loading the desktop and just display the pre-loaded desktop. The return to desktop from a running app is instant, which is nice because loading an 800x480 image from SD seems like it takes forever!
There were several problems with doing things this way but the biggest annoyance, other than loading the desktop for the first time, was switching between displays. The previous hardware design didn't use the LCD /RD and simply pulled up to 3v3. The new hardware should allow me to determine which controller is plugged in, instead of changing a line in the source code and recompiling the entire package. The biggest problem I'm having is figuring out how to handle 2 different resolutions, as for a long time the 320x240 "package" was completely separate from the 800x480.
Getting back the 3k is going to be tricky but with 1mw of SRAM at least I have some place to stuff data between programs run off SD. Its very interesting looking at the spin code and then seeing the pasm output. I haven't really noticed your compiler "slacking off" anywhere and the moving large constants is just to satisfy the FIT directive. I think I may have made some mistakes and need to double check pointer vs register. I'm also thinking that display init data could be overwritten for buffers if I could wrap my head around that mess.
I was having trouble with some piece of code overflowing the rambuffer so that's 4x larger than it needs to be. There's plenty of places where this code can be optimized, I'm just trying to understand the best ways to optimize for P1 and looking for ideas in the P2 rewrite.
At this point things look pretty simple, put all the display specific stuff in a file. But every time I try to break things down it seems to complicated. In this example, DISPLAY_RS is controlled by it's pin, but the previous hardware (do I really have to support it? probably not?) it's controlled by a latch. Reset_Display, Set_Counter, and Set_Latch are new and totally untested..
Does anyone have tips for the best way to manage a HUGE program, that's meant to run across multiple hardware? There's already quite a few pieces to this puzzle and it seems like I'm either going to end up with lots of little files, or one REALLY long one. Ifdefs really add to character count and I still don't quite get the syntax.
As always, any advice appreciated!
In general I would try to factor out common code between your hardware into one object, and have the hardware specific parts in other objects (something like "common.spin", "driver1.spin", and "driver2.spin". That's how I did my VGA driver: there's a common file that has
all the text output routines, separate modules for setting up different modes (1024x768, 800x600, and so on) and then another common low level VGA driver that actually bangs the bits on the hardware.
#ifdef is a really powerful helper. I regularly do things like starting files with: and then within the code having bits like: To assist with debugging or for easily turning feaures on and off during development. For example when I'm done debugging I just comment out the "#define DEBUG" line, and the debug code is no longer compiled.
For big things like switching hardware or processors, I would suggest that ifdef be used in large chunks, ideally at the highest level to control including whole objects rather than inside of functions. So don't write: That leads to a rats nest of #ifdefs and is hard to read (in my opinion). Instead I would do something like: and then use "p.do_output", "p.do_input", etc.
In general I would definitely urge abstracting things and writing lots of little functions over writing large complicated functions. The smaller the pieces you break things up into, the easier the code will be to read and the less likely it is that bugs will creep in. Don't worry about creating even tiny functions -- in fact fastspin will happily inline any small functions (ones that are just a few lines), and at optimization level -O2 it will inline any function that's only used once, so it's OK to write your code in a very modular fashion.
The other tip I would have is to try to make things data driven where you're doing the same thing over and over but with different parameters. So for example your last function consists basically of a long list of things like: I'd change this into a display list instead:
I've made a lot of progress and am hopefully close to having something to show. I did a FB live video of the 'cardboard box notUnIpod' messing with the fonts. Now that I have a working FSRW I've been testing, I'm trying to get Ymodem working and getting close. Right now I'm banging my head against a wall because numbers.spin doesn't seem to work right. I took a quick look and can't see what would break toStr. I tried hijacking the number functions from std_text_routines.spinh but it's ugly...
Could use some sage advice since I'm a bit stuck and have several drivers that are getting close but still need to be debugged!
I had to fiddle with FSRW to make it return a file's size while walking through the directory and once I got that working the only thing missing (other than CHAIN) is damn ToStr. I think numbers.spin would be an ESSENTIAL have, as well as some string stuffs. I admit to being super lazy in this regard, but that's the point of libraries right??
I will try to narrow some specifics down, it seems that fromStr was working although I didn't realize I was using it at the time. I have noticed some bugs with operator precedence and will keep that in mind, as I have recently. I'm very liberal with my bracketing, mostly for human readability but libraries... I haven't played with optimization settings either so that could help narrow it down. I also noticed a bug with return values and will try to get a detail of what's working and what's not.
I've got a ton of code to debug myself and half the time when I find what could be a bug, it's from debugging by buggy code. Separating the two can be difficult in the moment and I rarely get a chance to go back and reproduce the bug, sorry.
One thing I'm noticing is std_text_routines.spinh has some interesting code that works as an include (pretty sure) but not directly.
I haven't tested this yet but here's what I was thinking a workaround for now
I've almost got ymodem working and that will help with testing significantly. I was sending small files okay but I think the serial port on my laptop is causing a problem with the longer transfers. 2 steps forward, 2 lightyears to go.
I've tried a bunch of different ways of doing this but it's as if p never increments? I don't get it!
*edit
ughhhh, it's a sign problem... I think I have a fix..
*aedit -
Still not sure why numbers.spin isn't working but it' been broken for quite some time now iirc. I actually wanted to use Kye's string engine in Ymodem for a while because I think it would simplify handling user input. I'm making progress again though and can deal with the signed / unsigned issue since the only place it showed up is in the final total of the directory. I might need to change this to display total kb on disk instead of bytes but shouldn't break too much, other than the transfers of very large files that are probably impractical to transfer this way anyway..
std_text_routines.spinh isn't intended to work stand-alone, but rather converts any object with a "tx" method into one that also has "dec", "hex", and so on. A simple kind of decimal to string capability can be built with this by creating an object where "tx" just stores a character into a string:
This is one that caught me up several times. One of the things I hate about "black box" object usage is really hard to understand code like this. When things break it becomes nearly impossible to debug. Kye's string engine works great, although it only handles signed decimal. I'm sure there's a few other formats that may get tricky, I'll have to see when I get there. Right now it looks a little ugly but it's working.
Now I'm to the place of testing the memory chips and was hoping to use the old XMM cache test to verify but now I have to wrap my head around not having movd / movs.
I'm happy that things are progressing and everything seems to work good so far.
SRW_CHZrc3 works really well and the smartpin spi code seems very stable. There's a lot of possible optimizations but I think the next thing to work on is LUT sharing. I've got a build of Ymodem that can only send to the SD card, receive is broken but I think it has something to do with the smartpin serial?? Haven't really debugged this yet but the one nice thing I did notice is the limiting factor now is the serial, with transfer speed constant over any sysclock setting (as long as it's able to keep up with the baud). I've also been running 460800 baud, and had a few tests complete at 921600 baud, although I think self heating and sysclock >240mhz is a problem right now.
I'm also able to turn my tft lcd on and display some text. Started working on cog code for that, although SRAM is the next thing I need to verify. I've got some SPI code that should read the touch adc but have not even tried testing yet. Things are progressing though and it's amazing how fast this chip is!
(1) As noted above, the order of evaluation of things like x[Digit++] |= Digit was different between openspin and fastspin; fastspin followed GCC's convention (the Digit++ on the LHS got evaluated before the Digit on the RHS) but openspin did it the other way around.
(2) Numbers.spin relies on "x/0" to equal 0, whereas fastspin's divide routine returned -1 for this.
I think relying on the value of division by 0 is particularly ugly, but I'll change fastspin to match the original Spin since perhaps other code relies on this too .
Thanks. fastspin has always done this for Spin code, and so that wasn't an issue. I suppose one could argue that making "x/0" be 0 ties in to it being "false", but that's tenuous, since valid division like "1/2" also produces 0. Anyway, since division by 0 is undefined there's no particular harm in making the result 0 like Spin does, although many processors do return $FFFFFFFF (maximum 32 bit integer) for that case, as it's the natural output of the division algorithm when used with a divisor of 0.
compiles to
That's broken my hex to ascii method currently.
I'm also having problems with something like this.
Again, I haven't had a chance to track this down and have OUT hard-coded to 0 for now. I tried a few different things and not quite sure what's going on.
Solved with a small wait, seems to be a race condition with the hardware? Need to track that down later when there's no ribbon cable \
I've got a LOT of bugs in my own code to find still, I'm sure. And I still have a long way to go, but my project is starting to show signs of life. Hopefully @ersmith has some answers.