P2 Experiments- P1 source to P2 Fastspin

So I've played with TaqOZ a bit and now I'm wanting to try converting P1 source to P2. I ran my touchscreen desktop app through fastspin and got what looks like a complete ASM output. I haven't gone through all of it yet but it looks like it should run. I'm wondering if there's any known gotchas I should watch out for? I'm finishing up new hardware now, waiting on a couple final things. Progress has been slow over the holidays but I'm getting ready for another big push after the new year.

Thanks for any advice!

Comments

  • whickerwhicker Posts: 413
    edited 2018-12-31 - 06:29:18
    Here is what I'm encountering with Fastspin:

    I am using Windows 7.

    1) I heard about issues with loadp2.exe so I grabbed and compiled the version from here:
    https://github.com/davehein/p2gcc

    I placed the newly compiled loadp2.exe version 0.007 into the /bin folder of spin2gui


    2) When running spin2gui, I had to open the menu labeled "Commands" and select "Configure Commands..."
    This brings up a window called "Executable Paths"
    I pressed the [P2 Defaults] button, but had to change the text in "Run Command" to this path:
    cmd.exe /c start %D/bin/loadp2 %B -CHIP -t -v
    
    Press OK to close the configure commands window.

    3) I opened up the sample file blink2.bas. Changed the first lines to this, and saved:
    const LED1 = 56
    const LED2 = 57
    

    4) Pressed the "Compile & Run" button.
    Got this in the terminal, because I'm using -v for loadp2 (verbose mode):
    Searching serial ports for a P2
    P2 version A found on serial port com9
    Setting clock_mode to 10c1f08
    Setting user_baud to 115200
    Loading C:/Propeller/spin2gui/samples/blink2.binary - 1440 bytes
    C:/Propeller/spin2gui/samples/blink2.binary loaded
    [ Entering terminal mode.  Press ESC to exit. ]
    

    5) Now I'm stuck. It looks like fastspin inserts a hubset #255 into the asm code. From various other comments, this isn't going to work on the P2-EVAL board.

    Here is the beginning of the .lst file it generates for blink2.bas, with me having changed the pin numbers of led1 and led2.
    00000                 | con
    00000                 | 	led1 = 56
    00000                 | 	led2 = 57
    00000                 | dat
    00000 000             | 	org	0
    00000 000 
    00000 000             | entry
    00000 000 00F00FF2    | 	cmp	ptra, #0 wz
    00004 001 1800905D    |  if_ne	jmp	#spininit
    00008 002 30F003F6    | 	mov	ptra, objptr
    0000c 003 00F007F1    | 	add	ptra, #0
    00010 004 00FE65FD    | 	hubset	#255
    00014 005 D404C0FD    | 	calla	#@_program
    

  • twm47099twm47099 Posts: 777
    edited 2018-12-31 - 07:13:13
    I rename the .pasm file to .spin2 and then open the .spin2 file and change the hubset #255 to hubset #0 then press compile & run (the spin2 file). That will run the P2-ES using RCFAST (20MHz). There are changes you can make to the spin2 file to change the clock speed:
    https://forums.parallax.com/discussion/comment/1459371/#Comment_1459371
  • whickerwhicker Posts: 413
    edited 2018-12-31 - 07:52:10
    Thanks @twm47099 !!!

    When changing the .p2asm file to .spin2 as you described, and adding the _CLOCKFREQ constants from that message, I think I had to go down further below to set the clock frequency as well in the COG_BSS_START section, if present?
    | COG_BSS_START
    | 	fit	496
    | 	orgh	$400
    | 	'SEEMS TO MAKE A DIFFERENCE? First long is CPU Clock?
    | 	long	_CLOCKFREQ	'was long 80000000
    | 	...
    | hubentry
    
  • So @cheezus, a gotcha is it looks like fastspin is not currently generating code for setting the CPU clock frequency correctly.
  • Try commenting out the "hubset #$ff". By default, loadp2 will set up the system clock to 80 MHz. If you want the user program to set up the clock, then you have to specify the -SINGLE option to loadp2.
  • There's a new fastspin / spin2gui release that should work much better. It no longer sets the clock automatically, you'll have to either have loadp2 do that or else insert an explicit clkset(mode, freq) at the beginning of your Spin or BASIC code.
  • Thanks guys! I've been working on getting the new hardware ready, turns out I'm short on pin headers :| I'll have to figure out something to keep me busy while I wait.


    I'm really impressed with fastspin, I need to clean up the project so it will compile for P1. I'm still not clear on the translation of DAT sections. It LOOKS like they have been parsed to P2asm correctly (of course there may be issues of timing or something?) but it can't be THAT simple can it?
  • potatoheadpotatohead Posts: 9,698
    edited 2018-12-31 - 23:19:04
    Why not?

    That is what normally happens. Assembly boils down to just being a set of directives to assemble a binary file. It may or may not be instructions. Could just be data.
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • cheezus wrote: »
    I'm really impressed with fastspin, I need to clean up the project so it will compile for P1. I'm still not clear on the translation of DAT sections. It LOOKS like they have been parsed to P2asm correctly (of course there may be issues of timing or something?) but it can't be THAT simple can it?

    I'm glad it's working for you. fastspin doesn't try to do any translation of P1 asm to P2 asm, so you're probably just getting lucky if your DAT section has much legacy P1 assembly code in it. The P2 instruction set is "mostly" a superset of P1, and I have kept a few P1 assembly conventions (e.g. fastspin accepts "wc,wz" as well as "wcz"), so some P1 asm will compile OK. But in many cases you'll need to put an "#ifdef __P2__" in the DAT section with the P2 specific assembly.

    The Spin code (outside of DAT) should mostly translate fine, so the more you can put in Spin the better. Performance shouldn't be an issue at all, fastspin is compiling to hubexec code on P2, so your P2 compiled Spin code will probably run faster than hand-crafted P1 assembly.
  • So having spent some more time writing spin code, compiling through fastspin and looking at the output I'm even more impressed than I was initially. I've still been using the proptool for the bulk of editing but I think it's time for me to look at some of the other editors. I've always used proptool and it's just that comfortable place to develop for the p1. There's a lot of nice offerings, I've just got to be willing to get out of my comfort zone.

    The one thing that I'm looking for in a toolchain is conditional compilation. If I understand correctly fastspin has a preprocessor, with built in directives?

    I still haven't tried to run any code on the P2 yet, I really wanted to get something working on the p1 first. I finally got the Spin p1 version working and before going to bed I thought I'd try fastspin and... well it sure lives up to its name. It went from some 10s of seconds to draw a 320x240 blank screen to at least a refresh a second? This is with no work to optimize the spin code for speed. I'd LIKE to create an object that works for P1 and P2 and it seems doable but there's a lot of gotchas with the current codebase.

    Is there a "fastspin" manual I'm missing?

    Thanks as always to all the forum members here, you are awesome!
  • cheezus wrote: »
    I've still been using the proptool for the bulk of editing but I think it's time for me to look at some of the other editors. I've always used proptool and it's just that comfortable place to develop for the p1.
    It's definitely worth checking out @Rayman 's SpinEdit, it's got an interface very similar to proptool but can use openspin or fastspin to do the compiling.
    The one thing that I'm looking for in a toolchain is conditional compilation. If I understand correctly fastspin has a preprocessor, with built in directives?

    Yes. See the docs/spin.md file in the fastspin or spin2gui release. It's a basic subset of the C preprocessor, with #define, #ifdef / #else / #endif, and #include. There are some predefined symbols, like __P2__ if compiling for the P2 (again, documented in the spin.md file; that's a GitHub mark-down file, so readable as plain text).

  • I've been converting the markdown (.md) files to pdf for my own use when i grab a new release.
    makes for an easier read.
  • cheezuscheezus Posts: 126
    edited 2019-02-09 - 06:02:13
    Okay, I did it again! Still working on P1 source..

    Seems fastspin doesn't like the copy of fsrw I've been using, giving me an error that's got me scratching my head right now.
    "fsrw_Ray.Spin(568) error: internal error : expected statement list, got 87"

    So I go looking for an 87 in the code and can't find a dec, hex or binary... But line 568 is a blank line so I thought maybe something in the last function isn't properly formatted or something. I'm sure it's something simple really.
    pub nextfile(fbuf) | i, t, at, lns
    {{
    '   Find the next file in the root directory and extract its
    '   (8.3) name into fbuf.  Fbuf must be sized to hold at least
    '   13 characters (8 + 1 + 3 + 1).  If there is no next file,
    '   -1 will be returned.  If there is, 0 will be returned.
    }}
       repeat
          if (bufat => bufend)
             t := pfillbuf
             if (t < 0)
                return t
             if (((floc >> SECTORSHIFT) & ((1 << clustershift) - 1)) == 0)
                fclust++
          at := @buf + bufat
          if (byte[at] == 0)
             return -1
          bufat += DIRSIZE
          if (byte[at] <> $e5 and (byte[at][$0b] & $18) == 0)
             lns := fbuf
             repeat i from 0 to 10
                byte[fbuf] := byte[at][i]
                fbuf++
                if (byte[at][i] <> " ")
                   lns := fbuf
                if (i == 7 or i == 10)
                   fbuf := lns
                   if (i == 7)
                      byte[fbuf] := "."
                      fbuf++
             byte[fbuf] := 0
             fsize2:= brlong(at+$1c)
             return 0
                 
    

    I was finally able to trim down Touch.Spin enough to compile the desktop app. I'm still rewriting the ASM driver in Spin... Got a long way to go and starting to wonder if I should just bypass this step and jump to P2asm. I wish I had the time and "resources" to devote to serious development... But, almost 3 year old twin boys who think playing with wires and cords is almost as amazing as beating each other with plastic baseball bats; a cat that thinks jumper wires are the best thing since hair ties and feathers; as well as preparing for a move all slow the process.


    btw, I still haven't rtfm :P

    *edit, added sources i'm trying to build, duh
  • An "internal error" is pretty much always a bug in fastspin: it indicates that some sanity check that I added to the code failed.

    In this case it looks like fastspin messed up compiling the line:
      \sdspi.readblock(0, @buf)
    
    in the mount(basepin) function in fsrw_Ray.spin. I'll try to sort out what's wrong, but for now probably removing the \ will get it going.


  • Thanks as always for taking a look at that.

    One of the things I noticed after I posted was touch.spin compiled program size comes out to 35580 bytes? @msrobots mentioned running out of memory and this is entirely possible. I've been trying to shave longs by moving all large constants to a DAT block. Not sure if this is the best way to go...
  • cheezus wrote: »
    Thanks as always for taking a look at that.

    One of the things I noticed after I posted was touch.spin compiled program size comes out to 35580 bytes? @msrobots mentioned running out of memory and this is entirely possible. I've been trying to shave longs by moving all large constants to a DAT block. Not sure if this is the best way to go...

    Yes, unfortunately fastspin binaries are quite a bit bigger than regular Spin ones, since they compile to PASM rather than to compressed bytecode.

    Moving large constants to a DAT block probably won't make much difference; the compiler tries to keep only one copy of each large immediate constant. But it won't hurt either, and maybe you'll notice some things the compiler missed. I kind of doubt you'll get 3K back though :(.

    I have some ideas on how to allow fastspin to compress instructions (like PropGCC's CMM mode), but that isn't going to happen in the near future.
  • ersmith wrote: »
    Yes, unfortunately fastspin binaries are quite a bit bigger than regular Spin ones, since they compile to PASM rather than to compressed bytecode.

    Moving large constants to a DAT block probably won't make much difference; the compiler tries to keep only one copy of each large immediate constant. But it won't hurt either, and maybe you'll notice some things the compiler missed. I kind of doubt you'll get 3K back though :(.

    I have some ideas on how to allow fastspin to compress instructions (like PropGCC's CMM mode), but that isn't going to happen in the near future.

    When I started playing with this I wasn't even sure it would work but now I'm starting to see potential. Part of the reason for these experiments is I'm trying to understand ways to optimize as well as restructure things. As it stands now, there's a lot of code that only runs once like display inits. This is one place I've always thought things could be improved. I've thought about putting the init code into CSV files on the SD and just parse though a file for display init but it really seems best to have the display init code in EEPROM in case there's something wrong with the SD I can still init the display and print debugging info.

    For a while I used a small program to init the display, start the SD and chain into another program that actually loaded the desktop. From here each program could be chained to and when finished kicks back to the EEPROM. This would check the SRAM to see if the desktop attributes were still in ram and if so skip loading the desktop and just display the pre-loaded desktop. The return to desktop from a running app is instant, which is nice because loading an 800x480 image from SD seems like it takes forever!

    There were several problems with doing things this way but the biggest annoyance, other than loading the desktop for the first time, was switching between displays. The previous hardware design didn't use the LCD /RD and simply pulled up to 3v3. The new hardware should allow me to determine which controller is plugged in, instead of changing a line in the source code and recompiling the entire package. The biggest problem I'm having is figuring out how to handle 2 different resolutions, as for a long time the 320x240 "package" was completely separate from the 800x480.

    Getting back the 3k is going to be tricky but with 1mw of SRAM at least I have some place to stuff data between programs run off SD. Its very interesting looking at the spin code and then seeing the pasm output. I haven't really noticed your compiler "slacking off" anywhere and the moving large constants is just to satisfy the FIT directive. I think I may have made some mistakes and need to double check pointer vs register. I'm also thinking that display init data could be overwritten for buffers if I could wrap my head around that mess.
        sdbuffer                    byte 0[1024]                   ' 1024 byte buffer for sd card interface and also for sending one row 320 x3 = 960 bytes max picture size 
        buffer2                     byte 0[512]                    ' 512 general purpose hub buffer 
        rambuffer                   word 0[1024]                    ' 512 byte (256 word) buffer easier for using with words, needed to double this
    

    I was having trouble with some piece of code overflowing the rambuffer so that's 4x larger than it needs to be. There's plenty of places where this code can be optimized, I'm just trying to understand the best ways to optimize for P1 and looking for ideas in the P2 rewrite.
  • If you're running out of COG memory (and yes, having lots of large constants will do that :( ), have you tried disabling the code cache with --fcache=0? That'll reclaim a fair chunk of COG memory pretty easily, albeit with a bit of a hit to performance.
Sign In or Register to comment.