fastspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

1202123252629

Comments

  • JRoarkJRoark Posts: 102
    edited 2019-09-14 - 03:28:50
    PINTOGGLE() vs OUTPUT()
    I was going back through some of the code examples I posted previously just to make sure nothing had re-broken before I moved on, and I discovered that the output timing using both OUTPUT() and PINTOGGLE() has changed for the worse.
    const pin = 17
    direction(pin) = output
    do
        output(pin) = not(output(pin))   ' 454.551 khz @ 80 mhz clock
    loop
    
    One version ago, the code above used to toggle the pin at 5.003 mhz. Now it's running at about 1/10th the original speed.
    const pin = 17
    direction(pin) = output
    do
        'pintoggle(pin)                  ' 113.638 khz @ 80 mhz clock
    loop
    
    The code above used to toggle the pin at 192.310 Khz.

    It may be that the compiler is having to generate more (slower) code now due to some other fix, and if so, we will need to make the best of it. But that 10x slow-down on the OUTPUT-based example seems sorta extreme and worthy of a few minutes of noodling.

    Documentation
    You might consider putting a document version ID and last edited date/time either on the first page of the doc, or as a footer on each page. And while we're on the subject of docs, @ersmith if you'd like to make the documentation a collaborative effort, I'd be happy to jump-in and flesh out some of this stuff for you. You would, of course, retain editorial control and final approval. I'd just act as a shaven ape and bang some more of this stuff in. :smile:
  • JRoark wrote: »
    PINTOGGLE() vs OUTPUT()
    I was going back through some of the code examples I posted previously just to make sure nothing had re-broken before I moved on, and I discovered that the output timing using both OUTPUT() and PINTOGGLE() has changed for the worse.
    const pin = 17
    direction(pin) = output
    do
        output(pin) = not(output(pin))   ' 454.551 khz @ 80 mhz clock
    loop
    
    One version ago, the code above used to toggle the pin at 5.003 mhz. Now it's running at about 1/10th the original speed.
    Are you sure you used the same optimization settings both times? With the default optimization (-O1) I get the following code in the listing file for the main program:
    001a4 061             | _program
    001a4 061 5C EC BF 68 | 	or	dira, imm_131072_
    001a8 062 43 AC FC 5C | 	call	#LMM_FCACHE_LOAD
    001ac 063 08 00 00 00 | 	long	(@ @ @LR__0002-@ @ @LR__0001)
    001b0 064             | ' do
    001b0 064             | LR__0001
    001b0 064 5C E8 BF 6C | 	xor	outa, imm_131072_
    001b4 065 66 00 7C 5C | 	jmp	#LMM_FCACHE_START + (LR__0001 - LR__0001)
    001b8 066             | LR__0002
    001b8 066             | _program_ret
    001b8 066 3B 84 FC 5C | 	call	#LMM_RET
    
    The toggle loop looks pretty much optimal: it's two instructions per iteration, and running from COG memory (FCACHE) so I would expect it to be changing state every 8 processor cycles, or 16 processor cycles for a complete pin cycle; at 80 MHz that would correspond to a 5 MHz toggle rate, which is what you saw before.
    const pin = 17
    direction(pin) = output
    do
        'pintoggle(pin)                  ' 113.638 khz @ 80 mhz clock
    loop
    
    The code above used to toggle the pin at 192.310 Khz.
    With fastspin 3.9.32 I would expect that to do much better now, I think around 24 processor cycles per pin cycle or about 3.333 MHz. (It's a little slower than the first example because pintoggle() also sets the direction, so the inner loop has 3 instructions instead of 2).
    Documentation
    You might consider putting a document version ID and last edited date/time either on the first page of the doc, or as a footer on each page. And while we're on the subject of docs, @ersmith if you'd like to make the documentation a collaborative effort, I'd be happy to jump-in and flesh out some of this stuff for you. You would, of course, retain editorial control and final approval. I'd just act as a shaven ape and bang some more of this stuff in. :smile:

    Putting a version ID is an excellent idea. I'll have to automate that (as I have just done in github for the spin2gui version) because otherwise I'll mess it up as I have been doing with the spin2gui version :).

    I'd be very happy indeed to accept any improvements at all to fastspin/spin2gui, including in the documentation. Thanks!
  • JRoarkJRoark Posts: 102
    edited 2019-09-14 - 17:17:06
    ersmith wrote: »
    Are you sure you used the same optimization settings both times? With the default optimization (-O1) I get the following code in the listing file for the main program:
    001a4 061             | _program
    001a4 061 5C EC BF 68 | 	or	dira, imm_131072_
    001a8 062 43 AC FC 5C | 	call	#LMM_FCACHE_LOAD
    001ac 063 08 00 00 00 | 	long	(@ @ @LR__0002-@ @ @LR__0001)
    001b0 064             | ' do
    001b0 064             | LR__0001
    001b0 064 5C E8 BF 6C | 	xor	outa, imm_131072_
    001b4 065 66 00 7C 5C | 	jmp	#LMM_FCACHE_START + (LR__0001 - LR__0001)
    001b8 066             | LR__0002
    001b8 066             | _program_ret
    001b8 066 3B 84 FC 5C | 	call	#LMM_RET
    
    The toggle loop looks pretty much optimal: it's two instructions per iteration, and running from COG memory (FCACHE) so I would expect it to be changing state every 8 processor cycles, or 16 processor cycles for a complete pin cycle; at 80 MHz that would correspond to a 5 MHz toggle rate, which is what you saw before.
    const pin = 17
    direction(pin) = output
    do
        'pintoggle(pin)                  ' 113.638 khz @ 80 mhz clock
    loop
    
    The code above used to toggle the pin at 192.310 Khz.
    With fastspin 3.9.32 I would expect that to do much better now, I think around 24 processor cycles per pin cycle or about 3.333 MHz. (It's a little slower than the first example because pintoggle() also sets the direction, so the inner loop has 3 instructions instead of 2).
    Documentation
    You might consider putting a document version ID and last edited date/time either on the first page of the doc, or as a footer on each page. And while we're on the subject of docs, @ersmith if you'd like to make the documentation a collaborative effort, I'd be happy to jump-in and flesh out some of this stuff for you. You would, of course, retain editorial control and final approval. I'd just act as a shaven ape and bang some more of this stuff in. :smile:

    Putting a version ID is an excellent idea. I'll have to automate that (as I have just done in github for the spin2gui version) because otherwise I'll mess it up as I have been doing with the spin2gui version :).

    I'd be very happy indeed to accept any improvements at all to fastspin/spin2gui, including in the documentation. Thanks!

    You nailed it, Eric. I did indeed use different optimization options during the compiles, and the differences are pretty significant. The following results were taken from a stock FLIP module running at 80 Mhz:

    Using PINTOGGLE():
    const pin = 17
    direction(17) = output
    
    do
        pintoggle(17)               'No optimization: 113.638 khz 
    				'Default optimization: 3.33337 mhz
    				'Full optimization: 3.33337 mhz
    loop
    

    Using OUTPUT():
    const pin = 17
    direction(17) = output
    
    do
        output(pin) = not (output(pin))   	'No optimization: 454.551 khz 
    					'Default optimization: 5.0005 mhz
    					'Full optimization: 5.0005 mhz
    loop
    

    Documentation
    On the subject of documentation, how do you prefer submissions? Email? Or is there a way to edit the document itself remotely? Is this in an MSWord compatible format? If so, I'm thinking my first project should be getting a Table of Contents, an Index, a copyright, etc happening.

    It might be a good thing to explain the concept of LMM ("Large Memory Model") just a bit, along with a brief discussion of where it came from, why it was needed, and what it means on the Propeller family. This was way before my time, but apparently a nod to @Bill Henning is in order pursuant to this thread: https://forums.parallax.com/discussion/89640/announcing-large-memory-model-for-propeller-assembly-language-programs. The term "LMM" gets used quite a lot in the forums, and a bit in the BASIC ref doc, but it isn't defined, so someone coming to this BASIC dialect from a non-C background may get lost in re LMM.
  • RaymanRayman Posts: 9,716
    edited 2019-09-14 - 18:48:13
    I'm trying to move this code from ASM to Spin2 with ASM and have a problem...
    It seems that hubexec code no longer works right...

    When I move this hubexec code back into the cog, it works...

    Here's the example where I am trying to send serial output instructions to another cog via a mailbox.
    I've made the main cog's code very simple for troubleshooting:
    DAT     ''MainEntry    Main Cog
    'orgh
                    org 0
                    
    MainEntry
                    mov     tx_out,#"A"
                    call    #OutputCharSub  
                    waitx   ##100000000
    testing                
                    jmp     #MainEntry
    


    OutputCharSub works when inside the cog, but not in hubexec… It's also very simple:
    DAT 'OutputCharSub
    OutputCharSub   'Output character in tx_out
                  wrbyte    tx_out,##Mailbox2'tx_hexTarget 'set byte to send 
                  wrbyte    #1,tx_target 'send command #1
    OutputCharWait
                  rdbyte    tx_out,tx_target
                  cmp       tx_out,#0 wcz
            if_nz jmp       #OutputCharWait              
                  ret
    

    It appears that tx_out winds up with a different value that what I give it before the call.
    If I change tx_out to ptra, then it works...
    There is also a problem with tx_hexTarget that I've fixed here by hard coding the destination...

    Seems that registers don't work right when in hubexec when I change from all ASM to Spin+ASM.


    Any idea what's going on? This is with the latest version of fastspin.
    I use a "DAT" label to start all of my subroutines. That's not a problem is it?
    Prop Info and Apps: http://www.rayslogic.com/
  • RaymanRayman Posts: 9,716
    edited 2019-09-14 - 19:12:47
    Ok, this is really weird... If I use a different register, it works...
    Could it be that the name "tx_" does something in hubexec?

    Ok, it's really strange... If change from "tx_out" to "n3" just in these two places, it works.
    But, if I globally replace "tx_out" with "n3", it doesn't work...
    Prop Info and Apps: http://www.rayslogic.com/
  • Rayman,
    Look at the listing - it’s your friend :)

    I don’t see the ORGH $400 in your example.
    The problem is likely the jump/call addresses are wrong.

    Forcing cog or hub addresses is a mess. We will need to sort this out soon before there is a lot of code done.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • You can’t do that in spin mode, right?
    Prop Info and Apps: http://www.rayslogic.com/
  • There is an orgh before the hub exec
    Prop Info and Apps: http://www.rayslogic.com/
  • JRoark wrote: »
    Documentation
    On the subject of documentation, how do you prefer submissions? Email? Or is there a way to edit the document itself remotely? Is this in an MSWord compatible format? If so, I'm thinking my first project should be getting a Table of Contents, an Index, a copyright, etc happening.
    At the moment the document source code is checked in to github, in the doc/ directory. It's in Github markdown format, which is a human-readable ASCII text file. I use the pandoc program to convert that to .pdf for the release. So for example doc/basic.md has the documentation for the BASIC language that fastspin supports.

    I'm pretty sure pandoc can generate a table of contents automatically. Actually it must be doing something like that already, since my PDF reader is showing an outline, but perhaps it could also explicitly put the table in the printed text also.

    As for submission formats, github push requests work well, but e-mail or pretty much anything else would work fine too.

    Thanks,
    Eric
  • Rayman wrote: »
    I'm trying to move this code from ASM to Spin2 with ASM and have a problem...
    It seems that hubexec code no longer works right...

    When I move this hubexec code back into the cog, it works...

    Here's the example where I am trying to send serial output instructions to another cog via a mailbox.
    I've made the main cog's code very simple for troubleshooting:
    DAT     ''MainEntry    Main Cog
    'orgh
                    org 0
                    
    MainEntry
                    mov     tx_out,#"A"
                    call    #OutputCharSub  
                    waitx   ##100000000
    testing                
                    jmp     #MainEntry
    


    OutputCharSub works when inside the cog, but not in hubexec… It's also very simple:
    DAT 'OutputCharSub
    OutputCharSub   'Output character in tx_out
                  wrbyte    tx_out,##Mailbox2'tx_hexTarget 'set byte to send 
                  wrbyte    #1,tx_target 'send command #1
    OutputCharWait
                  rdbyte    tx_out,tx_target
                  cmp       tx_out,#0 wcz
            if_nz jmp       #OutputCharWait              
                  ret
    

    It appears that tx_out winds up with a different value that what I give it before the call.
    If I change tx_out to ptra, then it works...
    There is also a problem with tx_hexTarget that I've fixed here by hard coding the destination...

    Seems that registers don't work right when in hubexec when I change from all ASM to Spin+ASM.


    Any idea what's going on? This is with the latest version of fastspin.
    I use a "DAT" label to start all of my subroutines. That's not a problem is it?

    Just to be clear: is this code running in another COG? You didn't show where the tx_out variable is declared. The main Spin COG uses COG memory internally, and probably will get unhappy if you try to put code and/or data into its COG. I should probably set aside a reserved area for user code, but I haven't done that yet.

    Also, as @msrobots found above, there is a bug in the detection of hub labels in 3.9.31 (and earlier) in hubexec code being used as a Spin object. It's fixed in github now, I hope, but if you're using the built release it's probably best to put an explicit "@" in front of any labels that you want to use in HUB memory, e.g.:
        wrbyte tx_out, ##@Mailbox2
    
    The bug doesn't affect pure PASM code, only code that's been mixed with Spin in some way.
  • RaymanRayman Posts: 9,716
    edited 2019-09-14 - 22:56:04
    The mailbox is not the issue, it's the tx_out…

    If I add a register, like say "n3", and replace tx_out with n3 in these two places, the code works.
    But, if I do a global replacement of tx_out with n3, it stops working again...

    Something is very wrong with registers in hubexec space...

    There are 3 or 4 cogs running in this code, this one's code starts last.
    The subroutine is at around $2000 in HUB memory...

    Also, if I move the subroutine back into cog memory, it works...
    Prop Info and Apps: http://www.rayslogic.com/
  • RaymanRayman Posts: 9,716
    edited 2019-09-14 - 23:01:33
    Here's the code in question. It's big and messy.
    But, if you comment out the two Spin lines, the code works and is rigged to output an "A" character.
    If you leave the Spin in, it outputs some other character...
    Prop Info and Apps: http://www.rayslogic.com/
  • Rayman wrote: »
    The mailbox is not the issue, it's the tx_out…

    If I add a register, like say "n3", and replace tx_out with n3 in these two places, the code works.
    But, if I do a global replacement of tx_out with n3, it stops working again...

    Something is very wrong with registers in hubexec space...
    No, it isn't the registers, it is the hub labels. The registers are being corrupted because some accesses to HUB memory (including some subroutine calls) are being compiled incorrectly -- the compiler doesn't realize the labels are in HUB and treats them as COG. This generally causes all kinds of corruption and hard to track down problems.

    Here's a fixed fastspin binary that doesn't have that problem and which will compile your example at least to the point where it prints A's on the serial port.
  • Thanks! I'll try it soon...
    Prop Info and Apps: http://www.rayslogic.com/
  • Ok, this test does now work. But, the overall code still doesn't work...
    Prop Info and Apps: http://www.rayslogic.com/
  • RaymanRayman Posts: 9,716
    edited 2019-09-15 - 12:07:54
    Here it is with the test loop removed. It should update the VGA screen and send ".0065E8FF" at 115200 baud at the screen update rate of ~11 Hz or so.
    But, it doesn't draw the screen. It does seem to be outputting something over serial... I see rx light flash, but there are no characters in terminal window...

    I guess I'll just proceed to break this up into the individual pieces in several spin files and see if that helps...
    Prop Info and Apps: http://www.rayslogic.com/
  • RaymanRayman Posts: 9,716
    edited 2019-09-15 - 17:00:02
    I've split this code up into a .spin2 file for each cog's code.

    The problem is with calls to hubexec code. Something is very wrong...
    JMP to hubexec doesn't work either... Maybe the address is being calculated wrongly?
    Prop Info and Apps: http://www.rayslogic.com/
  • Here is maybe a minimal program that shows the problem.
    With the "orgh" removed it flashes the P56 led.

    Doesn't work when the "orgh" is included.
    Prop Info and Apps: http://www.rayslogic.com/
  • Maybe I figured it out... Was looking at the USB code to see how garryj did it...

    The hubexec call works when written like this:
    call    #\@HubExecTest
    
    Prop Info and Apps: http://www.rayslogic.com/
  • Actually, it only toggles the LED ~5 times and then goes off the rails...
    Looks like I need the #\@ in the hubexec loop too, like this:
    orgh
    
    HubExecTest
    testing 
     
                    waitx   ##100000000
                    drvNOT    #56
                   
                    jmp     #\@testing
    
    Prop Info and Apps: http://www.rayslogic.com/
  • Something even stranger... Hubexec calls put in after this first one mess up the code, unless they are fixed with "#\@".
    How can code after the test call mess things up?
    Prop Info and Apps: http://www.rayslogic.com/
  • RaymanRayman Posts: 9,716
    edited 2019-09-15 - 18:38:46
    I'm starting to get it working...
    Looks like all calls or jumps to hubexec need the "#\@"
    Inside hubexec, appears you need "#\@" to jump forward but not backwards...
    Prop Info and Apps: http://www.rayslogic.com/
  • I think I'm seeing that "LOC" doesn't work the same in Spin2 as it does in ASM...
    Had to change:
    loc    ptrb,#OffscreenBufferAddress
    
    to
    mov     ptrb,##OffscreenBufferAddress
    
    Prop Info and Apps: http://www.rayslogic.com/
  • Rayman wrote: »
    I think I'm seeing that "LOC" doesn't work the same in Spin2 as it does in ASM...
    Had to change:
    loc    ptrb,#OffscreenBufferAddress
    
    to
    mov     ptrb,##OffscreenBufferAddress
    

    Sorry, this is another version of the Spin hub recognition problem (it's also why you were having to put #\@ in branches). Here's a newer version of fastspin that should fix it. Thanks for testing and reporting this; most of my Spin object tests have been with COG and LUT, and the hubexec examples were too simple to show up the issues.

  • Thanks. I'll try it. I'm amazed garryj was able to push through...
    Prop Info and Apps: http://www.rayslogic.com/
  • Rayman wrote: »
    Thanks. I'll try it. I'm amazed garryj was able to push through...

    I think his objects didn't use hubexec. Indeed, you're somewhat of a pioneer here (with all the hazards that come with that): I think up until now most Spin programmers have followed the traditional P1 model of putting PASM code in COG (or perhaps LUT) and writing all the HUB code in Spin. But definitely being able to use HUB for some of the PASM is a useful feature, so thanks for your patience and for helping to debug this!

  • Peter and I have written hubexec code - it's in the ROM :smiley:
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • ersmithersmith Posts: 3,423
    edited 2019-09-16 - 11:12:34
    Cluso99 wrote: »
    Peter and I have written hubexec code - it's in the ROM :smiley:

    Of course, and I have written lots of hubexec code too. Sorry I was unclear, I was referring to writing hubexec PASM code for Spin objects, not standalone PASM code. fastspin has to treat these cases differently, because the hubexec code in an object needs to be relocated to wherever the object ends up in memory, whereas in plain PASM we always know all the addresses from the beginning.

  • RaymanRayman Posts: 9,716
    edited 2019-09-16 - 13:15:42
    No garryj usb code uses hubexec
    That’s how I figured out the workaround

    But it’s much nicer if can be just like with asm only... hope you can make that work...

    It's really nice when you can move code between cog and hub and LUT without making any changes to the code...
    Prop Info and Apps: http://www.rayslogic.com/
  • Rayman wrote: »
    No garryj usb code uses hubexec
    That’s how I figured out the workaround

    But it’s much nicer if can be just like with asm only... hope you can make that work...
    Yes, definitely the hubexec should work the same in PASM and Spin.

    I think the last version of fastspin I posted should fix the loc and jump problems you saw. Have you had a chance to give it a try yet?
Sign In or Register to comment.