flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

JRoark · 2019-09-14 02:57

PINTOGGLE() vs OUTPUT()
I was going back through some of the code examples I posted previously just to make sure nothing had re-broken before I moved on, and I discovered that the output timing using both OUTPUT() and PINTOGGLE() has changed for the worse.

const pin = 17
direction(pin) = output
do
    output(pin) = not(output(pin))   ' 454.551 khz @ 80 mhz clock
loop

One version ago, the code above used to toggle the pin at 5.003 mhz. Now it's running at about 1/10th the original speed.

const pin = 17
direction(pin) = output
do
    'pintoggle(pin)                  ' 113.638 khz @ 80 mhz clock
loop

The code above used to toggle the pin at 192.310 Khz.

It may be that the compiler is having to generate more (slower) code now due to some other fix, and if so, we will need to make the best of it. But that 10x slow-down on the OUTPUT-based example seems sorta extreme and worthy of a few minutes of noodling.

Documentation
You might consider putting a document version ID and last edited date/time either on the first page of the doc, or as a footer on each page. And while we're on the subject of docs, @ersmith if you'd like to make the documentation a collaborative effort, I'd be happy to jump-in and flesh out some of this stuff for you. You would, of course, retain editorial control and final approval. I'd just act as a shaven ape and bang some more of this stuff in.

ersmith · 2019-09-14 14:01

JRoark wrote: »
PINTOGGLE() vs OUTPUT()
I was going back through some of the code examples I posted previously just to make sure nothing had re-broken before I moved on, and I discovered that the output timing using both OUTPUT() and PINTOGGLE() has changed for the worse.
const pin = 17
direction(pin) = output
do
    output(pin) = not(output(pin))   ' 454.551 khz @ 80 mhz clock
loop
One version ago, the code above used to toggle the pin at 5.003 mhz. Now it's running at about 1/10th the original speed.

Are you sure you used the same optimization settings both times? With the default optimization (-O1) I get the following code in the listing file for the main program:

001a4 061             | _program
001a4 061 5C EC BF 68 | 	or	dira, imm_131072_
001a8 062 43 AC FC 5C | 	call	#LMM_FCACHE_LOAD
001ac 063 08 00 00 00 | 	long	(@ @ @LR__0002-@ @ @LR__0001)
001b0 064             | ' do
001b0 064             | LR__0001
001b0 064 5C E8 BF 6C | 	xor	outa, imm_131072_
001b4 065 66 00 7C 5C | 	jmp	#LMM_FCACHE_START + (LR__0001 - LR__0001)
001b8 066             | LR__0002
001b8 066             | _program_ret
001b8 066 3B 84 FC 5C | 	call	#LMM_RET

The toggle loop looks pretty much optimal: it's two instructions per iteration, and running from COG memory (FCACHE) so I would expect it to be changing state every 8 processor cycles, or 16 processor cycles for a complete pin cycle; at 80 MHz that would correspond to a 5 MHz toggle rate, which is what you saw before.

const pin = 17
direction(pin) = output
do
    'pintoggle(pin)                  ' 113.638 khz @ 80 mhz clock
loop
The code above used to toggle the pin at 192.310 Khz.

With fastspin 3.9.32 I would expect that to do much better now, I think around 24 processor cycles per pin cycle or about 3.333 MHz. (It's a little slower than the first example because pintoggle() also sets the direction, so the inner loop has 3 instructions instead of 2).

Documentation
You might consider putting a document version ID and last edited date/time either on the first page of the doc, or as a footer on each page. And while we're on the subject of docs, @ersmith if you'd like to make the documentation a collaborative effort, I'd be happy to jump-in and flesh out some of this stuff for you. You would, of course, retain editorial control and final approval. I'd just act as a shaven ape and bang some more of this stuff in.

Putting a version ID is an excellent idea. I'll have to automate that (as I have just done in github for the spin2gui version) because otherwise I'll mess it up as I have been doing with the spin2gui version

.

I'd be very happy indeed to accept any improvements at all to fastspin/spin2gui, including in the documentation. Thanks!

JRoark · 2019-09-14 16:38

ersmith wrote: »
Are you sure you used the same optimization settings both times? With the default optimization (-O1) I get the following code in the listing file for the main program:
001a4 061             | _program
001a4 061 5C EC BF 68 | 	or	dira, imm_131072_
001a8 062 43 AC FC 5C | 	call	#LMM_FCACHE_LOAD
001ac 063 08 00 00 00 | 	long	(@ @ @LR__0002-@ @ @LR__0001)
001b0 064             | ' do
001b0 064             | LR__0001
001b0 064 5C E8 BF 6C | 	xor	outa, imm_131072_
001b4 065 66 00 7C 5C | 	jmp	#LMM_FCACHE_START + (LR__0001 - LR__0001)
001b8 066             | LR__0002
001b8 066             | _program_ret
001b8 066 3B 84 FC 5C | 	call	#LMM_RET
The toggle loop looks pretty much optimal: it's two instructions per iteration, and running from COG memory (FCACHE) so I would expect it to be changing state every 8 processor cycles, or 16 processor cycles for a complete pin cycle; at 80 MHz that would correspond to a 5 MHz toggle rate, which is what you saw before.
const pin = 17
direction(pin) = output
do
    'pintoggle(pin)                  ' 113.638 khz @ 80 mhz clock
loop
The code above used to toggle the pin at 192.310 Khz.
With fastspin 3.9.32 I would expect that to do much better now, I think around 24 processor cycles per pin cycle or about 3.333 MHz. (It's a little slower than the first example because pintoggle() also sets the direction, so the inner loop has 3 instructions instead of 2).

Documentation
You might consider putting a document version ID and last edited date/time either on the first page of the doc, or as a footer on each page. And while we're on the subject of docs, @ersmith if you'd like to make the documentation a collaborative effort, I'd be happy to jump-in and flesh out some of this stuff for you. You would, of course, retain editorial control and final approval. I'd just act as a shaven ape and bang some more of this stuff in.

Putting a version ID is an excellent idea. I'll have to automate that (as I have just done in github for the spin2gui version) because otherwise I'll mess it up as I have been doing with the spin2gui version .

I'd be very happy indeed to accept any improvements at all to fastspin/spin2gui, including in the documentation. Thanks!

You nailed it, Eric. I did indeed use different optimization options during the compiles, and the differences are pretty significant. The following results were taken from a stock FLIP module running at 80 Mhz:

Using PINTOGGLE():

const pin = 17
direction(17) = output

do
    pintoggle(17)               'No optimization: 113.638 khz 
				'Default optimization: 3.33337 mhz
				'Full optimization: 3.33337 mhz
loop

Using OUTPUT():

const pin = 17
direction(17) = output

do
    output(pin) = not (output(pin))   	'No optimization: 454.551 khz 
					'Default optimization: 5.0005 mhz
					'Full optimization: 5.0005 mhz
loop

Documentation
On the subject of documentation, how do you prefer submissions? Email? Or is there a way to edit the document itself remotely? Is this in an MSWord compatible format? If so, I'm thinking my first project should be getting a Table of Contents, an Index, a copyright, etc happening.

It might be a good thing to explain the concept of LMM ("Large Memory Model") just a bit, along with a brief discussion of where it came from, why it was needed, and what it means on the Propeller family. This was way before my time, but apparently a nod to @"Bill Henning" is in order pursuant to this thread: https://forums.parallax.com/discussion/89640/announcing-large-memory-model-for-propeller-assembly-language-programs. The term "LMM" gets used quite a lot in the forums, and a bit in the BASIC ref doc, but it isn't defined, so someone coming to this BASIC dialect from a non-C background may get lost in re LMM.

Rayman · 2019-09-14 18:32

I'm trying to move this code from ASM to Spin2 with ASM and have a problem...
It seems that hubexec code no longer works right...

When I move this hubexec code back into the cog, it works...

Here's the example where I am trying to send serial output instructions to another cog via a mailbox.
I've made the main cog's code very simple for troubleshooting:

DAT     ''MainEntry    Main Cog
'orgh
                org 0
                
MainEntry
                mov     tx_out,#"A"
                call    #OutputCharSub  
                waitx   ##100000000
testing                
                jmp     #MainEntry

OutputCharSub works when inside the cog, but not in hubexec… It's also very simple:

DAT 'OutputCharSub
OutputCharSub   'Output character in tx_out
              wrbyte    tx_out,##Mailbox2'tx_hexTarget 'set byte to send 
              wrbyte    #1,tx_target 'send command #1
OutputCharWait
              rdbyte    tx_out,tx_target
              cmp       tx_out,#0 wcz
        if_nz jmp       #OutputCharWait              
              ret

It appears that tx_out winds up with a different value that what I give it before the call.
If I change tx_out to ptra, then it works...
There is also a problem with tx_hexTarget that I've fixed here by hard coding the destination...

Seems that registers don't work right when in hubexec when I change from all ASM to Spin+ASM.

Any idea what's going on? This is with the latest version of fastspin.
I use a "DAT" label to start all of my subroutines. That's not a problem is it?

Rayman · 2019-09-14 18:54

Ok, this is really weird... If I use a different register, it works...
Could it be that the name "tx_" does something in hubexec?

Ok, it's really strange... If change from "tx_out" to "n3" just in these two places, it works.
But, if I globally replace "tx_out" with "n3", it doesn't work...

Cluso99 · 2019-09-14 21:07

Rayman,
Look at the listing - it’s your friend

I don’t see the ORGH $400 in your example.
The problem is likely the jump/call addresses are wrong.

Forcing cog or hub addresses is a mess. We will need to sort this out soon before there is a lot of code done.

Rayman · 2019-09-14 21:30

You can’t do that in spin mode, right?

Rayman · 2019-09-14 21:32

There is an orgh before the hub exec

ersmith · 2019-09-14 22:05

JRoark wrote: »

Documentation
On the subject of documentation, how do you prefer submissions? Email? Or is there a way to edit the document itself remotely? Is this in an MSWord compatible format? If so, I'm thinking my first project should be getting a Table of Contents, an Index, a copyright, etc happening.

At the moment the document source code is checked in to github, in the doc/ directory. It's in Github markdown format, which is a human-readable ASCII text file. I use the pandoc program to convert that to .pdf for the release. So for example doc/basic.md has the documentation for the BASIC language that fastspin supports.

I'm pretty sure pandoc can generate a table of contents automatically. Actually it must be doing something like that already, since my PDF reader is showing an outline, but perhaps it could also explicitly put the table in the printed text also.

As for submission formats, github push requests work well, but e-mail or pretty much anything else would work fine too.

Thanks,
Eric

ersmith · 2019-09-14 22:13

Rayman wrote: »
I'm trying to move this code from ASM to Spin2 with ASM and have a problem...
It seems that hubexec code no longer works right...

When I move this hubexec code back into the cog, it works...

Here's the example where I am trying to send serial output instructions to another cog via a mailbox.
I've made the main cog's code very simple for troubleshooting:
DAT     ''MainEntry    Main Cog
'orgh
                org 0
                
MainEntry
                mov     tx_out,#"A"
                call    #OutputCharSub  
                waitx   ##100000000
testing                
                jmp     #MainEntry
OutputCharSub works when inside the cog, but not in hubexec… It's also very simple:
DAT 'OutputCharSub
OutputCharSub   'Output character in tx_out
              wrbyte    tx_out,##Mailbox2'tx_hexTarget 'set byte to send 
              wrbyte    #1,tx_target 'send command #1
OutputCharWait
              rdbyte    tx_out,tx_target
              cmp       tx_out,#0 wcz
        if_nz jmp       #OutputCharWait              
              ret
It appears that tx_out winds up with a different value that what I give it before the call.
If I change tx_out to ptra, then it works...
There is also a problem with tx_hexTarget that I've fixed here by hard coding the destination...

Seems that registers don't work right when in hubexec when I change from all ASM to Spin+ASM.

Any idea what's going on? This is with the latest version of fastspin.
I use a "DAT" label to start all of my subroutines. That's not a problem is it?

Just to be clear: is this code running in another COG? You didn't show where the tx_out variable is declared. The main Spin COG uses COG memory internally, and probably will get unhappy if you try to put code and/or data into its COG. I should probably set aside a reserved area for user code, but I haven't done that yet.

Also, as @msrobots found above, there is a bug in the detection of hub labels in 3.9.31 (and earlier) in hubexec code being used as a Spin object. It's fixed in github now, I hope, but if you're using the built release it's probably best to put an explicit "@" in front of any labels that you want to use in HUB memory, e.g.:

    wrbyte tx_out, ##@Mailbox2

The bug doesn't affect pure PASM code, only code that's been mixed with Spin in some way.

Rayman · 2019-09-14 22:54

The mailbox is not the issue, it's the tx_out…

If I add a register, like say "n3", and replace tx_out with n3 in these two places, the code works.
But, if I do a global replacement of tx_out with n3, it stops working again...

Something is very wrong with registers in hubexec space...

There are 3 or 4 cogs running in this code, this one's code starts last.
The subroutine is at around $2000 in HUB memory...

Also, if I move the subroutine back into cog memory, it works...

Rayman · 2019-09-14 23:01

Here's the code in question. It's big and messy.
But, if you comment out the two Spin lines, the code works and is rigged to output an "A" character.
If you leave the Spin in, it outputs some other character...

ersmith · 2019-09-14 23:33

Rayman wrote: »

The mailbox is not the issue, it's the tx_out…

If I add a register, like say "n3", and replace tx_out with n3 in these two places, the code works.
But, if I do a global replacement of tx_out with n3, it stops working again...

Something is very wrong with registers in hubexec space...

No, it isn't the registers, it is the hub labels. The registers are being corrupted because some accesses to HUB memory (including some subroutine calls) are being compiled incorrectly -- the compiler doesn't realize the labels are in HUB and treats them as COG. This generally causes all kinds of corruption and hard to track down problems.

Here's a fixed fastspin binary that doesn't have that problem and which will compile your example at least to the point where it prints A's on the serial port.

Rayman · 2019-09-15 00:18

Thanks! I'll try it soon...

Rayman · 2019-09-15 11:09

Ok, this test does now work. But, the overall code still doesn't work...

Rayman · 2019-09-15 12:06

Here it is with the test loop removed. It should update the VGA screen and send ".0065E8FF" at 115200 baud at the screen update rate of ~11 Hz or so.
But, it doesn't draw the screen. It does seem to be outputting something over serial... I see rx light flash, but there are no characters in terminal window...

I guess I'll just proceed to break this up into the individual pieces in several spin files and see if that helps...

Rayman · 2019-09-15 16:57

I've split this code up into a .spin2 file for each cog's code.

The problem is with calls to hubexec code. Something is very wrong...
JMP to hubexec doesn't work either... Maybe the address is being calculated wrongly?

Rayman · 2019-09-15 17:16

Here is maybe a minimal program that shows the problem.
With the "orgh" removed it flashes the P56 led.

Doesn't work when the "orgh" is included.

Rayman · 2019-09-15 17:38

Maybe I figured it out... Was looking at the USB code to see how garryj did it...

The hubexec call works when written like this:

call    #\@HubExecTest

Rayman · 2019-09-15 18:02

Actually, it only toggles the LED ~5 times and then goes off the rails...
Looks like I need the #\@ in the hubexec loop too, like this:

orgh

HubExecTest
testing 
 
                waitx   ##100000000
                drvNOT    #56
               
                jmp     #\@testing

Rayman · 2019-09-15 18:24

Something even stranger... Hubexec calls put in after this first one mess up the code, unless they are fixed with "#\@".
How can code after the test call mess things up?

Rayman · 2019-09-15 18:38

I'm starting to get it working...
Looks like all calls or jumps to hubexec need the "#\@"
Inside hubexec, appears you need "#\@" to jump forward but not backwards...

Rayman · 2019-09-15 19:45

I think I'm seeing that "LOC" doesn't work the same in Spin2 as it does in ASM...
Had to change:

loc    ptrb,#OffscreenBufferAddress

to

mov     ptrb,##OffscreenBufferAddress

ersmith · 2019-09-15 23:45

Rayman wrote: »
I think I'm seeing that "LOC" doesn't work the same in Spin2 as it does in ASM...
Had to change:
loc    ptrb,#OffscreenBufferAddress
to
mov     ptrb,##OffscreenBufferAddress

Sorry, this is another version of the Spin hub recognition problem (it's also why you were having to put #\@ in branches). Here's a newer version of fastspin that should fix it. Thanks for testing and reporting this; most of my Spin object tests have been with COG and LUT, and the hubexec examples were too simple to show up the issues.

Rayman · 2019-09-16 00:46

Thanks. I'll try it. I'm amazed garryj was able to push through...

ersmith · 2019-09-16 10:15

Rayman wrote: »

Thanks. I'll try it. I'm amazed garryj was able to push through...

I think his objects didn't use hubexec. Indeed, you're somewhat of a pioneer here (with all the hazards that come with that): I think up until now most Spin programmers have followed the traditional P1 model of putting PASM code in COG (or perhaps LUT) and writing all the HUB code in Spin. But definitely being able to use HUB for some of the PASM is a useful feature, so thanks for your patience and for helping to debug this!

Cluso99 · 2019-09-16 10:52

Peter and I have written hubexec code - it's in the ROM

ersmith · 2019-09-16 11:08

Cluso99 wrote: »

Peter and I have written hubexec code - it's in the ROM

Of course, and I have written lots of hubexec code too. Sorry I was unclear, I was referring to writing hubexec PASM code for Spin objects, not standalone PASM code. fastspin has to treat these cases differently, because the hubexec code in an object needs to be relocated to wherever the object ends up in memory, whereas in plain PASM we always know all the addresses from the beginning.

Rayman · 2019-09-16 12:12

No garryj usb code uses hubexec
That’s how I figured out the workaround

But it’s much nicer if can be just like with asm only... hope you can make that work...

It's really nice when you can move code between cog and hub and LUT without making any changes to the code...

ersmith · 2019-09-16 13:54

Rayman wrote: »

No garryj usb code uses hubexec
That’s how I figured out the workaround

But it’s much nicer if can be just like with asm only... hope you can make that work...

Yes, definitely the hubexec should work the same in PASM and Spin.

I think the last version of fastspin I posted should fix the loc and jump problems you saw. Have you had a chance to give it a try yet?

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

Comments