flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

David Betz · 2018-05-14 16:41

Dave Hein wrote: »

The Propeller Tool hasn't been updated for 7 years, so I doubt that it will support P2 Spin. SimpleIDE, and then later, PropellerIDE were developed to support the P1. The PropellerIDE may be updated to support P2 Spin, or maybe a whole new IDE will be developed.

It is possible to run P1 Spin on the P2. I got that working a few years ago for P2-Hot. So in that way it would be possible to use the 7-year old version of the Propeller Tool to write and run Spin programs on the P2.

How would you load the programs? Does your P1 emulator for P2 contain a P1-compatible loader?

Heater. · 2018-05-14 17:23

Digging up the Propeller Tool for the P2 seems like a waste. It's closed source, single platform, does not run anywhere useful.

Please, not "a whole new IDE", doesn't the world have enough IDE's already ?!

Getting P2 support into PropellerIDE sounds like a better idea. Better yet might be a plugin for P2 support in VS Code.

Sadly I don't have the skills or time to tackle any of this.

David Betz · 2018-05-14 17:27

Heater. wrote: »

Digging up the Propeller Tool for the P2 seems like a waste. It's closed source, single platform, does not run anywhere useful.

Please, not "a whole new IDE", doesn't the world have enough IDE's already ?!

Getting P2 support into PropellerIDE sounds like a better idea. Better yet might be a plugin for P2 support in VS Code.

Sadly I don't have the skills or time to tackle any of this.

Before adding P2 support to PropellerIDE I would think it would be necessary to find someone new to take over PropellerIDE. It seems to be unsupported at present.

Heater. · 2018-05-14 17:43

OK, scratch the PropellerIDE idea. It's dead.

I can't help feeling that a whole new IDE would suffer the same fate.

Surely a plugin for an existing IDE would be a lot easier, quicker, cheaper and require less support in the years ahead?

Not IntelliJ though. That is comlex and clunky to use, takes ages to start up and uses ridiculous amounts of RAM. What with being written in Java. I'm having to use it now for my SpinalHDL RISC V project.

Not Eclipse. That is comlex and clunky to use, takes ages to start up and uses ridiculous amounts of RAM. What with being written in Java.

VS Code seems to be the best of the bunch. Starts quickly making it natural to use for even simple things. Open Source, cross-platform. Has excellent C/C++ support. Lots of plugins, that are easy to install, for almost anything you want to do.

Hey, look at me, I'm praising an MS product!

Dave Hein · 2018-05-14 17:47

David Betz wrote: »

Dave Hein wrote: »

The Propeller Tool hasn't been updated for 7 years, so I doubt that it will support P2 Spin. SimpleIDE, and then later, PropellerIDE were developed to support the P1. The PropellerIDE may be updated to support P2 Spin, or maybe a whole new IDE will be developed.

It is possible to run P1 Spin on the P2. I got that working a few years ago for P2-Hot. So in that way it would be possible to use the 7-year old version of the Propeller Tool to write and run Spin programs on the P2.

How would you load the programs? Does your P1 emulator for P2 contain a P1-compatible loader?

David,
The answers to your questions are:
- Include a P1-compatible loader with the P1 emulator.
and
- Not yet.

Of course, it would be better to just use fastspin and run the code without an emulator.

David Betz · 2018-05-14 17:55

Dave Hein wrote: »

David Betz wrote: »

Dave Hein wrote: »

The Propeller Tool hasn't been updated for 7 years, so I doubt that it will support P2 Spin. SimpleIDE, and then later, PropellerIDE were developed to support the P1. The PropellerIDE may be updated to support P2 Spin, or maybe a whole new IDE will be developed.

It is possible to run P1 Spin on the P2. I got that working a few years ago for P2-Hot. So in that way it would be possible to use the 7-year old version of the Propeller Tool to write and run Spin programs on the P2.

How would you load the programs? Does your P1 emulator for P2 contain a P1-compatible loader?

David,
The answers to your questions are:
- Include a P1-compatible loader with the P1 emulator.
and
- Not yet.

Of course, it would be better to just use fastspin and run the code without an emulator.

I figured you'd need to emulate the P1 loader. However, I agree that using fastspin or Chips spin2 when it becomes available would make more sense. Still, your P1 emulator is pretty cool!

jmg · 2018-05-14 19:35

Dave Hein wrote: »

It is possible to run P1 Spin on the P2. I got that working a few years ago for P2-Hot. So in that way it would be possible to use the 7-year old version of the Propeller Tool to write and run Spin programs on the P2.

? but P2 has very different timers, and no video, so any P1 code using that hardware, will not run ?

Roy Eltham · 2018-05-14 21:09

jmg,
P2 does NTSC and VGA video just fine. It's just done differently, but still assisted by instructions so not pure bit bang.

Also, I assume he's talking about Spin and not PASM, and probably isn't able to do the Spin stuff that deals with the timers (or only handles some cases?).

Dave Hein · 2018-05-15 00:23

Yes, the Spin1 emulator only handles the P1 Spin bytecodes. It doesn't execute P1 PASM code. The INA, INB, OUTA, OUTB, DIRA and DIRB registers are handled by re-mapping from the P1 I/O register addresses to the P2 addresses. The address for the P1 CNT register is detected, and reads are done using GETCT. All other special P1 registers are not supported.

There are probably some P1 features that could be supported using SmartPins, but I haven't done anything with SmartPins yet. The worked on the P1 Spin emulator was done a few years ago with P2-Hot. It executed the more frequently used bytecodes from cog memory, and the less frequently used bytecodes were implemented in hubexec. With the addition of the XBYTE feature, the P1 bytecode interpreter should be very efficient, and should fit entirely in cog memory.

potatohead · 2018-05-16 03:38

Re: video

Well, most of the code out there either expects a static tile / bitmap, with optional VBLANK flag, or a video driver that is triggering things by scanline.

The P1 video engine will not work, and will need a rewrite. But, other parts may not need much, if they are SPIN, or if PASM, a lighter adjustment. A good example might be seen in Chips graphics demo. The video COG needs to be done P2 style. We could just do the same tiles and color mapping.

The graphics cog needs P1 to P2 PASM translation.

The Spin program should work almost completely unchanged.

So long as the sweep frequencies are the same, a whole lot can be made to work. Replace the video COG, and then evaluate what the others are doing and how they are triggered, communicated with.

Some of the more software video drivers may not be optimal done on a P2. However, many of those were about cloning old display means and methods, or about doing more than a P1 was designed for. The P2 basics are so much more capable as to render most of that unnecessary.

Probably easier to closely emulate the screen map, and call it good, just using palettes to match colors and such up, if needed.

Again, with graphics demo, maybe adding to it makes sense. Do a linear 16 color display, maybe 8 bit color, and the graphics COG gets much simpler. The SPIN program would work with some minor league changes.

ersmith · 2018-05-16 06:15

@potatohead : I agree completely -- a lot of the Spin framework around drivers should be portable pretty much as-is, once the PASM has been translated to P2 code. And as you say, the P2 is so much more capable than P1 that it may even be feasible to write the low level drivers in Spin, particularly since with the extra RAM the size overhead of compiling to PASM with fastspin is not so onerous.

Eric

potatohead · 2018-05-16 06:26

may even be feasible to write the low level drivers in Spin,

Yes. I need to try that. Agreed. Nice work here, BTW.

Really appreciate how you did inline PASM. Thank you.

ersmith · 2018-05-16 06:28

I've updated the top post with a new version. That new version is also available in GUI form at the spin2gui GitHub repo. The official release is WIndows only, but spin2gui is written in Tcl/Tk so should be trivial to get running on other platforms (I've run it on Linux myself).

The GUI is very basic, but it does make it easy to do an edit/compile/run of a Spin program for P2. If you get a chance please try it out and let me know what you think.

Eric

potatohead · 2018-05-16 06:36

Will do.

ersmith · 2018-05-16 06:38

potatohead wrote: »

may even be feasible to write the low level drivers in Spin,

Yes. I need to try that. Agreed. Nice work here, BTW.

Really appreciate how you did inline PASM. Thank you.

Thanks! The inline PASM still needs some work (it's got limitations that are a bit of a pain) but it does make some things easy. For example, I was curious about the speed of the Cordic multiply versus using the mul instruction to synthesize one. It turns out Cordic is easier, but mul is faster. Cordic takes 66 cycles, whereas mul takes 44 for the full 32x32 -> 64 bit multiply (and if you don't need the upper 32 bits it could be even faster).

Here's my test code. It also illustrates the new multiple assignment feature supported by fastspin (the multiply functions return 3 values, the lo 32 bits, hi 32 bits, and time taken).

''
'' program to test multiply
''
OBJ
  ser: "SimpleSerial"

PUB main | a,b
  ser.start(115_200)
  ser.str(string("Multiply test...", 13, 10))
  try(1, 1)
  try($7fffffff, $ffffffff)
  try($ffff, $ffff)

PUB try(a, b) | t, lo, hi
  (lo,hi,t) := cordicmul(a, b)
  report(string("cordic"), lo, hi, t)
  (lo,hi,t) := hwmul(a, b)
  report(string("mul16"), lo, hi, t)

PUB report(msg, x, y, t)
  ser.str(msg)
  ser.str(string(" gives "))
  ser.hex(x, 8)
  ser.tx(",")
  ser.hex(y, 8)
  ser.str(string( " in "))
  ser.dec(t)
  ser.str(string(" cycles.", 13, 10))

PUB cordicmul(a, b) : lo, hi, t | m1, m2
  t := CNT
  asm
    qmul a, b
    getqx lo
    getqy hi
  endasm
  t := CNT - t

PUB hwmul(a, b) : lo, hi, t | ahi, bhi
  t := CNT
  ahi := a>>16
  bhi := b>>16
  asm
    mov lo, a
    mov hi, ahi
    mul lo, b
    mul hi, bhi
    mul ahi, b
    mul bhi, a
    mov a, ahi
    shl a, #16
    shr ahi, #16
    mov b, bhi
    shl b, #16
    shr bhi, #16
    add lo, a wc
    addx hi, ahi
    add lo, b wc
    addx hi, bhi
  endasm
  t := CNT - t

potatohead · 2018-05-16 06:49

Great start! Honestly, what is there right now is very useful, and definitely appropriate for helping SPIN out with hardware specifics.

Thanks again. I'll be giving this a go.

ersmith · 2018-05-16 23:05

Funny thing about the multiply timings -- I went to some effort to port the multiply routines to COG memory (basically I got fastspin -w to work properly with multiple return values; I'll be posting the updated fastspin soon) only to find out that the COG execution times and hubexec execution times are the same. I shouldn't have been surprised, I guess -- in both cases we're just running straightline code with no branches and no HUB accesses, so hubexec gets to run at full speed. It's something to keep in mind, though -- for simple functions hubexec is just as fast as COG or LUT exec!

msrobots · 2018-05-17 06:22

I really like the syntax for multiple return values.

cool,

Mike

ersmith · 2018-05-17 19:55

I've updated the top post again. The multiple assignment syntax has been loosened up so that:

  a,b := b,a

works (no parentheses required). I also fixed a really stupid bug where if there was no Spin code the DAT section would be optimized away, which meant that compiling Chip's .spin2 files didn't work right with fastspin. (I had been testing PNut compatibility, but I was testing it with the -c flag which explicitly asked for just the DAT section, rather than doing a regular compile). At this point I think fastspin should work the same as PNut when there's just a DAT.

I'll also mention https://github.com/totalspectrum/spin2gui/releases, a simple GUI for doing Spin programming on the P2. It's not fancy (not up to Propeller Tool level) but it does make compiling and running simple Spin programs very straightforward. It uses fastspin to compile Spin and Dave Hein's very nice loadp2 to load the binary files into the P2. If you add David Betz's propeller-load you could also use spin2gui for P1 programming, but I haven't really focused on that.

jmg · 2018-05-17 20:27

ersmith wrote: »

...
I'll also mention https://github.com/totalspectrum/spin2gui/releases, a simple GUI for doing Spin programming on the P2. It's not fancy (not up to Propeller Tool level) but it does make compiling and running simple Spin programs very straightforward. It uses fastspin to compile Spin and Dave Hein's very nice loadp2 to load the binary files into the P2. If you add David Betz's propeller-load you could also use spin2gui for P1 programming, but I haven't really focused on that.

Sounds good

If it is easy to add David Betz's propeller-load , sounds like that would make for a nice dual-chip design flow, which is going to help ramp P2 usage.
Someone can start now using P1, and know the environment will not change. They can check compile on P2 anytime they like.

Probably been asked before, but does your Spin support conditional compile, so those areas where P1 <> P2 can be 'user managed' ?

ersmith · 2018-05-17 21:09

jmg wrote: »

If it is easy to add David Betz's propeller-load , sounds like that would make for a nice dual-chip design flow, which is going to help ramp P2 usage.
Someone can start now using P1, and know the environment will not change. They can check compile on P2 anytime they like.

Well, the P2 hardware is pretty different from the P1. For simple Spin only programs it doesn't matter, but once you start using PASM it will matter a lot.

Probably been asked before, but does your Spin support conditional compile, so those areas where P1 <> P2 can be 'user managed' ?

Yes, absolutely. For example the SimpleSerial.spin file from the spin2gui library works on both P1 and P2, and it looks like:

''
'' serial port definitions
''
'' this is for a very simple serial port
'' (transmit only, and only on the default pin for now)
''
#ifdef __P2__
#define OUT OUTB
#define DIR DIRB
#else
#define OUT OUTA
#define DIR DIRA
#endif

CON
  txpin = 30
  
VAR
  long bitcycles
   
PUB start(baudrate)
  bitcycles := clkfreq / baudrate
  return 1

PUB tx(c) | val, nextcnt
  OUT[txpin] := 1
  DIR[txpin] := 1

  val := (c | 256) << 1
  nextcnt := CNT
  repeat 10
     waitcnt(nextcnt += bitcycles)
     OUT[txpin] := val
     val >>= 1

PUB str(s) | c
  REPEAT WHILE ((c := byte[s++]) <> 0)
    tx(c)

Note that this isn't a very good way to write a serial port for P2 (you'd really want to use the smartpins) but it is quick and dirty and easy to make P1 compatible.

ersmith · 2018-05-19 02:12

I've fixed a number of bugs that showed up in compiling ROM_137PBJ.spin2, so there's a new version 3.8.1. Please get it from https://github.com/totalspectrum/spin2cpp/releases This version also adds a -l flag for producing a listing file, as suggested by @jmg. To create a listing from foo.spin2, do:

  fastspin -2 -l foo.spin2

This produces a file foo.lst which looks like:


00000                 | 
00000                 | dat
00000                 | 		orgh	0
00000                 | '
00000                 | ' Launch cogs 15..0 with blink program.
00000                 | ' Cogs that don't exist won't blink.
00000                 | '
00000 000             | 		org
00000 000             | 
00000 000 
00000 000 0C04E4FC    | .loop		coginit	cognum,#@blink	'last iteration relaunches cog 0
00004 001 FE057CFB    | 		djnf	cognum,#.loop
00008 002             | 
00008 002 
00008 002 0F000000    | cognum		long	15
0000c 003             | '
0000c 003             | ' blink
0000c 003             | '
0000c 000             | 		org
0000c 000             | 
0000c 000 
0000c 000 010C60FD    | blink		cogid	x		'which cog am I, 0..15?
00010 001 200C04F1    | 		add	x,#32		'add 32 to get pin 32..47
00014 002 5F0C60FD    | 		drvnot	x		'output and flip that pin
00018 003 100C64F0    | 		shl	x,#16		'shift up to make it big
0001c 004 1F0C60FD    | 		waitx	x		'wait that many clocks
00020 005 E8FF9FFD    | 		jmp	#blink		'do it again
00024 006             | 
00024 006 
00024 006             | x		res	1

Only the DAT section appears in the listing, for now, so it's mainly useful for tracking down issues with assembly code. You can get a listing for the compiled Spin code, but you'll have to run fastspin twice:

  fastspin -2 myspinprog.spin2
  fastspin -2 -l myspinprog.p2asm

(fastspin always produces a .p2asm file with the compiled Spin code, and you can run fastspin over that to get a listing)

Dave Hein · 2018-05-19 02:28

ROM_137PBJ.spin2 is a good test case for an assembler. I also found several bugs in p2asm when trying to assemble that file. The unaligned data and instructions uncovered some bugs, and also using _ret_ with an instruction that used ## to include a 32-bit value. I'm impressed that PNut correctly handled the interesting conditions contained in ROM_137PBJ.spin2.

Cluso99 · 2018-05-19 03:55

Dave Hein wrote: »

ROM_137PBJ.spin2 is a good test case for an assembler. I also found several bugs in p2asm when trying to assemble that file. The unaligned data and instructions uncovered some bugs, and also using _ret_ with an instruction that used ## to include a 32-bit value. I'm impressed that PNut correctly handled the interesting conditions contained in ROM_137PBJ.spin2.

ooh! never thought about those _ret_ issues. Pleased they worked.

ersmith · 2018-05-19 11:58

Dave Hein wrote: »

ROM_137PBJ.spin2 is a good test case for an assembler. I also found several bugs in p2asm when trying to assemble that file. The unaligned data and instructions uncovered some bugs, and also using _ret_ with an instruction that used ## to include a 32-bit value. I'm impressed that PNut correctly handled the interesting conditions contained in ROM_137PBJ.spin2.

I was also bitten by the _ret_ with ##... that's a tricky one

It is useful that we have 3 mostly independent assemblers now -- hopefully we won't all make the same mistakes!

cgracey · 2018-05-19 21:40

ersmith wrote: »

Dave Hein wrote: »

ROM_137PBJ.spin2 is a good test case for an assembler. I also found several bugs in p2asm when trying to assemble that file. The unaligned data and instructions uncovered some bugs, and also using _ret_ with an instruction that used ## to include a 32-bit value. I'm impressed that PNut correctly handled the interesting conditions contained in ROM_137PBJ.spin2.

I was also bitten by the _ret_ with ##... that's a tricky one

It is useful that we have 3 mostly independent assemblers now -- hopefully we won't all make the same mistakes!

That IS good. I see that OnSemi always validates its work from at least two different competitive tools to serve as a reality-check before it fabricates anything.

ersmith · 2018-05-21 13:55

There are new releases of fastspin and spin2gui. There are a bunch of bugfixes, and some improvments:

(a) You can now pass the multiple return values from a function as parameters to another function:

' function to double a 64 bit number
PUB dbl64(ahi, alo): bhi, blo
  bhi := ahi
  blo := alo
  asm
    add blo, blo wc
    addx bhi, bhi
  endasm

' function to quadruple a 64 bit number
PUB quad64(ahi, alo)
  return dbl64( dbl64(ahi, alo) )

(b) Parameters may be given default values:

PUB incr(n = 1)
  a += n

calling just plain "incr" is the same as "incr(1)". The default values must be constant (the prevents name conflicts when expanding them... I'm looking into ways to relax this).

(c) Some more Spin2 operators are supported, such as "a\b" for "use a, then set a to b".

ersmith · 2018-05-28 21:42

I've published another release of spin2gui and fastspin. This has a number of bug releases for P2 (including getting coginit of Spin methods to work properly on P2) and a changed listing file format which works with the generated Spin code.

spin2gui now supports opening a listing file to view alongside the original source, if you're curious about how the compiler is converting Spin to PASM2. You can also control the optimization levels from the GUI. For example, consider the simple Spin program:

CON
  ASIZE = 10
  
VAR
  long a[ASIZE]
  
PUB demo | i
  repeat i from 0 to ASIZE-1
    a[i] := 100*i

With no optimizations this produces:

00408                 | _Demo
00408                 | '   repeat i from 0 to ASIZE-1
00408     004804F6    | 	mov	_var_01, #0
0040c     
0040c                 | L__0001
0040c                 | '     a[i] := 100*i
0040c     244200F6    | 	mov	_tmp001_, _var_01
00410     024264F0    | 	shl	_tmp001_, #2
00414     1E4400F6    | 	mov	_tmp002_, objptr
00418     224200F1    | 	add	_tmp001_, _tmp002_
0041c     245400F6    | 	mov	muldiva_, _var_01
00420     645604F6    | 	mov	muldivb_, #100
00424     0800C0FD    | 	calla	#multiply_
00428     2A4600F6    | 	mov	_tmp003_, muldiva_
0042c     214660FC    | 	wrlong	_tmp003_, _tmp001_
00430     244200F6    | 	mov	_tmp001_, _var_01
00434     244400F6    | 	mov	_tmp002_, _var_01
00438     014404F1    | 	add	_tmp002_, #1
0043c     224800F6    | 	mov	_var_01, _tmp002_
00440     0A485CF2    | 	cmps	_var_01, #10 wcz
00444     C4FF9FCD    |  if_b	jmp	#@L__0001
00448     
00448                 | _Demo_ret
00448     2E0064FD    | 	reta

With full optimizations this comes down to:

00408                 | _Demo
00408     001804F6    | 	mov	_var_02, #0
0040c     081A00F6    | 	mov	_var_03, objptr
00410     0A1604F6    | 	mov	_var_01, #10
00414     
00414                 | L__0003
00414                 | '     a[i] := 100*i
00414     0D1860FC    | 	wrlong	_var_02, _var_03
00418     641804F1    | 	add	_var_02, #100
0041c     041A04F1    | 	add	_var_03, #4
00420     FC176CFB    | 	djnz	_var_01, #@L__0003
00424     
00424                 | _Demo_ret
00424     2E0064FD    | 	reta

ersmith · 2018-05-30 15:37

spin2gui now has a somewhat improved interface (multiple tabs so that you can keep multiple Spin files open at the same time) and an updated fastspin with some bug fixes.

The compiler has been upgraded to use the "rep" instruction for some loops, so the example above now compiles to:

00408                 | _Demo
00408     002004F6    | 	mov	_var_02, #0
0040c     0C2200F6    | 	mov	_var_03, objptr
00410     0A1E04F6    | 	mov	_var_01, #10
00414     0A06DCFC    | 	rep	@L__0006, #10
00418     
00418                 | L__0003
00418                 | '     a[i] := 100*i
00418     112060FC    | 	wrlong	_var_02, _var_03
0041c     642004F1    | 	add	_var_02, #100
00420     042204F1    | 	add	_var_03, #4
00424     
00424                 | L__0006
00424     
00424                 | _Demo_ret
00424     2E0064FD    | 	reta

potatohead · 2018-05-30 16:02

That is nice work!

I'm eager to give it a proper go.

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

Comments