FlexProp: a complete programming system for P2 (and P1)

iseries · 2020-12-07 20:18

I'm so happy, I got my Rev A P2 board to work with flexprop. I was thinking it was banned to the paper weight pile.

There is no real documentation that says what will not work on a Rev A chip versus a Rev B/C.

I knew it has something to do with SETQ not working correctly. So after a lot of hunting and pecking I got it at least to run some code.

It's not 100% but the change was small and seems to work for the most part. I will have to try some other programs and see how it works.

Still have issues loading code onto the boards as it fails quite often with checksum errors and timeouts. I think it's because my computer runs to fast or something. Tried to rebuild loadp2 to see if I could add some pauses or something. It kind off errors out with not being able to find fastspin.

I guess I can continue to use my CSharp loader as it works everytime.

Mike

msrobots · 2020-12-07 21:41

There is a change in the streamer modes, so rev A codes these in the instruction set different to revB/C. And there is also a change on PTRsomething to allow different ranges for relative addressing.

There was a cmdline switch in fastspin, but @eric said he will not support rev A any longer since there are just about 120 out in the wild. Understandable and not really a problem since the real P2 is out now.

rev B to C is just a change for the DAC ADC circuit, I think no changes to the instruction set.

Your loading issue is weird, I have 3 revA and some revB and had never any loading problems. But I do supply power to both USB on the eval's via a powered USB Hub. Maybe give that a try?

Mike

Cluso99 · 2020-12-08 00:09

There is a thread that discusses the Rev A to B changes - sorry don;t have a link.

The Rev B to C change is only a die track cut that only affects the ADC noise, so for almost everyone you can safely ignore it.

ersmith · 2020-12-08 03:04

iseries wrote: »

I'm so happy, I got my Rev A P2 board to work with flexprop. I was thinking it was banned to the paper weight pile.

If you're able to get it working reliably I'd be happy to accept a pull request to restore Rev A support; it's just not something I can support myself.

Still have issues loading code onto the boards as it fails quite often with checksum errors and timeouts. I think it's because my computer runs to fast or something. Tried to rebuild loadp2 to see if I could add some pauses or something. It kind off errors out with not being able to find fastspin.

I've updated the loadp2 Makefile to use "flexspin" instead of "fastspin".

Does "loadp2 -single" work for you? It uses the slow single-stage loader, which should be fine as long as you don't need any special loader features (like loading at a different starting address).

iseries · 2020-12-08 12:27

The Pushregs and Popregs which is used all over the place is the main problem for Rev A chips. Rev A does not increment the pointer register with multi-read/multi-write set. It only increments it by 4 instead of by the number of moves.

So in "outasm.c" the following lines don't work on Rev A chips.

    "    setq #2 ' push 3 registers starting at COUNT_\n"
    "    wrlong COUNT_, ptra++\n"
    "    mov    fp, ptra\n"

Changing it to this fixes the issue.

    "    setq #2 ' push 3 registers starting at COUNT_\n"
    "    wrlong COUNT_, ptra\n"
    "    add ptra, #12\n"
    "    mov    fp, ptra\n"

since it's a small change it could be changed for all versions of the chip.

I'm sure there are other issues but at least most of the code now works.

Mike

ersmith · 2020-12-14 22:48

I've uploaded a new release (5.0.2) to https://github.com/totalspectrum/flexprop/releases. This version has a number of compiler bug fixes:

Version 5.0.2
- Added "noinline" attribute
- Added a warning for some non-terminating loops
- Fixed a compiler warning in ff.cc
- Fixed a bug with returning structures in C
- Fixed an optimizer bug with trying to replace ptra++
- Fixed an overly aggressive loop reduction optimization
- Made BASIC open command use "throw" to report errors

Unofficially, it should also work on Rev. A boards again, but that's not an officially supported configuration and I'd urge everyone to upgrade to a board with a Rev. B or Rev. C chip on it.

ersmith · 2020-12-16 11:34

Does anyone care about separate spin2cpp and flexspin binaries? flexspin binaries may be obtained from the flexprop.zip package, but spin2cpp isn't in there. Should it be?

R Baggett · 2020-12-17 00:04

About that...
I've been using FlexProp with great success.. But haven't had the time to work out how to get it running in Linux. Where does one obtain binary versions of flexspin and loadp2 for Linux?

Greg LaPolla · 2020-12-17 04:25

The source is available on github.

JRoark · 2020-12-17 11:39

@ersmith I think its fine the way it is.

ersmith · 2020-12-17 18:39

R Baggett wrote: »

About that...
I've been using FlexProp with great success.. But haven't had the time to work out how to get it running in Linux. Where does one obtain binary versions of flexspin and loadp2 for Linux?

There are no binary versions of these for Linux. If you check out the source and do "make install" it'll build and install in your $HOME/flexprop directory. You'll need Tcl/Tk, bison, and the usual build tools (gcc, make, etc.)

Wuerfel_21 · 2020-12-17 19:35

ersmith wrote: »

R Baggett wrote: »

About that...
I've been using FlexProp with great success.. But haven't had the time to work out how to get it running in Linux. Where does one obtain binary versions of flexspin and loadp2 for Linux?

There are no binary versions of these for Linux. If you check out the source and do "make install" it'll build and install in your $HOME/flexprop directory. You'll need Tcl/Tk, bison, and the usual build tools (gcc, make, etc.)

Yep, it's really easy. Unlike most C programs I've seen, attempting to compile flexspin (IDK about the GUI) does not suck all the lifeforce out of your soul (this isn't even a joke, that is legit the feeling CMake invokes in me), it just works.

R Baggett · 2020-12-17 20:01

Wuerfel_21 wrote: »

ersmith wrote: »

R Baggett wrote: »

About that...
I've been using FlexProp with great success.. But haven't had the time to work out how to get it running in Linux. Where does one obtain binary versions of flexspin and loadp2 for Linux?

There are no binary versions of these for Linux. If you check out the source and do "make install" it'll build and install in your $HOME/flexprop directory. You'll need Tcl/Tk, bison, and the usual build tools (gcc, make, etc.)

Yep, it's really easy. Unlike most C programs I've seen, attempting to compile flexspin (IDK about the GUI) does not suck all the lifeforce out of your soul (this isn't even a joke, that is legit the feeling CMake invokes in me), it just works.

Its as easy as finding 'flexspin' source code... its right... where?

Seriously, you underestimate the depth of my ignorance in this matter. Life force is about 70% at this point...

Wuerfel_21 · 2020-12-17 20:27

Again, IDK about the GUI, but for the flexspin compiler you just

git clone https://github.com/totalspectrum/spin2cpp.git
cd spin2cpp
make

and it puts the binaries into the build directory.

Probably similar for loadp2.

dgately · 2020-12-17 21:12

Wuerfel_21 wrote: »
Again, IDK about the GUI, but for the flexspin compiler you just
git clone https://github.com/totalspectrum/spin2cpp.git

All the source is available in the github flexprop repository...
https://github.com/totalspectrum/flexprop
As ersmith stated, you can easily build the whole project (i.e. GUI, all the binaries for compiling & loading) on WIN10, Linux, macOS...

I suggest (on Linux or macOS) something like:

$ mkdir source
$ cd source
$ git clone --recursive https://github.com/totalspectrum/flexprop.git
$ cd flexprop
$ make clean     # only if you had previously run make
$ make install

Note: '--recursive' allows git to include all of the projects sub-modules (PropLoader, loadp2, & spin2cpp).
If successful, you'll find a new flexprop directory in your home directory, where you'll find the flexprop.tcl GUI, tools & sample code:

$ cd ~/flexprop
$ ls
License.txt  bin          doc          include      src
README.md    board        flexprop.tcl samples
$ ls bin
fastspin        flexcc          flexspin        loadp2          mac_terminal.sh proploader

Use flexprop.tcl if you like or the tools in the bin directory from the Linux/macOS command line. It's "all there".

Note: flexprop may have some requirements for successful building... If make install does not work, you can probably get help here in the forum...

dgately

Rayman · 2020-12-18 20:20

Plan9 question...
Trying to load up file from PC using Plan9.

This "fread" function doesn't work in USE_HOST mode. Worked in USE_SD mode.
I replaced it with fgetc to make it work.

Also, why is Plan9 so slow? Would increasing baud somewhere make it faster?

//bytesRead = fread(imbuff,1,blocklen,pfile);							/* copy the data into pbuff and then transfter it to command buffer */
        bytesRead = 0;
        for (i = 0; i < blocklen; i++)
        {
            imbuff[i] = fgetc(pfile);
            bytesRead++;
        }

tritonium · 2020-12-18 21:02

@ersmith

Beginning to do useful things with flexbasic.
I noticed that your assembler is clever enough to use fcache to speed things up.
If I understand it correctly your program looks for loops it can put into fcache- but not all loops- there must be rules determining what is allowed. Perhaps loops without calls?
What about for/next loops?

An example

sub spiout(dval as ulong,bits as ubyte) 'do/loop twice as fast
	dim sp1 as ulong
	sp1=1<<(bits-1)	'bit mask msbit first
	do
		if (dval and sp1) = 0 then
	  	        output(dt) = 0
		else
			output(dt)=1
		endif
		sp1=sp1>>1		'put it here to allow a bit of data setup time
		output(lck)=1		'strobe data in
		output(lck)=0
	loop until sp1=0	
end sub

01470                 | _spiout
01470     07 1E 45 F7 | 	zerox	arg02, #7
01474     01 1E 85 F1 | 	sub	arg02, #1
01478     8F 12 C1 F9 | 	decod	_var01, arg02
0147c     9C EB 9F FE | 	loc	pa,	#(@LR__0022-@LR__0021)
01480     33 00 A0 FD | 	call	#FCACHE_LOAD_
01484                 | LR__0021
01484     89 1C C9 F7 | 	test	arg01, _var01 wz
01488     0A F8 07 A4 |  if_e	bitl	outa, #10
0148c     0A F8 27 54 |  if_ne	bith	outa, #10
01490     01 12 4D F0 | 	shr	_var01, #1 wz
01494     10 F8 47 F5 | 	or	outa, #16
01498     10 F8 27 F5 | 	andn	outa, #16
0149c     E4 FF 9F 5D |  if_ne	jmp	#LR__0021
014a0                 | LR__0022
014a0                 | _spiout_ret
014a0     2D 00 64 FD | 	ret

It looks like its putting the 'do loop' code between LR__0021 and LR__0022 into the cache for fast execution.
Makes heck of a difference!

Can I write my flexbasic code in a way that I can 'encourage' this behaviour?

Dave

ersmith · 2020-12-18 21:11

@Rayman: There's a mistake in the 9p calculation for breaking large reads/writes up. So for now you'll have to stick to reading or writing <= 1024 bytes at a time. (The bug is fixed in github if you want to use bigger I/O)

The 9p file I/O goes over the serial line, so yes, increasing the baud rate will improve the performance. 2 megabaud seems to work fine.

ersmith · 2020-12-18 21:18

@tritonium : Yes, for loops will be placed into the cache. The loop cannot contain any branches to outside the loop (including calls). So no subroutine calls or GOTO statements inside the loop. Also, the loop has to fit: on P2 that's not too much of a restriction since 1K of LUT is available, but on P1 it's an issue.

Also, you can further optimize your SPI code by testing for bit 31 or bit 0 inside your loop -- that way the compiler can use the carry bit from the shift. So for MSB first output either reverse the bits before the loop and shift right, or else shift the data up before the loop so that the next bit you need to shift out is at bit 31.

tritonium · 2020-12-18 21:27

@ersmith

That's brilliant!
How on earth does your compiler manage that!
I have many loops calling spiout, I might try pasting the spiout code into those loops and see the difference.

Thanks

Dave

ersmith · 2020-12-18 22:49

tritonium wrote: »

That's brilliant!
How on earth does your compiler manage that!

I wrote it to take advantage of the Propeller's features, and efficient bit-banging was a high priority

.

I have many loops calling spiout, I might try pasting the spiout code into those loops and see the difference.

You could try that, although honestly I suspect most of the time is spent in the spiout loop itself, so as long as that loop gets fcached then it's not as big a deal if the outer ones don't.

Another thing you can do with small, frequently accesssed functions (like spiout) is to explicitly place them in LUT or COG memory. Doing this too much will overflow, but for a few key functions it may be worth while.

Rayman · 2020-12-19 01:37

How do I change the baud? To change baud:
I changed the loadp2 command line to have the new baud.
Added this:

// baud rate for serial
#ifndef _BAUD
#define _BAUD 2000000
#endif

and then added this in the startup:

_setbaud(_BAUD);

But, it doesn't appear to work... Actually, it does work...

JRoark · 2020-12-19 03:04

@ersmith

ersmith wrote: »

Also, the loop has to fit: on P2 that's not too much of a restriction since 1K of LUT is available, but on P1 thats an issue...

This begs a follow-up question. Assume for the moment I have just under 1k of loop code that gets placed into the LUT.... and a half dozen other similar loops that are each just under 1k in size. Does the compiler swap these in and out at runtime as needed, or is this fixed? (ie, the allocation of that 1k is static, and whatever winds-up there first stays there with no swapping?)

dgately · 2020-12-19 03:46

flexprop .spin2 bug or poor programming on my part (using flexprop 5.0.3 Beta on macOS)

Have I missed something very simple in this code? The two arrays should print out with different results (I think). The X array of longs gets overwritten with the Y array, in this example:

CON 
    _clkfreq=200_000_000 
	rx_pin = 63
	tx_pin = 62
	baud = 230_400

VAR
  long X[10]
  long Y[10]

OBJ
  ser: "spin/SmartSerial"

PUB Main()| i 
	ser.start(rx_pin, tx_pin, 0, baud)
	ser.printf("Broken array problem\n\n")
	repeat i from 0 to 9
		long[X][i] := i				
	repeat i from 0 to 9
		long[Y][i] := i + 103

	repeat i from 0 to 9
		ser.printf("%d     x: %d    ", i, long[X][i])	
		ser.printf("%d     y: %d     \n", i, long[Y][i])				
	repeat

dgately

JonnyMac · 2020-12-19 05:31

Try this syntax:

  long[@X][i] := i

The long keyword is expecting an address.

JonnyMac · 2020-12-19 05:50

Well, there may be a problem in FlexProp. I couldn't get it to work there, so I moved it over to Propeller Tool. It's okay there with the syntax I suggested.

JonnyMac · 2020-12-19 05:55

I must have done something wrong. This works. Is there a reason you don't simplify to named arrays, e.g.

PUB main() | i

  ser.tstart(baud)

  ser.fstr0(string("Broken array problem\n\n"))

  repeat i from 0 to 9
    X[i] := i
    Y[i] := i + 103

  repeat i from 0 to 9
    ser.fstr3(string("%d  x: %d  y: %d\n"), i, X[i], Y[i])

  repeat
    waitct(0)

ersmith · 2020-12-19 14:43

JRoark wrote: »

@ersmith

ersmith wrote: »

Also, the loop has to fit: on P2 that's not too much of a restriction since 1K of LUT is available, but on P1 thats an issue...

This begs a follow-up question. Assume for the moment I have just under 1k of loop code that gets placed into the LUT.... and a half dozen other similar loops that are each just under 1k in size. Does the compiler swap these in and out at runtime as needed, or is this fixed? (ie, the allocation of that 1k is static, and whatever winds-up there first stays there with no swapping?)

The fcache code is swapped out as needed. Only the most recently used loop is kept in there (so there's only one loop at a time in the cache).

dgately · 2020-12-19 14:56

I've since gotten this to work.

Is there a reason you don't simplify to named arrays, e.g.

That code was just an example, that I quickly put together to show the problem. The real code has a single named array (named 'buffer' in both C and Spin2) that I send to C code (thanks to flexprop's ability for multi-language projects) that builds the array up with bezier curve coordinates. I need the C code because I couldn't write a bezier function with integers in Spin2

I think the problem was within the C function... I was using the input array pointer as an indexed array. When I changed that code to use pointer arithmetic, the returned array was correct.

C code:
*buffer++ = x ;
*buffer++ = y;

I now retrieve the coordinates in Spin and use them like this:
SetPixel(long[buffer][i], long[buffer][i+1],color)

ersmith · 2020-12-19 15:01

dgately wrote: »

	repeat i from 0 to 9
		long[X][i] := i				
	repeat i from 0 to 9
		long[Y][i] := i + 103

	repeat i from 0 to 9
		ser.printf("%d     x: %d    ", i, long[X][i])	
		ser.printf("%d     y: %d     \n", i, long[Y][i])				
	repeat

dgately

In Spin if "X" is an array then just plain "X" in an expression is equivalent to "X[0]". (That's *very* different from C, of course!). So

    long[X][i] := i

is the same as

    long[X[0]][i] := i

which is probably not what you intended. As Jon mentioned, you more likely wanted

    long[@X][i] := i

FlexProp: a complete programming system for P2 (and P1)

Comments