Big Spin - is it still a pipedream?

jazzed · 2011-02-14 08:03

RossH wrote: »

I'm not really sure what Jazzed's benchmarks figures mean. I presume "C3" mean executing from SPI RAM on the C3? If so, those times for Zog and BigSpin look pretty good. But SDRAM means ... what?

Yes, C3 is executing from SPI RAM. It's easier to have everything in SPI RAM. Some day I'd like to have code in Flash, EEPROM, or SDCARD, but that's harder because of things like strings.

SDRAM means all code, data, and stack live in the SDRAM/Cache. This works with any device that supports the JCACHE interface demonstrated in BigSpin and ZOG - DracBlade is not yet verified.

RossH wrote: »

I presume having a caching SPI driver makes the big difference here - it seems to double the executon speed! I'll need to sharpen my pencils and get back to work on my own caching Catalina SPI driver.

If you use the JCACHE interface, you automatically get SDRAM for Catalina. If you don't that's fine, but whatever you do hopefully will allow access to the full memory space. MicroPropPC will have up to 64MB SDRAM (excess memory could be used as a RAM-DISK).

Heater. wrote: »

Seems to me that both BigSpin and Zog need an option to get stack into HUB at least.

I'll look into keeping the stack in HUB. It shouldn't be too hard, and I know that will help performance in a big way. The problem I have with using too much HUB is in crowding out user things like video buffer space.

I do have some experimental drivers where SDRAM is used for storing/fetching video as well as code, but I don't have the chops to finish that right now.

Heater. wrote: »

Yep. this fibo thing really needs to shot in the head. It is totally unrepresentative of what software in a typical mcu-application does. It is really only a good test of subroutine calling efficiency.

Yes, but it definitely offers a quick perspective. I did qualify use of FIBO with a disclaimer twice.

I do look forward to trying a more interesting application. FFT with SDRAM would be a wonderful test application. Is there a SPIN version of your FFT? Got a link?

Dave Hein wrote: »

I guess I need to add a cycle-accurate mode in the simulator. I currently don't simulate instruction pipe-lining or the hub access time slots.

Dave, if you want to add cycle accuracy to simulation, it could be useful. Don't let it become an unnecessary burden though. I really enjoy how nice the simulator works now.

Dave Hein · 2011-02-14 12:38

I implemented hub stalls in SpinSim, and the hello program works without changing the timing in the cache routine. It was relatively easy to add. I'll post a new version of SpinSim after I've added the extra cycles for DJNZ, TJZ and TJNZ when the jump isn't taken. I think hub access and conditional jumps are the only instructions that are more than 1 cycle. Of course, the wait instructions can take more than one cycle, but I'm not concerned about them right now.

Steve, have you tried running a larger program with LitllteBigSpin? I tried SpinSim's demo.spin, and it didn't print anything. I had to increase the size of the read buffer to 2500 bytes to read it in. However, that wasn't the problem since the hello program ran OK with that size buffer. I didn't spend any time to debug it, but I'll look into further.

Dave

Heater. · 2011-02-14 13:16

Jazzed,

Is there a SPIN version of your FFT?

Yes indeed, I have posted a few versions. I think all of them bar the last one have a working Spin version. You just have to set a #define to select the Spin or PASM version when you build it.

In the last version I moved over to using the Props trig tables which broke the Spin build. Anyway they are all here:

http://forums.parallax.com/showthread.php?128292-Heater-s-Fast-Fourier-Transform.&highlight=heater_fft

I should clean up my C version so as to enable a direct C to Spin comparison.

jazzed · 2011-02-14 14:24

Dave Hein wrote: »

Steve, have you tried running a larger program with LitllteBigSpin? I tried SpinSim's demo.spin, and it didn't print anything.

It's great you were able to add hub stalls easily.

I haven't run too many programs just yet. I did try your demo with your SimCache change, and I get a menu but I can't enter commands. Something to debug, and I'm looking into it.

The simulator demo.spin program only works if compiled with: bstc -Ograux -b demo

Having a sane shared memory strategy will be necessary for bigger programs, and at this point i'm not sure what to do with that other than enumerating spaces or using a hub allocator. Just leaving data structures in the spin code won't work, and rendezvous is not really helpful either.

@Heater, thanks for the link and instructions. Will try some things later today after grandpa time.

jazzed · 2011-02-14 19:27

Heater's Spin FFT 2.0 ran on SDRAM after I replaced the FullDuplexSerialPlus with BigSerial and set SPIN_BUTTERFLIES. For some reason the C3 version does not work; I'll look at it more later.

Normal Spin runtime: 1787ms
SDRAM BigSpin runtime: 7531ms

That's about a 4x difference between friends and somewhat consistent with FIBO results.
I'll produce a HUB stack version tomorrow.

Valentines time now. Happy day to all you lovers.

Heater. · 2011-02-15 02:09

Great. Somewhat faster that the FFT I had running on my Atari ST520 in BASIC back in the day:)

Odd that it does not run on the C3, must have uncovered some odd corner case in the memory access somewhere.

I'll look into getting the C version of the FFT out for comparison with Zog, Catalina etc.

Dr_Acula · 2011-02-17 02:00

Catalina's stack is always in Hub RAM. The data is in Hub RAM when using either the LMM memory model or the XMM SMALL memory model. The data is in XMM RAM when using the XMM LARGE model.

This brings up an issue that has arisen also on the Catalina thread and in BCX Basic as well.

Where do you put variable data when you have a program in external memory?

If you put a variable in hub ram, things run faster.

If you put a variable in external memory, you can have many more variables without running out of space.

You might even want to put a huge array in hub ram, eg a screen buffer.

But - there is only so much hub ram, and if you allocate a stack and a screen buffer, you might start running out. Or worse, get unpredictable behaviour as the stack overruns the screen buffer.

Catalina has two models, one with variables in hub and one with variables in external memory. There are advantages for both models.

Does this lead to a need to be able to let the programmer decide?

Would you have to explicitly state this for every single variable in a program? (aargh!)

Is there enough flexibility in saying something like "global variables are in XMM and local variables are in hub"?

Could you have a system where global variables are in XMM, local variables are in hub, but for arrays, the programmer can decide (eg with a special instruction, which maybe we need to define)?

Thoughts would be most appreciated.

jazzed · 2011-02-17 07:21

Dr_Acula wrote: »

Where do you put variable data when you have a program in external memory?

Three modes make sense:
1) all code/data/stack in external memory and shared memory in hub
2) code/data in external memory with stack and shared memory in hub
3) code in external memory, data/stack/shared memory in hub

Item 1 allows full usage of large memory, but will be slower.
Item 2 will be faster and excess memory can be used for SDCARD cache.
Item 3 would allow using EEPROM, FLASH, or SDCARD to hold code.

Can you please test the DracBlade version I posted?

lonesock · 2011-02-17 09:35

Regarding the automatic placement of variables in Hub RAM, I think the heuristic I would use would be something like:

* local in scope
* not an array
* size up to 4 bytes
OR
* the user specified the "register" keyword

Jonathan

Dave Hein · 2011-02-17 12:02

The locations of the variable spaces would be determined by the value of the PBASE, VBASE and DBASE pointers. Local variables live on the stack and use DBASE to point to them. VBASE points to all the VAR variables and PBASE points to the DAT variables. PBASE is also used as the base for code, so DAT variables would have to be in external memory with code. The way Spin is set up, it would be hard to seperate scalar variables from array variables. Of course, it's always possbile to add an extension to the Spin language and add an additional variable space to it.

jazzed · 2011-02-17 15:36

I'm still looking at all this, but am distracted just a little right now.

Dave, is it possible to keep shared memory data in the stub files?
I'm thinking a BigSpin program will end up using stubs and linkit
all the time. The only other alternatives are the current hack which
doesn't scale, and array/pointer notation and static addresses.

--Steve

Dave Hein · 2011-02-17 19:04

Steve, I don't understand what your asking. The stub objects are replaced by the real ones at link time. We could create a shared memory area in Hub RAM that is like a rendezvous area. An object containing constants could define the location of variables in the rendezvous area. The variables would be defined as offset from the start of the area. Is that what your asking about?

jazzed · 2011-02-17 19:25

Dave Hein wrote: »

We could create a shared memory area in Hub RAM that is like a rendezvous area. An object containing constants could define the location of variables in the rendezvous area. The variables would be defined as offset from the start of the area. Is that what your asking about?

Dave: Yes, that's what I'm talking about. I wouldn't call it rendezvous though because the usage is different from the meaning that word takes in Propeller-ese.

I would much rather have the compiler/linker define the variables and addresses if at all possible. Having to use pointers and array indices for communicating with a device is Ok and it is by definition SPIN, but it's rather unnatural. I suppose it's a lot to ask of the linker. Maybe we can look into the compiler to help with shared memory later.

Is it possible to at least automatically assign an object's shared memory base address?
Having to do all those constant definitions by hand will be tedious.

Thanks

Dave Hein · 2011-02-18 06:02

The linker could map pbase, vbase and dbase any place we want to put them. Of course, that may be more of a load function. As far a creating a new variable space, there are a couple of approaches we could use. The first one is the one I mentioned in my previous post. We'll call it the shared area. A shared.spin object would be defined as follows:

con
  SBASE = $7000
  var1 = SBASE
  var2 = var1 + 4
  var3 = var2 + 4
pub dummy

Programs would access shared variables as follows:

obj
  shr : "shared"

pub start
  long[shr#var1] := 1
  long[shr#var2] := 4
  long[shr#var3] := long[shr#var1] * long[shr#var2]

The other way to do this would be to add a SHR keyword, which would be like VAR, except there is only one instance that can be accessed by all objects. We could add an SBASE to the program header, and support for SBASE would need to be added to the interpreter as well.

This would require modifying the compiler. I've been looking at what it would take to write a Spin compiler, and it's not as hard as I thought. The approach I'm looking at is to generate a symbol table with a function pointer for each symbol. Each line is parsed into tokens, and then it's just a matter of finding the token in the symbol table and calling it's function. The magic happens in the symbol functions, where they recursively call other symbol functions. The Spin VM makes it easier because it is stack-based instead of being register based.

If we write our own compiler we can add lots of different language extensions, such as jumps and method pointers.

Dave

jazzed · 2011-02-18 09:03

I guess one file that provides the "shared" constants definition is not so bad.
I guess one could enable resources based on pre-processor commands.
Having shared variables be automatic would be much better though.

So, for now let's just say that shared.spin should be a staple in the BigSpin
diet and we should come up with a good definition.

Some guidelines for the file should be:

1) add devices in alphabetical order
2) any new device should be wrapped with #ifdef DEVICE_<type> #endif \n ' DEVICE_<type>
3) add copyright info in any new section to keep with the MIT
4) never use the propeller font so we can use diff and patch on the file

I'll make a file later and post it unless someone beats me to it.

Dr_Acula · 2011-02-18 18:28

Re jazzed post #144, I wanted to get zog working today but have come across a small problem - the Zylin toolchain does not seem to list Windows as an operating system. http://opensource.zylin.com/zpudownload.html

jazzed · 2011-02-18 19:48

Dr_Acula wrote: »

Re jazzed post #144, I wanted to get zog working today but have come across a small problem - the Zylin toolchain does not seem to list Windows as an operating system. http://opensource.zylin.com/zpudownload.html

Use the Cygwin toolchain. You don't need to install Cygwin, just download zpugcccygwin.tar.bz2 and unzip it to a directory next to the zog sources. I recommend using a short path name with no spaces to hold zog files. Send me a PM if you run into trouble.

I hope you get to try LittleBigSpin-DracBlade.zip from post #144. It does not require zog.

Dr_Acula · 2011-02-18 20:13

LittleBigSpin-Dracblade is installed. Compiled with BST. Running a terminal program at 115200. It prints "Starting..."

Checking the code now to see what it does next. BRB
next line

#ifdef DRACBLADE
    if f.startSD(@ioControl) == 0
        s.str(string(u#EOL,"SDCARD Start Error ... "))
        repeat
#endif

gives an error mounting (changed the ifdef from C3 to Dracblade). Which sd driver is this using? I see the dracblade definition has C3 in it

#ifdef DRACBLADE
    u : "userdefs"
    f : "fsrwFemto_c3"
    c : "sram_cache_dracblade"
#endif

The Dracblade does not have the C3 chip select - it is simply pins 12-15. Does this need a different driver?

jazzed · 2011-02-18 20:43

Dr_Acula wrote: »

The Dracblade does not have the C3 chip select - it is simply pins 12-15. Does this need a different driver?

I believe the fsrwFemto_c3 is a version of fsrwFemto_rr*.spin, so it should work.

Two things to check:

1) Make sure you have a fibo.bin file on the SDCARD - that's the program to run.
2) Change the userdefs SDCARD pin definitions to match your board. They appear to be wrong.

Dr_Acula · 2011-02-18 21:27

Ok, fibo.bin is on the sd card.
And the user defs will be

con '' microSD pin definitions

  spiDO     = 12
  spiClk    = 13
  spiDI     = 14
  spiCS     = 15
  spiINC    = 16 '' ?? value  maybe use 16 which is the vga display, as it will just be ignored

Dr_Acula · 2011-02-18 21:30

Here we go. How do these figures look?

Starting ...
Writing Cache ... 1024
Writing Cache ... 140
Verified 1164 of 65536 bytes.


Startup addresses: 0010 048C 0498 0020 04A4
?Starting
fibo 0 = 0 in 0 ms
fibo 1 = 1 in 0 ms
fibo 2 = 1 in 0 ms
fibo 3 = 2 in 0 ms
fibo 4 = 3 in 1 ms
fibo 5 = 5 in 2 ms
fibo 6 = 8 in 3 ms
fibo 7 = 13 in 6 ms
fibo 8 = 21 in 10 ms
fibo 9 = 34 in 17 ms
fibo 10 = 55 in 27 ms
fibo 11 = 89 in 45 ms
fibo 12 = 144 in 73 ms
fibo 13 = 233 in 118 ms
fibo 14 = 377 in 191 ms
fibo 15 = 610 in 309 ms
fibo 16 = 987 in 501 ms
fibo 17 = 1597 in 811 ms
fibo 18 = 2584 in 1312 ms
fibo 19 = 4181 in 2124 ms
fibo 20 = 6765 in 3436 ms
fibo 21 = 10946 in 5561 ms
fibo 22 = 17711 in 8998 ms

Hardware  |  Language  |  FIBO(20) time  |  FIBO 0 to 26
----------+------------+-----------------+--------------
C3        |  SPIN      |  547ms          |  30s
SDRAM     |  SPIN      |  547ms          |  30s
C3        |  BigSPIN   |  3601ms         |  2m53s
SDRAM     |  BigSPIN   |  2858ms         |  2m19s
C3        |  ZOG C     |  3644ms         |  3m18s
SDRAM     |  ZOG C     |  2773ms         |  2m18s

seems on par with the C3, maybe just fractionally faster?

Re

I'd like to see a windows GUI application to automate the build/download process.

Yes, that sounds like a good idea. What are the commands to do a compile?

Language implementations that keep data and stack in HUB RAM will be faster.

We are brainstorming this on the Catalina thread at the moment, after having realised that it is possible to determine where variables/arrays end up depending on where they are defined. Local variables in functions are in hub. Static hub arrays/variables can be created by defining them in the "main" function, as the main never exits. And global ones end up in external memory. There is some merit in allowing the user to determine where things end up. I wonder if this sort of system could be replicated in big spin?

jazzed · 2011-02-18 22:19

Dr_Acula wrote: »

Here we go. How do these figures look?

Excellent

Can you post back the corrected package here?

We need 2 numbers for comparisons to put in our table:
1) fifo(20) calculation time and
2) the time it takes to run fibo 0 through 26

If you feel adventurous, Heater's SPIN FFT will run on BigSpin too. It runs in about 7 seconds on the SDRAM board.

Dr_Acula wrote: »

What are the commands to do a compile?

1) The BigSpin application (fibo.bin in this case) needs to be compiled and loaded onto SDCARD.
The compile is just a standard BSTC command with options to create a binary.
2) Need to automate copy from .binary to .bin. The program could be saved on SDCARD as image.bin or some other standard filename for now.
3) The load method could be anything as long as it's FAT16.

I'm using a version of David Betz Zog loader right now since it uploads to SDCARD very quickly. His loader also writes the filename to a boot pointer file which zog can then read and boot. Having the file pointer keeps us from constantly writing the same file name or renaming the file all the time in the interpreter's loader.

Just assume for now that the BigSpin interpreter is in EEPROM or can be loaded by BST and friends.

Dr_Acula wrote: »

There is some merit in allowing the user to determine where things end up. I wonder if this sort of system could be replicated in big spin?

Unfortunately we are restricted in Spin to a few methods. C is a beautiful language in part because you have power over the environment (all the curly braces make it pretty too LOL).

BigSpin that I'm considering will eventually have things as I've described before. BigSpin will allow anything to be put into "shared memory" HUB RAM if it's address is > $1000_0000 (ironic isn't it?).

Good job getting BigSpin running on DracBlade!
Thanks.

--Steve

Dr_Acula · 2011-02-19 02:23

This really is a most impressive piece of work.

Attached is a zip. Just changed two things:
1)

#ifdef DRACBLADE
    u : "userdefs_drac"
    f : "fsrwFemto_c3"
    c : "sram_cache_dracblade"
#endif

and 2)

The new userdefs_drac

Down the track one might look at adding a vga terminal like Kyedos uses, but for the moment the serial port is fine.

I'm not entirely sure what these time stats mean - it appears to be giving a negative value near the end

fibo 0 = 0 in 0 ms
fibo 1 = 1 in 0 ms
fibo 2 = 1 in 0 ms
fibo 3 = 2 in 0 ms
fibo 4 = 3 in 1 ms
fibo 5 = 5 in 2 ms
fibo 6 = 8 in 3 ms
fibo 7 = 13 in 6 ms
fibo 8 = 21 in 10 ms
fibo 9 = 34 in 17 ms
fibo 10 = 55 in 27 ms
fibo 11 = 89 in 45 ms
fibo 12 = 144 in 73 ms
fibo 13 = 233 in 118 ms
fibo 14 = 377 in 191 ms
fibo 15 = 610 in 309 ms
fibo 16 = 987 in 501 ms
fibo 17 = 1597 in 811 ms
fibo 18 = 2584 in 1312 ms
fibo 19 = 4181 in 2124 ms
fibo 20 = 6765 in 3436 ms
fibo 21 = 10946 in 5561 ms
fibo 22 = 17711 in 8998 ms
fibo 23 = 28657 in 14559 ms
fibo 24 = 46368 in 23557 ms
fibo 25 = 75025 in -15569 ms
fibo 26 = 121393 in 7988 ms

fibo26 took 2min, 35 seconds

Can you explain a bit more what is going on. You have an interpreter and you download this and run it off eeprom. Then you have the actual program you are running, in this case fibo.spin. Fibo.spin needs to be turned into a binary to be run.

Kyedos might be able to do some of this. Instead of saving as a .bin, save it as a .bsp or something. Then if you type a command in kyedos that happens to be a .bsp program, it then saves the name in COMMAND.TXT, runs the big spin interpreter which could search for the file and then run it.

What happens if fibo.spin compiles to a 100k program? What compiler can compile this?

Heater. · 2011-02-19 03:00

Dr_A,

I'm not entirely sure what these time stats mean - it appears to be giving a negative value near the end

Looks like that has been derived from my old fibo.c for testing Zog. It was timing execution by crudely comparing values of CNT before and after executing fibo(). Of course CNT wraps around after 40 seconds or so and the results go weird.

Dr_Acula · 2011-02-19 03:24

Ah, that makes sense.

Just for fun I thought I would try translating the Spin fibo program into BCX Basic.

Original Spin program:

pub main | n,time
  s.start(31,30,0,115200)
  waitcnt(clkfreq+cnt)
  s.str(string("Starting "))

  repeat n from 0 to 26

    time := cnt
    result := fibo(n)
    time := cnt - time

    s.str(string($d,"fibo "))
    s.dec(n)
    s.str(string(" = "))
    s.dec(result)
    s.str(string(" in "))
    s.dec(time/(_clkfreq/1000))
    s.str(string(" ms"))


pub fibo(n)

  if (n < 2)
    return n
  else
    return fibo(n - 1) + fibo(n - 2)

Program in BCX basic

print "Fibonacci series in BCX Basic"
dim time as uint 				' uint is the same as a Spin long
dim n as uint
dim result as uint

for n=0 to 26
  time = _cnt()
  result = fibo(n)
  time = _cnt() - time
  print "fibo ";n;" = ";result;" in ";time/80000;" ms"
next n

do 						' endless loop so doesn't clear the screen
loop until 0=1


Function fibo(n as uint)
  if (n < 2) then
    function=n
  else
    function=fibo(n - 1) + fibo(n - 2)
  end if
End Function

Results are
fibo 20 = 6765 in 319ms (running in hub memory) (5735ms for fibo26)
fibo 20 = 6765 in 2841ms (running in external dracblade memory) (2min, 10 secs for fibo26)

Big Spin seems to be right up there in terms of speed. And it does have the advantage of being able to leverage the obex code.

RossH · 2011-02-19 16:42

Hi Jazzed,

I just got my first Catalina programs running using your caching SPI driver. Here are the results on the C3 using a similar format to the one you use above:

Hardware        |  Language   |  FIBO(20) time  |  FIBO 0 to 26
----------------+-------------+-----------------+--------------
C3 LMM          |  Catalina C |  306ms          |  11s
[I][B]C3 XMM cached   |  Catalina C |  1468ms         |  1m10s[/B][/I]
C3 XMM uncached |  Catalina C |  7386ms         |  5m50s
----------------+-------------+-----------------+--------------

So your cachng SPI driver improves Catalina's speed by 5 times! Not too shabby!

Got a couple of bugs to sort out before I post a version - and also one question: Your SPI driver stores longs in SPI RAM in big-endian format. This will confuse most Propeller languages, as they all expect little-endian format. I had to modify your caching driver to use little-endian format to get it to work with Catalina. Is there a reason for the big-endian orientation?

Ross.

Mike Huselton · 2011-02-19 19:27

Great job, Dr_Acula ! Also. I installed the Basic4android trial - looks good for the first phase.

jazzed · 2011-02-19 20:03

Great work Ross! Congratulations.

I hope this means we should be able to use Catalina with SDRAM soon also.
I'll look at that as soon as you post a C3 caching update.

RossH wrote: »

I just got my first Catalina programs running using your caching SPI driver.

The C3 driver was written by David Betz.

RossH wrote: »

Is there a reason for the big-endian orientation?

Blame it on ZOG!

RossH · 2011-02-19 23:59

jazzed wrote: »

I hope this means we should be able to use Catalina with SDRAM soon also.
I'll look at that as soon as you post a C3 caching update.

Yes, my goal is to make Catalina work with your SDRAM board as well.

jazzed wrote: »

The C3 driver was written by David Betz.

Well, thanks David!

jazzed wrote: »

Is there a reason for the big-endian orientation?

Blame it on ZOG!

This is going to make it difficult for BigSpin to be compatible with Spin, and will also make it difficult for it to interoperate with other languages. Everything is ok as long as you only read and write longs - but reading and writing words or bytes doesn't give you the same answers as you would get in Spin (or any other Propeller language).

There may be a reason to keep it for Zog programs (although I can't think of one) but for everything else I think it would be best to stick to "little endian" memory organization. The changes required are quite minor - it just takes a couple of extra instructions.

Ross.

jazzed · 2011-02-20 02:50

Ross,

I've been scratching my head over what part of the cache interface would be big-endian specific.
Could you please snip an example here and explain what you mean?

Thanks.

Big Spin - is it still a pipedream?

Comments