Speed of PropForth vs Spin

msrobots · 2014-07-07 12:57

@Heater,

gosh. before you start hating Chrome try IE instead...

At least you can not claim to be windows free since 86 anymore!

The culprit in this case is the Parallax font. There is some Parallax2 font around in the forum fixing some of the problems.

Enjoy!

Mike

Don Pomplun · 2014-07-07 14:17

Peter,
I'd like to try your Tachyon Forth. When I follow the Latest Tachyon Binaries I get a binary file, which when I drag to the Propeller Tool window lets me load it into Prop eeprom. That gives me a rudimentary Forth (it will ad & subtract, etc). You refer to Extend.FTH, bud I don't find it anywhere.
TIA
Don

mindrobots · 2014-07-07 14:50

Don,

I usually end up going here for the latest Tachyon news.

I grab the latest .spin file from Peter's Dropbox and also grab the EXTEND.FTH to go with it. I'll build the .SPIN with one of the Spin compilers and load it to a Propeller, tehn start a serial connection to the prop and add in the EXTEND.FTH - at this point, you have a pretty complete basic Tachyon to play with.

Peter Jakacki · 2014-07-07 15:56

Dave Hein wrote: »

I ran the same Forth code using the Fast interpreter, and I got 26,928 cycles. This interpreter uses an inner loop that is similar to Tachyon's, so I would expect similar results from Tachyon. I'm looking forward to seeing the numbers from Peter and Doug.

These are the results for the exact same code in Tachyon Forth.

DECIMAL
: SUM ( n -- result ) 1+ 0 SWAP OVER DO I + LOOP ;
: MAIN CNT@ 100 SUM CNT@ ROT - SWAP ." sum(100) = " . ." , " . ." cycles" ;

MAIN sum(100) = 5050, 22144cycles ok

But more correctly:
100 LAP SUM LAP .LAP 271.40us ok

I note that the C version runs fast because it is a snippet of code and has the whole cog to itself to run pure PASM, is that right?. So a lot of these benchmarks and snippets look good on their own but fail to deliver in the real world where it has to do a whole lot more. But perhaps compiling PASM to a dedicated cog also has it's uses, I may look at something like this if I need to, although saying that I haven't needed to yet.

Dave Hein · 2014-07-07 16:02

I also timed the compiled version of the Spin code using spin2cpp, and got 3,552 cycles.

brucee · 2014-07-07 16:19

In a compiled BASIC (for not a Prop), it is around 3200 cycles, assuming a cycle is a CPU clock. Using a WHILE loop which optimizes better about 1700 cycles

Peter Jakacki · 2014-07-07 16:36

Okay, so your snippet runs fast when compiled to PASM (btw, do you have a listing?) although it doesn't do anything useful at all does it?. How about the challenge then to do something useful and try updating the string of 264 RGB LEDs??? Seeing you have quickly responded to counting in a loop, let's bash out some rgb bits now.

prof_braino · 2014-07-07 16:47

Peter Jakacki wrote: »

Yes, since you did say "TOO fast" then what is the external device(s) that can't keep up?

I'm just saying what the man says. All my stuff is small potatos, so I never have any of those problems myself. Lets see if I can recall correctly:

I2C: A lot of the older I2C devices have a slow speed. Nearly all have a medium speed, and a some have crazy fast speed. We use medium speed, as the ones that only support slow speed are mostly obsolete and not very common. The crazy fast speed is too fast for some of the medium speed devices, and the crazy fast devices are not very consistent. Sorry, that's the best I can recall of the I2C discussion.

FTDI chips: Many of the FTDI chips have a driver issues. If we are talking both directions full speed, the FTDI communication can lock until power cycle. The solultion is to talk in only one direction at a time. Most folks don't go full blast in both directions at once and are not bothered. This is too flakey, so Sal is writing the driver and protocol on both ends so we can skip the FTDI parts where we need fast communications. I don't know which part in particular, but appearently its been know for awhile.

SD: The SD cards are very fast and can hold a lot of data, but we can only blast so much at a time. So we have to pay attention to record size and record interval so we don't overrun the SD card.

There's other stuff, like COMM ports on a windows machine, that we just cannot rely on for large fast transfers, because window might no be there waiting when we need it. This was noticied even on very fast machines. It happens often enough to end up using something else, like Go on a linux box. Far as I know we have never had any trouble blasting data to a linux box.

Peter Jakacki · 2014-07-07 17:10

prof_braino wrote: »

I'm just saying what the man says. All my stuff is small potatos, so I never have any of those problems myself. Lets see if I can recall correctly:

I2C: A lot of the older I2C devices have a slow speed. Nearly all have a medium speed, and a some have crazy fast speed. We use medium speed, as the ones that only support slow speed are mostly obsolete and not very common. The crazy fast speed is too fast for some of the medium speed devices, and the crazy fast devices are not very consistent. Sorry, that's the best I can recall of the I2C discussion.

FTDI chips: Many of the FTDI chips have a driver issues. If we are talking both directions full speed, the FTDI communication can lock until power cycle. The solultion is to talk in only one direction at a time. Most folks don't go full blast in both directions at once and are not bothered. This is too flakey, so Sal is writing the driver and protocol on both ends so we can skip the FTDI parts where we need fast communications. I don't know which part in particular, but appearently its been know for awhile.

SD: The SD cards are very fast and can hold a lot of data, but we can only blast so much at a time. So we have to pay attention to record size and record interval so we don't overrun the SD card.

There's other stuff, like COMM ports on a windows machine, that we just cannot rely on for large fast transfers, because window might no be there waiting when we need it. This was noticied even on very fast machines. It happens often enough to end up using something else, like Go on a linux box. Far as I know we have never had any trouble blasting data to a linux box.

Of course we add delays for very slow protocols like I2C but we don't cripple the run-time speed to do that. As for FTDI chips I know I can blast them at 3Mbits but that's as far as they go.
SD cards can operate at a minimum 25MHz SPI speed and the only slow down is when we are waiting for responses so The Prop could never ever hope to overwhelm an SD card.

It's true about Windows serial, I think the only reason it manages anything at all is because the chips buffer the data for it

I didn't really expect you to try and respond as I just gave you the to be expected jab in the ribs for making a "it's TOO fast" boast especially considering that PropForth is running over 3 times slower than Spin in the LED demo.

Dave Hein · 2014-07-07 17:39

Doug, can you run the 2-line Forth code through PropForth to see how many cycles it uses?

prof_braino · 2014-07-07 19:15

Dave Hein wrote: »

Doug, can you run the 2-line Forth code through PropForth to see how many cycles it uses?

No time. Otherwise I would have responsed to the post that says " I feel the same about someone who would say the C is the only way to go." with something like "I know a guy like that. He goes on and on about C in a thread about propforth and spin".

Go ahead, you can do it and post the results. You guys are much better qualified to scrutinize the finer points of programming languages. I'm sure the results will change everyone's options, or not.

Dave Hein · 2014-07-07 19:24

Doug, in the time it took you to make your last post you could have copied and pasted 2 lines of Forth into PropForth. Maybe you are concerned that PropForth will have the slowest performance of all the languages.

Yes, I am biased toward C. I spent several months programming in Forth, and ran into its many limitations. C is superior in almost every way. However, the main reason I mention C in this thread is that it's used as an intermediate language when compiling Spin to PASM.

prof_braino · 2014-07-07 19:29

Peter Jakacki wrote: »

.... As for FTDI chips I know I can blast them at 3Mbits but that's as far as they go.

\

I'm not talking about one way transmission, propforth can go to the max speed of the prop clock, but it starts getting unreliable over long wires, so Sal cut it back to 230400. The FTDI chip itself chokes in 2 way communication if both directions are running continuously. Or something. Anyway, Sal decided it would be more reliable to do something else. That's all I got for that.

SD cards can operate at a minimum 25MHz SPI speed and the only slow down is when we are waiting for responses so The Prop could never ever hope to overwhelm an SD card.

Maybe I'm talking out the wrong oriface again. I thought when we sent too big a record that card did not finish writing before the next record. I thought is was the prop going at 80Mhx and the SD going at 25, but changing the record size and the record period and it worked so I left it at that. Somebody smart would have to investigate further.

I didn't really expect you to try and respond as I just gave you the to be expected jab in the ribs for making a "it's TOO fast" boast especially considering that PropForth is running over 3 times slower than Spin in the LED demo.

??? No clue which LED demo you mean. Are you talking about the Quickstart resistitive touch buttons demo? Seems pretty darn fast to me. Anyway, that's what Sal said, its the best I got for right now.

prof_braino · 2014-07-07 19:44

Dave Hein wrote: »

Doug, in the time it took you to make your last post you could have copied and pasted 2 lines of Forth into PropForth. Maybe you are concerned that PropForth will have the slowest performance of all the languages.

No, I just messing with you cause its easy. And I don't really care about apples and oranges benchmarks. Unless your making sangria, which I already have, being my vacation.

You really need to lighten up a bit. Life is too short. Have some fun!

Peter Jakacki · 2014-07-07 20:23

prof_braino wrote: »

\

I'm not talking about one way transmission, propforth can go to the max speed of the prop clock, but it starts getting unreliable over long wires, so Sal cut it back to 230400. The FTDI chip itself chokes in 2 way communication if both directions are running continuously. Or something. Anyway, Sal decided it would be more reliable to do something else. That's all I got for that.

Maybe I'm talking out the wrong oriface again. I thought when we sent too big a record that card did not finish writing before the next record. I thought is was the prop going at 80Mhx and the SD going at 25, but changing the record size and the record period and it worked so I left it at that. Somebody smart would have to investigate further.

??? No clue which LED demo you mean. Are you talking about the Quickstart resistitive touch buttons demo? Seems pretty darn fast to me. Anyway, that's what Sal said, its the best I got for right now.

I'll let up on ya Doug but the LED demo is referring to this thread and the OP's code in the top post.

mindrobots · 2014-07-07 21:03

Here's what I got from PropForth v5.5 running on a stock QuickStart

Prop0 Cog6 RESET - last status: 0 ok
Prop0 Cog6 ok
: sum 1+ 0 swap over do i + loop ;
Prop0 Cog6 ok
: main cnt COG@ 100 sum cnt COG@ rot - swap ." sum(100) = " . ." , " . ." cycles" ;
Prop0 Cog6 ok
main
sum(100) = 5050 , 94976 cyclesProp0 Cog6 ok

94976 cycles

I'll let someone else do the LED demo.

prof_braino · 2014-07-07 21:09

Peter Jakacki wrote: »

I'll let up on ya Doug but the LED demo is referring to this thread and the OP's code in the top post.

sorry, packing, not thinking. I don't have the parts, so I can't comment very well. But using slow commands in the innermost loop will usually have an impact.
Refactoring that code is the first step, but aside from "don't do it the slow way" I don't have much more to add at this point. Sal said to use the fast command, where we set the mask, then send the values, but I don't have the specifics.

Dave Hein · 2014-07-08 06:14

Here's are the results of the comparisons so far. If I get a chance I'll try running the LED code with pfth, Fast and spin2cpp.

              Sum  LED
Spin       141008  0.4s
pfth       101952
PropForth   94976  1.4s
Fast        26928
Tachyon     22144  0.003s
spin2cpp     3552
C            2032

prof_braino · 2014-07-08 07:49

So, the goal here seems to be compare spin execution time with several other options, including propforth and tachyon

We generally take spin as the benchmark. Spin is pretty much you get what you get, and everybody should get about the same thing (using the same demo etc).

We know that tachyon is designed to fast. Further optimization would be to re-implement directly to assembler, or further optimize the kernel itself. (Is this right?)

We already know that un-optimized propforth, is by design, un-optimized. Propforth focus is on ease of use and flexibility, to be optimized later as needed.

So the question is what do we mean by "as needed" in this instance. Is anybody going to do the final step, which is required in an execution speed comparison? That is, refactor the high level propforth code for better execution, determine the bottle necks and consider the options for assembly optimization, and optimize the tighest bottleneck until the speed is acceptable (or we get bored and move on)? Or will this be left as an apple to oranges comparison?

Numbers without meaningful interpretation are not data, they are just a bunch of numbers.

FYI - I'm going to be out of town, and may or may not have wi-fi, and definitely won't be able to run arbitrary tests on command as I'm not bringing any development kits. I usually don't do optimization anyway, beyond refactoring the high level forth. You would need someone like Caskaz or Rick. Caskaz generally does not argue or banter.

My guess would be that optimized propforth would approach the physical limits of the prop (as more becomes straight assembler). The degree of optimization should not be so extensive, as most of the function need not be executed in the body of the loop. Whether this would meet or excede the number next to Tachyon is undetermined.

mindrobots · 2014-07-08 09:17

Don,

Sorry to take this back to your original question. Since you have the LED string, can you try this and see if it is much faster?

Can you change the usage of px to be either pinhi or pinlo as required? The px has considerably more overhead and since it is at the core of your loops, replacing it with the appropriate pinhi/pinlo should have the greatest impact. You saw some of that impact the inner loop had with your switch from the lshift to the multiply and that was a relatively low overhead word change.

: sendMSBfirst \ put byte on stack first 
8 
0 do  
dup 
h80  and 
if data pinhi then  
clock pinhi clock pinlo \ toggle clock  
data pinlo 
\ 1 lshift  
2* loop 
drop \ the depleted old shifted word ;

This is the quickest thing I can see to try with PropForth.

Dave Hein · 2014-07-08 09:33

I ran the LED test for Spin, C, spin2cpp, pfth and Fast, and the results are shown below. The PropForth number is the value given by the OP. I've also included the Forth code that I used with pfth and Fast.

              Sum     LED
Spin       141008   294ms
pfth       101952   265ms
PropForth   94976  1400ms
Fast        26928   139ms
Tachyon     22144     3ms
spin2cpp     3552     6ms
C            2032     7ms

0 constant LEDdata
1 constant LEDclock
1 LEDdata lshift constant LEDdatabit
1 LEDclock lshift constant LEDclockbit
LEDdatabit invert constant LEDdatamask
LEDclockbit invert constant LEDclockmask

: streamByte ( d -- )
  24 lshift
  8 0 do
    dup 1 31 lshift and if
      outa@ LEDdatabit or outa! then
    outa@ LEDclockbit or outa!   \ high
    outa@ LEDclockmask and outa! \ low
    outa@ LEDdatamask and outa!
    2*
  loop
  drop
;

: main
  cnt@
  264 0 do
    128 streamByte
    128 streamByte
    128 streamByte
  loop
  cnt@ swap -
  40000 + 80000 /
  ." time = " . ."  msec"
;

Dave Hein · 2014-07-08 09:37

Here are the C and Spin programs that I used.

#include <stdio.h>
#include <propeller.h>

#define LEDdata  0
#define LEDclock 1

void streamByte(int d)
{
    int i;
    d <<= 24;
    for (i = 0; i < 8; i++)
    {
        if (d & 0x80000000)
            OUTA |= 1 << LEDdata;

        OUTA |= 1 << LEDclock;
        OUTA &= ~(1 << LEDclock);
        OUTA &= ~(1 << LEDdata);
        d <<= 1;
    }
}

int main(void)
{
    int i, cycles;
    cycles = CNT;
    for (i = 0; i < 264; i++)
    {
        streamByte(0x80);
        streamByte(0x80);
        streamByte(0x80);
    }
    cycles = CNT - cycles;
    printf("time = %d msec\n", (cycles + 40000)/80000);
    return 0;
}

CON
  _clkmode = xtal1  + pll16x
  _xinfreq = 5_000_000

  LEDdata  = 0
  LEDclock = 1

OBJ
  c : "clib"

PUB startup 
  c.start
  result := CNT
  repeat 264
    streamByte( $80 )
    streamByte( $80 )
    streamByte( $80 )
  result := CNT - result
  c.printf1(string("time = %d msec\n"), (result + 40000)/80000)

PRI streamByte(d)
  d <<= 24
  repeat 8
    if d & $8000_0000
      outa[LEDdata]~~
    outa[LEDclock]~~   ' high
    outa[LEDclock]~    ' low
    outa[LEDdata]~
    d <<= 1

Don Pomplun · 2014-07-08 10:55

Here's where I am . . . I've loaded the binary for Peter's Tachyon Forth. That goes quickly; just a quick dump into Prop's EEPROM.
Then I loaded his EXTEND.fth -- a slow laborious process via TeraTerm. Something you don't want to do often.
When I do a ^C it shows loaded modules (boot & extend).
So I debug my app code in immediate mode, then copy what works to a Notepad file. I assume that I then drag this into the TeraTerm window and it adds my new Words into the dictionary. Now, in order to have those words not disappear, I do a BACKUP. If I then do a ^C, will it show my new additions as another "module"?
Probably (actually, fer sure) as I play with my app I'll have a list of fixes/improvements. Is there a way to remove all of my new personal additions and get back to just Boot & Extend, without starting over from scratch?
TIA
Don

Peter Jakacki · 2014-07-08 15:57

Don Pomplun wrote: »

Here's where I am . . . I've loaded the binary for Peter's Tachyon Forth. That goes quickly; just a quick dump into Prop's EEPROM.
Then I loaded his EXTEND.fth -- a slow laborious process via TeraTerm. Something you don't want to do often.
When I do a ^C it shows loaded modules (boot & extend).
So I debug my app code in immediate mode, then copy what works to a Notepad file. I assume that I then drag this into the TeraTerm window and it adds my new Words into the dictionary. Now, in order to have those words not disappear, I do a BACKUP. If I then do a ^C, will it show my new additions as another "module"?
Probably (actually, fer sure) as I play with my app I'll have a list of fixes/improvements. Is there a way to remove all of my new personal additions and get back to just Boot & Extend, without starting over from scratch?
TIA
Don

"slow laborious"? The delay is a line delay, not the character delay, I have mine set to 12ms per line, so the whole thing should load relatively quickly.

The binary already has EXTEND.fth loaded into it, so there is no need to try and load it again.

Modules are only identified because I have a definition at the start that ends in .fth which is what the MODULES word scans for. So if you make sure the first defintion in your code starts with a module name such as don.fth and before that have a line that says FORGET don.fth then what will happen is that every time you paste your code in it will start clean.

Peter Jakacki · 2014-07-08 16:26

Dave Hein wrote: »

I ran the LED test for Spin, C, spin2cpp, pfth and Fast, and the results are shown below. The PropForth number is the value given by the OP. I've also included the Forth code that I used with pfth and Fast.
[code]
Sum LED
Spin 141008 294ms
pfth 101952 265ms
PropForth 94976 1400ms
Fast 26928 139ms
Tachyon 22144 3ms
spin2cpp 3552 6ms
C 2032 7ms

Those figures for C Ceem quite good really, I'm wonder how it would go with compiling an application? Do you have the C source for this demo? Does the compiler generate a machine listing?

BTW, on larger systems that are self-supporting such as a PC or tablet etc I would be far more inclined to use the HLL that works best for that platform (even C) as I would still have the O/S to interact with and hopefuly the libraries are up to it and the tools don't end up costing 3 or 4 digits. With the Propeller it's not small enough to program purely in assembler and it's not big enough to support an O/S of the traditional kind, so I "prefer" programming in Forth for this chip. Perhaps if the Prop 3 could run Linux I might do it differently, but that's me, I hate having silicon chips with binary blobs to work with, I like to have fun and interact at the same time.

Don Pomplun · 2014-07-08 16:47

Peter Jakacki wrote: »

"slow laborious"? The delay is a line delay, not the character delay, I have mine set to 12ms per line, so the whole thing should load relatively quickly.

The binary already has EXTEND.fth loaded into it, so there is no need to try and load it again.

Modules are only identified because I have a definition at the start that ends in .fth which is what the MODULES word scans for. So if you make sure the first defintion in your code starts with a module name such as don.fth and before that have a line that says FORGET don.fth then what will happen is that every time you paste your code in it will start clean.

HAH! That would help. I put it in as the Character Delay. That should speed things up by a factor of 80 or so ;=)

Also didn't realize that EXTEND.fth was included. I guess what was throwing me was not recognizing the Case Sensitivity. Explains why emit wouldn't work. [PropForth was the same, but in lower case]

Don Pomplun · 2014-07-08 16:50

mindrobots wrote: »

Don,

Sorry to take this back to your original question. Since you have the LED string, can you try this and see if it is much faster?

Can you change the usage of px to be either pinhi or pinlo as required? The px has considerably more overhead and since it is at the core of your loops, replacing it with the appropriate pinhi/pinlo should have the greatest impact. You saw some of that impact the inner loop had with your switch from the lshift to the multiply and that was a relatively low overhead word change.
.

OK, switched the px's to pinhi & pinlo.
For as fast as I can react with the stopwatch button, it reduced the strip's load time from ~1.23 sec to ~0.8 sec.
Don

Don Pomplun · 2014-07-08 17:17

Peter Jakacki wrote: »

I don't actually have LED hardware and I don't need to, I just took the original code that Don did in PropForth and wrote the equivalent in TF and it's all there in the thread. I just went and looked up the datasheet for the
LPD8806 that Don is using and I'm not sure where the 50ms figure comes from, I just blast out the bits correctly. Here is the link for the Arduino C code. but that uses hardware SPI by the look of it.
.

This started out as a simple digital binary clock, then I ran across the RGB LCD strips using the LPD8806 chip. The scuttlebutt seems to be that the data sheet is in Chinese and may read like an old VCR manual. There are a couple reverse engineered data sheet articles around.
Once I had a strip to play with then it was all learn-by-doing. I think the 50ms delay is vestigial. I just took it out of the PropForth code and it still works fine. Of course an extra 50ms doesn't affect my stopwatch-thumb reaction time.

The 8806's are daisy-chained (data & clock lines) for as long as you want (you do have to re-supply the 5v periodically. Lit up white, I think it draws abut 3 amps for a 5m strip). Chip #1 doesn't know how many follow it. You "prime the pump" by sending out a zero byte that is passed along to all the chips. It lets them know that a string of new data is coming. Data Bytes all have the MSB set. Each chip drives 2 RGB LEDs. The first 3 bytes are grabbed by chip#1 for the RGB of its first LED bundle. Then the same chip grabs the second three bytes for its other LED cluster. All bytes after that are merely passed down the line. So chip#2 grabs the next 6 bytes for its LEDs, and passes everything after that; ad infinitum. A zero-byte starts the whole process over.
IIRC, the chip can take clocking at 20MHz, which should make it instantaneous (to my eye). Probably still too slow if you used it for a video display at the football stadium.

Interestingly, makers of the strips (I've tried 2) can't seem to get the RGB sequence to be "RGB". The one I'm now using goes BRG.

Dave Hein · 2014-07-08 18:05

Peter Jakacki wrote: »

Those figures for C Ceem quite good really, I'm wonder how it would go with compiling an application? Do you have the C source for this demo? Does the compiler generate a machine listing?

BTW, on larger systems that are self-supporting such as a PC or tablet etc I would be far more inclined to use the HLL that works best for that platform (even C) as I would still have the O/S to interact with and hopefuly the libraries are up to it and the tools don't end up costing 3 or 4 digits. With the Propeller it's not small enough to program purely in assembler and it's not big enough to support an O/S of the traditional kind, so I "prefer" programming in Forth for this chip. Perhaps if the Prop 3 could run Linux I might do it differently, but that's me, I hate having silicon chips with binary blobs to work with, I like to have fun and interact at the same time.

The assembly output from the C compiler is shown below. The C source is in post #53 of this thread. You'll see that the SPI loop is loaded into the fcache and executed directly from cog memory.

You may want to try spinix. It provides the look and feel of the Linux OS for the prop. It has a simple Spin compiler that handles small objects. Small objects can be linked together to build up a large program. Someday I'll port a limited version of cspin to it that can convert C to Spin. This will provide a C compiler running on the Prop.

I also have pfth integrated into spinix, which allows me to edit a Forth program using vi, and then run it by typing "pfth file.fth". You can exit back to the spinix shell by typing "bye" under pfth.

	.text
	.balign	4
	.global	_streamByte
_streamByte
	shl	r0, #24
	mov	r6, #8
	jmp	#__LMM_FCACHE_LOAD
	long	.L6-.L5
.L5
.L3
	cmps	r0, #0 wz,wc
	IF_B  mov	r7, OUTA
	IF_B  or	r7, #1
	IF_B  mov	OUTA, r7
	or OUTA,#2
	mov	r7, OUTA
	andn	r7, #0x2
	mov	OUTA, r7
	mov	r7, OUTA
	andn	r7, #0x1
	mov	OUTA, r7
	sub	r6, #1 wz
	shl	r0, #1
	IF_NE	jmp	#__LMM_FCACHE_START+(.L3-.L5)
	jmp	__LMM_RET
	.compress default
.L6
	mov	pc,lr
	.data
	.balign	4
.LC0
	.ascii "time = %d msec\12\0"
	.text
	.balign	4
	.global	_main
_main
	mov	__TMP0,#(3<<4)+13
	call	#__LMM_PUSHM
	sub	sp, #8
	mov	r13, CNT
	mov	r14, #264
.L8
	mov	r0, #128
	lcall	#_streamByte
	mov	r0, #128
	lcall	#_streamByte
	mov	r0, #128
	lcall	#_streamByte
	sub	r14, #1 wz
	IF_NE	brs	#.L8
	mov	r0, CNT
	mvi	r7,#.LC0
	sub	r0, r13
	mvi	r1,#80000
	wrlong	r7, sp
	mvi	r7,#40000
	add	r0, r7
	mov	r7, sp
	add	r7, #4
	call	#__DIVSI
	wrlong	r0, r7
	lcall	#_printf
	mov	r0, #0
	add	sp, #8
	mov	__TMP0,#(3<<4)+15
	call	#__LMM_POPRET
	'' never returns

Martin_H · 2014-07-08 19:42

@Dave, thanks for publishing those benchmarks. The Tachyon score is amazing, and C was no slouch either.

Speed of PropForth vs Spin

Comments