Fast - Yet another Forth interpreter for the Prop

Dave Hein · 2013-12-11 14:14

I'm considering replacing the Forth interpreter in pfth with a faster one, so I'm writing a new one to test it out. I decided to call it fast because it will be faster than pfth's current interpreter, and I'm hoping it will be quick to implement. I plan on posting my work in stages so that it might be beneficial to others who are interested in learning about Forth.

This initial version of fast contains only the inner interpreter and enough words to implement a small program that prints "Hello World" over and over. It executes a small pre-compiled program that is represented by the following Forth code.

create hellostr char H c, char e c, char l c, char l c, char o c, 32 c,
                char W c, char o c, char r c, char l c, char d c,
: type ( addr num -- ) begin dup while 1 - swap dup c@ emit 1 + swap repeat
                       drop drop ;
: hello ( -- ) begin hellostr 11 type 13 emit again ;
hello

That's all it does. However, the PASM code does show how the inner loop, stacks and a small number of basic words are implemented. In future updates I will add the dictionary structure and the outer interpreter, which will allow for interactivity and the ability to compile new words.

iammegatron · 2013-12-11 16:33

Thanks Dave. Couldn't hope anything better. Please keep on teaching us how to fish.

Peter Jakacki · 2013-12-11 20:17

Dave Hein wrote: »
I'm considering replacing the Forth interpreter in pfth with a faster one, so I'm writing a new one to test it out. I decided to call it fast because it will be faster than pfth's current interpreter, and I'm hoping it will be quick to implement. I plan on posting my work in stages so that it might be beneficial to others who are interested in learning about Forth.

This initial version of fast contains only the inner interpreter and enough words to implement a small program that prints "Hello World" over and over. It executes a small pre-compiled program that is represented by the following Forth code.
create hellostr char H c, char e c, char l c, char l c, char o c, 32 c,
                char W c, char o c, char r c, char l c, char d c,
: type ( addr num -- ) begin dup while 1 - swap dup c@ emit 1 + swap repeat
                       drop drop ;
: hello ( -- ) begin hellostr 11 type 13 emit again ;
hello
That's all it does. However, the PASM code does show how the inner loop, stacks and a small number of basic words are implemented. In future updates I will add the dictionary structure and the outer interpreter, which will allow for interactivity and the ability to compile new words.

Hmmm, looks somewhat familiar...

Your inner loop:

[FONT=courier new]innerloop         rdword  opcode, pc              ' Fetch a 16-bit instruction 
                        add     pc, #2                  ' Increment the program counter 
                        jmp     opcode                  ' Execute the instruction 
[/FONT]

vs mine:

[FONT=courier new]doNEXT      rdbyte X,IP        'read byte code instruction
            add    IP,#1       'advance IP to next byte token
            jmp    X           'execute the code by directly indexing the first 256 long in cog
[/FONT]

So the 16-bit address is mostly wasted at this level as only 9 bits can be addressed anyway. This also means that a branch instruction such as IF takes 4 bytes vs 2 in TF and I notice that you can't really have byte literals as your PC (IP) needs to be word aligned anyway. But it will be interesting to see how you develop this and what design decisions you come to as well. Could you write your kernel on a C compiler rather then Spin like I am looking at doing? It seems to me that the C compilers could be customized too to make it a little easier to do this sort of thing or perhaps GCC's macros will be sufficient.

Dave Hein · 2013-12-12 07:00

The discussion about the J1 Forth CPU gave me a better understanding of how Forth works. The fastest Forth interpreters just run assembly code. When executing code in the hub RAM on the Prop, "assembly" code basically consists of code bytes that are interpreted by a VM running in a cog. The fastest way to interpret the code bytes is to use them as cog addresses to the routines that execute the code. So the Tachyon inner loop is probably the fastest way to execute code.

I originally thought about using byte codes, but then I changed it to word-size codes to improve the efficiency of loading literals larger than 255, and also to make jumps and calls more efficient. This doubles the size of compiled words that only use kernel words, but for words that call a lot of compiled words it should only be 33% larger.

I also use a hybrid approach for the data stack, where the top two elements consist of registers and the rest of the stack is in hub RAM. This should be just as fast, or faster than either keeping the entire stack in registers or the entire stack in RAM.

The Prop C languages don't really integrate well with PASM like Spin does. So I think it would be difficult to do it with PropGCC or Catalina. Like I suggested in your thread, a Forth compiler could be written in C that would then provide binary data to be integrated with the Spin/PASM kernel.

Martin_H · 2013-12-12 07:52

This is interesting as it looks like pfth could evolve to have some of the benefits of Tachyon. It might also fit my selfish desire of a fast ANS compatible Forth with a robust serial buffer. I could build something on a PC, port it to the Propeller, and then to an AVR.

For example it recently dawned on my that I could do my fixed point trig in 16 bits and make it portable to AmForth. So I could probably run it in AmForth too.

KC_Rob · 2013-12-12 08:48

Martin_H wrote: »

This is interesting as it looks like pfth could evolve to have some of the benefits of Tachyon. It might also fit my selfish desire of a fast ANS compatible Forth with a robust serial buffer.

Ditto. This is neat stuff, Dave. I'll keep an eye out for further developments.

iammegatron · 2013-12-12 17:47

I apologize for asking a newbie question. On line #164 of fast.spin:

     dstackptr               long    @dstack+Q              ' Data stack pointer

I understand it's mapping cog RAM address to HUB RAM address, since dstack lives in HUB RAM. So I tried to use "@@" operator instead of "+Q"

     dstackptr               long    @@dstack              ' Data stack pointer

But it failed to compile. Why?

iammegatron · 2013-12-12 18:10

Just to add the J1 forth CPU discussion. Here is a J1 virtual machine written in Google Go. https://bitbucket.org/zserge/j1vm/src/ad1192dc0ab8762c9a0b757e147f58e233c4afad/j1vm/j1.go?at=default

Dave Hein · 2013-12-12 19:30

The @@ operator is valid only within a Spin method. It can't be used in a DAT section. BST does have the @@@ operator that can be used in a DAT section, but I want to use the Spin Tool. The operator doesn't map a cog RAM address to a hub RAM address, but it adds the object's starting hub address to an object offset.

DAT symbols have two different addresses. The @ operator gives the object offset. Without the @ operator you will get the cog address.

The hello word was defined as follows:

hello                   word    _lit, @hellostr+Q, _lit, 11, _exec, @type+Q, _lit, 13, emit, _jmp, @hello+Q

It contains a combination of cog and hub addresses. The symbols _lit, _exec and emit are cog address, and the symbols @hellostr+Q, @type+Q and @hello+Q are hub addresses. There are also two numeric values in the word, which are 11 and 13. Any value in the word that is not a cog address must be preceded by a cog address that represents a word that will consume it.

Dave Hein · 2013-12-13 15:01

Here's version 2. This version reads a string from the input terminal, parses it and prints out each word bracketed by the <> characters. This version adds the kernel words c!, <, >, =, <>, over, rot and key. The new compiled words are count, accept, refill, cmove, skipchar, findchar, word and interp.

There's still no dictionary, but that will appear in the next update.

LoopyByteloose · 2013-12-14 05:05

Well, I can appreciate FAST from the point of view of better performance. But I deeply appreciate the fact that pfth is so easy for me to read the code and learn from.

It seems to be that once anyone masters Forth, they begin to feel the need to optimize and thier Forth becomes more and more of a personalized form. Enjoy doing this as you have certainly earned the right to do so.

I just hope that pfth is kept available for those that are just starting out.

Martin_H · 2013-12-14 05:50

Loopy Byteloose wrote: »

Well, I just hope that pfth is kept available for those that are just starting out.

Only Dave can speak for his long term plans, but from his posts I think he values the ANS standard. As long as he keeps the syntax and word list of ANS Forth, these optimizations of the interpreter internals shouldn't bother you.

I've been able to navigate around some of the Forth dialect issues, but it took me quite a bit of effort to figure it out. It is kinda interesting that there are probably nearly as many Forth implementations as Forth projects on the forum. If I start writing a Forth we'll know it's some kind of mind virus.

Dave Hein · 2013-12-14 06:35

My plan for pfth is to replace its kernel with the FAST kernel. The outer interpreter and its ANS Forth dictionary will remain the same. This will make pfth faster while still being ANS Forth compliant. I'm posting FAST in stages to show the process for implementing a Forth interpreter from the ground up. Once I get the boot interpreter working in FAST I'll port the pfth Forth code to it.

mindrobots · 2013-12-14 07:29

Thanks for sharing this, Dave!

The various threads we have documenting people's creative processes are fascinating (in a geeky way). Between Chip, you, Peter, Jazzed with SimpleIDE and the others sharing their journeys, discoveries, victories and failures, there's probably a book on creation and innovation buried some place in here.

I'll have to dust off a Prop and fire up pfht again just to play along.

LoopyByteloose · 2013-12-14 09:48

For me personally, it is not just about ANS Forth for the new Forth learner. It is also that pfth provided the source code initially in C and then in PASM in a simpler less enhanced form of Forth that has allowed me to read the code and actually learn more about C and PASM.

FAST may come to be a superb Forth kernel that far exceeds pfth in performance. But I am interested the the educational resources that pfth has provided. Pfth can be studied along with eForth documents to give a serious students the means to learn quite a bit by comparison.

I keep restating, that Forth is not just about using Forth. It is about using Forth to discover how other software tools work.

iammegatron · 2013-12-14 10:39

mindrobots wrote: »

Thanks for sharing this, Dave!

The various threads we have documenting people's creative processes are fascinating (in a geeky way). Between Chip, you, Peter, Jazzed with SimpleIDE and the others sharing their journeys, discoveries, victories and failures, there's probably a book on creation and innovation buried some place in here.

Completely agree.

Contrary to Carl Gauss's belief that "no self-respecting architect leaves the scaffolding in place after completing the building", the building process i extremely valuable for other want-to-be architects to learn. And Forth language is a perfect vehicle for that learning purpose.

I am in no way against ANS-ize forth. But on a dedicated embedded system with special purpose or limited resource, to be able to cook your own Forth, is definitely an advantage.

So Dave please keep on enlightening us.

Dave Hein · 2013-12-16 17:51

Here is version 3. This version implements the dictionary, and it will interpret and execute words that are typed into the terminal. Any words that are not found in the dictionary will be interpreted as a hex number. Enter "words" to see a list of the words in the dictionary.

EDIT: BTW, this version does not support creating new words or variables. That will come in the next update.

iammegatron · 2013-12-19 10:10

"Fast" forth learning notes
====================

(source file: fast.spin)

1. _jmp vs jmp

jmp is a Propeller assembly code instruction. It instructs the "program counter" inside Propeller chip to execute code at a new address.

_jmp (line 141) is used by #innerloop (line 63) to keeps its own "program counter"---pc to point to the next executable code.

2. why _exec and exit, why not _jmp

The Forth word hello (line 180) jumps to another Forth word type (line 177) with _exec.

First of all, these are not cog-ram code, not executed directly by the Propeller chip. So we can't do jmp #type, and in type to go back with jmp #innerloop

These Forth words are "interpreted" by #innerloop. In a sense, _exec and exit do _jmp @type+Q and _jmp @hello+Q+14 to come back with the help of a rstack.

Basically, to jump around and execute code, there are different levels.

The lowest level: Propeller chip. One can use jmp and the addresses are fixed at compilation time.

The second level: #innerloop. One can use _jmp

The third level is sub-function calls (forth word using other forth word). One uses _exec and exit

"Fast.spin" code demonstrates these three levels very clearly.

Dave Hein · 2013-12-19 11:45

iammegatron, that's a good description of how _jmp, _exec and exit work. _jmp is normally used within a word. _exec is used to call another word, and it works just like _jmp except that it also saves the current value of the pc to the return stack. exit is a standard Forth word that pops the address off the return stack, and sets the pc to it. There is also the _jmp0 word that performs a jump if the TOS is zero.

I've gotten the :, ; and create words working, but I haven't posted it yet. I'm looking into making the code words more compact by using the most significant word to differentiate between kernel words and compiled words. This will eliminate the need to use _exec before each compiled word. I'm also looking at using the 6 unused bits in a kernel word to implement compact relative jumps and 6-bit literal values. I think I'll post a version with just the addition of :, ; and create first, and then I'll post the compact code changes later.

iammegatron · 2013-12-20 07:55

(fast2.spin, fast3.spin)

To be able to compile forth code at runtime, which is a preferred (though not always necessary) feature of forth, a dictionary is created.

A linked list is used to make that dictionary, so that the executable address of forth primitives can be found at runtime.

My question (my evil thoughts of optimization) is that why not a plain list in cog-ram, such as

dictionary byte 6, "interp"
               long                
               word @interp+Q
               byte 5, "words"
               long
               word @words+Q
                , ...., 
              byte 4, "swap"
             long
             word @swap+Q
            byte 4, "over"
             long
            word @over+Q
            byte 0  'end of list

This way, no need for an extra memory cell (a word) pointing to the previous forth word, as in a linked list. And a cache list can be created for faster search.

I understand that "dictionary" grows "backwards" in the memory space, because new forth words are prepended, not appended. But that can be implemented with the help of stack.

What do you think?

[note: after I posted this, I went looking into the source code of Tachyon V2.3. Its dictionary is a flat list. So Peter is probably already doing all this.

Still without Dave's fast forth, I wouldn't have the knowledge to look into other forth implementations.

So this post shouldn't be deemed as a persuasion for fast forth to "optimize", rather it's an indication of how wonderful fast forth promotes better understanding of forth itself.

Dave Hein · 2014-01-30 13:29

Here's version 4 of fast. I actually wrote this before Christmas, and I thought I would have lots of time to work on it back then. However, I became busy with other things during Christmas and January.

This version implements the create, : and ; words. This allows for creating new words and variables. A flag byte was added to the dictionary entries that indicates the word type, and whether it is immediate or not. The outer interpreter was improved to allow it to work in either the compile mode or interpreter mode.

I have another version of fast that I'll post later on. It incorporates some work I did on making the code words more compact.

frida · 2014-02-02 12:39

It is a wonderful forth you have made. But when I want to make output on the other pins, they got disturbed by the emit routine. So I made a little change to emit, and it works.

before:

                        ' Emit the value on the top of the stack to the output serial port
emit                    or      tos, #$100
                        shl     tos, #1
                        rol     tos, #30
                        mov     temp1, #10
                        mov     temp2, bitcycles
                        add     temp2, cnt
:loop                   mov     outa, tos
                        ror     tos, #1
                        waitcnt temp2, bitcycles
                        djnz    temp1, #:loop
                        jmp     #drop

after:

                        ' Emit the value on the top of the stack to the output serial port
emit                    or      tos, #$100
                        shl     tos, #1
'                        rol     tos, #30
                        mov     temp1, #10
                        mov     temp2, bitcycles
                        add     temp2, cnt
:loop                   ror     tos,   #1    wc
                        muxc    outa,  outbit 
'                        mov     outa, tos
'                        ror     tos, #1
                        waitcnt temp2, bitcycles
                        djnz    temp1, #:loop
                        jmp     #drop

Dave Hein · 2014-02-02 19:03

Thanks for making that change. I'll incorporate it in the next update.

Fast - Yet another Forth interpreter for the Prop

Comments