Fast - Yet another Forth interpreter for the Prop
Dave Hein
Posts: 6,347
I'm considering replacing the Forth interpreter in pfth with a faster one, so I'm writing a new one to test it out. I decided to call it fast because it will be faster than pfth's current interpreter, and I'm hoping it will be quick to implement. I plan on posting my work in stages so that it might be beneficial to others who are interested in learning about Forth.
This initial version of fast contains only the inner interpreter and enough words to implement a small program that prints "Hello World" over and over. It executes a small pre-compiled program that is represented by the following Forth code.
This initial version of fast contains only the inner interpreter and enough words to implement a small program that prints "Hello World" over and over. It executes a small pre-compiled program that is represented by the following Forth code.
create hellostr char H c, char e c, char l c, char l c, char o c, 32 c, char W c, char o c, char r c, char l c, char d c, : type ( addr num -- ) begin dup while 1 - swap dup c@ emit 1 + swap repeat drop drop ; : hello ( -- ) begin hellostr 11 type 13 emit again ; helloThat's all it does. However, the PASM code does show how the inner loop, stacks and a small number of basic words are implemented. In future updates I will add the dictionary structure and the outer interpreter, which will allow for interactivity and the ability to compile new words.
Comments
Hmmm, looks somewhat familiar...
Your inner loop:
vs mine:
So the 16-bit address is mostly wasted at this level as only 9 bits can be addressed anyway. This also means that a branch instruction such as IF takes 4 bytes vs 2 in TF and I notice that you can't really have byte literals as your PC (IP) needs to be word aligned anyway. But it will be interesting to see how you develop this and what design decisions you come to as well. Could you write your kernel on a C compiler rather then Spin like I am looking at doing? It seems to me that the C compilers could be customized too to make it a little easier to do this sort of thing or perhaps GCC's macros will be sufficient.
I originally thought about using byte codes, but then I changed it to word-size codes to improve the efficiency of loading literals larger than 255, and also to make jumps and calls more efficient. This doubles the size of compiled words that only use kernel words, but for words that call a lot of compiled words it should only be 33% larger.
I also use a hybrid approach for the data stack, where the top two elements consist of registers and the rest of the stack is in hub RAM. This should be just as fast, or faster than either keeping the entire stack in registers or the entire stack in RAM.
The Prop C languages don't really integrate well with PASM like Spin does. So I think it would be difficult to do it with PropGCC or Catalina. Like I suggested in your thread, a Forth compiler could be written in C that would then provide binary data to be integrated with the Spin/PASM kernel.
For example it recently dawned on my that I could do my fixed point trig in 16 bits and make it portable to AmForth. So I could probably run it in AmForth too.
I understand it's mapping cog RAM address to HUB RAM address, since dstack lives in HUB RAM. So I tried to use "@@" operator instead of "+Q"
But it failed to compile. Why?
DAT symbols have two different addresses. The @ operator gives the object offset. Without the @ operator you will get the cog address.
The hello word was defined as follows: It contains a combination of cog and hub addresses. The symbols _lit, _exec and emit are cog address, and the symbols @hellostr+Q, @type+Q and @hello+Q are hub addresses. There are also two numeric values in the word, which are 11 and 13. Any value in the word that is not a cog address must be preceded by a cog address that represents a word that will consume it.
There's still no dictionary, but that will appear in the next update.
It seems to be that once anyone masters Forth, they begin to feel the need to optimize and thier Forth becomes more and more of a personalized form. Enjoy doing this as you have certainly earned the right to do so.
I just hope that pfth is kept available for those that are just starting out.
Only Dave can speak for his long term plans, but from his posts I think he values the ANS standard. As long as he keeps the syntax and word list of ANS Forth, these optimizations of the interpreter internals shouldn't bother you.
I've been able to navigate around some of the Forth dialect issues, but it took me quite a bit of effort to figure it out. It is kinda interesting that there are probably nearly as many Forth implementations as Forth projects on the forum. If I start writing a Forth we'll know it's some kind of mind virus.
The various threads we have documenting people's creative processes are fascinating (in a geeky way). Between Chip, you, Peter, Jazzed with SimpleIDE and the others sharing their journeys, discoveries, victories and failures, there's probably a book on creation and innovation buried some place in here.
I'll have to dust off a Prop and fire up pfht again just to play along.
FAST may come to be a superb Forth kernel that far exceeds pfth in performance. But I am interested the the educational resources that pfth has provided. Pfth can be studied along with eForth documents to give a serious students the means to learn quite a bit by comparison.
I keep restating, that Forth is not just about using Forth. It is about using Forth to discover how other software tools work.
Completely agree.
Contrary to Carl Gauss's belief that "no self-respecting architect leaves the scaffolding in place after completing the building", the building process i extremely valuable for other want-to-be architects to learn. And Forth language is a perfect vehicle for that learning purpose.
I am in no way against ANS-ize forth. But on a dedicated embedded system with special purpose or limited resource, to be able to cook your own Forth, is definitely an advantage.
So Dave please keep on enlightening us.
EDIT: BTW, this version does not support creating new words or variables. That will come in the next update.
====================
(source file: fast.spin)
1. _jmp vs jmp
jmp is a Propeller assembly code instruction. It instructs the "program counter" inside Propeller chip to execute code at a new address.
_jmp (line 141) is used by #innerloop (line 63) to keeps its own "program counter"---pc to point to the next executable code.
2. why _exec and exit, why not _jmp
The Forth word hello (line 180) jumps to another Forth word type (line 177) with _exec.
First of all, these are not cog-ram code, not executed directly by the Propeller chip. So we can't do jmp #type, and in type to go back with jmp #innerloop
These Forth words are "interpreted" by #innerloop. In a sense, _exec and exit do _jmp @type+Q and _jmp @hello+Q+14 to come back with the help of a rstack.
Basically, to jump around and execute code, there are different levels.
The lowest level: Propeller chip. One can use jmp and the addresses are fixed at compilation time.
The second level: #innerloop. One can use _jmp
The third level is sub-function calls (forth word using other forth word). One uses _exec and exit
"Fast.spin" code demonstrates these three levels very clearly.
I've gotten the :, ; and create words working, but I haven't posted it yet. I'm looking into making the code words more compact by using the most significant word to differentiate between kernel words and compiled words. This will eliminate the need to use _exec before each compiled word. I'm also looking at using the 6 unused bits in a kernel word to implement compact relative jumps and 6-bit literal values. I think I'll post a version with just the addition of :, ; and create first, and then I'll post the compact code changes later.
To be able to compile forth code at runtime, which is a preferred (though not always necessary) feature of forth, a dictionary is created.
A linked list is used to make that dictionary, so that the executable address of forth primitives can be found at runtime.
My question (my evil thoughts of optimization) is that why not a plain list in cog-ram, such as
This way, no need for an extra memory cell (a word) pointing to the previous forth word, as in a linked list. And a cache list can be created for faster search.
I understand that "dictionary" grows "backwards" in the memory space, because new forth words are prepended, not appended. But that can be implemented with the help of stack.
What do you think?
[note: after I posted this, I went looking into the source code of Tachyon V2.3. Its dictionary is a flat list. So Peter is probably already doing all this.
Still without Dave's fast forth, I wouldn't have the knowledge to look into other forth implementations.
So this post shouldn't be deemed as a persuasion for fast forth to "optimize", rather it's an indication of how wonderful fast forth promotes better understanding of forth itself.
This version implements the create, : and ; words. This allows for creating new words and variables. A flag byte was added to the dictionary entries that indicates the word type, and whether it is immediate or not. The outer interpreter was improved to allow it to work in either the compile mode or interpreter mode.
I have another version of fast that I'll post later on. It incorporates some work I did on making the code words more compact.
before:
after: