P2XCForth - A Fast Hybrid: XBYTE and C
Hi,
once again an experiment to create a Forth für P2. While P2CCForth worked and brought together the Forth world to the innovations of the Obex, it is slow in comparison to Taqoz.
So the question was, if the special XBYTE mechanism of P2 could be used to go the other direction and make a Forth for P2, which is even faster than Taqoz. XBYTE is a hardware mechanism for an assembler program sitting in COG or LUT RAM. It uses the fast STREAMER cache to read an instruction code byte from HUB RAM, looks up where to find the right assembler routine in COG or LUT and executes it. Then this cycle is repeated. XBYTE is fast. I think, that during the development of P2, XBYTE came relatively late, when Taqoz and it’s word-code mechanism had already been fixed. CCForth uses a 32bit-code, and the inner interpreter is executed from C so the XBYTE code mechanism is a different world.
When I had a more close look into XBYTE, I discovered, that using Flex there could be a way to combine a small XBYTE machine executing a core wordset with a mechanism (TRAP) to execute words written in C. So the hybrid P2XCForth was born.
As the stacks have to be accessible both from assembler and C they reside in HUB RAM. Forth registers are COG registers for speed and also to keep them separated, when multiple COGs are executing Forth.
As long as only XBYTE words are used, P2XCForth is as fast as Taqoz or even faster, up to twice as fast as Taqoz!
P2XCForth comes with Local Variables, Value-type Variables, Pause-style multitasking, an online help system and the ability to start words in other cogs. New words can be created:
- As compound words from existing words directly on the P2
- As TRAP-words in C. Can contain inline assembler. Can call routines from SPIN files. You will need to re-compile.
My editor FED is included. In my opinion it is good enough that I am actually using it instead of sending files from PC with notepad++. It comes with syntax highlighting, online glossary, features to navigate in the file.
The ZIP contains a PDF with further descriptions, the source files and also a _BOOT_P2.BIX for the Kiss board (25MHz crystal) with SD-card. This should be copied together with the blocks.blk onto a SD-card. The setup for Teraterm is described in the PDF.
2 load loads the first blocks 2....5
7 load loads the editor
I am very thankful for
- this nice forum and it's helpful members!
- P2!
- FlexProp, which opens up so many possibilities!
- The great Kiss board!
- a lot of source code and texts, where I have learned a lot and used it. For example there is a "neoOut" routine to output data to neopixels, which is derived from JonnyMac's driver.
- Also I was very glad to find examples for XBYTE machines!
Have Fun! Christof
Comments
Unfortunately not a forth programmer but well done
That's a lot of work, Christof, fantastic! I look forward to trying it all on my P2-EVAL. Impressive speed results with the XBYTE interpreter. Good to see you've built in multitasking, I find writing applications is easier as a group of interacting 'applets'. It's the same technique as when using more than one cog in an application.
Cheers, Bob
beautiful example of all the XBYTE bits, which I must admit I'm still a bit stymied by on my own implementation for Scheme (Lisp). my bytecodes execute then run right off the end, or not,... or something. as I was trimming it all down to the most essential bits to ask on this forum, I learned a bunch more. but, then I haven't asked here yet as I've also learned there is a batch more I need to learn about Scheme... which I've been doing on the desktop for the moment. Getting closer, though!
very glad to see your work and to have FED built in,... very cool!
I've installed P2XCFORTH on a KISS board and it seems to be running. I can compile
: test 0 1000 0 do dup . 1+ loop drop ;
and it runs OK
I can't compile
: test 1000 0 do i . loop ;
I can compile (though it makes no sense)
: test 1000 0 do j . loop ;
If I see j, it tells me see is TRAP
If I see i, it tells me see -not found
Do you see the same, Christof?
Bob
Hi, @bob_g4bby, nice that you give it a try.
Fortunately your finding is not a bug.
i is a macro, defined in block 2:
: i s" r@" eval$ ; immediate \ macro
This means, that i is an alias of r@. And will be replaced by r@.
```
test 1000 0 do i r@ . loop ;
RDepth: 1 Depth: 0
$0 $0 $0 $0
0 0 0 0 > see test
see
test: 1c921
1c941 xbyte: 3 lit 1000
1c946 xbyte: 3 lit 0
1c94b xbyte: 8 swap
1c94c xbyte: 21 >r
1c94d xbyte: 21 >r
1c94e xbyte: 22 r@
1c94f trap: 2 .
1c951 xbyte: 2a doloop -8
1c956 xbyte: 25 exit
RDepth: 1 Depth: 0
$0 $0 $0 $0
0 0 0 0 >
```
Ah! I had not loaded the blocks onto the SD card. Now i compiles r@ just fine. Thanks, Christoff
In the source code, I see the constant char builtinWords is processed by P2XCFORTH's input stream on switch on. Is that the means by which a user program is compiled from SD card and auto-started, or do you have another mechanism for that?
In the Blocks directory of the source code, I think you might delete file "block3A - Kopie.fth" because if it is loaded into the SD card, P2XCFORTH does not all compile correctly. Using "block3A.fth", everything works fine.
I see the program compiles to 408480 bytes. This sounds a lot, but may be misleading: How much space is available for the user's program?
Regards, Bob