P2XCForth - A Fast Hybrid: XBYTE and C

Christof Eb. · 2025-07-11 06:51

Hi,
once again an experiment to create a Forth für P2. While P2CCForth worked and brought together the Forth world to the innovations of the Obex, it is slow in comparison to Taqoz.

So the question was, if the special XBYTE mechanism of P2 could be used to go the other direction and make a Forth for P2, which is even faster than Taqoz. XBYTE is a hardware mechanism for an assembler program sitting in COG or LUT RAM. It uses the fast STREAMER cache to read an instruction code byte from HUB RAM, looks up where to find the right assembler routine in COG or LUT and executes it. Then this cycle is repeated. XBYTE is fast. I think, that during the development of P2, XBYTE came relatively late, when Taqoz and it’s word-code mechanism had already been fixed. CCForth uses a 32bit-code, and the inner interpreter is executed from C so the XBYTE code mechanism is a different world.
When I had a more close look into XBYTE, I discovered, that using Flex there could be a way to combine a small XBYTE machine executing a core wordset with a mechanism (TRAP) to execute words written in C. So the hybrid P2XCForth was born.

As the stacks have to be accessible both from assembler and C they reside in HUB RAM. Forth registers are COG registers for speed and also to keep them separated, when multiple COGs are executing Forth.

As long as only XBYTE words are used, P2XCForth is as fast as Taqoz or even faster, up to twice as fast as Taqoz!

P2XCForth comes with Local Variables, Value-type Variables, Pause-style multitasking, an online help system and the ability to start words in other cogs. New words can be created:

As compound words from existing words directly on the P2
As TRAP-words in C. Can contain inline assembler. Can call routines from SPIN files. You will need to re-compile.

My editor FED is included. In my opinion it is good enough that I am actually using it instead of sending files from PC with notepad++. It comes with syntax highlighting, online glossary, features to navigate in the file.

The ZIP contains a PDF with further descriptions, the source files and also a _BOOT_P2.BIX for the Kiss board (25MHz crystal) with SD-card. This should be copied together with the blocks.blk onto a SD-card. The setup for Teraterm is described in the PDF.
2 load loads the first blocks 2....5
7 load loads the editor

I am very thankful for

this nice forum and it's helpful members!
P2!
FlexProp, which opens up so many possibilities!
The great Kiss board!
a lot of source code and texts, where I have learned a lot and used it. For example there is a "neoOut" routine to output data to neopixels, which is derived from JonnyMac's driver.
Also I was very glad to find examples for XBYTE machines!

Have Fun! Christof

rosco_pc · 2025-07-11 15:58

Unfortunately not a forth programmer but well done

bob_g4bby · 2025-07-12 09:12

That's a lot of work, Christof, fantastic! I look forward to trying it all on my P2-EVAL. Impressive speed results with the XBYTE interpreter. Good to see you've built in multitasking, I find writing applications is easier as a group of interacting 'applets'. It's the same technique as when using more than one cog in an application.
Cheers, Bob

refaQtor · 2025-07-14 21:41

@"Christof Eb." said:
The ZIP contains a PDF with further descriptions, the source files and also

beautiful example of all the XBYTE bits, which I must admit I'm still a bit stymied by on my own implementation for Scheme (Lisp). my bytecodes execute then run right off the end, or not,... or something. as I was trimming it all down to the most essential bits to ask on this forum, I learned a bunch more. but, then I haven't asked here yet as I've also learned there is a batch more I need to learn about Scheme... which I've been doing on the desktop for the moment. Getting closer, though!

very glad to see your work and to have FED built in,... very cool!

bob_g4bby · 2025-07-16 09:23

I've installed P2XCFORTH on a KISS board and it seems to be running. I can compile
: test 0 1000 0 do dup . 1+ loop drop ;
and it runs OK
I can't compile
: test 1000 0 do i . loop ;
I can compile (though it makes no sense)
: test 1000 0 do j . loop ;
If I see j, it tells me see is TRAP
If I see i, it tells me see -not found
Do you see the same, Christof?
Bob

Christof Eb. · 2025-07-16 09:58

Hi, @bob_g4bby, nice that you give it a try.
Fortunately your finding is not a bug.

i is a macro, defined in block 2:
: i s" r@" eval$ ; immediate \ macro
This means, that i is an alias of r@. And will be replaced by r@.

```

0 0 0 0 > : test 1000 0 do i . loop ;

test 1000 0 do i r@ . loop ;
RDepth: 1 Depth: 0
$0 $0 $0 $0
0 0 0 0 > see test

see
test: 1c921

1c941 xbyte: 3 lit 1000

1c946 xbyte: 3 lit 0

1c94b xbyte: 8 swap
1c94c xbyte: 21 >r
1c94d xbyte: 21 >r
1c94e xbyte: 22 r@
1c94f trap: 2 .
1c951 xbyte: 2a doloop -8

1c956 xbyte: 25 exit
RDepth: 1 Depth: 0
$0 $0 $0 $0
0 0 0 0 >
```

bob_g4bby · 2025-07-16 11:03

Ah! I had not loaded the blocks onto the SD card. Now i compiles r@ just fine. Thanks, Christoff

bob_g4bby · 2025-07-18 15:21

How is trace! and untrace used, please?

Regards, Bob

Christof Eb. · 2025-07-20 08:21

Hi Bob,
glad, that you got it compiling!

At the moment trace and untrace is not supported with the fast XBYTE machine.
As wcall and exit are done in the XBYTE machine, it would be necessary to insert low level tracing there, which would slow down everything very much.
At the beginning of my efforts I have had some tracing for the XBYTE mechanism as Chip has documented. There are still some traces of this in the XBYTE assembler code. When activated, they actually bypass the hardware XBYTE mechanism and they do need the DEBUG mechanism at the serial line.

This part is not actually used.

It would be possible to redefine ":" that it compiles some "dotrace" into the beginning of every compound word. I have that in my homegrown Forth for ESP32 but actually don't use it much. It is difficult to use trace and not be swamped by data.....

I have added a sentence into my writeup about trace.
Regards Christof

bob_g4bby · 2025-07-20 19:01

I understand - you need to keep the interpreters as small as possible to keep the speed of execution up.

I've been exploring multitasking, the background task, and multicogging which are all working fine for me. FED isn't working for me just now. I'm using a noname SD card. I need to try a SanDisk card, to see if that fixes the problem. I'm quite happy copy/pasting text into tera term meanwhile, like when using Taqoz.

All good stuff, I'll be following where you go with this, cheers, Bob

Christof Eb. · 2025-07-21 07:11

@bob_g4bby said:
I understand - you need to keep the interpreters as small as possible to keep the speed of execution up.

I've been exploring multitasking, the background task, and multicogging which are all working fine for me. FED isn't working for me just now. I'm using a noname SD card. I need to try a SanDisk card, to see if that fixes the problem. I'm quite happy copy/pasting text into tera term meanwhile, like when using Taqoz.

All good stuff, I'll be following where you go with this, cheers, Bob

Hi
it would be interesting to know, where you run into trouble with FED.
One thing is, that it needs a block already structured to 100 chars per line, last char a LF.
with "eFed" a normal existing text block is expanded to this fixed line length.

So if I want to edit a new block 20, I first use "20 gu" (for get and update)
This captures all terminal input into block 20 until you type ESC.
Be sure to have some extra empty lines at the end before ESC.

You can test the filesystem, if you use gu for some source code and then load the new block.

Then I use "20 eFed" which will expand block 20 to 100 chars per line and start FED.
End eFed with "^Q" and "Y" to store.

Well, I should add a word "wipeFed" to start FED with an empty block file.....

You got me thinking about trace of course. Perhaps it would be possible to load a rather different substitute for the XBYTE machine in case you want to trace. In Forth there is always a way....
Cheers Christof

Christof Eb. · 2025-07-21 08:14

Ok, the new word wipeFed, which creates an empty block and starts the editor with it.

: nTitle$ s" \ Block empty " ;

: wipeFed ( blocknr -- ) \ wipe block and start FED with it
   fedFlush \ in case it is open
   dup 0 maxBlocks# 1- within if
      dup to fedBlock#
      dup fedBlock drop \ load what ever there is
      fedBuf$ 32768 0 fill \ wipe: fill with zero
      nTitle$ fedBuf$ cpy$
      $0A0A0A0A nTitle$ length$ fedBuf$ + ! \ append 4 line feeds
      fedUpdate fedFlush \ write to block
      eFed \ start the editor
   else drop then
;

bob_g4bby · 2025-07-22 09:11

The editor is now working for me, sorry for the false report - I was trying to start it with "fed" which freezes the system if eFed hasn't been run first. It also helps to type "2 eFed", not "2 efed" - doh! Thanks for the wipeFed word.
Thanks for the editor use hints - cheers Bob

bob_g4bby · 2025-07-22 16:40

How would one set up an autostarting user application, or maybe that isn't available yet? P2xcforth is quite experimental just yet.

I noticed the string that is interpreted on power up, I guess that could include ...
<block number of user app> load <top app word>
I must try that when I get home

Christof Eb. · 2025-07-23 15:48

Yes,
"2 load" at the end of the string evaluates block 2.
This works, but there is restriction in nesting load commands. You cannot "7 load" within a block file.
"-->" will load the next block.
Yes, P2XCForth is quite experimental.
In my opinion it must have an assembler to be as useful as TAQOZ. So at the moment I am looking, if I can port Taqoz's assembler. But it will be a simpler one with RPN syntax, if I succeed. P2XCForthD can now execute machine code routines.... And some simple vocabulary mechanism is also needed to hide all the assembler words.

bob_g4bby · 2025-07-24 08:54

That's good stuff, @"Christof Eb."
You're not tempted to emulate Taqoz and the "->" word that causes the first word in a line to be interpreted last? The source can be written "rdlong xx,ptra" instead of "xx,ptra rdlong"
I've often thought that it would be good to be able to forget the assembler after the machine code is assembled.

Just a hint for anyone else attempting to use "eFed" the editor: Vertical scrolling didn't work properly, until I reduced Tera Terms font size to 8. That then allows the 74 lines to appear fully on screen and scrolling will then extend to the beginning of the block OK.

Christof Eb. · 2025-07-24 16:25

@bob_g4bby said:
That's good stuff, @"Christof Eb."
You're not tempted to emulate Taqoz and the "->" word that causes the first word in a line to be interpreted last? The source can be written "rdlong xx,ptra" instead of "xx,ptra rdlong"
I've often thought that it would be good to be able to forget the assembler after the machine code is assembled.

Taqoz has this special feature, that the outer interpreter compiles each line and then executes that compiled code. I never really appreciated this as a feature as it makes some things more complicated and needs that GRAB thingy, which I find hard to grasp. But I think, that it enables "->".
I once tried to understand how Peter's assembler works, but failed awfully. That source code is not intended to be read and understood by others.
I am still very much impressed by Peter's work in TAQOZ. There is so much perfection!

Syntax will be worse: there will be spaces and also some words to clarify the type of operand. Like #d and #s or d s. More like
"xx d ptra s rdlong". The idea is, to let the outer interpreter do the assembly using it's existing features. Something like https://forums.parallax.com/discussion/173684/simple-assembler-for-p1-tachyon-forth-5-7-or-can-you-beat-tachyon-at-fibonacci#latest
In that post is also described a way to first reserve space for a code word, load and use the assembler and then forget it.
At the moment, I think, that permanently loading the assembler is not a problem for P2 with P2XCForth, because it has no 16 bit page restrictions.
But I also have a vague idea for an overlay (a 32kB Block) of compiled code from SD card. The editor and the assembler could probably share the same overlay area. Lots of experiments for fun....

You can reduce " 65 value fedHeight#" in block 7 to have less lines visible. @Maciek, in case you read that, the colors for color printing are defined in Block 2 "create colTab 7 c, 1 c, 6 c, 4 c, 7 c, 7 c, 0 c, 0 c," and some hard coded yellow and green in the following words.

Have fun, Christof

Maciek · 2025-07-24 18:58

@"Christof Eb."

Oh, yes. I do read every single word of your posts regarding your "Forth für P2", be it a P2CC or P2XC and not only these. I do not understand many of the advanced concepts you talk about however I am trying to follow what I can grasp and learn what I can't at the moment. I do not have much of hobby time these days but just enough to do some reading and a tiny, tiny bit of playing with the hardware. One thing that caught my eye is your custom forth for the ESP32 you mentioned someplace here in the forum. Is that for the Extensa or the Risc-V based core ?

As a side note, I can't admire enough your patience and determination in your pursuit to help one special forum member with his special project. I wish I had that much of both myself and thank you for this effort of yours. I hope it will really bear fruit eventually. I really do.

Regards,
Maciek

P2XCForth - A Fast Hybrid: XBYTE and C

Comments