TACHYON O/S V3.0 JUNO - Furiously Fast Forth, FAT32+LAN+VGA+RS485+OBEX ROMS+FP+LMM+++
Peter Jakacki
Posts: 10,193
Updated links 150826
Propeller Hardware Explorer with VGA
Tachyon Dropbox files and binaries (latest)
Introduction to TACHYON Forth
Tachyon Forth Resource Links
Tachyon Web Server
FTP: ftp://tachyonforth.com
Telnet: tachyonforth.com 10001
Watch Easynet in operation
Note: these early posts are mostly historical only, please read the latest posts or click the links in my sig.
Enhanced bitmap graphics demo + serial
*ORIGINAL POST*
I've been hooked onto Forth again after a long break away. Thanks to Sal's PropForth and recently the Bluetooth modules I have rediscovered the advantages and fun of programming and testing in a Forth environment. Now I mentioned I have been away from Forth for awhile and that's got to do with the Propeller chip since I like using it but Forth does not lend itself to this architecture very easily. Several years ago (time flies) I had a look at writing a Forth call CogForth for the Prop but I just felt it was too much hard work, which it was, not just because of the architecture but also because of the limitations of the tools (Spin tool etc). Even so the Forth would have been slow for what I need and there were memory limitations.
However, spurred on by the efforts of Sal Sanci and Prop Braino I have taken another look at my old CogForth since I needed amongst other things more runtime speed without having to resort to assembler. So over the last couple of days CogForth has been completely revamped and I think I'm on a winner with this implementation. It is both fast and very small thanks to the byte codes for each Forth VM operation. Like a tachyon, it is fast and very small (as a hypothetical particle anyway) with emphasis on fast I/O operations and maximizing the Propeller's memory. What would some byte code look like? Have a look at this function which prints a hex character:
So you see this function takes 14 bytes and compare this to Spin which also uses byte codes:
<removed proposed dictionary description>
The runtime speed is mainly because many of the primitives are written in assembly and stacks are implemented that are more suited for the Prop's architecture permitting direct addressing, just like a register. So many of the primitives get the job done with very very few instructions and even the runtime interpreter is lean and mean. A byte code is read from hub RAM and shifted up to 9-bits which the Prop jumps to in COG memory, so it's very direct. The runtime interpreter looks like this:
EDIT: Fixed a bug when testing for PASM for extended byte code functions.
There's a test for byte codes from $C0..$FF which doesn't really impact the speedy operation of the assembly primitves which are indexed by codes $00..$BF. The reason I reserve some codes is that there is no way you could use all the 256 codes for assembly primitives so I used some to form a very compact way of accessing up to 64 more words (functions) which instead of being assembly code are instead interpreted byte codes. All byte code functions other than these special 64 are referenced with 2 or 3 bytes one of which is the byte code and the other 1 or 2 bytes are a relative address poining back to the word function. The one byte CALL gets straight into 1 of 64 higher-level functions which are themselves comprised of byte codes which eventually execute assembly code via the first 192 byte codes $00..$BF.
Anyway, I'm developing and testing and the beta will be ready very soon but I thought I would present some details of the workings of this Forth implementation as I am also looking for feedback. Perhaps also someone could suggest an easier way around the Spin/BST compiler limitations especially with DAT sections and references which the compiler insists must be on long boundaries. Anyway, I want the references to be absolute in hub RAM rather than as if it were PASM running in a COG. Also, I am making it far easier to interface to various chips by having low-level code for serial operations and making all the byte code operations fast, especially serial operations. I'm even thinking of making it as easy to use as the Basic Stamp. For instance, there's something in being able to send and receive serial data on any pin at any time (without starting up a cog). So too all those pin high and pin low and clocking operations etc. I want to be able to hook-up an I2C or SPI'ish chip and bit-bash to it at least in the 100kHz range if not more (without resorting to PASM in a COG).
This is my header file and some code snippets for the moment.
TACHYON
A very fast and very small Forth byte code interpreter for the Propeller chip.
2012 Peter Jakacki
Features:
- Low level words are written in PASM and accessed by the
Forth run-time interpreter as single byte codes.
Byte codes are read from hub RAM and executed in PASM
Byte codes $00..$BF are PASM primitives expaned to 9-bits to directly address COG code
Byte codes $C0..$FF are calls to kernel byte code defs via table in hub RAM
- Support for LMM operations
- Interpreted byte code definitions are referenced either as:
- 1 byte - codes $C0..$FF index their definitions via a table - used as part of compiled kernel
- 2 bytes - RCALL opcode + relative byte (always referenced backwards) (extra 4 bits in opcode = -4096 range)
There are 16 entires in the COG for the RCALL byte code + extra address bits
- 3 bytes - WCALL byte code + 16-bit relative address
- All literals and strings are byte aligned
- Fast I/O bit-bashing support
- Flexible SPI PASM code support words in kernel
Constuct fast serial drivers with minimal code
- Holds Forth headers in EEPROM or SD storage
Searches the dictionary using rapid index key searching by first character
No hub RAM is used by headers
Even 32K EEPROMs can be used if the area is in RAM is normally rewritten (i.e. video memory)
Option to hold additional information per defintion such as stack usage and description
- Kernel compiled in standard manner via Spin tools so other Spin objects can be combined
- Three stacks in COG RAM: Data, Return, and Loop
Access loop indices outside of definitions
Avoids manipulation and corruption of return stack
Static stack arrays for direct addressing of stack items
Intrinsically safe stack overflow and underflow
Some early unoptimized observations:
- Empty loops can execute in 500ns to 825ns (absolute worst case)
Two to one stack operations ( + * AND etc) inc opcode fetch take 900ns to 1.087us (absolute worse case)
Propeller Hardware Explorer with VGA
Tachyon Dropbox files and binaries (latest)
Introduction to TACHYON Forth
Tachyon Forth Resource Links
Tachyon Web Server
FTP: ftp://tachyonforth.com
Telnet: tachyonforth.com 10001
Watch Easynet in operation
Note: these early posts are mostly historical only, please read the latest posts or click the links in my sig.
Enhanced bitmap graphics demo + serial
*ORIGINAL POST*
I've been hooked onto Forth again after a long break away. Thanks to Sal's PropForth and recently the Bluetooth modules I have rediscovered the advantages and fun of programming and testing in a Forth environment. Now I mentioned I have been away from Forth for awhile and that's got to do with the Propeller chip since I like using it but Forth does not lend itself to this architecture very easily. Several years ago (time flies) I had a look at writing a Forth call CogForth for the Prop but I just felt it was too much hard work, which it was, not just because of the architecture but also because of the limitations of the tools (Spin tool etc). Even so the Forth would have been slow for what I need and there were memory limitations.
However, spurred on by the efforts of Sal Sanci and Prop Braino I have taken another look at my old CogForth since I needed amongst other things more runtime speed without having to resort to assembler. So over the last couple of days CogForth has been completely revamped and I think I'm on a winner with this implementation. It is both fast and very small thanks to the byte codes for each Forth VM operation. Like a tachyon, it is fast and very small (as a hypothetical particle anyway) with emphasis on fast I/O operations and maximizing the Propeller's memory. What would some byte code look like? Have a look at this function which prints a hex character:
0534(001C) | PRTHEX ' ( n -- ) print n (0..$0F) as a hex character 0534(001C) 2D | byte CLIT/2,$30,PLUS/2 0535(001C) 30 | 0536(001C) 0C | 0537(001C) 05 | byte DUP/2,CLIT/2,$39,GT/2,_IF/2,3 0538(001D) 2D | 0539(001D) 39 | 053A(001D) 20 | 053B(001D) 3E | 053C(001E) 03 | 053D(001E) 2D | byte CLIT/2,12,PLUS/2 'Adjust for A..F 053E(001E) 0C | 053F(001E) 0C | 0540(001F) 49 | PRTCH byte EMIT/2,EXIT/2 0541(001F) 00 |EDIT: Byte codes mush be shifted one bit right to compress 9-bits, the lsb is always zero as all byte code functions are on double-long boundaries.
So you see this function takes 14 bytes and compare this to Spin which also uses byte codes:
88 char+=$30 Addr : 05B0: 38 30 : Constant 1 Bytes - 30 - $00000030 48 Addr : 05B2: 66 4C : Variable Operation Local Offset - 1 Assign WordMathop + 89 if char > $39 Addr : 05B4: 64 : Variable Operation Local Offset - 1 Read Addr : 05B5: 38 39 : Constant 1 Bytes - 39 - $00000039 57 Addr : 05B7: FA : Math Op > Addr : 05B8: JZ Label0002 Addr : 05B8: 0A 04 : jz Address = 05BE 4 90 char+=12 Addr : 05BA: 38 0C : Constant 1 Bytes - 0C - $0000000C 12 Addr : 05BC: 66 4C : Variable Operation Local Offset - 1 Assign WordMathop + Addr : 05BE: Label0002 Addr : 05BE: Label0003 91 coms.tx(char) Addr : 05BE: 01 : Drop Anchor Addr : 05BF: 64 : Variable Operation Local Offset - 1 Read Addr : 05C0: 06 03 0B : Call Obj.Sub 3 11 Addr : 05C3: 32 : Return
<removed proposed dictionary description>
The runtime speed is mainly because many of the primitives are written in assembly and stacks are implemented that are more suited for the Prop's architecture permitting direct addressing, just like a register. So many of the primitives get the job done with very very few instructions and even the runtime interpreter is lean and mean. A byte code is read from hub RAM and shifted up to 9-bits which the Prop jumps to in COG memory, so it's very direct. The runtime interpreter looks like this:
doNEXT rdbyte token,IP 'read byte code instruction add IP,#1 'advance IP to next byte token shl token,#1 'expand to 9-bits - all byte codes point to code on double-long boundary cmp token,#$180 wc 'tokens $C0..$FF are calls to kernel byte code via kbctbl if_c jmp token 'directly execute PASM byte codes without further ado
EDIT: Fixed a bug when testing for PASM for extended byte code functions.
There's a test for byte codes from $C0..$FF which doesn't really impact the speedy operation of the assembly primitves which are indexed by codes $00..$BF. The reason I reserve some codes is that there is no way you could use all the 256 codes for assembly primitives so I used some to form a very compact way of accessing up to 64 more words (functions) which instead of being assembly code are instead interpreted byte codes. All byte code functions other than these special 64 are referenced with 2 or 3 bytes one of which is the byte code and the other 1 or 2 bytes are a relative address poining back to the word function. The one byte CALL gets straight into 1 of 64 higher-level functions which are themselves comprised of byte codes which eventually execute assembly code via the first 192 byte codes $00..$BF.
Anyway, I'm developing and testing and the beta will be ready very soon but I thought I would present some details of the workings of this Forth implementation as I am also looking for feedback. Perhaps also someone could suggest an easier way around the Spin/BST compiler limitations especially with DAT sections and references which the compiler insists must be on long boundaries. Anyway, I want the references to be absolute in hub RAM rather than as if it were PASM running in a COG. Also, I am making it far easier to interface to various chips by having low-level code for serial operations and making all the byte code operations fast, especially serial operations. I'm even thinking of making it as easy to use as the Basic Stamp. For instance, there's something in being able to send and receive serial data on any pin at any time (without starting up a cog). So too all those pin high and pin low and clocking operations etc. I want to be able to hook-up an I2C or SPI'ish chip and bit-bash to it at least in the 100kHz range if not more (without resorting to PASM in a COG).
This is my header file and some code snippets for the moment.
TACHYON
A very fast and very small Forth byte code interpreter for the Propeller chip.
2012 Peter Jakacki
Features:
- Low level words are written in PASM and accessed by the
Forth run-time interpreter as single byte codes.
Byte codes are read from hub RAM and executed in PASM
Byte codes $00..$BF are PASM primitives expaned to 9-bits to directly address COG code
Byte codes $C0..$FF are calls to kernel byte code defs via table in hub RAM
- Support for LMM operations
- Interpreted byte code definitions are referenced either as:
- 1 byte - codes $C0..$FF index their definitions via a table - used as part of compiled kernel
- 2 bytes - RCALL opcode + relative byte (always referenced backwards) (extra 4 bits in opcode = -4096 range)
There are 16 entires in the COG for the RCALL byte code + extra address bits
- 3 bytes - WCALL byte code + 16-bit relative address
- All literals and strings are byte aligned
- Fast I/O bit-bashing support
- Flexible SPI PASM code support words in kernel
Constuct fast serial drivers with minimal code
- Holds Forth headers in EEPROM or SD storage
Searches the dictionary using rapid index key searching by first character
No hub RAM is used by headers
Even 32K EEPROMs can be used if the area is in RAM is normally rewritten (i.e. video memory)
Option to hold additional information per defintion such as stack usage and description
- Kernel compiled in standard manner via Spin tools so other Spin objects can be combined
- Three stacks in COG RAM: Data, Return, and Loop
Access loop indices outside of definitions
Avoids manipulation and corruption of return stack
Static stack arrays for direct addressing of stack items
Intrinsically safe stack overflow and underflow
Some early unoptimized observations:
- Empty loops can execute in 500ns to 825ns (absolute worst case)
Two to one stack operations ( + * AND etc) inc opcode fetch take 900ns to 1.087us (absolute worse case)
' Fetch the next byte code instruction pointed to by the instruction pointer IP in hub RAM ' doNEXT rdbyte token,IP 'read byte code instruction add IP,#1 'advance IP to next byte token shl token,#1 'expand to 9-bits - all byte codes point to code on double-long boundary cmp token,#$180 wc 'tokens $C0..$FF are calls to kernel byte code via kbctbl if_c jmp token 'directly execute PASM byte codes without further ado ' byte codes $C0..$FF point to further byte code definitions ' which are larger fragments of byte code in hub RAM call #SAVEIP 'save current IP in prep for a call add X,kbcptr 'kbcptr points to the kernel byte code table (less $180) rdword IP,X 'read 16-bit address from hub kbc table into IP jmp #doNEXT 'Execute the code ' Example of PASM code entries for Byte Code indexing on double-long boundaries ' DROP2 call #POPX jmp #DROP DUP mov X,tos ' Read directly from the top of the data stack jmp #PUSHX ' Push X onto the data stack and doNEXT ' OVER mov X,tos+1 'read second data item and push jmp #PUSHX NIP mov tos+1,tos 'replace second item with top and drop jmp #DROP LIT0 mov X,#0 jmp #PUSHX LIT1 mov X,#1 jmp #PUSHX '****************** BOOLEAN ****************** _AND movi _POPEX,#1000_001 ' AND ( n1 n2 -- n3 ) jmp #POPEX 'discard top of stack and execute modified PASM _OR movi _POPEX,#1010_001 jmp #POPEX _XOR movi _POPEX,#1011_001 jmp #POPEX '***************** MEMORY ******************* CFETCH rdbyte tos,tos ' read byte pointed to by tos into tos jmp #doNEXT CPLUSST rdbyte X,tos ' read in byte from adress add tos+1,X ' add second item to contents of address CSTORE wrbyte tos+1,tos ' write the second item using address on the tos jmp #DROP2 ' Example of interpreted byte codes in hub RAM ' References to other byte code defintions is relative which is also necessary because of the Spin compiler's limitations with DAT sections ' 0530(001B) 06 | _BOUNDS byte OVER/2,PLUS/2,SWAP/2,EXIT/2 0531(001B) 0C | 0532(001B) 08 | 0533(001B) 00 | 0534(001C) | PRTHEX ' ( n -- ) print n (0..$0F) as a hex character 0534(001C) 2D | byte CLIT/2,$30,PLUS/2 0535(001C) 30 | 0536(001C) 0C | 0537(001C) 05 | byte DUP/2,CLIT/2,$39,GT/2,_IF/2,3 0538(001D) 2D | 0539(001D) 39 | 053A(001D) 20 | 053B(001D) 3E | 053C(001E) 03 | 053D(001E) 2D | byte CLIT/2,12,PLUS/2 'Adjust for A..F 053E(001E) 0C | 053F(001E) 0C | 0540(001F) 49 | PRTCH byte EMIT/2,EXIT/2 0541(001F) 00 | 0542(001F) | PRTBYTE 0542(001F) 05 | byte DUP/2,CLIT/2,4,_SHR/2 0543(001F) 2D | 0544(0020) 04 | 0545(0020) 1A | 0546(0020) 3B | byte RCALL/2,20 '-->PRTHEX 'Due to limitations of Spin tool & BST this needs to be calculated by hand 0547(0020) 14 | 0548(0021) 3B | byte RCALL/2,22 0549(0021) 16 | 054A(0021) 00 | byte EXIT/2EDIT: Fixed byte code references which are encoded as 8-bits using cogaddress/2
Comments
Sounds cool. Did you read localroger's Windmill blogs? It's a Forth-like bytecode interpreter designed to run out of SPI EEPROM. I'd recommend checking out http://forums.parallax.com/entry.php?39-Windmill-Byte-Code-Interpreter where he talks about mapping bytecodes to PASM instructions.
Tachyon is a perfect name for it since Forth is often considered a hypothetical language that may or may not exist.
I look forward to taking this out for a cruise!
I am curious as to what the advantage of developing another Forth when PropForth looks very complete?
That is besides the need to learn by doing, which i understand completely.
cheers,
rich
A or greater equal depending on when you do the jump
We all need more coffee and we don't need an excuse. The code was all mish-mashed by some major upheavals when I posted this about 3:30AM in the morning (or at night from my pont of view) and I spotted it this morning and changed it to if_z instead but still wrong Oh look! time for a coffee! The original just tested the msb so the reason for "test" instead of "cmp". Anyway I thought I would leave the post without corrections as it serves it's purpose and I will find out if anyone is analyzing it (which you did 30 minutes later).
I had changed the point at which byte tokens are either used as a direct 9-bit address into the cog's PASM code or as an index into a jump table to byte code. Somehow it looked right when I did it. Sometimes I aim for "close enough is good enough" as I know I will come back and if it's still there in that I haven't changed it all again then I will make it right (or just crumple it up and toss it into the trash).
I look forward to (attempting) assimilating all your best ideas! Especially those that yield more run time speed without assembler. You got some neat stuff going on. Can you borrow from localroger's work?
When your design starts to get stable, consider using the test automation so the workstation runs a regression test suite after every development change is "done". Sal says there is no reason the automation would not work with any language, I would like to find evidence one way or the other. I'd like to help set it up.
@richaj45: The biggest advantage is getting the perspective of a different approach. Sal's way is deemed the best by Sal to do what Sal wants to do, and does not necessarily lend itself to what Peter wants to do or the way Peter wants to do it. "Right tool for the right job". Its almost guaranteed that Peter will find a better way to do what Peter wants to do, since he does not have the same constraints. Often, we will find a new perspective in the way one does something that improves the other. This effect might not be limited to forth kernel development . I heard of some guy Darwin that made some notes about exploiting niches, but he has not posted anything in the OBEX.
As Prof says, another version is great, and ideas can be shared, making each version better than before. Good ideas will find their way into other code too.
Prof & Sal: I am sure the test automation suite will be great once things settle. Hopefully it should even find its way into normal code too.
One of my aims is to also have enough resources left over that I can hook-up a monitor, keyboard and SD card and run the whole system stand-alone if I have to. But mainly I find that I interface to a great variety of chips but I need more speed and I would like to stay within the normal Forth environment when doing that. Forth was after all designed to get at the bare metal in a transparent and interactive manner having first been employed on radio telescopes in the 70's. At the very most if necessary having to patch in a Spin file, recompile the kernel and have it up and running just as quick. Well, at least, that's my aim. Having efficient byte code means I can pack a large application program in and still have memory left over for video etc.
I hadn't looked at localroger's Windmill before but that's the basic idea I had before for running larger programs in that I would use those small 4M byte serial Flash chips I have on some boards or else SD but it's still a lot more cumbersome than running from hub RAM. Running interpreted code from serial memory is an old idea, I remember the TSS400 for one. It's interesting to see that he is thinking of a scheme to encode PASM. But what has happened to Windmill since? Has he charged at it a bit too quixotically?
I'm looking forward to your experimenting along these lines, your perspective and results will be interesting. We do have a way to "lock" the forth prompt so only the user level application words are available, and to eliminate the development extensions from the final application to save space, but we haven't worked on trimming down the kernel further, there is a lot of unexplored custom kernel development.
In the meantime, maybe look at the JupiterACE code from v3.6; this runs on the Prop Demo Board and is a stand alone forth with VGA and keyboard, 80 column text in hires, 40 column text in low res. This might be towards what you are looking for, it will be brought up to 5.0 kernel when the test automation is complete. You might be able to build on this, but Sal's version is still a couple weeks out. The JupiteACE was actually my goal for joining the project. V3.6 was the teaser, v5.3 may be the final result. Running VGA takes most of the resources of a prop chip, which lead us to the idea of just adding more props, which lead us to MCS and Go-channels. So the propforth development has been getting "bigger", if you find ways to get it "smaller" again, that will be really helpful.
Sal plans to simplify the process for optimizing in assembler, he will add new words that start and end the assembly process, and the assembler code can be compiled right into the dictionary. This could help in creating tachyon, but it may not ready until 5.4.
We looked at linking in arbitrary SPIN files, but that seemed to require the SPIN be written to support a "standard" interface, and we couldn't find anything to use as the standard; (every spin program seems to be too different) so we stopped that investigation. Maybe you can provide some insight or example of a specific spin file you want linked in, we can use that as a starting point for a "standard".
Sal's model supports adding more hub and cog memory in the form of more units of prop + SD, rather than adding more dedicated memory parts etc. For Sal's purposes, its easier and cheaper to just grab a couple more props out of the bin. In the case of the JupiteACE, it allows the full resource of oa prop to run the VGA, and permits a a full prop or more to be available to an application. But we have kept in the 32K memory mind set, it will be interesting to see what we gain will large external memory configurations.
I have a pile of HIVE boards (hive-project.de) that accept 1 meg x 8 bit SRAM. I was thinking of circuit bending these toward a propforth rig, but that is way down the road. In fact, you might want to check out the "m" language Ingo is working on over there. It appears to be a version of forth for the Hive hardware, and works with the hive OS running on the other chips (which might get you the "link in spin programs" function you seek). I don't know the details, but a bunch of it seems similar to your goals. Google translate does a fair job with the German, and Ingo and the Borg drones can do English.
I have a little tidying up to do (if I am not seconded to the garden project in the meantime) and I will build in the dictionary and high-level words that form the text interpreter (vs the byte code interpreter) so that I can work with this interactively in a terminal. At this stage I may release the source for the alpha which should be in the next week of so.
Another change I made was to the serial I/O and include the serial transmit code into the Forth cog and leave the serial receive to a dedicated cog. The transmit code is very small and doesn't have to worry about multi-tasking with any receive code etc. This way the receive timing can be very precise and run at 1,382,400 baud for the maximum speed of my Bluetooth modules. Also since the transmit speed is very high there is no need to buffer or waste time writing to hub RAM as each character completes transmission in 7.23us.
The SPI primitives now include a flexible transmit routine which clocks data out at 2.85MHz (without doing anything fancy that is) so this handles a lot of SPI and I2C style protocols very efficiently. To slow it down just requires accessing the CLKBIT primitive at your leisure.
I have also found that it is far more efficient to store my inline literals and constants in big endian form to facilitate shifting and accumulating. So there is only one routine that reads in bytes to form these numbers and depending upon the entry point is what decides how many bytes are read. Constants are also coded as a standard definition with an exit (return from call) as there is not much advantage in having a special operation just for this. So the structure of say a 24-bit constant is [PUSH3] [$A5] [$00] [$C1] [EXIT] where PUSH3 is the byte-code for reading in 3 bytes and pushing the result onto the data stack after which the definition EXITs. Too easy.
This is the simple PASM code that effects pushing 1 to 4 inline bytes onto the datastack.
And this is how a separate constant is coded (similar to inline literals without the EXIT): Because all values are non-aligned there is no wasted space aligning them to word or long boundaries. Also relative addressing gets around any offsets and allows for relocating code easily.
So my test routine which prints a start-up message, sends out 32-bits via SPI, and does a hex dump of hub RAM looks like this in byte-code listing form (courtesy BST):
Please note too that Forth is only coded this way in the Spin compiler to form the kernel after which the Forth itself would handle normal text input for compiling which would look like this:
So I'll try not to bore you any further with any details suffice to say that there are a lot of very neat things going on and planned. With the dictionary (names of functions and pointers etc) in external memory such as EEPROM and SD there will be a lot of program code that will be able to be squeezed into just a few k of RAM, count the bytes that are used in the demo! Some code will be available very soon now. (Hey Cluso, I hear it's going to rain all week )
I'm doing a rain dance on this side of the world because we need the rain.....I'll add some "remote location" rain dancing for purely selfish reasons!
Do you think your interpreter could be applied to Spin to improve the speed? One approach would be to compile Spin to your bytecodes. Another approach may be to use your instruction-decoding technique to decode the existing Spin bytescodes.
Also, maybe it would be possible to compile C to your bytecodes. Can you describe your VM in more detail? It appears to be stack-based, but also includes an accumulator. Are there any other registers in the VM?
Dave
I just had a look at the Spin bytecodes and it ain't fun, there's no way you could code all that and still cram it into just the cog. Tachyon Forth bytecode operations are fairly simple, just like PASM, but they are very flexible and implement a simple virtual stack based processor. The reference to an accumulator is really nominal as this is just a location to shift and accumulate literal values. The so named "accumulator" is cleared for next use after every push onto the datastack. There are other temporary registers also named for convenience such as R,X,Y,Z,R0..R3 as well as the IP which is equivalent to the PC in a real processor. The "X" register is used a lot for passing a value without upsetting the tos (top of stack) value as you can see in the GETBYTE and ACCBYTE routines. Creating virtual registers is not a problem though.
Stack manipulation can be very easy and transparent on some processors but the Prop isn't one of them, there are no auto increment/decrement indexed instructions. I push and pop my stacks by physically moving values which sounds kind of brute force'ish but I worked out that this is still far more efficient as the PASM routines can access all stack items (not just tos) directly without any extra overhead so rotating and swapping etc is very fast and compact. The push and pop operations only take a tiny bit longer than a conventional stack implemented with a pointer anyway. Upon detecting non-zero values "falling" off the bottom of the stack I jump to an error processing routine but who cares if zero values "fall off" as I also pump zero values back into the bottom of the stack when it's popped.
As for compiling from C to these bytecodes there shouldn't be any problem at all as it would only be the PASM code that fits in the VM cog that would be required to run them. It's a bit like compiling from C to Java bytecodes and using the JVM but a lot simpler of course and without all the overhead that would normally be involved. At present the Tachyon bytecodes are not fixed in value as the VM is in a state of flux but even so I don't think that there would be any requirement for portable bytecode normally. The symbol address of the bytecode function is the same as the bytecode which is why I can just reference them directly with the "byte" directive in a DAT section.
Hope that sort of answers your questions and when i release some source soon you will be able to have a good look yourself to see if Tachyon bytecode is suitable for your task.
I like your stack philosophy. There's no reason to have a huge stack, except for deep recursion. One annoyance of Forth, of course, is stack maintenance. It would be handy to add two more kernel instructions, pushmark and poptomark. These allows one to punctuate the stack in such a way that post-operative cleanup requires only a poptomark without having to know how much garbage remains. The "mark" deosn't really have to exist in the stack itself, but can be tracked either via a rotating bitmask that's synchronous with the stack, or a separate mark stack whose top element keeps track of the relative position of the next mark (i.e. increments on a push, decrements on a pop, and gets popped off when it goes negative).
-Phil
Some changes:
1) Added a jump table for kernel bytecodes as I originally planned. This simplifies calling them from the Spin compiler and allows these functions to be anywhere.
The table allows for up to 256 vectors which so that a call looks like:..... BCALL,xPARSE ..... where PARSE is referred to by the byte label xPARSE. Now I don't have to do a relative call with all that awkward setup for the Spin tool as in:...... RCALL,@L1-@PARSE .....where I also have to create a label such as L1 that follows everytime. The DATA section has it's advantages as I just set the ORG to 0 before the table so that each entry has consecutive values from 0 up to the maximum reference of 255.
2) Added a local register bank which is great for storing temporary values and settings etc. This makes the kernel bytecode a lot easier too as it doesn't have to create special variables for number bases and interpreter flags etc. The registers used by the kernel are referred to in the CON section where they are created with a simple:
While the reference in the Spin tool to the register is in the form: and compiled:
3) Enhanced number processing so that sensible prefixes and suffixes can be used:
4) 5) 6) lots more stuff
I will probably allow direct addressing of stack items so that .... 3 STACK ... will return the address of the third stack item which can be manipulated just like any other variable.
@Phil:
I have tinkered before with shadow stacks that hold information about the stack items, a bit like a type identifier. But although I understand what you are saying and ways to implement it I'm not sure how useful this would be, at least to a Forth programmer. You see, although the stack can be a nuisance especially when it is misused and layers deep, part of coding any Forth is the art of factoring things into small, clear, and manageable chunks of code. My Tachyon Forth is being driven by need, by actual embedded control use so I tweak it to optimize what I really need. But could you please give me an example of where your suggestions would really excel? Thanks.
My suggestion arises from comparative experience programming in both Forth and Postscript. The latter has mark and cleartomark words that make stack maintenance a breeze compared to Forth. One simple example of their utility can be seen when a subroutine needs to abort, returning the stack to a known state upon exit. Depending upon how much stuff got pushed on before the fault condition was encountered, this could be a real chore without a way simply to wipe out the garbage down to a known point in the stack.
-Phil
Although Postscript may be based on Forth it is also very non-Forth'ish because of it's huge complexity. I understand the Postscript requirements but how useful would that really be in the simpler embedded Forth model with lots of "small" words? But I will see if this can be implemented without too much fuss, especially since I am looking at being able to create local variables straight from the stack description which is normally only a comment such as: ( pin channel -- flg ) ..... and referring directly to pin and channel etc. However EXIT or it's cousin will then have to clean-up the stack and place whatever results there are onto the stack.
The Google Docs format makes it easy to view and even download in various formats. Of course if you want to try out the code then just select all and copy&paste into your Spin IDE. The settings are so that anyone with the link can view and comment and if anyone wants to be able to edit this document as part of a collaborative effort then please just email me. There's also the chat box you can use when you have the document opened.
now I need to start thinking backwards.This is nice. Even I might understand FORTH now - Will study your pasm ...
Thank you
Mike
Another great feature is that this "source" is live, you may even see things change before your eyes as I am formatting it or making code changes !!! Maybe this way should be the way we format source for the gold standard?
Fancy formatted source code in Google Docs
The other advantage is that this is a live document, what you see is what I see and what I am changing.
I have also tested the serial transmit and receive to at least 2M baud at present and I will do some further tests at higher speeds later on. I'm also overlaying the serial receive cog's image as well with the receive buffer so there is no wasted memory.
So far...so good.
So how is your coding doing 2M baud?
Why is it so much faster than the standard Full Duplex 4 port object?
Is the serial driver full duplex?
When do you think the whole of the Tachyon code will be posted?
Thanks in advance for your patience with all these questions.
cheers,
rich