Welcome to the Parallax Discussion Forums, sign-up to participate.
0534(001C) | PRTHEX ' ( n -- ) print n (0..$0F) as a hex character
0534(001C) 2D | byte CLIT/2,$30,PLUS/2
0535(001C) 30 |
0536(001C) 0C |
0537(001C) 05 | byte DUP/2,CLIT/2,$39,GT/2,_IF/2,3
0538(001D) 2D |
0539(001D) 39 |
053A(001D) 20 |
053B(001D) 3E |
053C(001E) 03 |
053D(001E) 2D | byte CLIT/2,12,PLUS/2 'Adjust for A..F
053E(001E) 0C |
053F(001E) 0C |
0540(001F) 49 | PRTCH byte EMIT/2,EXIT/2
0541(001F) 00 |
EDIT: Byte codes mush be shifted one bit right to compress 9-bits, the lsb is always zero as all byte code functions are on double-long boundaries.88 char+=$30
Addr : 05B0: 38 30 : Constant 1 Bytes - 30 - $00000030 48
Addr : 05B2: 66 4C : Variable Operation Local Offset - 1 Assign WordMathop +
89 if char > $39
Addr : 05B4: 64 : Variable Operation Local Offset - 1 Read
Addr : 05B5: 38 39 : Constant 1 Bytes - 39 - $00000039 57
Addr : 05B7: FA : Math Op >
Addr : 05B8: JZ Label0002
Addr : 05B8: 0A 04 : jz Address = 05BE 4
90 char+=12
Addr : 05BA: 38 0C : Constant 1 Bytes - 0C - $0000000C 12
Addr : 05BC: 66 4C : Variable Operation Local Offset - 1 Assign WordMathop +
Addr : 05BE: Label0002
Addr : 05BE: Label0003
91 coms.tx(char)
Addr : 05BE: 01 : Drop Anchor
Addr : 05BF: 64 : Variable Operation Local Offset - 1 Read
Addr : 05C0: 06 03 0B : Call Obj.Sub 3 11
Addr : 05C3: 32 : Return
doNEXT rdbyte token,IP 'read byte code instruction
add IP,#1 'advance IP to next byte token
shl token,#1 'expand to 9-bits - all byte codes point to code on double-long boundary
cmp token,#$180 wc 'tokens $C0..$FF are calls to kernel byte code via kbctbl
if_c jmp token 'directly execute PASM byte codes without further ado
' Fetch the next byte code instruction pointed to by the instruction pointer IP in hub RAM
'
doNEXT rdbyte token,IP 'read byte code instruction
add IP,#1 'advance IP to next byte token
shl token,#1 'expand to 9-bits - all byte codes point to code on double-long boundary
cmp token,#$180 wc 'tokens $C0..$FF are calls to kernel byte code via kbctbl
if_c jmp token 'directly execute PASM byte codes without further ado
' byte codes $C0..$FF point to further byte code definitions
' which are larger fragments of byte code in hub RAM
call #SAVEIP 'save current IP in prep for a call
add X,kbcptr 'kbcptr points to the kernel byte code table (less $180)
rdword IP,X 'read 16-bit address from hub kbc table into IP
jmp #doNEXT 'Execute the code
' Example of PASM code entries for Byte Code indexing on double-long boundaries
'
DROP2 call #POPX
jmp #DROP
DUP mov X,tos ' Read directly from the top of the data stack
jmp #PUSHX ' Push X onto the data stack and doNEXT '
OVER mov X,tos+1 'read second data item and push
jmp #PUSHX
NIP mov tos+1,tos 'replace second item with top and drop
jmp #DROP
LIT0 mov X,#0
jmp #PUSHX
LIT1 mov X,#1
jmp #PUSHX
'****************** BOOLEAN ******************
_AND movi _POPEX,#1000_001 ' AND ( n1 n2 -- n3 )
jmp #POPEX 'discard top of stack and execute modified PASM
_OR movi _POPEX,#1010_001
jmp #POPEX
_XOR movi _POPEX,#1011_001
jmp #POPEX
'***************** MEMORY *******************
CFETCH rdbyte tos,tos ' read byte pointed to by tos into tos
jmp #doNEXT
CPLUSST rdbyte X,tos ' read in byte from adress
add tos+1,X ' add second item to contents of address
CSTORE wrbyte tos+1,tos ' write the second item using address on the tos
jmp #DROP2
' Example of interpreted byte codes in hub RAM
' References to other byte code defintions is relative which is also necessary because of the Spin compiler's limitations with DAT sections
'
0530(001B) 06 | _BOUNDS byte OVER/2,PLUS/2,SWAP/2,EXIT/2
0531(001B) 0C |
0532(001B) 08 |
0533(001B) 00 |
0534(001C) | PRTHEX ' ( n -- ) print n (0..$0F) as a hex character
0534(001C) 2D | byte CLIT/2,$30,PLUS/2
0535(001C) 30 |
0536(001C) 0C |
0537(001C) 05 | byte DUP/2,CLIT/2,$39,GT/2,_IF/2,3
0538(001D) 2D |
0539(001D) 39 |
053A(001D) 20 |
053B(001D) 3E |
053C(001E) 03 |
053D(001E) 2D | byte CLIT/2,12,PLUS/2 'Adjust for A..F
053E(001E) 0C |
053F(001E) 0C |
0540(001F) 49 | PRTCH byte EMIT/2,EXIT/2
0541(001F) 00 |
0542(001F) | PRTBYTE
0542(001F) 05 | byte DUP/2,CLIT/2,4,_SHR/2
0543(001F) 2D |
0544(0020) 04 |
0545(0020) 1A |
0546(0020) 3B | byte RCALL/2,20 '-->PRTHEX 'Due to limitations of Spin tool & BST this needs to be calculated by hand
0547(0020) 14 |
0548(0021) 3B | byte RCALL/2,22
0549(0021) 16 |
054A(0021) 00 | byte EXIT/2
EDIT: Fixed byte code references which are encoded as 8-bits using cogaddress/2
Comments
Sounds cool. Did you read localroger's Windmill blogs? It's a Forth-like bytecode interpreter designed to run out of SPI EEPROM. I'd recommend checking out http://forums.parallax.com/entry.php?39-Windmill-Byte-Code-Interpreter where he talks about mapping bytecodes to PASM instructions.
Tachyon is a perfect name for it since Forth is often considered a hypothetical language that may or may not exist.
I look forward to taking this out for a cruise!
I am curious as to what the advantage of developing another Forth when PropForth looks very complete?
That is besides the need to learn by doing, which i understand completely.
cheers,
rich
A or greater equal depending on when you do the jump
We all need more coffee and we don't need an excuse. The code was all mish-mashed by some major upheavals when I posted this about 3:30AM in the morning (or at night from my pont of view) and I spotted it this morning and changed it to if_z instead but still wrong
I had changed the point at which byte tokens are either used as a direct 9-bit address into the cog's PASM code or as an index into a jump table to byte code. Somehow it looked right when I did it. Sometimes I aim for "close enough is good enough" as I know I will come back and if it's still there in that I haven't changed it all again then I will make it right (or just crumple it up and toss it into the trash).
I look forward to (attempting) assimilating all your best ideas! Especially those that yield more run time speed without assembler. You got some neat stuff going on. Can you borrow from localroger's work?
When your design starts to get stable, consider using the test automation so the workstation runs a regression test suite after every development change is "done". Sal says there is no reason the automation would not work with any language, I would like to find evidence one way or the other. I'd like to help set it up.
@richaj45: The biggest advantage is getting the perspective of a different approach. Sal's way is deemed the best by Sal to do what Sal wants to do, and does not necessarily lend itself to what Peter wants to do or the way Peter wants to do it. "Right tool for the right job". Its almost guaranteed that Peter will find a better way to do what Peter wants to do, since he does not have the same constraints. Often, we will find a new perspective in the way one does something that improves the other. This effect might not be limited to forth kernel development .
As Prof says, another version is great, and ideas can be shared, making each version better than before. Good ideas will find their way into other code too.
Prof & Sal: I am sure the test automation suite will be great once things settle. Hopefully it should even find its way into normal code too.
One of my aims is to also have enough resources left over that I can hook-up a monitor, keyboard and SD card and run the whole system stand-alone if I have to. But mainly I find that I interface to a great variety of chips but I need more speed and I would like to stay within the normal Forth environment when doing that. Forth was after all designed to get at the bare metal in a transparent and interactive manner having first been employed on radio telescopes in the 70's. At the very most if necessary having to patch in a Spin file, recompile the kernel and have it up and running just as quick. Well, at least, that's my aim. Having efficient byte code means I can pack a large application program in and still have memory left over for video etc.
I hadn't looked at localroger's Windmill before but that's the basic idea I had before for running larger programs in that I would use those small 4M byte serial Flash chips I have on some boards or else SD but it's still a lot more cumbersome than running from hub RAM. Running interpreted code from serial memory is an old idea, I remember the TSS400 for one. It's interesting to see that he is thinking of a scheme to encode PASM. But what has happened to Windmill since? Has he charged at it a bit too quixotically?
I'm looking forward to your experimenting along these lines, your perspective and results will be interesting. We do have a way to "lock" the forth prompt so only the user level application words are available, and to eliminate the development extensions from the final application to save space, but we haven't worked on trimming down the kernel further, there is a lot of unexplored custom kernel development.
In the meantime, maybe look at the JupiterACE code from v3.6; this runs on the Prop Demo Board and is a stand alone forth with VGA and keyboard, 80 column text in hires, 40 column text in low res. This might be towards what you are looking for, it will be brought up to 5.0 kernel when the test automation is complete. You might be able to build on this, but Sal's version is still a couple weeks out. The JupiteACE was actually my goal for joining the project. V3.6 was the teaser, v5.3 may be the final result. Running VGA takes most of the resources of a prop chip, which lead us to the idea of just adding more props, which lead us to MCS and Go-channels. So the propforth development has been getting "bigger", if you find ways to get it "smaller" again, that will be really helpful.
Sal plans to simplify the process for optimizing in assembler, he will add new words that start and end the assembly process, and the assembler code can be compiled right into the dictionary. This could help in creating tachyon, but it may not ready until 5.4.
We looked at linking in arbitrary SPIN files, but that seemed to require the SPIN be written to support a "standard" interface, and we couldn't find anything to use as the standard; (every spin program seems to be too different) so we stopped that investigation. Maybe you can provide some insight or example of a specific spin file you want linked in, we can use that as a starting point for a "standard".
Sal's model supports adding more hub and cog memory in the form of more units of prop + SD, rather than adding more dedicated memory parts etc. For Sal's purposes, its easier and cheaper to just grab a couple more props out of the bin. In the case of the JupiteACE, it allows the full resource of oa prop to run the VGA, and permits a a full prop or more to be available to an application. But we have kept in the 32K memory mind set, it will be interesting to see what we gain will large external memory configurations.
I have a pile of HIVE boards (hive-project.de) that accept 1 meg x 8 bit SRAM. I was thinking of circuit bending these toward a propforth rig, but that is way down the road. In fact, you might want to check out the "m" language Ingo is working on over there. It appears to be a version of forth for the Hive hardware, and works with the hive OS running on the other chips (which might get you the "link in spin programs" function you seek). I don't know the details, but a bunch of it seems similar to your goals. Google translate does a fair job with the German, and Ingo and the Borg drones can do English.
I have a little tidying up to do (if I am not seconded to the garden project in the meantime) and I will build in the dictionary and high-level words that form the text interpreter (vs the byte code interpreter) so that I can work with this interactively in a terminal. At this stage I may release the source for the alpha which should be in the next week of so.
Another change I made was to the serial I/O and include the serial transmit code into the Forth cog and leave the serial receive to a dedicated cog. The transmit code is very small and doesn't have to worry about multi-tasking with any receive code etc. This way the receive timing can be very precise and run at 1,382,400 baud for the maximum speed of my Bluetooth modules. Also since the transmit speed is very high there is no need to buffer or waste time writing to hub RAM as each character completes transmission in 7.23us.
The SPI primitives now include a flexible transmit routine which clocks data out at 2.85MHz (without doing anything fancy that is) so this handles a lot of SPI and I2C style protocols very efficiently. To slow it down just requires accessing the CLKBIT primitive at your leisure.
I have also found that it is far more efficient to store my inline literals and constants in big endian form to facilitate shifting and accumulating. So there is only one routine that reads in bytes to form these numbers and depending upon the entry point is what decides how many bytes are read. Constants are also coded as a standard definition with an exit (return from call) as there is not much advantage in having a special operation just for this. So the structure of say a 24-bit constant is [PUSH3] [$A5] [$00] [$C1] [EXIT] where PUSH3 is the byte-code for reading in 3 bytes and pushing the result onto the data stack after which the definition EXITs. Too easy.
This is the simple PASM code that effects pushing 1 to 4 inline bytes onto the datastack.
And this is how a separate constant is coded (similar to inline literals without the EXIT): Because all values are non-aligned there is no wasted space aligning them to word or long boundaries. Also relative addressing gets around any offsets and allows for relocating code easily.
So my test routine which prints a start-up message, sends out 32-bits via SPI, and does a hex dump of hub RAM looks like this in byte-code listing form (courtesy BST):
Please note too that Forth is only coded this way in the Spin compiler to form the kernel after which the Forth itself would handle normal text input for compiling which would look like this:
So I'll try not to bore you any further with any details suffice to say that there are a lot of very neat things going on and planned. With the dictionary (names of functions and pointers etc) in external memory such as EEPROM and SD there will be a lot of program code that will be able to be squeezed into just a few k of RAM, count the bytes that are used in the demo! Some code will be available very soon now. (Hey Cluso, I hear it's going to rain all week
I'm doing a rain dance on this side of the world because we need the rain.....I'll add some "remote location" rain dancing for purely selfish reasons!
Do you think your interpreter could be applied to Spin to improve the speed? One approach would be to compile Spin to your bytecodes. Another approach may be to use your instruction-decoding technique to decode the existing Spin bytescodes.
Also, maybe it would be possible to compile C to your bytecodes. Can you describe your VM in more detail? It appears to be stack-based, but also includes an accumulator. Are there any other registers in the VM?
Dave
I just had a look at the Spin bytecodes and it ain't fun, there's no way you could code all that and still cram it into just the cog. Tachyon Forth bytecode operations are fairly simple, just like PASM, but they are very flexible and implement a simple virtual stack based processor. The reference to an accumulator is really nominal as this is just a location to shift and accumulate literal values. The so named "accumulator" is cleared for next use after every push onto the datastack. There are other temporary registers also named for convenience such as R,X,Y,Z,R0..R3 as well as the IP which is equivalent to the PC in a real processor. The "X" register is used a lot for passing a value without upsetting the tos (top of stack) value as you can see in the GETBYTE and ACCBYTE routines. Creating virtual registers is not a problem though.
Stack manipulation can be very easy and transparent on some processors but the Prop isn't one of them, there are no auto increment/decrement indexed instructions. I push and pop my stacks by physically moving values which sounds kind of brute force'ish but I worked out that this is still far more efficient as the PASM routines can access all stack items (not just tos) directly without any extra overhead so rotating and swapping etc is very fast and compact. The push and pop operations only take a tiny bit longer than a conventional stack implemented with a pointer anyway. Upon detecting non-zero values "falling" off the bottom of the stack I jump to an error processing routine but who cares if zero values "fall off" as I also pump zero values back into the bottom of the stack when it's popped.
As for compiling from C to these bytecodes there shouldn't be any problem at all as it would only be the PASM code that fits in the VM cog that would be required to run them. It's a bit like compiling from C to Java bytecodes and using the JVM but a lot simpler of course and without all the overhead that would normally be involved. At present the Tachyon bytecodes are not fixed in value as the VM is in a state of flux but even so I don't think that there would be any requirement for portable bytecode normally. The symbol address of the bytecode function is the same as the bytecode which is why I can just reference them directly with the "byte" directive in a DAT section.
Hope that sort of answers your questions and when i release some source soon you will be able to have a good look yourself to see if Tachyon bytecode is suitable for your task.
I like your stack philosophy. There's no reason to have a huge stack, except for deep recursion. One annoyance of Forth, of course, is stack maintenance. It would be handy to add two more kernel instructions, pushmark and poptomark. These allows one to punctuate the stack in such a way that post-operative cleanup requires only a poptomark without having to know how much garbage remains. The "mark" deosn't really have to exist in the stack itself, but can be tracked either via a rotating bitmask that's synchronous with the stack, or a separate mark stack whose top element keeps track of the relative position of the next mark (i.e. increments on a push, decrements on a pop, and gets popped off when it goes negative).
-Phil
Some changes:
1) Added a jump table for kernel bytecodes as I originally planned. This simplifies calling them from the Spin compiler and allows these functions to be anywhere.
The table allows for up to 256 vectors which so that a call looks like:..... BCALL,xPARSE ..... where PARSE is referred to by the byte label xPARSE. Now I don't have to do a relative call with all that awkward setup for the Spin tool as in:...... RCALL,@L1-@PARSE .....where I also have to create a label such as L1 that follows everytime. The DATA section has it's advantages as I just set the ORG to 0 before the table so that each entry has consecutive values from 0 up to the maximum reference of 255.
2) Added a local register bank which is great for storing temporary values and settings etc. This makes the kernel bytecode a lot easier too as it doesn't have to create special variables for number bases and interpreter flags etc. The registers used by the kernel are referred to in the CON section where they are created with a simple:
While the reference in the Spin tool to the register is in the form: and compiled:
3) Enhanced number processing so that sensible prefixes and suffixes can be used:
4) 5) 6) lots more stuff
I will probably allow direct addressing of stack items so that .... 3 STACK ... will return the address of the third stack item which can be manipulated just like any other variable.
@Phil:
I have tinkered before with shadow stacks that hold information about the stack items, a bit like a type identifier. But although I understand what you are saying and ways to implement it I'm not sure how useful this would be, at least to a Forth programmer. You see, although the stack can be a nuisance especially when it is misused and layers deep, part of coding any Forth is the art of factoring things into small, clear, and manageable chunks of code. My Tachyon Forth is being driven by need, by actual embedded control use so I tweak it to optimize what I really need. But could you please give me an example of where your suggestions would really excel? Thanks.
My suggestion arises from comparative experience programming in both Forth and Postscript. The latter has mark and cleartomark words that make stack maintenance a breeze compared to Forth. One simple example of their utility can be seen when a subroutine needs to abort, returning the stack to a known state upon exit. Depending upon how much stuff got pushed on before the fault condition was encountered, this could be a real chore without a way simply to wipe out the garbage down to a known point in the stack.
-Phil
Although Postscript may be based on Forth it is also very non-Forth'ish because of it's huge complexity. I understand the Postscript requirements but how useful would that really be in the simpler embedded Forth model with lots of "small" words? But I will see if this can be implemented without too much fuss, especially since I am looking at being able to create local variables straight from the stack description which is normally only a comment such as: ( pin channel -- flg ) ..... and referring directly to pin and channel etc. However EXIT or it's cousin will then have to clean-up the stack and place whatever results there are onto the stack.
The Google Docs format makes it easy to view and even download in various formats. Of course if you want to try out the code then just select all and copy&paste into your Spin IDE. The settings are so that anyone with the link can view and comment and if anyone wants to be able to edit this document as part of a collaborative effort then please just email me. There's also the chat box you can use when you have the document opened.
now I need to start thinking backwards.This is nice. Even I might understand FORTH now - Will study your pasm ...
Thank you
Mike
Another great feature is that this "source" is live, you may even see things change before your eyes as I am formatting it or making code changes !!! Maybe this way should be the way we format source for the gold standard?
Fancy formatted source code in Google Docs
The other advantage is that this is a live document, what you see is what I see and what I am changing.
I have also tested the serial transmit and receive to at least 2M baud at present and I will do some further tests at higher speeds later on. I'm also overlaying the serial receive cog's image as well with the receive buffer so there is no wasted memory.
So far...so good.
So how is your coding doing 2M baud?
Why is it so much faster than the standard Full Duplex 4 port object?
Is the serial driver full duplex?
When do you think the whole of the Tachyon code will be posted?
Thanks in advance for your patience with all these questions.
cheers,
rich