Tachyon NEON V5 (FAT32 and Ethernet Servers in 32kB EEPROM!)
Peter Jakacki
Posts: 10,193
Tachyon NEON
V4 introduced wordcode vs bytecode which has proved to be faster and more compact overall but the thing that I always wanted to improve upon too was the memory hogging dictionary. While I have a COMPACT utility to transfer the existing dictionary over to EEPROM and use a hash table to index candidates it is still a hybrid dictionary with newer entries saved in hub RAM. Now it's the hub RAM that needs to be utilized to the max for code execution and buffers so it would be nice if there was a standardized way of accessing the dictionary so that it could be in any memory space. This is the conclusion I've come to and what I am proposing amongst other enhancements for Tachyon NEON V5.
Dictionary entries are fixed length and encoded with a standard record of 8 bytes that holds 5 characters or an extended record of 16 bytes that holds up to 15 characters. Compare this with the minimum variable record in V4 which requires 6 bytes for 1 character and 10 bytes for 5 characters. So V5 would mostly compare a long with a long rather than byte by byte comparisons. The method allows for optional hash table lookup in ram or elsewhere too for faster compile times. Also there will be no unique headers, if a new definition is created with an existing name then that header will simply be reused which is also good for temporary or local names.
Part of the dictionary overall also involves adding a vocabulary stack so that we can have multiple vocabularies over multiple memories and specify which ones to use and in which order.
I did an analysis of header sizes and found that most of them are 1 to 5 characters in length so it seems practical to encode up to 5 characters into a single long and forego the count byte. This leaves one byte for the attributes and 3 bytes for the wordcode.
Number of characters in kernel names
Wait on, a wordcode is only 16 bits, what's the other byte for? That's for the extended addressing where code can reside in eeprom up to 1MB and the rest in SD or serial flash up to 16MB. This code will be a little slower to execute but once cached in a sector buffer could run almost as fast as hub code but we will have to see how it goes. The EXIT routine will detect 24-bit addresses and use a procedure to return to extended code and other methods are also used. Code and execution speed in hub space will be practically unaffected by this.
However the main thrust of V5 is to have an absolutely minimal dictionary footprint in hub memory if desired although this does not affect a basic Tachyon system, it simply allows for the system to be expanded easily. Even if the dictionary was moved to EEPROM after EXTEND you could still have very fast searching and compiling as the block mode could create a hash table first to select only a handful of candidates from slow EEPROM for comparison which would only need 4 bytes read in at a time initially at around 200us each. The biggest problem with EEPROM access is the setup to select a device, address the location, and then reselect in read mode, this needs to be avoided as much as possible.
<More information>
V4 introduced wordcode vs bytecode which has proved to be faster and more compact overall but the thing that I always wanted to improve upon too was the memory hogging dictionary. While I have a COMPACT utility to transfer the existing dictionary over to EEPROM and use a hash table to index candidates it is still a hybrid dictionary with newer entries saved in hub RAM. Now it's the hub RAM that needs to be utilized to the max for code execution and buffers so it would be nice if there was a standardized way of accessing the dictionary so that it could be in any memory space. This is the conclusion I've come to and what I am proposing amongst other enhancements for Tachyon NEON V5.
Dictionary entries are fixed length and encoded with a standard record of 8 bytes that holds 5 characters or an extended record of 16 bytes that holds up to 15 characters. Compare this with the minimum variable record in V4 which requires 6 bytes for 1 character and 10 bytes for 5 characters. So V5 would mostly compare a long with a long rather than byte by byte comparisons. The method allows for optional hash table lookup in ram or elsewhere too for faster compile times. Also there will be no unique headers, if a new definition is created with an existing name then that header will simply be reused which is also good for temporary or local names.
Part of the dictionary overall also involves adding a vocabulary stack so that we can have multiple vocabularies over multiple memories and specify which ones to use and in which order.
I did an analysis of header sizes and found that most of them are 1 to 5 characters in length so it seems practical to encode up to 5 characters into a single long and forego the count byte. This leaves one byte for the attributes and 3 bytes for the wordcode.
Number of characters in kernel names
1=33 **************** 2=120 ************************************************************ 3=138 ********************************************************************* 4=139 ********************************************************************* 5=152 **************************************************************************** 6=85 ****************************************** 7=77 ************************************** 8=26 ************* 9=7 *** 10=2 *
Wait on, a wordcode is only 16 bits, what's the other byte for? That's for the extended addressing where code can reside in eeprom up to 1MB and the rest in SD or serial flash up to 16MB. This code will be a little slower to execute but once cached in a sector buffer could run almost as fast as hub code but we will have to see how it goes. The EXIT routine will detect 24-bit addresses and use a procedure to return to extended code and other methods are also used. Code and execution speed in hub space will be practically unaffected by this.
However the main thrust of V5 is to have an absolutely minimal dictionary footprint in hub memory if desired although this does not affect a basic Tachyon system, it simply allows for the system to be expanded easily. Even if the dictionary was moved to EEPROM after EXTEND you could still have very fast searching and compiling as the block mode could create a hash table first to select only a handful of candidates from slow EEPROM for comparison which would only need 4 bytes read in at a time initially at around 200us each. The biggest problem with EEPROM access is the setup to select a device, address the location, and then reselect in read mode, this needs to be avoided as much as possible.
<More information>
Comments
Does it cache just as much as it needs to ?
i2c memory seems to be slow hitting the next 3.4MB/s specs - has anyone tried over-clocking common i2c parts above their 1M spec ?
I can find FRAM & nvSRAM parts spec'd to 3.4MHz, so they could be used for testing, but they are not what could be called cheap...
While Tachyon will use I2C EEPROM as it's primary "secondary memory" it is not because it is fast or cheap, but simply because every Prop system has it and the chip can be easily substituted with a higher capacity etc. But I do have QSPI Flash on many boards although they only run on 1-bit mode but they would be a lot faster than I2C already. However I will at some point try out QSPI mode after I get this up and running.
I see Winbond have some DTR flash parts, but the part-codes seem not easy to find.
As best I can decode, the final letter is M on DTR+Quad parts and Q on non-DTR (just Quad)
Chipstop show
W25Q16JVSSIM (SOIC-8 208-mil) 33c/100+
W25Q16JVZPIM ZP WSON-8 6x5-mm 16M-bit $0.677/10+
and then there are new Octoflash parts :
http://www.macronix.com/en-us/products/NOR-Flash/Pages/OctaFlash.aspx#3V
which shows 512Mb (prodn) and 256Mb, 128Mb planned.
These parts DO have a 300mil 16-SOP package choice, which is easier to handle and prototype with than 6x8mm 24-TFBGA(5x5)
These can support 1b SPI, mode, for backward compatible use, and they also have the 8b Dual-Edge modes, for highest throughput.
HELP is designed to be rather general and at the moment it lists all the words of the dictionary in a more useful format too. Here's a small sample:
Internally the search methods allow for external memory methods and code execution is being upgraded to 32-bit addresses as well. My next step is to selectively assign some words to the EEPROM dictionary and others to RAM and perhaps even SD to test dictionary handlers. BTW, the compilation speed is very fast, I can set the line delay to 5ms using GtkTerm at 115200 baud which will be the same for TeraTerm so that it takes 9 seconds to download and compile EASYFILE.
I shared the google doc with you ...
Ah yes, but I thought I would open up a lot more memory for the Internet of "EveryThing"
Even with a flexible dictionary structure I can do a lot more as well after which I can explore my VMM (Virtual Memory Model).
And since this integrates into EASYNET using some internal SD accelleration and does not sit on top,
it really makes sense to get it running and then keep it as part of EASYNET, so any future system change can be handled by you @Peter.
It is not a lot of code actually. When the hook is in the html file reader everything else is handled in the script parts inside the HTML files.
So partially dynamic HTML or JS files can easily be scripted using Tachyon as the scripting language. JSON can be used as well. A JSON writer for Tachyon exists.
Yes this would be great to include in easynet even conditionally. It seems V5 will have plenty of space for all sorts of add-ons. I'm interested in this doc as well.
this is a really tiny extension to the code
I will take a look at it today on V4.5.
The dictionary searching is much faster and is the same even when pasting code outside of a TACHYON END block load since V4 load a fast search module only during block mode. The headers in the dictionary now combine the name count, the attributes, and a vocab ID. Standard Forth colon definitions are either pub, pri, pre, or module to set attributes for public, private, preemptive, or to identify a module header which used to be anything that ended in ".fth". The module attribute will also control the scope for word searches so that the module can be searched first, and if not found then the whole dictionary.
There's still lots of optimization I intend to do but it is fully functional and tested so far. Why would you use this version? Various words have been sped up, it loads faster, and includes extras while having more memory left over, even before RECLAIM'ing. I noticed SPIPINS that sets up the masks for the SPI instructions etc used to take over 80us but now it executes in less than 20us. Okay, many are still the same speed, but I've been tidying up some of the tardy ones.
Neon colors in the print? Nice and easy (figure it out):
Nothing to do with space probes
More like oversight I would say I formatted the report to always use 2+2 digits for the frequency!
I will just change the .AS" format so that it reports correctly as in this test:
there seems to be global replace error starting with your name and then many places ... I did a compare with 5.1 from Feb.27
great
I started out with MAKE: and DOES: but by renaming the kernel CREATE to CREATE: I was able to use the more familiar CREATE DOES> words. The only difference is that the DOES> portion is called by any function created with CREATE but does not have a "parameter field address" on the stack due to the way that Tachyon functions. It is easy enough though to do a simple R> to get that parameter field address which is usually all that is required as in the case of TABLE etc.
There's also a simple decompiler built into EXTEND that uses <100 bytes but is still quite effective and includes ANSI color coding. Trouble is, it shows you just how simple some of these functions are:
WWORDS also gives a color coded formatted listing of the dictionary.
will investigate CREATE DOES> in detail next week
Most of the development/debug tools are in the TOOLS module at the end of EXTEND so they may be forgotten without impacting code to save a couple of kB of memory in addition to the 1.4kB that is saved by RECLAIMing private words. Also the VGA, ANYKEY, and LCD modules are placed at the end of EXTEND before TOOLS so these may be safely forgotten as well. If you do all this it will provide 11.3kB free vs the 6.2kB free with all the modules etc but of course there is no fancy report on boot, just the version and memory stats.
Remember too that the top 3kB from $7400 onwards is dedicated to variables and various buffers etc.
Full version of EXTEND
Minimum version of EXTEND
HELP word (still adding stack hints)
What's more the server protocols are now independent modules which can be loaded on top of EASYNET, no need to have it compiled at the same time. This means that a protocol can be enhanced or added to suit user requirements.
By loading and running the COMPACT extension a further 7kB of dictionary can be moved into upper EEPROM leaving 8kB free.
After I play with this a bit more I will make and upload an image into the V5 directory.
By V9 you will not need any EEPROM at all !
Yes, and imagine where would we be if we had P1B with extra memory and I/O all those years ago when P2 "was almost here"?
I know there would have been a lot more still with us who have since fallen by the wayside, and a lot more newer ones, which would have created a perfect market for that future day when we have P2!
Who said go for P2??? <ducks for cover>
so it might be time soon that Tachyon is stable enough again (for a while ;-) ... )
to get back to port the scriptable webserver extensions from Tachyon 2.7 ...
If the HTTP/Web module is incremental it might be simpler to handle
I will check out your code
BTW, I came across Pascal's Triangle in Forth so I converted it straight across into Tachyon and it worked first go!
Pascal's Triangle code
Running it