Best way to handle text messaging
Erlend
Posts: 612
This time I will ask the good forum before I start. As part of my project there will be hundreds of text messages that will be output to both LCD and to TTS (TextToSpeech chip) by simple serial comms. The mass of messages will be growing and evolving as the project moves forward. The structure is such that there will be a number of base messages, with variants toeach message. E.g. the message 'too little water in tank' will have a verbouse, instructional variant for the first time user, a brief variant for the experienced user, and also some irritaded variants for the neglective user, etc. I am imagining this will grow to about hundred base messages, each with five-six variants.
User events, process events, and environment event will trigger messages. Probably this will done by the parent level code. I am thinking of using some sort of message(number, variant) command.
Should I use a huge DAT section with string definitions, should I read from SD each time, should I load from SD into a large array at startup and then use strings held in string[n], etc? I am unsure what is the best approach, and advice is appreciated.
Erlend
User events, process events, and environment event will trigger messages. Probably this will done by the parent level code. I am thinking of using some sort of message(number, variant) command.
Should I use a huge DAT section with string definitions, should I read from SD each time, should I load from SD into a large array at startup and then use strings held in string[n], etc? I am unsure what is the best approach, and advice is appreciated.
Erlend
Comments
Design advice is always the same. Characterize your problem and the solution will present itself. Pfaffing around in the field of possibilities without defining your problem is a fool's errand.
I used your "too little water in tank" example to extrapolate the aggregate size of your literal pool. That message is 24 bytes, 25 with a null terminator. I presumed that each variant would be the same and that the verbose flavor would be 10x that size. So for each message, we get 14 x 25 bytes or 350 bytes per message. 100 of these is 35000 bytes - more than all available memory. And if you're using SD, you're already conceding about 25% of memory to just the code to handle the SD cards.
That makes it easy to conclude that you won't be storing them in a big string table in memory. Even if you had only 25% that aggregate size, you're likely out of memory before you start. You can rule out the large array on first principals and focus on retrieving them from SD each time.
What does this mean in this case ? :
1) If you only have a few messages and you are sure they will never change, by all means build them into your program's binary.
2) If you have many messages, too many to fit in the available Prop memory say, perhaps it's better to keep them external to the program as data that it uses as and when required. Perhaps on SD card or whatever.
3) If you think your messages may change with time again perhaps it's better to keep them external to the program. Then you can just write some new text files to the data store rather than having to rebuild the program.
4) If you have many messages that will change with time and perhaps in different versions, like different language translations, then it is definitely time to use an external store rather than maintain many different versions of a program binary.
As you say "The mass of messages will be growing and evolving as the project moves forward." that indicates that an external store is what you want. If the project can tolerate the extra hardware cost I guess.
But 25% memory to read simple texts from SD?? What about moving it all over from SD to the upper part EEPROM, and then use that as the working memory? Or maybe a bare-bone SD object?
Erlend
Another good point. Maybe even the tiny, fiddly SD cards do not fit the pupose. Maybe I should go for USB stick - which is much more convenient for moving back and fro the PC for text entry/edit.
Erlend
No matter what approach you take I would suggest you use a spreadsheet to create the messages so you can cut and paste the .CSV file into the propeller program. That way if the message table gets large enough to slow things down you can split it up and use an ISAM style method to access it.
If you have no need for a FAT file system don't use one. Just treat the SD as a raw FLASH device to which you can read/write blocks of data via it's SPI interface. Dead easy, very cheap, very small SPI driver required.
Even the cheapest SD card available today has giga bytes of space. It's all basically 4K blocks of data (or whatever size). You could use one 4K block for every possible text message your system will ever need in every possible language and never come close to using the whole device.
How would you do this?
Just write a simple program on your PC to create files containing all your messages. Place the messages at multiples of 4K byte offsets within the file (Or whatever the blocks size is).
Write that file to an SD card using the "dd" command of Linux/Unix.
Now your Propeller can read blocks from that SD via a SPI driver. If you know the message you want you know which block to read it from.
To make it flexible, put a "look up table" in the first block or two that translates required message numbers into block offsets.
Basically this is what I did with my early Z80 emulator. Forget about the FAT file system and just let the CP/M OS on the Prop read and write blocks as it pleases.
Edit: I would not even think of a USB stick with the Propeller.
It may also be the case that compressing your text make sense. It's probably all 7-bit ascii and if it's english, then you ought to get about a 2x compression with Huffman encoding. But you're still way behind the power curve.
EEPROM would be far, far easier by most measures than either SD or USB. Simply eliminating the need for removable media and all the failure scenarios that result is huge. Simply using a 1mb EEPROM to store both your program and its data as the boot rom is trivial and pin compatible with most designs.
Even better if a PC is not needed for the actual transfer - i.e. instead by plugging in. Speed is not needed, obviously. If, on the other hand I do it by beaming it from a smartphone file, via Bluetooth, that would be an exciting learning exercise and would justify some extra work/time. What would that take? (I am thinking in this case writing it into the 2nd 32k space already on the board).
Erlend
I could give you a very simple and easy solution but not too many like moving out of their comfort zone
The simplest hardware solution is to use a larger eeprom and simply index the messages by number or name through a "directory" which could be just a simple table lookup using a message number.
As you know I use Tachyon for most of my Prop development and I have stored strings in upper eeprom to save memory but now I save the whole dictionary in upper eeprom. Anyhow I also have applications that use text messaging and if they are stored in eeprom rather than SD then I simply download them serially either in whole as a text file or interactively enter/edit individual messages. Not sure if you are looking in this direction but it is very easy to implement if you choose to do so.
EDIT: Yes, you could "export" a csv file serially or it could just be plain text.
By all means maintain the text files as a spreadsheet for conversion to a text file. For the “something” to export it to an sd card or usb drive would be the simplest since they are already available on most pc's.
An alternative would be an eeprom or eMMC card, but that would require extra hardware (a propeller as the go between?). Using a propeller and some storage device to transfer the data from the PC to the unit in the field would certainly simplify the field unit hardware and software.
If there are many messages and lots of text having a small program on the propeller that creates an index as the messages are read from the “something” and stored on the other “something” would help speed up searching. Store the message number and start address of the text for each message in the index if the numbers are not sequential, store the address only if they are sequential.
Why cut and paste? Create the CSV file of all the messages you need. When finished, try uploading it to EEPROM with the FILE directive. If it fits, then great, you can parse for a particular message by a csv field indicator. If it doesn't fit, then punt.
Keep all your messages in plan text files.
That way you can easily keep them with all your source code in a source code management system. Like git, preferably up on gihub.
You will need some format to identify messages, perhaps by number. Simple CSV might do.
Consider that this maybe the "source" format of your messages. If you need to convert that into a different format for use by the Prop that can be a simple as a few lines of Perl or Python or the language of your choice.
1,complex message,average message,minimal message
2,hello friend good to see you on this fine day,hello my friend,hi
With a number at the beginning of each line, you can easily and quickly parse them.
That's more like it.
I once worked on a project that needed thousands of messages. It needed translations in many languages. Those messages had to be available quickly for a very complex GUI. Memory space was really tight.
Solution: Keep the messages on disk and pull them in on demand.
A linear parse/search though the text file(s) was very slow given the disks and CPU's of the day.That required building an index of message numbers vs location on disk so as to be able to access them at any reasonable speed. A quick binary search through the index and seek to the file offset worked a treat.
Some of the objects could be stored in arrays whose index indicates severity.
You might have ...
... an array with "I'd like some", "I need", "I'm going to die without" and so on.
... an array with "food", "water", "fertilizer", "sunlight" and so on.
... an array with "please", "really soon", and maybe some expletives.
Then build sentences with what's in your dictionary.
Better to use some common standard format like JSON. Stay away from XML.
[code]
[
{
"index": 1,
"English": "This is a message, how do you like it?",
"German": "Das ist eine Botschaft , wie gef
Yea, when it comes to data that has a common denominator, such as the "message", I like CSV files, because they are simple, and if you want to change languages, no sweat, just translate the original file, and you are good to go.
And as previously mentioned, CSV files can be uploaded to EEPROM with the FILE directive, providing they fit into EEPROM, if not, they can be stored on SD and be parsed just as easily.
CSV is the way to go
Now wait a minute..... You could very easily have a comma delimiter to be recognized during parsing.... but not much point in that, because it would just slow down processing. If commas won't do it for you, because you need commas in your messages and you don't want to slow down processing, just pick another character that you will not use.
E.G.
CSV File (with altered seperator)
1~complex message~average message~minimal message
2~hello friend good to see you on this fine day~hello my friend~hi
Well CSV as of definition uses quoted text for strings and unquoted text for numbers. Thus commas inside the quoted text are no problem. Quotes inside of the quoted text need to be escaped by doubling them. The specification also allows to have strings unquoted as long as they do not contain spaces. This one is really stupid.
Funny thing is that line breaks are allowed inside of quoted text, thus one record in a CSV file can span multiple lines. As usual end of record can be CR or LF or CRLF.
The field delimiter usually is a comma (hence the name CSV) but can be a semicolon or a TAB also.
CSV is a widely used data exchange format, often used to copy databases from one system to the other. I doubt that JSON is more common then CSV.
Both of them are a pain to parse on a prop.
@Erlend,
I would go for a fixed length record database approach. So you can calculate the position of the text on the media by multiplication of the index with the record length. This will work with a big EEPROM as well as with a SD card.
When using a EEPROM you will need to write a small program for the prop, and another for the PC to copy data into the EEPROM. There is some Hydra Asset Manager (HAM) around doing basically that for Andre's HYDRA propeller system.
SD cards are nice to move from PC to Prop. As long as you just have your texts on the SD card, you just need a block driver. That is one cog and some init in SPIN. Start with FSRW and throw out most of it.
On the PC side you need to format the SD card and then copy one single file on it. The file will always begin with the same sector number (depending on size of SD card) and will be saved in continuous sectors on the card.
On the propeller you can use the block driver to get a 512 bytes sector into a RAM buffer and use the saved strings in it. (Index * RecordLenght / 512) + StartSectorNumber will give you the right sector. So you can use a FAT32 formatted SD card on the PC and on the prop you do not need to support FAT32, just one continuous file starting at StartSectorNumber.
Easy on the propeller side, more work on the PC side to create the file and format and copy SD.
Enjoy!
Mike
And - sure - the record length should fit nicely in a 512 byte sector. Say 128 or 256 bytes per message or so.
It's very easy to access it from Spin as long as the messages fit in the RAM. If they don't fit anymore, you already have the texts in the right format to make a methode that writes them to the Upper EEPROM or to an SD card and automatically generates an index with the addresses. The msg_out() methode shows how you get the addresses of the strings by message-numbers. With some modifications you can do the same if the messages are in the EEPROM. And you can write the whole message data from msg11 to msgend to the EEPROM or SD card as a single block.
Andy
Spreadsheets are great for creating this type of text file. Start by entering the data in the spreadsheet, save it, then save it as a CSV and use a text editor on that file if needed.
This way you can use spreadsheet functions to do what they are good for, and a word processor for the rest.
Me too.
I use strsize() to iterate through a list of strings -- like this:
A little more complicated but well worth doing if there is a lot of text to search through. If the text were to be downloaded serially or via bluetooth to whatever form of storage ends up being used an index could be created as the text is being received and stored.
An even better approach IMHO would be to leave the .csv file as is, have the PC send it to the propeller, and have a program on the propeller format and write the data to the upper 32K of the eeprom, creating an index while it writes the text, and finally writing that index to the eeprom.
The software needed for this would most likely take up less space than the text it moves to the upper 32K of ram, and JonnyMac's parsing routines would make formatting fairly simple. I also have added a routine to FullDuplexSerial that will receive and store a string of data to a byte array that I will post some time tomorrow.