Best way to handle text messaging

Erlend · 2015-03-03 08:01

This time I will ask the good forum before I start. As part of my project there will be hundreds of text messages that will be output to both LCD and to TTS (TextToSpeech chip) by simple serial comms. The mass of messages will be growing and evolving as the project moves forward. The structure is such that there will be a number of base messages, with variants toeach message. E.g. the message 'too little water in tank' will have a verbouse, instructional variant for the first time user, a brief variant for the experienced user, and also some irritaded variants for the neglective user, etc. I am imagining this will grow to about hundred base messages, each with five-six variants.
User events, process events, and environment event will trigger messages. Probably this will done by the parent level code. I am thinking of using some sort of message(number, variant) command.

Should I use a huge DAT section with string definitions, should I read from SD each time, should I load from SD into a large array at startup and then use strings held in string[n], etc? I am unsure what is the best approach, and advice is appreciated.

Erlend

ksltd · 2015-03-03 08:13

Erlend,

Design advice is always the same. Characterize your problem and the solution will present itself. Pfaffing around in the field of possibilities without defining your problem is a fool's errand.

I used your "too little water in tank" example to extrapolate the aggregate size of your literal pool. That message is 24 bytes, 25 with a null terminator. I presumed that each variant would be the same and that the verbose flavor would be 10x that size. So for each message, we get 14 x 25 bytes or 350 bytes per message. 100 of these is 35000 bytes - more than all available memory. And if you're using SD, you're already conceding about 25% of memory to just the code to handle the SD cards.

That makes it easy to conclude that you won't be storing them in a big string table in memory. Even if you had only 25% that aggregate size, you're likely out of memory before you start. You can rule out the large array on first principals and focus on retrieving them from SD each time.

Heater. · 2015-03-03 09:06

Erlend,

Should I use a huge DAT section with string definitions, should I read from SD each time...

There is a general design principal that basically says "use data not code". I'm not sure how they would say that formally now a days.

What does this mean in this case ? :

1) If you only have a few messages and you are sure they will never change, by all means build them into your program's binary.

2) If you have many messages, too many to fit in the available Prop memory say, perhaps it's better to keep them external to the program as data that it uses as and when required. Perhaps on SD card or whatever.

3) If you think your messages may change with time again perhaps it's better to keep them external to the program. Then you can just write some new text files to the data store rather than having to rebuild the program.

4) If you have many messages that will change with time and perhaps in different versions, like different language translations, then it is definitely time to use an external store rather than maintain many different versions of a program binary.

As you say "The mass of messages will be growing and evolving as the project moves forward." that indicates that an external store is what you want. If the project can tolerate the extra hardware cost I guess.

Erlend · 2015-03-03 09:12

ksltd wrote: »

Erlend,

Design advice is always the same. Characterize your problem and the solution will present itself.

Very much to the point I have to admit. This time it is me on the receiving end for this advice - which I give all the time to others.

But 25% memory to read simple texts from SD?? What about moving it all over from SD to the upper part EEPROM, and then use that as the working memory? Or maybe a bare-bone SD object?

Erlend

Erlend · 2015-03-03 09:16

Heater. wrote: »

Erlend,

4) If you have many messages that will change with time and perhaps in different versions, like different language translations, then it is definitely time to use an external store rather than maintain many different versions of a program binary.

As you say "The mass of messages will be growing and evolving as the project moves forward." that indicates that an external store is what you want. If the project can tolerate the extra hardware cost I guess.

Another good point. Maybe even the tiny, fiddly SD cards do not fit the pupose. Maybe I should go for USB stick - which is much more convenient for moving back and fro the PC for text entry/edit.

Erlend

Chris Savage · 2015-03-03 09:28

Tricky subject line. I came into this thinking it was about SMS.

kwinn · 2015-03-03 09:37

The answer really depends on how much memory the messages will take up and how frequently they may be updated. If they will fit in the hub with the program then put them there as a DAT block. If they need a bit more than that perhaps you can store them in the last 32K of a 64K eeprom plus whatever space was left beyond what program uses in the in the first 32K. Beyond that it's either a second larger eeprom or an sd card.

No matter what approach you take I would suggest you use a spreadsheet to create the messages so you can cut and paste the .CSV file into the propeller program. That way if the message table gets large enough to slow things down you can split it up and use an ISAM style method to access it.

Heater. · 2015-03-03 10:38

Erlend,

But 25% memory to read simple texts from SD??

Try thinking out of the box. What you describe only applies if you need a FAT file system on the SD cards to read those texts.

If you have no need for a FAT file system don't use one. Just treat the SD as a raw FLASH device to which you can read/write blocks of data via it's SPI interface. Dead easy, very cheap, very small SPI driver required.

Even the cheapest SD card available today has giga bytes of space. It's all basically 4K blocks of data (or whatever size). You could use one 4K block for every possible text message your system will ever need in every possible language and never come close to using the whole device.

How would you do this?

Just write a simple program on your PC to create files containing all your messages. Place the messages at multiples of 4K byte offsets within the file (Or whatever the blocks size is).

Write that file to an SD card using the "dd" command of Linux/Unix.

Now your Propeller can read blocks from that SD via a SPI driver. If you know the message you want you know which block to read it from.

To make it flexible, put a "look up table" in the first block or two that translates required message numbers into block offsets.

Basically this is what I did with my early Z80 emulator. Forget about the FAT file system and just let the CP/M OS on the Prop read and write blocks as it pleases.

Edit: I would not even think of a USB stick with the Propeller.

ksltd · 2015-03-03 10:46

USB via VINC1L or VII is a much better solution for many applications than is SD.

ksltd · 2015-03-03 10:55

Erlend,

It may also be the case that compressing your text make sense. It's probably all 7-bit ascii and if it's english, then you ought to get about a 2x compression with Huffman encoding. But you're still way behind the power curve.

EEPROM would be far, far easier by most measures than either SD or USB. Simply eliminating the need for removable media and all the failure scenarios that result is huge. Simply using a 1mb EEPROM to store both your program and its data as the boot rom is trivial and pin compatible with most designs.

Erlend · 2015-03-03 13:00

I do not want to spend too much time on hw or software to facilitate a storage solution. If there is a way I can go straight to a solution I would prefer that. Ideally, maintain the text in a spreadsheet on the PC, export it on to 'something', then let the P1 read the individual text files as needed, using some sort of numbering or indexing (converted to filename?).
Even better if a PC is not needed for the actual transfer - i.e. instead by plugging in. Speed is not needed, obviously. If, on the other hand I do it by beaming it from a smartphone file, via Bluetooth, that would be an exciting learning exercise and would justify some extra work/time. What would that take? (I am thinking in this case writing it into the 2nd 32k space already on the board).

Erlend

Peter Jakacki · 2015-03-03 14:46

Erlend wrote: »

I do not want to spend too much time on hw or software to facilitate a storage solution. If there is a way I can go straight to a solution I would prefer that. Ideally, maintain the text in a spreadsheet on the PC, export it on to 'something', then let the P1 read the individual text files as needed, using some sort of numbering or indexing (converted to filename?).
Even better if a PC is not needed for the actual transfer - i.e. instead by plugging in. Speed is not needed, obviously. If, on the other hand I do it by beaming it from a smartphone file, via Bluetooth, that would be an exciting learning exercise and would justify some extra work/time. What would that take? (I am thinking in this case writing it into the 2nd 32k space already on the board).

Erlend

I could give you a very simple and easy solution but not too many like moving out of their comfort zone

The simplest hardware solution is to use a larger eeprom and simply index the messages by number or name through a "directory" which could be just a simple table lookup using a message number.

As you know I use Tachyon for most of my Prop development and I have stored strings in upper eeprom to save memory but now I save the whole dictionary in upper eeprom. Anyhow I also have applications that use text messaging and if they are stored in eeprom rather than SD then I simply download them serially either in whole as a text file or interactively enter/edit individual messages. Not sure if you are looking in this direction but it is very easy to implement if you choose to do so.

EDIT: Yes, you could "export" a csv file serially or it could just be plain text.

kwinn · 2015-03-03 14:56

Minimum time on hardware and software would be nice, but it's more likely to be a trade off between them.

By all means maintain the text files as a spreadsheet for conversion to a text file. For the “something” to export it to an sd card or usb drive would be the simplest since they are already available on most pc's.

An alternative would be an eeprom or eMMC card, but that would require extra hardware (a propeller as the go between?). Using a propeller and some storage device to transfer the data from the PC to the unit in the field would certainly simplify the field unit hardware and software.

If there are many messages and lots of text having a small program on the propeller that creates an index as the messages are read from the “something” and stored on the other “something” would help speed up searching. Store the message number and start address of the text for each message in the index if the numbers are not sequential, store the address only if they are sequential.

idbruce · 2015-03-03 15:08

No matter what approach you take I would suggest you use a spreadsheet to create the messages so you can cut and paste the .CSV file into the propeller program.

Why cut and paste? Create the CSV file of all the messages you need. When finished, try uploading it to EEPROM with the FILE directive. If it fits, then great, you can parse for a particular message by a csv field indicator. If it doesn't fit, then punt.

Heater. · 2015-03-03 15:19

Do not use a spread sheet. Spread sheets are for numbers and calculating.

Keep all your messages in plan text files.

That way you can easily keep them with all your source code in a source code management system. Like git, preferably up on gihub.

You will need some format to identify messages, perhaps by number. Simple CSV might do.

Consider that this maybe the "source" format of your messages. If you need to convert that into a different format for use by the Prop that can be a simple as a few lines of Perl or Python or the language of your choice.

idbruce · 2015-03-03 15:37

CSV File
1,complex message,average message,minimal message
2,hello friend good to see you on this fine day,hello my friend,hi

With a number at the beginning of each line, you can easily and quickly parse them.

Heater. · 2015-03-03 16:15

idbruce,

That's more like it.

I once worked on a project that needed thousands of messages. It needed translations in many languages. Those messages had to be available quickly for a very complex GUI. Memory space was really tight.

Solution: Keep the messages on disk and pull them in on demand.

A linear parse/search though the text file(s) was very slow given the disks and CPU's of the day.That required building an index of message numbers vs location on disk so as to be able to access them at any reasonable speed. A quick binary search through the index and seek to the file offset worked a treat.

abecedarian · 2015-03-03 16:21

Another idea might be instead of storing complete phrases, build phrases from a collection of dictionary-like objects.
Some of the objects could be stored in arrays whose index indicates severity.
You might have ...
... an array with "I'd like some", "I need", "I'm going to die without" and so on.
... an array with "food", "water", "fertilizer", "sunlight" and so on.
... an array with "please", "really soon", and maybe some expletives.
Then build sentences with what's in your dictionary.

Heater. · 2015-03-03 17:01

Of course a CSV as Bruce showed will not allow for having comas in the messages.

Better to use some common standard format like JSON. Stay away from XML.
[code]
[
{
"index": 1,
"English": "This is a message, how do you like it?",
"German": "Das ist eine Botschaft , wie gef

idbruce · 2015-03-03 17:09

Heater

Yea, when it comes to data that has a common denominator, such as the "message", I like CSV files, because they are simple, and if you want to change languages, no sweat, just translate the original file, and you are good to go.

And as previously mentioned, CSV files can be uploaded to EEPROM with the FILE directive, providing they fit into EEPROM, if not, they can be stored on SD and be parsed just as easily.

CSV is the way to go

idbruce · 2015-03-03 17:15

Of course a CSV as Bruce showed will not allow for having comas in the messages.

Now wait a minute..... You could very easily have a comma delimiter to be recognized during parsing.... but not much point in that, because it would just slow down processing. If commas won't do it for you, because you need commas in your messages and you don't want to slow down processing, just pick another character that you will not use.

E.G.

CSV File (with altered seperator)

1~complex message~average message~minimal message
2~hello friend good to see you on this fine day~hello my friend~hi

idbruce · 2015-03-03 17:54

On the other hand, if you knew ahead of time, which message you wanted to send, then you could eliminate the number system, and just loop until you reach the proper file line and then parse that line individually to get the correct message. So in other words.....

if(x_situation_occurs)
{
	switch(User)
	{
		case 'Beginner':

			line = 1;
			field = 1;

			break;

		case 'Average User':

			line = 1;
			field = 2;
			
			break;

		// Expert
		default:

			line = 1;
			field = 3;
			
			break;
	}
}

if(y_situation_occurs)
{
	switch(User)
	{
		case 'Beginner':

			line = 2;
			field = 1;

			break;

		case 'Average User':

			line = 2;
			field = 2;
			
			break;

		// Expert
		default:

			line = 2;
			field = 3;
			
			break;
	}
}

idbruce · 2015-03-03 18:16

One more point for now, because I do not believe anyone has brought this subject up, but if you run out of space on the EEPROM, you could add on-board flash memory with a chip or by adding the Propeller Memory Card to your project. Here is a snip from the Propeller Memory Card datasheet:

The Winbond W25Q32FV provides 4 megabytes, for 32 megabits, of nonvolatile flash memory. The flash
memory is arranged in 256-byte pages, 4-kilobyte sectors, and 64-kilobyte blocks. The flash controller
can erase individual sectors and blocks, or the entire device. It supports a Serial Peripheral Interface
(SPI) in 1-bit, 2-bit, or 4-bit data modes. It also supports a Quad Peripheral Interface which uses a 4-bit
command and data bus. It supports a clock rate of up to 104 MHz, although the Propeller
microcontroller’s interface runs at a lower clock rate.

msrobots · 2015-03-03 18:57

Heater. wrote: »

Of course a CSV as Bruce showed will not allow for having comas in the messages.

Better to use some common standard format like JSON. Stay away from XML.
...With that in place it's easy to maintain, easy to write a few lines of Python or whatever to create whatever format the target needs.

Well CSV as of definition uses quoted text for strings and unquoted text for numbers. Thus commas inside the quoted text are no problem. Quotes inside of the quoted text need to be escaped by doubling them. The specification also allows to have strings unquoted as long as they do not contain spaces. This one is really stupid.

Funny thing is that line breaks are allowed inside of quoted text, thus one record in a CSV file can span multiple lines. As usual end of record can be CR or LF or CRLF.

The field delimiter usually is a comma (hence the name CSV) but can be a semicolon or a TAB also.

CSV is a widely used data exchange format, often used to copy databases from one system to the other. I doubt that JSON is more common then CSV.

Both of them are a pain to parse on a prop.

@Erlend,

I would go for a fixed length record database approach. So you can calculate the position of the text on the media by multiplication of the index with the record length. This will work with a big EEPROM as well as with a SD card.

When using a EEPROM you will need to write a small program for the prop, and another for the PC to copy data into the EEPROM. There is some Hydra Asset Manager (HAM) around doing basically that for Andre's HYDRA propeller system.

SD cards are nice to move from PC to Prop. As long as you just have your texts on the SD card, you just need a block driver. That is one cog and some init in SPIN. Start with FSRW and throw out most of it.

On the PC side you need to format the SD card and then copy one single file on it. The file will always begin with the same sector number (depending on size of SD card) and will be saved in continuous sectors on the card.

On the propeller you can use the block driver to get a 512 bytes sector into a RAM buffer and use the saved strings in it. (Index * RecordLenght / 512) + StartSectorNumber will give you the right sector. So you can use a FAT32 formatted SD card on the PC and on the prop you do not need to support FAT32, just one continuous file starting at StartSectorNumber.

Easy on the propeller side, more work on the PC side to create the file and format and copy SD.

Enjoy!

Mike

And - sure - the record length should fit nicely in a 512 byte sector. Say 128 or 256 bytes per message or so.

Ariba · 2015-03-03 19:34

I would just write the messages in the Spin Tool as zero terminted string data.
It's very easy to access it from Spin as long as the messages fit in the RAM. If they don't fit anymore, you already have the texts in the right format to make a methode that writes them to the Upper EEPROM or to an SD card and automatically generates an index with the addresses. The msg_out() methode shows how you get the addresses of the strings by message-numbers. With some modifications you can do the same if the messages are in the EEPROM. And you can write the whole message data from msg11 to msgend to the EEPROM or SD card as a single block.

PUB main
  term.str(@msg13)
  msg_out(3)
  ...

PUB msg_out(number) : i | p
  repeat p from @msg11 to @msgend
    if i==number
       term.str(p)
       quit
    if byte[p++]==0
       i++  
       if byte[p]==0
         quit

DAT
msg11   byte  "please switch the machine on",0
msg12   byte  "the water is too hot to drink",0
msg13   byte  "do you want sugar and milk?",0
msg21   byte  "Why not Tea?",0
msgend  byte  0
' ...

Andy

kwinn · 2015-03-03 19:59

Cut and paste into the propeller source code so you can start debugging the basic code using hub ram only. Then expand to the eeprom/sd/whatever version.

kwinn · 2015-03-03 20:05

Heater. wrote: »

Do not use a spread sheet. Spread sheets are for numbers and calculating.

Keep all your messages in plan text files.

That way you can easily keep them with all your source code in a source code management system. Like git, preferably up on gihub.

You will need some format to identify messages, perhaps by number. Simple CSV might do.

Consider that this maybe the "source" format of your messages. If you need to convert that into a different format for use by the Prop that can be a simple as a few lines of Perl or Python or the language of your choice.

Spreadsheets are great for creating this type of text file. Start by entering the data in the spreadsheet, save it, then save it as a CSV and use a text editor on that file if needed.

This way you can use spreadsheet functions to do what they are good for, and a word processor for the rest.

kwinn · 2015-03-03 20:21

Heater. wrote: »

Of course a CSV as Bruce showed will not allow for having comas in the messages.

Better to use some common standard format like JSON. Stay away from XML.
[code]
[
{
"index": 1,
"English": "This is a message, how do you like it?",
"German": "Das ist eine Botschaft , wie gef

JonnyMac · 2015-03-03 20:22

I would just write the messages in the Spin Tool as zero terminted string data.

Me too.

I use strsize() to iterate through a list of strings -- like this:

pub main | idx

  term.start(RX1, TX1, %0000, 115_200)

  repeat idx from 3 to 0
    term.str(get_msg(@msg00, idx))
    term.tx(term#CR)

  repeat
    waitcnt(0)


pub get_msg(p_base, ofs)

  repeat while (ofs > 0)                                         ' if not at target
    p_base += strsize(p_base) + 1                                ' skip this string + trailing 0
    --ofs                                                        ' decrement offset

  return p_base                                                  ' return pointer to target string

  
dat

  msg00       byte      "please switch the machine on", 0
  msg01       byte      "the water is too hot to drink", 0
  msg02       byte      "do you want sugar and milk?", 0
  msg03       byte      "Why not Tea?", 0

kwinn · 2015-03-03 20:31

Heater. wrote: »

idbruce,

That's more like it.

I once worked on a project that needed thousands of messages. It needed translations in many languages. Those messages had to be available quickly for a very complex GUI. Memory space was really tight.

Solution: Keep the messages on disk and pull them in on demand.

A linear parse/search though the text file(s) was very slow given the disks and CPU's of the day.That required building an index of message numbers vs location on disk so as to be able to access them at any reasonable speed. A quick binary search through the index and seek to the file offset worked a treat.

A little more complicated but well worth doing if there is a lot of text to search through. If the text were to be downloaded serially or via bluetooth to whatever form of storage ends up being used an index could be created as the text is being received and stored.

kwinn · 2015-03-03 21:14

Writing the messages in the Spin Tool as zero terminated string is the simplest way to go if everything fits in memory, and what I first suggested. Even that would be made easier by using a spreadsheet. Only the message text would need to be entered, the rest (numbering, "long", and zero terminator) can be done automatically on the spreadsheet, saved as a csv, then edited further with a text editor if need be.

An even better approach IMHO would be to leave the .csv file as is, have the PC send it to the propeller, and have a program on the propeller format and write the data to the upper 32K of the eeprom, creating an index while it writes the text, and finally writing that index to the eeprom.

The software needed for this would most likely take up less space than the text it moves to the upper 32K of ram, and JonnyMac's parsing routines would make formatting fairly simple. I also have added a routine to FullDuplexSerial that will receive and store a string of data to a byte array that I will post some time tomorrow.

Best way to handle text messaging

Comments