Help parsing SMS

pgbpsu · 2011-02-25 22:27

I'm trying to add an SMS parser to some work that Jay Kickliter did some time ago.

http://forums.parallax.com/showthread.php?107112-Cell-phone-text-message-email-interface

He figured out how to hook a Prop up to a cell phone and use the prop to send SMS. I've got his code modified to work with my system, but now I'd like to add the ability for the Prop to parse SMS messages received by the phone. I've been working at this for a couple of days, but haven't gotten anything working yet. I'm not sure I've got the right approach.

Some fresh eyes and some input would be greatly appreciated.

I don't know how many messages are on the phone ahead of time. I can gather that info from the phone in a message that looks like this:

+CPMS: 3,250,3,250,3,250

The first 3 in this message means there are 3 messages stored in memory.
Just to show you how bad I am with strings I wasn't able to pull that "3" out of this response. Any thing I could think of would get "3" but what if it were "23" or "133". I used a different command which lists ALL the messages and I simply count the number of times I see the AT command leader (+CMGL which always comes at the beginning of the message being displayed).

+CMGL: 1,"REC READ","+14186552150","Joseph cell","11/02/24,19:53:35-20",145,9
Message 1
+CMGL: 2,"REC READ","+14187834806","Google voice","11/02/25,23:44:56-20",145,46
Here's what they said "Get ready here we come"
+CMGL: 4,"REC READ","+14183443982","Mary cell","11/02/25,21:34:00-20",145,24
This is from marys phone

OK

Messages found: 3

I'm willing to use what I've got. What I'd like to do is parse each message and compare the sender's number against a short list. If the sender is on the list, I'd like to parse the body of the message. If certain strings are contained in the body of the message I'll have the prop do different things. I've only got 4 commands to watch for

1) on
2) off
3) force
4) reboot

My approach has been to list all the messages and count how many there are. As you can see from the list above, they may not be sequential and the listing may not be in chronological order. I expect to review these messages with the prop every minute so in practice there will only ever be one message waiting and most times there won't be any. I can set aside some buffer space, but I don't want to try accommodate all the chars in the all the messages because I don't know how many that might be and can't set aside 1/2 the prop's memory just for this. So I thought I'd parse each message individually. The messages themselves are limited to 160 characters, add another 140 for the header info and 300 bytes should fit everything. The messages I'm after are so short, there should be lots of empty array elements.

I've proven to be pretty dull when it comes to working with strings. So any pointers would be really helpful. Below is the method I'm currently working on, but it's bad, really bad. Below that is some of my debugging which might be helpful to see what's actually in the responses from the phone. Hey, if it were good I wouldn't need help so here it is in all it's "glory".

I should add that messages from the phone are littered with CRs and LFs. I've been trying to get something that would handle quotation marks and commas in the message body, but I don't anticipate needing to parse something that contained that so just recognizing the sender's number, the time the message arrived and checking to see if the body matches any of my commands above would be just what I need.

PUB READ_MESSAGES | phoneRx, idx, timeOutCounter, endOfMessages, messageCount, messageNumber

  uart.putc(DEBUG,CR)
  uart.decx(DEBUG,hours,2)
  uart.putc(DEBUG,COLON)
  uart.decx(DEBUG,minutes,2)
  uart.putc(DEBUG,COLON)
  uart.decx(DEBUG,seconds,2)
  uart.str(DEBUG, string(13,"Counting messages.", 13))

  _attention
  DELAY_MS(100)
  uart.rxflush(PHONE)
  uart.str(PHONE,@phone_preffered_storage_set_ME)
  timeOutCounter := cnt
  repeat until((cnt - timeOutCounter > clkfreq * 1) OR phoneRx == "K") ' allow 3 seconds to do this; then quit
    phoneRx := uart.rxcheck(PHONE)              ' +CPMS: 0,250,0,250,250
    if phoneRx <> -1                            ' 0123456789012345678901234567890
      uart.tx(DEBUG,phoneRx)

  bytefill(@phoneResponse,0,RESPONSE_LENGTH)       ' zero out array
  idx := 0
  messageCount  := 0

  endOfMessages := FALSE

  uart.str(PHONE,@phone_list_all_messages_command) ' AT+CMGL="ALL"
  timeOutCounter := cnt
  repeat until(cnt - timeOutCounter > clkfreq * 5) ' allow 5 seconds to do this; then quit
  ' allow 5 seconds to do this, end of message list
    phoneRx := uart.rxcheck(PHONE)              '
    if phoneRx <> -1                            ' 0123456789012345678901234567890
      phoneResponse[idx] := phoneRx
      uart.tx(DEBUG,phoneRx)

      if idx => 6
        if (phoneResponse[idx-5]==LF  AND phoneResponse[idx-4]=="+" AND {
        }   phoneResponse[idx-3]=="C" AND phoneResponse[idx-2]=="M" AND {
        }   phoneResponse[idx-1]=="G" AND phoneResponse[idx-0]=="L" )
'          uart.str(DEBUG,string(13,"Found message",13))
          messageCount++
          idx := -1       ' set to minus one because of increment below
          bytefill(@phoneResponse,0,RESPONSE_LENGTH)       ' zero out array

      if idx => 4
        if (phoneResponse[idx-3]=="O" AND phoneResponse[idx-2]=="K" AND {
        }   phoneResponse[idx-1]==CR AND phoneResponse[idx]==LF )
          endOfMessages := TRUE
'          uart.str(DEBUG,string(13,"End of messages",13))

      idx++


  uart.str(DEBUG,string(13,"Messages found: "))
  uart.dec(DEBUG,messageCount)
  uart.putc(DEBUG,CR)

  uart.str(DEBUG, string(13,"Reading individual messages.", 13))
  messageNumber := 1
  repeat 5 ' only process 5 messages at a time
    bytefill(@phoneResponse,0,RESPONSE_LENGTH)       ' zero out array
    idx := 0
    _attention
    DELAY_MS(100)
    uart.rxflush(PHONE)
    uart.str(PHONE,@phone_read_message_command)
    uart.dec(PHONE,messageNumber++)
    uart.putc(PHONE,CR)

    timeOutCounter := cnt
    repeat until(cnt - timeOutCounter > clkfreq * 2) ' allow 2 seconds to do this; then quit
      phoneRx := uart.rxcheck(PHONE)                 ' AT+CMGR=1
      if phoneRx <> -1                               ' 0123456789012345678901234567890
        uart.putc(DEBUG,SPACE)
        uart.tx(DEBUG,phoneRx)
        uart.putc(DEBUG,COLON)
        uart.hex(DEBUG,phoneRx,2)
        if phoneRx == "," 'OR phoneRx == CR OR phoneRx == LF   ' parse incoming message on commas, CR, and LF
          phoneResponse[idx++] := 0
        else
          phoneResponse[idx++] := phoneRx

    if (idx > 6 AND strcomp(@phoneResponse[idx-6],@okString))
      messageCount--
      uart.str(DEBUG, string(13,"OK: "))
      'uart.dec(DEBUG,CREATE_POINTERS(idx))
      'PROCESS_MESSAGE
    elseif (idx > 9 AND strcomp(@phoneResponse[idx-9],@errorString))
      uart.str(DEBUG, string(13,"ERROR",13))
    else
      uart.str(DEBUG, string(13,"UNKNOWN",13))


return

Here is the output when requesting message number 4 from above and a fifth message that doesn't exist. This is: a space, followed by the ascii character, followed by a colon, followed by the hex value of the ascii character

Messages found: 3

Reading individual messages.

:0D
:0A +:2B C:43 M:4D G:47 R:52 ::3A  :20 ":22 R:52 E:45 C:43  :20 R:52 E:45 A:41 D:44 ":22 ,:2C ":22 +:2B 1:31 8:38 1:31 4:34 6:36 5:35 7:37 2:32 1:31 5:35 0:30 ":22 ,:2C ":22 J:4A o:6F s:73 e:65 p:70 h:68  :20 c:63 e:65 l:6C l:6C ":22 ,:2C ":22 1:31 1:31 /:2F 0:30 2:32 /:2F 2:32 4:34 ,:2C 1:31 9:39 ::3A 5:35 3:33 ::3A 3:33 5:35 -:2D 2:32 0:30 ":22 ,:2C 1:31 4:34 5:35 ,:2C 4:34 ,:2C 0:30 ,:2C 0:30 ,:2C ":22 +:2B 1:31 3:33 1:31 2:32 3:33 1:31 4:34 9:39 6:36 2:32 1:31 ":22 ,:2C 1:31 4:34 5:35 ,:2C 9:39
:0D
:0A M:4D e:65 s:73 s:73 a:61 g:67 e:65  :20 1:31
:0D
:0A
:0D
:0A O:4F K:4B
:0D
:0A
OK:
:0D
:0A +:2B C:43 M:4D G:47 R:52 ::3A  :20 ":22 R:52 E:45 C:43  :20 R:52 E:45 A:41 D:44 ":22 ,:2C ":22 +:2B 1:31 8:38 1:31 4:34 8:38 0:30 8:38 5:35 8:38 0:30 6:36 ":22 ,:2C ":22 G:47 o:6F o:6F g:67 l:6C e:65  :20 v:76 o:6F i:69 c:63 e:65 ":22 ,:2C ":22 1:31 1:31 /:2F 0:30 2:32 /:2F 2:32 5:35 ,:2C 2:32 3:33 ::3A 4:34 4:34 ::3A 5:35 6:36 -:2D 2:32 0:30 ":22 ,:2C 1:31 4:34 5:35 ,:2C 4:34 ,:2C 0:30 ,:2C 0:30 ,:2C ":22 +:2B 1:31 3:33 1:31 2:32 3:33 1:31 4:34 9:39 6:36 2:32 1:31 ":22 ,:2C 1:31 4:34 5:35 ,:2C 4:34 6:36
:0D
:0A H:48 e:65 r:72 e:65 ':27 s:73  :20 w:77 h:68 a:61 t:74  :20 t:74 h:68 e:65 y:79  :20 s:73 a:61 i:69 d:64  :20 ":22 G:47 e:65 t:74  :20 r:72 e:65 a:61 d:64 y:79  :20 h:68 e:65 r:72 e:65  :20 w:77 e:65  :20 c:63 o:6F m:6D e:65 ":22
:0D
:0A
:0D
:0A O:4F K:4B
:0D
:0A
OK:
:0D
:0A +:2B C:43 M:4D G:47 R:52 ::3A  :20 ":22 R:52 E:45 C:43  :20 R:52 E:45 A:41 D:44 ":22 ,:2C ":22 +:2B 1:31 8:38 1:31 4:34 2:32 3:33 2:32 0:30 5:35 2:32 1:31 ":22 ,:2C ":22 M:4D a:61 r:72 y:79  :20 c:63 e:65 l:6C l:6C ":22 ,:2C ":22 1:31 1:31 /:2F 0:30 2:32 /:2F 2:32 5:35 ,:2C 2:32 1:31 ::3A 3:33 4:34 ::3A 0:30 0:30 -:2D 2:32 0:30 ":22 ,:2C 1:31 4:34 5:35 ,:2C 4:34 ,:2C 0:30 ,:2C 0:30 ,:2C ":22 +:2B 1:31 3:33 1:31 2:32 3:33 1:31 4:34 9:39 6:36 2:32 1:31 ":22 ,:2C 1:31 4:34 5:35 ,:2C 2:32 4:34
:0D
:0A T:54 h:68 i:69 s:73  :20 i:69 s:73  :20 f:66 r:72 o:6F m:6D  :20 m:6D a:61 r:72 y:79 s:73  :20 p:70 h:68 o:6F n:6E e:65
:0D
:0A
:0D
:0A O:4F K:4B
:0D
:0A
OK:
:0D
:0A E:45 R:52 R:52 O:4F R:52
:0D
:0A
UNKNOWN

:0D
:0A E:45 R:52 R:52 O:4F R:52
:0D
:0A
UNKNOWN

Thanks for reading and for any suggestions.
Regards,
Peter

StefanL38 · 2011-02-26 04:08

Hi Peter,

I like to ask for a couple things

1. complete code (that I can setup a prop to receive a string with the same characters as from your phone from a serial terminal)

and having this I will see automatically which serial object you are using and you don't have to mention it.
maybe it is a timing problem, maybe it is a bufferoverwrite problem

2. a simplified version of your code just showing how you try to pull out the "3" and nothing else

3. debugoutput not as hexadecimal but as ascii-characters and controlbytes as acronyms like hex 10 acronym <LF> hex 1D acronym <CR>

For me it will be MUCH EASIER if I can send a emulated string to the propeller running your code than to think everything in my head just reading your receive-method PUB READ_MESSAGES

This my sound unprofessional but in fact it IS professional to check and analyse everything step by step.

Did you do some step by step testing?
I mean sending a SINGLE request to your phone which makes the phone respond with a short message
and just echo the received bytes to a PC-serial terminal?
Checking if each byte is received the same way as if the PC-serial terminal is DIRECTLY connected to the phone?

best regards

Stefan

pgbpsu · 2011-02-26 04:38

Hi StefanL38

Thanks for your response.

I should have mentioned that I'm using the 4 ports-in-one cog serial object, which is why you see lines which have DEBUG and PHONE. DEBUG sends info back to the PC and PHONE sends chars to the phone.

I have been testing things step-by-step. It was quite late here when I posted my question and I may not have made my problem clear. I have no problem interacting with the phone. I can send commands and get responses with no problem. My problem is more "How would you approach this problem?" and less "What's wrong with my code?"

My question really is "If you had the following text coming across a serial port into the prop" how would you break out the different fields? Keep in mind the number of characters in each field is not constant one message to another.

The messages come from the phone in one of two formats.
Format 1:

+CMGL: 1,"REC READ","+18142150657","Joseph cell","11/02/24,19:53:35-20",145,9
Message 1
+CMGL: 2,"REC READ","+18145806808","Google voice","11/02/25,23:44:56-20",145,46
Here's what they said "Get ready here we come"
+CMGL: 3,"REC READ","+18140521232","Mary cell","11/02/25,21:34:00-20",145,24
This is from marys phone

OK

Format 2:

+CMGR: "REC READ","+18142150657","Joseph cell","11/02/24,19:53:35-20",145,4,0,0,"+13123149621",145,9
Message 1

OK

+CMGR: "REC READ","+18145806808","Google voice","11/02/25,23:44:56-20",145,4,0,0,"+13123149621",145,46
Here's what they said "Get ready here we come"

OK

+CMGR: "REC READ","+18140521232","Mary cell","11/02/25,21:34:00-20",145,4,0,0,"+13123149621",145,24
This is from marys phone

OK

ERROR


ERROR

I can get the phone to spit back either of the snippets above. What I can't do is get the different fields into an array of strings. I was following Perry's GPS parser as an example, but I don't have an easily identifiable start char like the NMEA messages do. Any character I can think of is also a character that might show up in an actual message.

Thanks again for looking over this.
Regards,
Peter

StefanL38 · 2011-02-26 10:08

if there would be really no regular pattern in the bytesequence there is no chance.
But I think there ARE regular patterns.

From the messages you posted I assume they all begin with "+CMGR: "

But it is still not clear to me what belongs to the message and what not

is "This is from marys phone" your comment or a part of the message???
You have to make REALLY clear what is the bytesequence of the message only.

Use a "~" to indicate where the message begins and ends

and you have to provide the FULL variety of how messages can look like

my first suggestion is checking the incoming bytesequence for the character SEQUENCE "+CMGR: "

you can do this using the strcmp command (compare strings)
or by using sequenced flags if character "+" arrives set flat "plus_received"
if plus_received
check for character "C" if yes set flag C_received
else plus_received := false

if plus_received and C_received
else
plus_received := false
C_received := false

check for "M" received if yes set flag M_received
else
etc. etc.

after receiving a ":" and a blank check if next character is a digit until a comma "," is detected

if you quoted the messages just the way you recive them parts of the message are separated by ","
so you can check for this

you can do this by setting a flag
leading " detected
trailing " detected
then next character has to be a comma
then nect character has to be a leading "

if numbers always start with a + you can use this in a similar way

another pattern might be counting the character sequences " any text ",
I mean counting how many leading " trailing " followed by a comma have already been detected
after the first the number is following
after the second "Joseph Cell" is following
after the third date is following and this string contains the pattern two digits "/" two digits "/" two digits comma two digits ":" two digits ":" two digits minus two digits " comma
then there is a pattern of digits and commas until the character-sequence ,"+digits"

If all these things are varying you first have to find out what stays still constant in the way described above. Might be pretty much coding of if-conditions but if you want to have it automated it has to be done

best regards

Stefan

kwinn · 2011-02-26 11:45

Parsing this looks to be pretty straight forward.

1- Each record starts with either +CPMS:, +CMGS:, or +CMGR: so you can start by looking for one of those three 6 character groups.

2- The rest of the fields are separated by commas, so use them as field separators/end of field characters.

3- If it is +CPMS: the fields are numeric. Each field can be scanned and stored/processed if required.

4- The +CMGS and +CMGR formats appear to be identical for the message body. Each record has the +Cxxx: header and xx fields as outlined below. The last numeric field before the actual text message appears to be the length of the text message.

Field 1 - "REC READ" - Appears to be the same for all so can be ignored.
Field 2 - "+18140521232" - The senders phone number
Field 3 - "Mary cell" - The callers name
Field 4 - "11/02/25,21:34:00-20" – Date and time of call
Field 5 - ,145 – May be the length of the header
Fields 6, 7, 8 – 4,0,0 - No idea what they are
Field 9 - "+13123149621" – Recipients phone number
Field 10 – 145 – Probably header length
Field 11 – 24 – Length of text message
Field 12 - This is from marys phone – Actual text message

Hope this helps

pgbpsu · 2011-02-26 11:54

Hi Stefan-

Thanks for your comments and suggestions. The path I was going down was more or less what you describe. Because of the variability of the messages there's no easy way to simplify this any further than lots of individual checking.

I think I've decided to use the second format because I can request messages individually. So yes each one will begin with +CMGR: If I can scan for that I can tell when a new message has begun. Unfortunately the phone number (the second field if one uses a comma as a delimiter, sometimes contains a + and sometimes doesn't.

It looks like I'll be doing this brute force since the are too many possibilities. Thanks for the suggestions. If anyone else has a clever way of doing this please let me know.

Regards,
Peter

kwinn · 2011-02-27 10:41

Peter, I think you may be over estimating the difficulty of this. If I understand what you want to do, all you need to do is the following:

1 - Check the first 6 characters to see if they are +CGMS:
if they are not go to the next message
2 – Step through each character until the next comma is encountered
this is the senders phone number.
step past the quotation mark
step past the + sign if it is present
store the next 11 characters in the “sender” variable
3 – Step through each character until the 10th comma is encountered
this is the message length
step past it until you reach the first alpha character, which is the start of the text message
store the alpha characters (maximum of 6) in the “command” variable
4 – Go do whatever comparisons and processing you require.

pgbpsu · 2011-02-27 17:25

Kwinn-

I think you are right that I was making this more difficult than it needed to be. I was worrying about all possible text messages, but I think you are right. I need to figure out how to handle the command messages and just ignore the rest.

This system was to be installed a my brother's sugar bush. It's springtime (almost) here in the NE USA which means sugar maples are starting to produce maple sap which can be boiled down into maple syrup. My brother has about 3000 taps spread across 4 sugar bushes. His largest one is about 40 miles from the location where he boils it down. I was putting together this little system to keep track of the temperature and the volume of sap collected. I had to deploy this setup today before finishing the command parsing. As it stands it sends messages on the hour. The commands were really for control of the monitoring and SMSs only. I set it up to send hourly SMS between 6AM and midnight. He isn't really concerned about the night time so it will probably work fine the way it is. But when I get it back, I'll work out the command parsing. The ability to reboot things would be helpful. If the system proves reliable, maybe next year we'll hook up the vacuum pump to the system so it can be controlled either by some SPIN program or via the cell phone. For this year I got my feet wet with a temp sensor and a hall sensor that gives an indication of volume. Switching 220VAC is a bit beyond me but I might be back for more help with that after the season is over.

Thanks again for the suggestions
Peter

kwinn · 2011-02-28 22:51

pgbpsu, Always happy to help. Good luck with the maple syrup season this year. You may want to take a look at the string routines in the obex for the command parsing. As for dealing with 220V, it is not that difficult. You need to be aware of the safety precautions, but there are numerous solid state relays and other devices to help in that regard.

Help parsing SMS

Comments