One question that comes up frequently in the forums goes something like, "How do I extract latitude and longitude from GPS sentences?" Extracting data from incoming character streams is a common requirement, which usually entails searching for patterns in the character stream, so you know where to look for the data. PBASIC includes rudimentary pattern matching with its input WAIT modifier, but no such facility is native to Spin.
A commonly used and very powerful pattern matching tool can be found in regular expressions. It's beyond the scope of a forum post to describe regular expresisons in any detail, but there are several good online references that do so:
····www.regular-expressions.info/tutorial.html
····en.wikipedia.org/wiki/Regular_expression
····etext.lib.virginia.edu/services/helpsheets/unix/regex.html
In order to to facilitate some upcoming GPS work, I decided to write a regular expression parser and pattern matcher in Spin. It uses pretty much the standard regex vocabulary and includes many of the standard features, but with some notable differences:
1. Only two anchors are supported for now: ^ (string beginning) and $ (string end).
2. The {m,n} repeat count is not yet supported.
3. Rather than extracting all the parenthesized groupings in a pattern, only those which begin with ($1 through ($9 are extracted.
4. My version does not do any backtracking. Once a portion of the string is matched, the matching engine will only move forward. Backtracking is difficult to implement efficiently and often causes a lot of churning to attain a match. Since my engine is written in Spin, backtracking could really slow things to a crawl.
5. Some special escape sequences have not yet been implemented.
6. Most regular expression engines compile the regex first before applying it to a string. In mine, the regex is applied entirely interpretively: the parsing and pattern matching occur simultaneously.
7. This is a matching and extraction engine only: there's no substitution or translation facility built in.
The best way to show what it does is to use a common GPS string as an example. Here you see some NMEA sentences as they might have come from a GPS unit and which exist in a string buffer somewhere:
····$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344, 39,14,22,228,45*75
····$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4 ,230394,003.1,W*6A
····$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48
For this example, what we're interested in is the RMC sentence and the latitude and longitude info it contains:
····$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6A
The red fields are latitude, and the blue fields are longitude. The data can be extracted using regex.spin and the following pattern (regular expression):
····\$GPRMC\s*,[^,]*,[^,]*,($1[\d\.]+)\s*,($2N|S)\s*,($3[\d\.]+)\s*,($4E|W)
The colors indicate those portions of the pattern used for the actual data extraction. This probably looks like a real mess to the uninitiated, and it's a fact that regular expressions are much easier to write than they are to read. But the individual elements are very simple, so I will try to explain them one at a time.
The first thing you see is \$GPRMC. This is there to make sure we're extracting data from the right sentence. The dollar sign is prepended with a backslash because $ by itself has special meaning, and the \ quotes it as a character to match. So the pattern matcher will scan the input stirng until it sees $GPRMC
Next is the rather cryptic-looking \s*. \s matches any whitespace character, such as space, CR, LF, and TAB. The * says to match the whitespace characters 0 or more times. Normally, the $GPRMC will be followed immediately by a comma, but this is put in the pattern in case some GPS receiver somewhere decides to throw in some extra blanks.
Next comes a comma, which needs to be matched, followed by another odd-looking construction: [^,]*. A list of characters inside square brackets defines a catagory. Any single character in the input string will match anything included in the category. Prepending the carat ^ to the list of characters means to match everything but the characters in the list. So, taken together with the *, [^,]* meaans to match zero or more occurances of anything besides a comma. This, along with the comma itself is used to skip data fields that we're not interested in.
Next comes ($1[\d\.]+). Anything in parentheses is a group that's treated as a single element. A group that starts with ($ followed by a digit is a special group whose data we want to extract. The digit (1-9) specifies which slot in the return array the extracted data should be stored. The actual data for this group has to match the pattern [\d\.]+. Again the bracketed set is a class consisting of two items: \d, which matches any decimal digit and \., which matches the decimal point. The latter is prepended with the backslash escape because, by itself, it has special significance: a lone period matches any single character. The plus following the class means to match the class one or more times. So, taken together, [\d\.]+ means to match a group of digits and decimal point(s) until something else comes along. This is the numerical part of the latitude and will be stored in position 1 of the results array. (Position 0 is reserved for the portion of the string that matched the entire pattern.)
Position 2 of the results will be either N or S. Its pattern (following the comma) is ($2N|S). The vertical bar means just what you think it does: OR. N|S will match either N or S. (It could also have been expressed [NS] to the same effect. But the vertical bar can also be used to separate subpatterns of more than one character, viz. NORTH|SOUTH.)
Positions 3 and 4 of the extracted data are for longitude and work just like positions 1 and 2.
Here's a sample program that takes a string containing the sentences above, locates the $GPRMC sentence, and extracts the lat/lon data from it:
Code:CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 OBJ re : "regex" io : "FullDuplexSerial" PUB Start | teststr, pattern, resaddr, rslt, i io.start(31, 30, 0, 9600) io.tx(0) teststr := string("$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75", 13, { } "$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6A", 13, { } "$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48", 13) pattern := string("\$GPRMC\s*,[*^,]*,[*^,]*,($1[*\d\.]+)\s*,($2N|S)\s*,($3[*\d\.]+)\s*,($4E|W)") io.str(string("String:", 13, 13)) io.str(teststr) io.str(string(13, 13, "Pattern:", 13, 13)) io.str(pattern) rslt := -cnt resaddr := re.match(teststr, pattern, re#NOALT) rslt += cnt io.str(string(13, 13, "Time: ")) io.dec(rslt / 80_000) io.str(string(" ms.")) io.str(string(13, 13, "Results:", 13)) if (resaddr < 0) io.tx(13) io.str(string("Error #")) io.dec(-resaddr) elseif (resaddr == 0) io.tx(13) io.str(string("No match.")) else repeat i from 0 to 9 if (rslt := long[*resaddr][*i]) io.tx(13) io.dec(i) io.str(string(": ")) io.str(re.field(i)) io.str(string(13, 13, "Remainder of string:", 13, 13)) io.str(re.remainder)
Here's what the output looks like:
Code:String: $GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75 $GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6A $GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48 Pattern: \$GPRMC\s*,[^,]*,[^,]*,($1[\d\.]+)\s*,($2N|S)\s*,($3[\d\.]+)\s*,($4E|W) Time: 143 ms. Results: 0: $GPRMC,123519,A,4807.038,N,01131.000,E 1: 4807.038 2: N 3: 01131.000 4: E Remainder of string: ,022.4,084.4,230394,003.1,W*6A $GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48
The results are displayed using the regex object's field method, given the index number for each field.
That's about all I can write about it here. Hopefully, I'll have a more thorough document available at a later date. In the meantime, give the program try if it's something that interests you. It's still really raw and very alpha, so don't rely on it too heavily until it receives more testing (and, possibly, some changes).
-Phil
Edit: Fixed several errors where \w was used when \s was intended. Added updated archive. Demo now uses field method to print, instead of substr method.
Post Edited (Phil Pilgrim (PhiPi)) : 6/30/2010 6:19:58 AM GMT




Reply With Quote





Bookmarks