Pattern matching and data extraction in Spin using regular expressions.

Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 19,524
One question that comes up frequently in the forums goes something like, "How do I extract latitude and longitude from GPS sentences?" Extracting data from incoming character streams is a common requirement, which usually entails searching for patterns in the character stream, so you know where to look for the data. PBASIC includes rudimentary pattern matching with its input WAIT modifier, but no such facility is native to Spin.

A commonly used and very powerful pattern matching tool can be found in regular expressions. It's beyond the scope of a forum post to describe regular expresisons in any detail, but there are several good online references that do so:

····www.regular-expressions.info/tutorial.html
····en.wikipedia.org/wiki/Regular_expression
····etext.lib.virginia.edu/services/helpsheets/unix/regex.html

In order to to facilitate some upcoming GPS work, I decided to write a regular expression parser and pattern matcher in Spin. It uses pretty much the standard regex vocabulary and includes many of the standard features, but with some notable differences:

1. Only two anchors are supported for now: ^ (string beginning) and $ (string end).

2. The {m,n} repeat count is not yet supported.

3. Rather than extracting all the parenthesized groupings in a pattern, only those which begin with ($1 through ($9 are extracted.

4. My version does not do any backtracking. Once a portion of the string is matched, the matching engine will only move forward. Backtracking is difficult to implement efficiently and often causes a lot of churning to attain a match. Since my engine is written in Spin, backtracking could really slow things to a crawl.

5. Some special escape sequences have not yet been implemented.

6. Most regular expression engines compile the regex first before applying it to a string. In mine, the regex is applied entirely interpretively: the parsing and pattern matching occur simultaneously.

7. This is a matching and extraction engine only: there's no substitution or translation facility built in.

The best way to show what it does is to use a common GPS string as an example. Here you see some NMEA sentences as they might have come from a GPS unit and which exist in a string buffer somewhere:

····$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75
····$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6A
····$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48


For this example, what we're interested in is the RMC sentence and the latitude and longitude info it contains:

····$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6A

The red fields are latitude, and the blue fields are longitude. The data can be extracted using regex.spin and the following pattern (regular expression):

····\$GPRMC\s*,[noparse][[/noparse]^,]*,[noparse][[/noparse]^,]*,($1[noparse][[/noparse]\d\.]+)\s*,($2N|S)\s*,($3[noparse][[/noparse]\d\.]+)\s*,($4E|W)

The colors indicate those portions of the pattern used for the actual data extraction. This probably looks like a real mess to the uninitiated, and it's a fact that regular expressions are much easier to write than they are to read. But the individual elements are very simple, so I will try to explain them one at a time.

The first thing you see is \$GPRMC. This is there to make sure we're extracting data from the right sentence. The dollar sign is prepended with a backslash because $ by itself has special meaning, and the \ quotes it as a character to match. So the pattern matcher will scan the input stirng until it sees $GPRMC

Next is the rather cryptic-looking \s*. \s matches any whitespace character, such as space, CR, LF, and TAB. The * says to match the whitespace characters 0 or more times. Normally, the $GPRMC will be followed immediately by a comma, but this is put in the pattern in case some GPS receiver somewhere decides to throw in some extra blanks.

Next comes a comma, which needs to be matched, followed by another odd-looking construction: [noparse][[/noparse]^,]*. A list of characters inside square brackets defines a catagory. Any single character in the input string will match anything included in the category. Prepending the carat ^ to the list of characters means to match everything but the characters in the list. So, taken together with the *, [noparse][[/noparse]^,]* meaans to match zero or more occurances of anything besides a comma. This, along with the comma itself is used to skip data fields that we're not interested in.

Next comes ($1[noparse][[/noparse]\d\.]+). Anything in parentheses is a group that's treated as a single element. A group that starts with ($ followed by a digit is a special group whose data we want to extract. The digit (1-9) specifies which slot in the return array the extracted data should be stored. The actual data for this group has to match the pattern [noparse][[/noparse]\d\.]+. Again the bracketed set is a class consisting of two items: \d, which matches any decimal digit and \., which matches the decimal point. The latter is prepended with the backslash escape because, by itself, it has special significance: a lone period matches any single character. The plus following the class means to match the class one or more times. So, taken together, [noparse][[/noparse]\d\.]+ means to match a group of digits and decimal point(s) until something else comes along. This is the numerical part of the latitude and will be stored in position 1 of the results array. (Position 0 is reserved for the portion of the string that matched the entire pattern.)

Position 2 of the results will be either N or S. Its pattern (following the comma) is ($2N|S). The vertical bar means just what you think it does: OR. N|S will match either N or S. (It could also have been expressed [noparse][[/noparse]NS] to the same effect. But the vertical bar can also be used to separate subpatterns of more than one character, viz. NORTH|SOUTH.)

Positions 3 and 4 of the extracted data are for longitude and work just like positions 1 and 2.

Here's a sample program that takes a string containing the sentences above, locates the $GPRMC sentence, and extracts the lat/lon data from it:

[b]CON[/b]

  [b]_clkmode[/b]      = [b]xtal1[/b] + [b]pll16x[/b]
  [b]_xinfreq[/b]      = 5_000_000

[b]OBJ[/b]

  re    : "regex"
  io    : "FullDuplexSerial"

[b]PUB[/b]  Start | teststr, pattern, resaddr, rslt, i

  io.start(31, 30, 0, 9600)
  io.tx(0)
  
  teststr := [b]string[/b]("$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75", 13, {
                 }  "$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6A", 13, {
                 }  "$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48", 13)
                 
  pattern := [b]string[/b]("\$GPRMC\s*,[noparse][[/noparse]*^,]*,[noparse][[/noparse]*^,]*,($1[noparse][[/noparse]*\d\.]+)\s*,($2N|S)\s*,($3[noparse][[/noparse]*\d\.]+)\s*,($4E|W)")

  io.[b]str[/b]([b]string[/b]("[b]String[/b]:", 13, 13))
  io.[b]str[/b](teststr)
  io.[b]str[/b]([b]string[/b](13, 13, "Pattern:", 13, 13))
  io.[b]str[/b](pattern)

  rslt := -[b]cnt[/b]
  resaddr := re.match(teststr, pattern, re#NOALT)
  rslt += [b]cnt[/b]

  io.[b]str[/b]([b]string[/b](13, 13, "Time: "))
  io.dec(rslt / 80_000)
  io.[b]str[/b]([b]string[/b](" ms."))
  io.[b]str[/b]([b]string[/b](13, 13, "Results:", 13))

  [b]if[/b] (resaddr < 0)
    io.tx(13)
    io.[b]str[/b]([b]string[/b]("Error #"))
    io.dec(-resaddr)
  [b]elseif[/b] (resaddr == 0)
    io.tx(13)
    io.[b]str[/b]([b]string[/b]("No match."))
  [b]else[/b]
    [b]repeat[/b] i [b]from[/b] 0 to 9
      [b]if[/b] (rslt := [b]long[/b][noparse][[/noparse]*resaddr][noparse][[/noparse]*i])
        io.tx(13)
        io.dec(i)
        io.[b]str[/b]([b]string[/b](": "))
        io.[b]str[/b](re.field(i))

    io.[b]str[/b]([b]string[/b](13, 13, "Remainder of [b]string[/b]:", 13, 13))
    io.[b]str[/b](re.remainder)
     




Here's what the output looks like:

String:

$GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75
$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6A
$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48


Pattern:

\$GPRMC\s*,[noparse][[/noparse]^,]*,[noparse][[/noparse]^,]*,($1[noparse][[/noparse]\d\.]+)\s*,($2N|S)\s*,($3[noparse][[/noparse]\d\.]+)\s*,($4E|W)

Time: 143 ms.

Results:

0: $GPRMC,123519,A,4807.038,N,01131.000,E
1: 4807.038
2: N
3: 01131.000
4: E

Remainder of string:

,022.4,084.4,230394,003.1,W*6A
$GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48




The results are displayed using the regex object's field method, given the index number for each field.

That's about all I can write about it here. Hopefully, I'll have a more thorough document available at a later date. In the meantime, give the program try if it's something that interests you. It's still really raw and very alpha, so don't rely on it too heavily until it receives more testing (and, possibly, some changes).

-Phil

Edit: Fixed several errors where \w was used when \s was intended. Added updated archive. Demo now uses field method to print, instead of substr method.

Post Edited (Phil Pilgrim (PhiPi)) : 6/30/2010 6:19:58 AM GMT
“Impossible is just a big word thrown around by small men who find it easier to live in the world they’ve been given than to explore the power they have to change it. Impossible is not a fact. It’s an opinion. Impossible is not a declaration. It’s a dare. Impossible is potential. Impossible is temporary. Impossible is nothing.” –Muhammad Ali
Tagged:

Comments

  • 23 Comments sorted by Date Added Votes
  • max72max72 Posts: 1,096
    edited November 2009 Vote Up0Vote Down
    That's great!
    I really love regexp, and having them on the propeller is a dream...

    Thanks,
    Massimo
  • SamMishalSamMishal Posts: 468
    edited November 2009 Vote Up0Vote Down
    WOW....Phil....you are a MASTER.....RegExps on the Propeller....you are REALLY A MASTER....

    I had a heck of time writing one to incorporate into RobotBASIC just recently......it works
    just like the PERL standard with extended syntax and replacing etc.


    I would have never been courageous enough to do an engine in Spin.......you are Good.

    For anyone wanting to learn Regular Expressions and wants a way to play with them
    and/or follow along a tutorial....here is a program that lets you do so easily. It is fun to
    use, and REALLY useful for following along with a book's examples or even a
    web tutorial and·also gives you some standard patterns such as matching an email, a date
    etc. It is a compiled EXE of a program written in RobotBASIC and uses the RB RegExp engine
    that is now part of the new version that is about to be released...I am posting only the EXE...
    If you want the source code it will be in the down load zip when I release the new version (V4.0.2)
    very soon. But the post below is an EXE and you do not need anything else other than just run it.


    ·

    Samuel

    ·

    Post Edited (SamMishal) : 11/11/2009 11:30:07 AM GMT
    827 x 707 - 270K
  • FredBlaisFredBlais Posts: 292
    edited November 2009 Vote Up0Vote Down
    I'm keeping that in my favorites [noparse]:)[/noparse]
  • BradCBradC Posts: 2,601
    edited November 2009 Vote Up0Vote Down
    Phil Pilgrim (PhiPi) said...

    In order to to facilitate some upcoming GPS work, I decided to write a regular expression parser and pattern matcher in Spin. It uses pretty much the standard regex vocabulary and includes many of the standard features, but with some notable differences:

    Superb work (as usual) Phil. So you have already made a good first step towards porting perl...

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    If you always do what you always did, you always get what you always got.
    wtf?
  • heaterheater Posts: 3,370
    edited November 2009 Vote Up0Vote Down
    Ahhhh...My worst nightmare has come to the Propeller world. Regular expressions give me severe headache.
    Having to read modem line noise "\$GPRMC\w*,[noparse][[/noparse]^,]*,[noparse][[/noparse]^,]*,($1[noparse][[/noparse]\d\.]+)\w*,($2N|S)\w*,($3[noparse][[/noparse]\d\.]+)\w*,($4E|W)" is not fun.

    Well done[noparse]:)[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • DroneDrone Posts: 433
    edited November 2009 Vote Up0Vote Down
    Wow Phil, excellent work. I live in Perl on other machines and use PCRE in C much. Nice to see this. However, the NMEA parsing example is good, but in spin, there is no way to define on-time per NMEA for, let's say a cog displaying time in an LED via perhaps parallel connection. Wish for RegEx NMEA driver in PASM. I've been looking at NMEA parsers written in C/C++ for AVR micro-controllers that do only one thing, on-time dump of fairly accurate time to serially connected time displays. This would be a replacement for the likes of IRIG-B, which is notoriously difficult to demodulate-decode (this is another target for the propeller for many reasons, if only there was a hardware multiplier/divider).

    I'm in run-on mode. Back to Topic. Great to see a RegEx object for the Prop. And thanks to you for having the depth of knowledge that recognized this as needed. Now if only someone can take the double precision stuff from the unofficial Propeller Wiki to an actual object with demo code encapsulation.
  • jazzedjazzed Posts: 11,767
    edited June 2010 Vote Up0Vote Down
    Phil, Thanks for writing this [noparse]:)[/noparse]
    Missed seeing you at the expo this year.

    Cheers,
    --Steve

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • WBA ConsultingWBA Consulting Posts: 2,385
    edited June 2010 Vote Up0Vote Down
    Phil, have you been reading my notes or something? This is exactly what I have been trying to grasp recently so....
    THANK YOU!!!!
    THANK YOU!!!!
    THANK YOU!!!!


    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Andrew Williams
    WBA Consulting
    PowerTwig Dual Output Power Supply Module
    My Prop projects: Reverse Geo-Cache Box, Custom Metronome, Micro Plunge Logger
    Andrew Williams<br>WBA Consulting
  • BergamotBergamot Posts: 185
    edited June 2010 Vote Up0Vote Down
    Now we have two problems...
  • Brian RileyBrian Riley Posts: 626
    edited June 2010 Vote Up0Vote Down
    Great work, Phil. I will have a use for this right quick ... I was going to ask for a string to parse $GPGGA .... but I went back and reread your excellant mini-tutorial and I figured I could do it myself. So, I started by modifying $GPRMC to also get time and date strings .. it worked on the first try ... I think I understand most of it, more than enough to do $GPGGA ... however could you please explain a little more about the "+" ????

    String:
    
    $GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75
    $GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6A
    $GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48
    
    
    Pattern:
    
    \$GPRMC\w*,($1[noparse][[/noparse]\d]+)\w*,[noparse][[/noparse]^,]*,($2[noparse][[/noparse]\d\.]+)\w*,($3N|S)\w*,($4[noparse][[/noparse]\d\.]+)\w*,($5E|W)\w*,[noparse][[/noparse]^,]*,[noparse][[/noparse]^,]*,($6[noparse][[/noparse]\d]+)
    
    Results:
    
    0: $GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394
    1: 123519
    2: 4807.038
    3: N
    4: 01131.000
    5: E
    6: 230394
    
    



    cheers ... BBR

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    cheers ... brian riley, n1bq, underhill center, vermont
    The Shoppe at Wulfden
    www.wulfden.org/TheShoppe/
    www.wulfden.org/TheShoppe/prop/ - Propeller Products
    www.wulfden.org/TheShoppe/k107/ - Serial LCD Display Gear

    cheers ... BBR
    The Propeller Shoppe at Wulfden
    http://www.wulfden.org/TheShoppe/prop/
    - on DropBox ==> Tachyon Forth Apps
    - at Wulfden Forth Resources Directory
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 19,524
    edited June 2010 Vote Up0Vote Down
    The "+" is a modifier to the unit it follows, such that it will match one or more of that unit in sequence. For example, "[noparse][[/noparse]ab]+" will match "aabbbababc" up to, but not including, the letter "c". It's often used for capturing numeric digits (i. e. "\d+"), when you know there's at least one but not how many beyond that.

    By contrast, the "*" is a modifier that will cause the unit it follows to match zero or more of that unit in sequence. It's typically used to skip over spaces when you don't know whether there are any or, if there are, how many.

    -Phil
    “Impossible is just a big word thrown around by small men who find it easier to live in the world they’ve been given than to explore the power they have to change it. Impossible is not a fact. It’s an opinion. Impossible is not a declaration. It’s a dare. Impossible is potential. Impossible is temporary. Impossible is nothing.” –Muhammad Ali
  • RossHRossH Posts: 4,057
    edited June 2010 Vote Up0Vote Down
    Aw, c'mon Phil - why tease us and then stop half way?

    We want a full implementation of AWK on the Propeller!

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Catalina - a FREE C compiler for the Propeller - see Catalina
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 19,524
    edited June 2010 Vote Up0Vote Down
    RossH,

    At this point, it's a mere awklet. A Propellerous awklet (Aethia cogspinicus), to be exact.

    -Phil
    “Impossible is just a big word thrown around by small men who find it easier to live in the world they’ve been given than to explore the power they have to change it. Impossible is not a fact. It’s an opinion. Impossible is not a declaration. It’s a dare. Impossible is potential. Impossible is temporary. Impossible is nothing.” –Muhammad Ali
  • Brian RileyBrian Riley Posts: 626
    edited June 2010 Vote Up0Vote Down
    OK, Thanks for the additional explanation Phil. I cleaned up my pattern some, then I sat down and did $GPGGA. Here is the pattern for that ... in case anyone wants it

    $1 is time string,
    $6 is GPS fix quality
    $7 is number of satellites
    $8 is altitude
    $9 units of altitude - M is meters

    ** EDIT **regex pattern corrected for mistake in Phil's original \w's replaced by \s's.

    _
    String:
    
    $GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75
    $GPGGA,170834,4124.8963,N,08151.6838,W,1,05,1.5,280.2,M,-34.0,M,,,*75
    $GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48
    
    
    Pattern:
    
    \$GPGGA\s*,($1\d+)\s*,($2[noparse][[/noparse]\d\.]+)\s*,($3N|S)\s*,($4[noparse][[/noparse]\d\.]+)\s*,($5E|W)\s*,($6\d)\s*,($7\d+)\s*,[noparse][[/noparse]^,]*,($8[noparse][[/noparse]\d\.]+)\s*,($9M)
    
    Time: 1576 ms.
    
    Results:
    
    0: $GPGGA,170834,4124.8963,N,08151.6838,W,1,05,1.5,280.2,M
    1: 170834
    2: 4124.8963
    3: N
    4: 08151.6838
    5: W
    6: 1
    7: 05
    8: 280.2
    9: M
    
    
    
    


    cheers ... BBR

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    cheers ... brian riley, n1bq, underhill center, vermont
    The Shoppe at Wulfden
    www.wulfden.org/TheShoppe/
    www.wulfden.org/TheShoppe/prop/ - Propeller Products
    www.wulfden.org/TheShoppe/k107/ - Serial LCD Display Gear

    Post Edited (Brian Riley) : 6/30/2010 6:10:00 AM GMT

    cheers ... BBR
    The Propeller Shoppe at Wulfden
    http://www.wulfden.org/TheShoppe/prop/
    - on DropBox ==> Tachyon Forth Apps
    - at Wulfden Forth Resources Directory
  • Brian RileyBrian Riley Posts: 626
    edited June 2010 Vote Up0Vote Down
    Phil Pilgrim (PhiPi) said...
    Next is the rather cryptic-looking \w*. \w matches any whitespace character, such as space, CR, LF, and TAB. The * says to match the whitespace characters 0 or more times. Normally, the $GPRMC will be followed immediately by a comma, but this is put in the pattern in case some GPS receiver somewhere decides to throw in some extra blanks.

    Phil ... \w or \s for whitespace??? The comments in the beginning of regex.spin made me think it was \s

    cheers ... BBR

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    cheers ... brian riley, n1bq, underhill center, vermont
    The Shoppe at Wulfden
    www.wulfden.org/TheShoppe/
    www.wulfden.org/TheShoppe/prop/ - Propeller Products
    www.wulfden.org/TheShoppe/k107/ - Serial LCD Display Gear

    cheers ... BBR
    The Propeller Shoppe at Wulfden
    http://www.wulfden.org/TheShoppe/prop/
    - on DropBox ==> Tachyon Forth Apps
    - at Wulfden Forth Resources Directory
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 19,524
    edited June 2010 Vote Up0Vote Down
    I just realized that I screwed something up, and it reflects an error I seem to repeat in my Perl programming more often than I care to admit. \w, in standard RE parlance, is supposed to represent a word character [noparse][[/noparse]A-Za-z0-9_], not whitespace. Whitespace is represented by \s. This is a change I will have to make. My apologies for the confusion.

    -Phil

    Addendum: Yes, Brian, you beat me to it!
    “Impossible is just a big word thrown around by small men who find it easier to live in the world they’ve been given than to explore the power they have to change it. Impossible is not a fact. It’s an opinion. Impossible is not a declaration. It’s a dare. Impossible is potential. Impossible is temporary. Impossible is nothing.” –Muhammad Ali
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 19,524
    edited June 2010 Vote Up0Vote Down
    'Turns out the object is correct, but the demo is wrong. It works only because there aren't any embedded whitespace characters in the NMEA sentences. Attached is a corrected demo program. I'll also make the necessary changes to the original post.

    -Phil
    “Impossible is just a big word thrown around by small men who find it easier to live in the world they’ve been given than to explore the power they have to change it. Impossible is not a fact. It’s an opinion. Impossible is not a declaration. It’s a dare. Impossible is potential. Impossible is temporary. Impossible is nothing.” –Muhammad Ali
  • Brian RileyBrian Riley Posts: 626
    edited June 2010 Vote Up0Vote Down
    Phil Pilgrim (PhiPi) said...
    'Turns out the object is correct, but the demo is wrong. It works only because there aren't any embedded whitespace characters in the NMEA sentences. Attached is a corrected demo program. I'll also make the necessary changes to the original post.

    -Phil


    This won't compile. "re#NOALT" is not in the regex.spin you posted last night. Looking at original code, I assume its value is "0" ??? also "re.field()" is unheard of .....

    cheers ... BBR

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    cheers ... brian riley, n1bq, underhill center, vermont
    The Shoppe at Wulfden
    www.wulfden.org/TheShoppe/
    www.wulfden.org/TheShoppe/prop/ - Propeller Products
    www.wulfden.org/TheShoppe/k107/ - Serial LCD Display Gear

    Post Edited (Brian Riley) : 6/30/2010 6:04:26 AM GMT

    cheers ... BBR
    The Propeller Shoppe at Wulfden
    http://www.wulfden.org/TheShoppe/prop/
    - on DropBox ==> Tachyon Forth Apps
    - at Wulfden Forth Resources Directory
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 19,524
    edited June 2010 Vote Up0Vote Down
    Sorry, Brian. Try the corrected archive I uploaded to the original post instead. I must've added some stuff to the regex object in the interim.

    -Phil
    “Impossible is just a big word thrown around by small men who find it easier to live in the world they’ve been given than to explore the power they have to change it. Impossible is not a fact. It’s an opinion. Impossible is not a declaration. It’s a dare. Impossible is potential. Impossible is temporary. Impossible is nothing.” –Muhammad Ali
  • heichheich Posts: 26
    edited July 2010 Vote Up0Vote Down
    Phil

    How could this post remain hidden in the shadows for so many months?

    I'm doing a complex state machine based on strings commands sent via RS232. If i had access to your excelent contribution before, this machine ( including GPS ) would be finished days ago.

    At this momment, your post has 2 days on my browser, since i have a lot of work here.
    Once i get some free time, i will do an exhaustive test on your code, i´ll change my entire state machine


    Thank you!

    Best of best regards

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Are you looking for a professional PCB/Schematichs software?

    Kicad: GPL. You can do Schematics, PCBS, Gerber interfaces, 3d-views and more
    www.lis.inpg.fr/realise_au_lis/kicad/

    Search in Parallax Forums Using Google
    www.google.com/advanced_search?q=+site:forums.parallax.com&num=20&hl=en&lr=
    Are you looking for a professional PCB/Schematichs software?

    Kicad: GPL. You can do Schematics, PCBS, Gerber interfaces, 3d-views and more

    http://www.lis.inpg.fr/realise_au_lis/kicad/
  • Brian RileyBrian Riley Posts: 626
    edited July 2010 Vote Up0Vote Down
    @Phil ... funny thing, the second regex object you posted is an order of magnitude faster than the first (160-170 ms vs 1500+ ms!!!!). Its great code and will be very useful ... tnx.

    I mucked with the demo program and made some of the display code into methods. It has both patterns (GPRMC and GPGGA) and parses both. The patterns reflect the correction for the whitespace. Its zipped up and attached here.

    ------------------
    String:
    $GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75
    $GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6A
    $GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48
    
    
    Pattern:
    \$GPRMC\s*,($1\d+)\s*,[noparse][[/noparse]^,]*,($2[noparse][[/noparse]\d\.]+)\s*,($3N|S)\s*,($4[noparse][[/noparse]\d\.]+)\s*,($5E|W)\s*,[noparse][[/noparse]^,]*,[noparse][[/noparse]^,]*,($6\d+)
    
    Time: 161 ms.
    
    Results:
    0: $GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394
    1: 123519
    2: 4807.038
    3: N
    4: 01131.000
    5: E
    6: 230394
    
    Remainder of string:
    ,003.1,W*6A
    $GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48
    
    ------------------
    String:
    $GPGSV,2,1,08,01,40,083,46,02,17,308,41,12,07,344,39,14,22,228,45*75
    $GPGGA,170834,4124.8963,N,08151.6838,W,1,05,1.5,280.2,M,-34.0,M,,,*75
    $GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48
    
    
    Pattern:
    \$GPGGA\s*,($1\d+)\s*,($2[noparse][[/noparse]\d\.]+)\s*,($3N|S)\s*,($4[noparse][[/noparse]\d\.]+)\s*,($5E|W)\s*,($6\d)\s*,($7\d+)\s*,[noparse][[/noparse]^,]*,($8[noparse][[/noparse]\d\.]+)\s*,($9M)
    
    Time: 170 ms.
    
    Results:
    0: $GPGGA,170834,4124.8963,N,08151.6838,W,1,05,1.5,280.2,M
    1: 170834
    2: 4124.8963
    3: N
    4: 08151.6838
    5: W
    6: 1
    7: 05
    8: 280.2
    9: M
    
    Remainder of string:
    ,-34.0,M,,,*75
    $GPVTG,054.7,T,034.4,M,005.5,N,010.2,K*48
    
    
    




    cheers ... BBR

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    cheers ... brian riley, n1bq, underhill center, vermont
    The Shoppe at Wulfden
    www.wulfden.org/TheShoppe/
    www.wulfden.org/TheShoppe/prop/ - Propeller Products
    www.wulfden.org/TheShoppe/k107/ - Serial LCD Display Gear

    cheers ... BBR
    The Propeller Shoppe at Wulfden
    http://www.wulfden.org/TheShoppe/prop/
    - on DropBox ==> Tachyon Forth Apps
    - at Wulfden Forth Resources Directory
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 19,524
    edited July 2010 Vote Up0Vote Down
    Brian,

    The bulk of the time savings has to do with the NOALT option. When this option is asserted, you're telling match that there are no alternate patterns to look for at the top level, so it will not scan throught the entire pattern looking for a "|" symbol every time it fails to match a character.

    -Phil
    “Impossible is just a big word thrown around by small men who find it easier to live in the world they’ve been given than to explore the power they have to change it. Impossible is not a fact. It’s an opinion. Impossible is not a declaration. It’s a dare. Impossible is potential. Impossible is temporary. Impossible is nothing.” –Muhammad Ali
Sign In or Register to comment.