Parsing more than eight characters
dbpage
Posts: 217
What is a good way to parse a text string that is longer than eight characters? In the code examples that follow, if I add more "if" statements, I will get the error message "Limit of 8 nested blocks exceeded." I would like to parse a few more characters. After reading timecode there is a space, end of file or invalid format. After reading command, there is a space, carriage return and line feed or end of file or invalid format. I resort to additional methods to continue parsing, but I don't like separating parsing statements.
Example 1:
Example 2:
Example 1:
' Parse timecode MM:SS.T, Where: M=Minutes, S=Seconds, T=Tenths of seconds
i := 0
if C[i]<>-1 and digit := lookdown(C[i]:"0".."5")
time := --digit * _6000tenths
if digit := lookdown(C[++i]:"0".."9")
time := (--digit * _600tenths) + time
if C[++i] == ":"
if digit := lookdown(C[++i]:"0".."5")
time := (--digit * _100tenths) + time
if digit := lookdown(C[++i]:"0".."9")
time := (--digit * _10tenths) + time
if C[++i] == "."
if digit := lookdown(C[++i]:"0".."9")
time := (--digit * tenths) + time
return true
return false ' End of line, file or invalid command format
Example 2:
' Parse command string 0NN-NNN
i := 0
if (C[i]=="0")
if digit := lookdown(C[++i]:"0".."9")
Addr := --digit
if digit := lookdown(C[++i]:"0".."9")
Addr := Addr * 10 + --digit
if C[++i] == "-"
if digit := lookdown(C[++i]:"0".."2")
Fcn := --digit
if digit := lookdown(C[++i]:"0".."9")
Fcn := Fcn * 10 + --digit
if digit := lookdown(C[++i]:"0".."9")
Fcn := Fcn * 10 + --digit
interpret
return true
return false ' End of line, file or invalid command format

Comments
But the next most stupid way to do it might look like the following pseudo code:
c = char[0] if c != what I want return sytax error do domething with c c = char[1] if c != what I want return sytax error do domething with c c = char[2] if c != what I want return sytax error do domething with c c = char[3] if c != what I want return sytax error do domething with cFor a little more sophisticated ways of parsing have a read of "Let's build a Compiler" http://compilers.iecc.com/crenshaw/
Don't be scared off by the "Build a Compiler" thing. That text is very easy to read and starts with ideas about how to parse numbers and white space etc. All done in Pascal which looks like Spin.
I too am having a hard time parsing data though
It would be useful to see more examples of the strings you're wanting to parse, and how you get them. Are there separator or terminating characters?
This code should work. It compiles but I do not have a propeller with me to test it.
VAR long time PUB start PRI timetobin | i ' Parse timecode MM:SS.T, Where: M=Minutes, S=Seconds, T=Tenths of seconds i := 0 ' point to tens of hours if C[i] <"0" and C[i] >"5" return false ' exit if 10's of minutes not 0-5 time := (C[i++] and $F) * 6000 ' set time = 10's of minutes in tenths of a second if C[i] <"0" and C[i]>"9" return false ' exit if minutes not 0-9 time := time + ((C[i++] and $F) * 600) ' add minutes to time if C[++i] <"0" and C[i]>"5" return false ' exit if 10's of seconds not 0-5 time := time + ((C[i++] and $F) * 100) ' add 10's of seconds to time if C[i] <"0" and C[i]>"9" return false ' exit if seconds not 0-9 time := time + ((C[i++] and $F) * 10) ' add seconds to time if C[++i] <"0" and C[i]>"9" return false ' exit if tenths of seconds not 0-9 time := time + (C[i] and $F) ' add tenths of seconds return true DAT C byte "MM:SS.T"For Jon McPhalen, I offer the following full disclosure:
MicroSD card file format (94 characters per line maximum): Header: ct0CL or ct0-381CL or ct0-382CL or ct0-xxxCL One or more lines: MM:SS.TAAA-FFF{SAAA-FFF}{SAAA-FFF}{SAAA-FFF}{SAAA-FFF}{SAAA-FFF}{SAAA-FFF}{SAAA-FFF}{SAAA-FFF}{SAAA-FFF}{SAAA-FFF}CL EOF Timecodes are 7 characters, where: MM = Minutes SS = Seconds T = Tenths of secods Colon separates Minutes and Seconds Period separates Tenths Commands are 6 digits, where: AAA = 3 digit Address FFF = 3 digit Function - = Dash separates Address and Function S = Space separates multiple commands, if any C = Carriage return L = Line feed (End of line delimiter) Minimum 1 command; maximum 11 commands per line EOF = End of FileI have working versions (Examples 1 and 2). My intent is to perfect and learn.
I think this could be done with only one "return false" using a loop and index to select limits and multipliers but inline seemed to be the simplest and clearest way to do it. It is a bit reminiscent of Basic spaghetti code but at least the code is short enough that it can all be displayed at once.
I do agree that a state machine is very useful in parsing. As some have noted above.
How do you make your state machines without case statements?
In the past I have done it with goto. Shameful I know but when you need the speed goto is the way to go. As it were.
I would be inclined to divide an conquer:
Make a method that parses numbers, hopefully with a variable number of digits for flexibility.
Make another method that parses alpa or alpha-numeric symbols, again of variable length and perhaps including "$" or "_" or whatever you might want to allow in such symbols.
Perhaps make a method that skips over white space.
Then you can tie all these together to parse all manner of data that contains numbers and symbols. That higher level parser will probably have to take care of detecting terminators like space, end of line, comma, semi-colon, whatever.
As a bonus you end up with some methods that can be reused for parsing in other different problems. And your finished code will be a lot more readable when you get back to it.
If you want to get really serious, use those low level methods to parse out tokens, the numbers and symbols etc and then have a higher level that works through the tokenized data.
Before you you know it you will be writing a compiler or interpreter as described by Crenshaw:)
A good technique for sure.
Sadly we don't have the ability to make or call function pointers in Spin.
Here is a snippet of how it works for parsing a GPS sentence. In this case the bytes come from the serial port, but in your case they would come from the sd card. The bytes are shifted msb first into an eight byte (two longs) buffer. After each shift, a condition is tested against the contents of the longs. In the case of the GPS, it is looking for a match to a set of key strings in a table. In your case, rather than a table match, you might have two conditions:
1) time string: <CR> in byte7, ":" in byte3, and "." in byte 1
2) command string: "-" in byte3 and either T or S in byte7
When that condition is met, you have all the other bytes in known position for quick analysis.
repeat ' rotate bytes from GPS_PORT thru 8 bytes buffer until command key is detected in byte 1 if char := uarts.rx(GPS_PORT) char0 := (char0 >> 8) + (char1.byte[0]<<24) ' rotate bytes of char0 and char1, msb to lsb char1 := (char1 >> 8) + (char << 24) if char0 == long[@gps] ' this is the key string for the device, a sentence start $GP has been detected. case char1 long[@gga] : ParseGGA long[@rmc] : ParseRMC long[@scv] : ParseSCV ' capture values using "," or "*" or <CR> as delimiter. DAT table long 0 ' force long alignment, 4 bytes per entry to retain alignment gga byte "GGA," ' chars GGA, rmc byte "RMC," scv byte "GSV," gps byte $A,"$GP" ' line feed and chars $GPYou have to be aware of endedness. The bytes are shifted in msb first so that a match can be made with a text string that is stored in normal left to right order. For example, in "GGA,", the first G ends up in the least significant byte of the double long char1:char0.
I was tempted to divide and conquer for the minute and second fields but it hardly seemed worth the effort, and the multiple "return false" could be eliminated by setting an error flag instead. In some cases it might be worth setting multiple error flags for each line that is parsed so that a comprehensive list of errors could could be provided for debugging after a single pass through all the commands.