Parsing more than eight characters
dbpage
Posts: 217
What is a good way to parse a text string that is longer than eight characters? In the code examples that follow, if I add more "if" statements, I will get the error message "Limit of 8 nested blocks exceeded." I would like to parse a few more characters. After reading timecode there is a space, end of file or invalid format. After reading command, there is a space, carriage return and line feed or end of file or invalid format. I resort to additional methods to continue parsing, but I don't like separating parsing statements.
Example 1:
Example 2:
Example 1:
' Parse timecode MM:SS.T, Where: M=Minutes, S=Seconds, T=Tenths of seconds i := 0 if C[i]<>-1 and digit := lookdown(C[i]:"0".."5") time := --digit * _6000tenths if digit := lookdown(C[++i]:"0".."9") time := (--digit * _600tenths) + time if C[++i] == ":" if digit := lookdown(C[++i]:"0".."5") time := (--digit * _100tenths) + time if digit := lookdown(C[++i]:"0".."9") time := (--digit * _10tenths) + time if C[++i] == "." if digit := lookdown(C[++i]:"0".."9") time := (--digit * tenths) + time return true return false ' End of line, file or invalid command format
Example 2:
' Parse command string 0NN-NNN i := 0 if (C[i]=="0") if digit := lookdown(C[++i]:"0".."9") Addr := --digit if digit := lookdown(C[++i]:"0".."9") Addr := Addr * 10 + --digit if C[++i] == "-" if digit := lookdown(C[++i]:"0".."2") Fcn := --digit if digit := lookdown(C[++i]:"0".."9") Fcn := Fcn * 10 + --digit if digit := lookdown(C[++i]:"0".."9") Fcn := Fcn * 10 + --digit interpret return true return false ' End of line, file or invalid command format
Comments
But the next most stupid way to do it might look like the following pseudo code:
For a little more sophisticated ways of parsing have a read of "Let's build a Compiler" http://compilers.iecc.com/crenshaw/
Don't be scared off by the "Build a Compiler" thing. That text is very easy to read and starts with ideas about how to parse numbers and white space etc. All done in Pascal which looks like Spin.
I too am having a hard time parsing data though
It would be useful to see more examples of the strings you're wanting to parse, and how you get them. Are there separator or terminating characters?
This code should work. It compiles but I do not have a propeller with me to test it.
For Jon McPhalen, I offer the following full disclosure:
I have working versions (Examples 1 and 2). My intent is to perfect and learn.
I think this could be done with only one "return false" using a loop and index to select limits and multipliers but inline seemed to be the simplest and clearest way to do it. It is a bit reminiscent of Basic spaghetti code but at least the code is short enough that it can all be displayed at once.
I do agree that a state machine is very useful in parsing. As some have noted above.
How do you make your state machines without case statements?
In the past I have done it with goto. Shameful I know but when you need the speed goto is the way to go. As it were.
I would be inclined to divide an conquer:
Make a method that parses numbers, hopefully with a variable number of digits for flexibility.
Make another method that parses alpa or alpha-numeric symbols, again of variable length and perhaps including "$" or "_" or whatever you might want to allow in such symbols.
Perhaps make a method that skips over white space.
Then you can tie all these together to parse all manner of data that contains numbers and symbols. That higher level parser will probably have to take care of detecting terminators like space, end of line, comma, semi-colon, whatever.
As a bonus you end up with some methods that can be reused for parsing in other different problems. And your finished code will be a lot more readable when you get back to it.
If you want to get really serious, use those low level methods to parse out tokens, the numbers and symbols etc and then have a higher level that works through the tokenized data.
Before you you know it you will be writing a compiler or interpreter as described by Crenshaw:)
A good technique for sure.
Sadly we don't have the ability to make or call function pointers in Spin.
Here is a snippet of how it works for parsing a GPS sentence. In this case the bytes come from the serial port, but in your case they would come from the sd card. The bytes are shifted msb first into an eight byte (two longs) buffer. After each shift, a condition is tested against the contents of the longs. In the case of the GPS, it is looking for a match to a set of key strings in a table. In your case, rather than a table match, you might have two conditions:
1) time string: <CR> in byte7, ":" in byte3, and "." in byte 1
2) command string: "-" in byte3 and either T or S in byte7
When that condition is met, you have all the other bytes in known position for quick analysis.
You have to be aware of endedness. The bytes are shifted in msb first so that a match can be made with a text string that is stored in normal left to right order. For example, in "GGA,", the first G ends up in the least significant byte of the double long char1:char0.
I was tempted to divide and conquer for the minute and second fields but it hardly seemed worth the effort, and the multiple "return false" could be eliminated by setting an error flag instead. In some cases it might be worth setting multiple error flags for each line that is parsed so that a comprehensive list of errors could could be provided for debugging after a single pass through all the commands.