[PoC] SPIN scanner parser written in lex/yacc
pedward
Posts: 1,642
Hello everyone. I can't tell you how much it bugs me (in an OCD way, not *really* bugs me) that the SPIN compiler is a monolithic compiler that doesn't use lex or yacc for the heavy lifting.
I wanted to write a tool to extract comments and context from SPIN files, to make a SPINdoc tool.
Well, I got a little carried away and wrote most of the frontend for a SPIN compiler.
I've spent probably 20 hours on it this weekend, getting reacquainted with lex and yacc (it's been 14 years since I wrote a compiler), but I now have a grammar that parses a couple of valid SPIN files, without generating a syntax error.
I attached my work in progress, for other tinkerers to look at.
I learned some things about what you can achieve in lex and what you can achieve in yacc.
In general, it's best to put all of your lexical tokens into the lex scanner, because you can control how greedy the matches are and generate unique tokens for matching rules that might otherwise be too general. It also turns out that with many of the operators in SPIN, it's mandatory to define them in a longest->shortest match order so lexical precedence can be enforced.
The parser uses all the tricks in yacc to enforce operator precedence and try to reasonably modularize the grammar. Unfortunately it's necessary to have a very long expression rule because there are so many operators and you need hooks to know how to interpret them.
lex and yacc are ideal for writing compilers, and because SPIN is a stack based language, it would make it easier to translate the grammar matches to code generation, since the order of matching is how you'd write a stack based machine.
This code, which I nicknamed splint is just an exercise, but it provides plenty of capability to do what I originally wanted, extract comments and context of the comments.
I haven't tackled PASM at all, only the SPIN grammar. I would probably add another state to the scanner, for PASM only tokens, then add a rule tree in the parser that handles PASM.
I haven't enforced tab indention, but I added hooks to the scanner to keep track of the "tab stack" so you could use the tab stack to keep track of blocks. I would add another variable that stores the difference between the last line and the current line, which would give you a positive number for a block beginning and a negative number for a block ending.
SPIN is a real bear to parse because it's not a "proper" context oriented language. Some annoying misfeatures are function declarations and calls that don't have parentheses (or empty parentheses). Of course the whitespace block handling is more difficult to parse, meaning that you can't use the parser to enforce syntax, you need to keep track of tabs outside of the parser rules and do your own checking and error detection.
My parser is VERY verbose, mainly to have an action for every rule, so I can ensure the parser is properly grabbing and assembling the tokens. A real compiler would have actual code to implement code generation and other bits, in those actions.
Anyway, it successfully parsed the SPI_spin and demo files I submitted to Parallax for Gold Standard inclusion, which include A LOT of comments and a variety of tricky operators.
Compiler abuse
Another interesting thing to note, if you know how the compiler parses the file, you can exploit that to write code that TOTALLY does not conform to the standards, but actually parses and compiles correctly. This is because parsers generally ignore (more accurately, the scanner throws away) whitespace. Because the whitespace is no longer needed, you could actually write code that just runs together likeagooglesearchwhenyouaretypinginthesearchbarofyourphone (like a google search when you are typing in the search bar of your phone).
Ever search google and forget to put spaces between your words, but it returns a suggested search term that magically separates the words with whitespace? That's because their parser matches based on tokens, it's not trying to figure out what you wrote; it matches a database of known tokens against your search string, the opposite of what you would expect.
Build
Oh, build instructions:
flex tokens.l
vi lex.yy.c (and remove "static" from the declaration of "yy_start" on line 279 so the variable can be used in the parser to debug the current scanner state.)
bison parse.y
gcc -o parser -D STANDALONE_PARSER parse.tab.c lex.yy.c
gcc -o scanner -D STANDALONE lex.yy.c
I included the generated source files, so you don't need to edit, just compile.
If you run scanner < file and pipe it to less, it will give you a debug output of what tokens were matched, their value, the string, and the parser state.
The parser takes the file in stdin too, and it will give you a complete lexical breakdown of the file in a most verbose way.
Something else you should know, this parser/scanner isn't built for wide character/UTF-16 support, so I just ran iconv -f UTF-16 -t UTF-8 < file.spin > out.spin on the included files before I ran them through the scanner/parser. If you try to run it on a regular spin file, it'll break. The proper solution would be to bolt a UTF-16 handler onto the scanner input.
Expected warnings
These are the expected warnings when generating the parser:
parse.y:261.7: warning: empty rule for typed nonterminal, and no action
parse.y:310.11: warning: empty rule for typed nonterminal, and no action
parse.y:314.13: warning: empty rule for typed nonterminal, and no action
parse.y:363.5: warning: empty rule for typed nonterminal, and no action
parse.y: conflicts: 3128 shift/reduce, 234 reduce/reduce
Output
Here's sample tokenizer output from the scanner:
Here is the output from the parser:
I wanted to write a tool to extract comments and context from SPIN files, to make a SPINdoc tool.
Well, I got a little carried away and wrote most of the frontend for a SPIN compiler.
I've spent probably 20 hours on it this weekend, getting reacquainted with lex and yacc (it's been 14 years since I wrote a compiler), but I now have a grammar that parses a couple of valid SPIN files, without generating a syntax error.
I attached my work in progress, for other tinkerers to look at.
I learned some things about what you can achieve in lex and what you can achieve in yacc.
In general, it's best to put all of your lexical tokens into the lex scanner, because you can control how greedy the matches are and generate unique tokens for matching rules that might otherwise be too general. It also turns out that with many of the operators in SPIN, it's mandatory to define them in a longest->shortest match order so lexical precedence can be enforced.
The parser uses all the tricks in yacc to enforce operator precedence and try to reasonably modularize the grammar. Unfortunately it's necessary to have a very long expression rule because there are so many operators and you need hooks to know how to interpret them.
lex and yacc are ideal for writing compilers, and because SPIN is a stack based language, it would make it easier to translate the grammar matches to code generation, since the order of matching is how you'd write a stack based machine.
This code, which I nicknamed splint is just an exercise, but it provides plenty of capability to do what I originally wanted, extract comments and context of the comments.
I haven't tackled PASM at all, only the SPIN grammar. I would probably add another state to the scanner, for PASM only tokens, then add a rule tree in the parser that handles PASM.
I haven't enforced tab indention, but I added hooks to the scanner to keep track of the "tab stack" so you could use the tab stack to keep track of blocks. I would add another variable that stores the difference between the last line and the current line, which would give you a positive number for a block beginning and a negative number for a block ending.
SPIN is a real bear to parse because it's not a "proper" context oriented language. Some annoying misfeatures are function declarations and calls that don't have parentheses (or empty parentheses). Of course the whitespace block handling is more difficult to parse, meaning that you can't use the parser to enforce syntax, you need to keep track of tabs outside of the parser rules and do your own checking and error detection.
My parser is VERY verbose, mainly to have an action for every rule, so I can ensure the parser is properly grabbing and assembling the tokens. A real compiler would have actual code to implement code generation and other bits, in those actions.
Anyway, it successfully parsed the SPI_spin and demo files I submitted to Parallax for Gold Standard inclusion, which include A LOT of comments and a variety of tricky operators.
Compiler abuse
Another interesting thing to note, if you know how the compiler parses the file, you can exploit that to write code that TOTALLY does not conform to the standards, but actually parses and compiles correctly. This is because parsers generally ignore (more accurately, the scanner throws away) whitespace. Because the whitespace is no longer needed, you could actually write code that just runs together likeagooglesearchwhenyouaretypinginthesearchbarofyourphone (like a google search when you are typing in the search bar of your phone).
Ever search google and forget to put spaces between your words, but it returns a suggested search term that magically separates the words with whitespace? That's because their parser matches based on tokens, it's not trying to figure out what you wrote; it matches a database of known tokens against your search string, the opposite of what you would expect.
Build
Oh, build instructions:
flex tokens.l
vi lex.yy.c (and remove "static" from the declaration of "yy_start" on line 279 so the variable can be used in the parser to debug the current scanner state.)
bison parse.y
gcc -o parser -D STANDALONE_PARSER parse.tab.c lex.yy.c
gcc -o scanner -D STANDALONE lex.yy.c
I included the generated source files, so you don't need to edit, just compile.
If you run scanner < file and pipe it to less, it will give you a debug output of what tokens were matched, their value, the string, and the parser state.
The parser takes the file in stdin too, and it will give you a complete lexical breakdown of the file in a most verbose way.
Something else you should know, this parser/scanner isn't built for wide character/UTF-16 support, so I just ran iconv -f UTF-16 -t UTF-8 < file.spin > out.spin on the included files before I ran them through the scanner/parser. If you try to run it on a regular spin file, it'll break. The proper solution would be to bolt a UTF-16 handler onto the scanner input.
Expected warnings
These are the expected warnings when generating the parser:
parse.y:261.7: warning: empty rule for typed nonterminal, and no action
parse.y:310.11: warning: empty rule for typed nonterminal, and no action
parse.y:314.13: warning: empty rule for typed nonterminal, and no action
parse.y:363.5: warning: empty rule for typed nonterminal, and no action
parse.y: conflicts: 3128 shift/reduce, 234 reduce/reduce
Output
Here's sample tokenizer output from the scanner:
1: 1: 1: {{ -> 265
1: 2: 1:
$Id$
Author: Perry Harrington
Copyright: (c) 2012 Perry Harrington
=======================================================================
This is a demo program to show the usage of the SPI SPIN object.
This demo reads the current temperature from a Maxim DS1620 digital
thermometer and thermostat and displays the temperature on the Parallax
Serial Terminal. You can obtain the datasheet from Maxim
Semiconductor.
This device is an 8 pin DIP with an active high CLK input and single DQ
data line instead of MISO and MOSI I/O. The chip uses LSB first
endianness, data is read PRE clock. The maximum clock rate is 1.75Mhz.
To send a command to the chip you must set RST high, send the command, then
signal the end of command by setting RST low. When receiving data set RST
low to signal the end of reception.
The output of the chip is a 9 bit signed temperature in 1/2 degree Celsius
units. That means you must divide the result by 2 to get whole degrees.
-> 355
1: 3: 0: }} -> 267
2: 4: 0: CON -> 258
4: 5: 0: _clkmode -> 356
4: 6: 0: = -> 299
4: 7: 0: xtal1 -> 356
4: 8: 0: + -> 310
4: 9: 0: pll16x -> 356
5: 10: 0: _xinfreq -> 356
5: 11: 0: = -> 299
5: 12: 0: 5_000_000 -> 358
7: 13: 0: DPIN -> 356
7: 14: 0: = -> 299
7: 15: 0: 0 -> 358
7: 16: 2: ' -> 272
7: 17: 2: Data pin -> 355
8: 18: 0:
-> 264
8: 19: 0: CPIN -> 356
8: 20: 0: = -> 299
8: 21: 0: 1 -> 358
8: 22: 2: ' -> 272
8: 23: 2: Clock pin -> 355
9: 24: 0:
-> 264
9: 25: 0: RST -> 356
9: 26: 0: = -> 299
9: 27: 0: 2 -> 358
9: 28: 2: ' -> 272
9: 29: 2: Reset pin -> 355
10: 30: 0:
-> 264
10: 31: 0: CLKr -> 356
10: 32: 0: = -> 299
10: 33: 0: 10 -> 358
10: 34: 2: ' -> 272
10: 35: 2: 100us clock frequency -> 355
11: 36: 0:
-> 264
12: 37: 0: START_CONVERT -> 356
12: 38: 0: = -> 299
12: 39: 3: $ -> 36
12: 40: 0: EE -> 360
12: 41: 2: ' -> 272
12: 42: 2: Start temp conversion -> 355
13: 43: 0:
-> 264
13: 44: 0: READ_TEMP -> 356
13: 45: 0: = -> 299
13: 46: 3: $ -> 36
13: 47: 0: AA -> 360
13: 48: 2: ' -> 272
13: 49: 2: Read temp from DS1620 -> 355
14: 50: 0:
-> 264
14: 51: 0: WRITE_CONFIG -> 356
14: 52: 0: = -> 299
14: 53: 3: $ -> 36
14: 54: 0: 0C -> 360
14: 55: 2: ' -> 272
14: 56: 2: Write config register -> 355
15: 57: 0:
-> 264
15: 58: 0: CONFIG_REG -> 356
15: 59: 0: = -> 299
15: 60: 0: % -> 37
15: 61: 0: 0000_1010 -> 358
15: 62: 2: ' -> 272
15: 63: 2: enable continuous conversion and CPU 3wire -> 355
16: 64: 0:
-> 264
16: 65: 0: CONVERT_TIME -> 356
16: 66: 0: = -> 299
16: 67: 0: 750 -> 358
16: 68: 2: ' -> 272
16: 69: 2: Takes 750ms to initialize temp conversion -> 355
17: 70: 0:
-> 264
17: 71: 0: EEPROM_WTIME -> 356
17: 72: 0: = -> 299
17: 73: 0: 10 -> 358
17: 74: 2: ' -> 272
17: 75: 2: Takes 10ms to write to EEPROM config -> 355
18: 76: 0:
-> 264
19: 77: 0: OBJ -> 259
21: 78: 0: pst -> 356
21: 79: 0: : -> 58
21: 80: 0: "Parallax Serial Terminal" -> 357
22: 81: 0: spi -> 356
22: 82: 0: : -> 58
22: 83: 0: "SPI_spinv1.1" -> 357
24: 84: 0: VAR -> 260
25: 85: 0: long -> 354
25: 86: 0: temperature -> 356
27: 87: 0: PUB -> 262
27: 88: 0: main -> 356
28: 89: 1: {{ -> 265
28: 90: 1:
This function does the setup for the DS1620 and reads the current temperature once per
second, converts it to Fahrenheit, and displays it on the serial terminal
-> 355
28: 91: 0: }} -> 267
30: 92: 0: pst -> 356
30: 93: 0: . -> 46
30: 94: 0: Start -> 356
30: 95: 0: ( -> 40
30: 96: 0: 115200 -> 358
30: 97: 0: ) -> 41
30: 98: 2: ' -> 272
30: 99: 2: debug output -> 355
31: 100: 0:
-> 264
31: 101: 0: pst -> 356
31: 102: 0: . -> 46
31: 103: 0: Str -> 356
31: 104: 0: ( -> 40
31: 105: 0: String -> 356
31: 106: 0: ( -> 40
31: 107: 0: "Welcome to the SPI demo! This program prints the current" -> 357
31: 108: 0: ) -> 41
31: 109: 0: ) -> 41
32: 110: 0: pst -> 356
32: 111: 0: . -> 46
32: 112: 0: Newline -> 356
33: 113: 0: pst -> 356
33: 114: 0: . -> 46
33: 115: 0: Str -> 356
33: 116: 0: ( -> 40
33: 117: 0: String -> 356
33: 118: 0: ( -> 40
33: 119: 0: "temperature from a DS1620 digital thermometer chip." -> 357
33: 120: 0: ) -> 41
33: 121: 0: ) -> 41
34: 122: 0: pst -> 356
34: 123: 0: . -> 46
34: 124: 0: Newline -> 356
35: 125: 0: pst -> 356
35: 126: 0: . -> 46
35: 127: 0: Str -> 356
35: 128: 0: ( -> 40
35: 129: 0: String -> 356
35: 130: 0: ( -> 40
35: 131: 0: "Binary Hex Fahrenheit" -> 357
35: 132: 0: ) -> 41
35: 133: 0: ) -> 41
36: 134: 0: pst -> 356
36: 135: 0: . -> 46
36: 136: 0: Newline -> 356
37: 137: 0: spi -> 356
37: 138: 0: . -> 46
37: 139: 0: Start -> 356
37: 140: 0: ( -> 40
37: 141: 0: CLKr -> 356
37: 142: 0: , -> 44
37: 143: 0: spi -> 356
37: 144: 0: # -> 35
37: 145: 0: HIGH -> 356
37: 146: 0: ) -> 41
37: 147: 2: ' -> 272
37: 148: 2: initial clock high, going low on first bit -> 355
38: 149: 0:
-> 264
39: 150: 0: DIRA -> 356
39: 151: 0: [ -> 91
39: 152: 0: RST -> 356
39: 153: 0: ] -> 93
39: 154: 0: ~~ -> 348
39: 155: 2: ' -> 272
39: 156: 2: enable RST as output -> 355
40: 157: 0:
-> 264
40: 158: 0: OUTA -> 356
40: 159: 0: [ -> 91
40: 160: 0: RST -> 356
40: 161: 0: ] -> 93
40: 162: 0: ~ -> 347
40: 163: 2: ' -> 272
40: 164: 2: set RST low -> 355
41: 165: 0:
-> 264
42: 166: 0: Toggle -> 356
42: 167: 0: ( -> 40
42: 168: 0: RST -> 356
42: 169: 0: ) -> 41
42: 170: 2: ' -> 272
42: 171: 2: clear the RST line so we can talk to chip -> 355
43: 172: 0:
-> 264
44: 173: 0: spi -> 356
44: 174: 0: . -> 46
44: 175: 0: ShiftOut -> 356
44: 176: 0: ( -> 40
44: 177: 0: DPIN -> 356
44: 178: 0: , -> 44
44: 179: 0: CPIN -> 356
44: 180: 0: , -> 44
44: 181: 0: spi -> 356
44: 182: 0: # -> 35
44: 183: 0: LSB -> 356
44: 184: 0: , -> 44
44: 185: 0: 8 -> 358
44: 186: 0: , -> 44
44: 187: 0: WRITE_CONFIG -> 356
44: 188: 0: ) -> 41
44: 189: 2: ' -> 272
44: 190: 2: send the config write command -> 355
45: 191: 0:
-> 264
45: 192: 0: spi -> 356
45: 193: 0: . -> 46
45: 194: 0: ShiftOut -> 356
45: 195: 0: ( -> 40
45: 196: 0: DPIN -> 356
45: 197: 0: , -> 44
45: 198: 0: CPIN -> 356
45: 199: 0: , -> 44
45: 200: 0: spi -> 356
45: 201: 0: # -> 35
45: 202: 0: LSB -> 356
45: 203: 0: , -> 44
45: 204: 0: 8 -> 358
45: 205: 0: , -> 44
45: 206: 0: CONFIG_REG -> 356
45: 207: 0: ) -> 41
45: 208: 2: ' -> 272
45: 209: 2: send config register contents -> 355
46: 210: 0:
-> 264
47: 211: 0: Toggle -> 356
47: 212: 0: ( -> 40
47: 213: 0: RST -> 356
47: 214: 0: ) -> 41
49: 215: 0: PauseMS -> 356
49: 216: 0: ( -> 40
49: 217: 0: EEPROM_WTIME -> 356
49: 218: 0: ) -> 41
49: 219: 2: ' -> 272
49: 220: 2: wait 10ms for config to write to EEPROM -> 355
50: 221: 0:
-> 264
51: 222: 0: Toggle -> 356
51: 223: 0: ( -> 40
51: 224: 0: RST -> 356
51: 225: 0: ) -> 41
53: 226: 0: spi -> 356
53: 227: 0: . -> 46
53: 228: 0: SHIFTOUT -> 356
53: 229: 0: ( -> 40
53: 230: 0: DPIN -> 356
53: 231: 0: , -> 44
53: 232: 0: CPIN -> 356
53: 233: 0: , -> 44
53: 234: 0: spi -> 356
53: 235: 0: # -> 35
53: 236: 0: LSBFIRST -> 356
53: 237: 0: , -> 44
53: 238: 0: 8 -> 358
53: 239: 0: , -> 44
53: 240: 0: START_CONVERT -> 356
53: 241: 0: ) -> 41
53: 242: 2: ' -> 272
53: 243: 2: send beginning of conversion command -> 355
54: 244: 0:
-> 264
55: 245: 0: Toggle -> 356
55: 246: 0: ( -> 40
55: 247: 0: RST -> 356
55: 248: 0: ) -> 41
57: 249: 0: PauseMS -> 356
57: 250: 0: ( -> 40
57: 251: 0: CONVERT_TIME -> 356
57: 252: 0: ) -> 41
57: 253: 2: ' -> 272
57: 254: 2: wait for first conversion -> 355
58: 255: 0:
-> 264
59: 256: 0: repeat -> 273
60: 257: 0: Toggle -> 356
60: 258: 0: ( -> 40
60: 259: 0: RST -> 356
60: 260: 0: ) -> 41
62: 261: 0: spi -> 356
62: 262: 0: . -> 46
62: 263: 0: ShiftOut -> 356
62: 264: 0: ( -> 40
62: 265: 0: DPIN -> 356
62: 266: 0: , -> 44
62: 267: 0: CPIN -> 356
62: 268: 0: , -> 44
62: 269: 0: spi -> 356
62: 270: 0: # -> 35
62: 271: 0: LSB -> 356
62: 272: 0: , -> 44
62: 273: 0: 8 -> 358
62: 274: 0: , -> 44
62: 275: 0: READ_TEMP -> 356
62: 276: 0: ) -> 41
62: 277: 2: ' -> 272
62: 278: 2: send temp reading command -> 355
63: 279: 0:
-> 264
63: 280: 0: temperature -> 356
63: 281: 0: := -> 294
63: 282: 0: spi -> 356
63: 283: 0: . -> 46
63: 284: 0: ShiftIn -> 356
63: 285: 0: ( -> 40
63: 286: 0: DPIN -> 356
63: 287: 0: , -> 44
63: 288: 0: CPIN -> 356
63: 289: 0: , -> 44
63: 290: 0: spi -> 356
63: 291: 0: # -> 35
63: 292: 0: LSB -> 356
63: 293: 0: + -> 310
63: 294: 0: spi -> 356
63: 295: 0: # -> 35
63: 296: 0: PRE -> 356
63: 297: 0: , -> 44
63: 298: 0: 9 -> 358
63: 299: 0: ) -> 41
63: 300: 2: ' -> 272
63: 301: 2: fetch temperature value -> 355
64: 302: 0:
-> 264
65: 303: 0: Toggle -> 356
65: 304: 0: ( -> 40
65: 305: 0: RST -> 356
65: 306: 0: ) -> 41
67: 307: 0: pst -> 356
67: 308: 0: . -> 46
67: 309: 0: Bin -> 356
67: 310: 0: ( -> 40
67: 311: 0: temperature -> 356
67: 312: 0: , -> 44
67: 313: 0: 9 -> 358
67: 314: 0: ) -> 41
67: 315: 2: ' -> 272
67: 316: 2: print out raw value binary -> 355
68: 317: 0:
-> 264
68: 318: 0: pst -> 356
68: 319: 0: . -> 46
68: 320: 0: Char -> 356
68: 321: 0: ( -> 40
68: 322: 0: 32 -> 358
68: 323: 0: ) -> 41
69: 324: 0: pst -> 356
69: 325: 0: . -> 46
69: 326: 0: Hex -> 356
69: 327: 0: ( -> 40
69: 328: 0: temperature -> 356
69: 329: 0: , -> 44
69: 330: 0: 4 -> 358
69: 331: 0: ) -> 41
69: 332: 2: ' -> 272
69: 333: 2: print out raw hex value -> 355
70: 334: 0:
-> 264
70: 335: 0: pst -> 356
70: 336: 0: . -> 46
70: 337: 0: Char -> 356
70: 338: 0: ( -> 40
70: 339: 0: 32 -> 358
70: 340: 0: ) -> 41
72: 341: 0: temperature -> 356
72: 342: 0: := -> 294
72: 343: 0: temperature -> 356
72: 344: 0: << -> 334
72: 345: 0: 23 -> 358
72: 346: 0: ~> -> 336
72: 347: 0: 23 -> 358
72: 348: 2: ' -> 272
72: 349: 2: zero, sign extend, and convert to whole degrees -> 355
73: 350: 0:
-> 264
73: 351: 0: temperature -> 356
73: 352: 0: *= -> 315
73: 353: 0: 5 -> 358
75: 354: 0: temperature -> 356
75: 355: 0: := -> 294
75: 356: 0: temperature -> 356
75: 357: 0: * -> 314
75: 358: 0: 9 -> 358
75: 359: 0: / -> 318
75: 360: 0: 5 -> 358
75: 361: 0: + -> 310
75: 362: 0: 320 -> 358
75: 363: 2: ' -> 272
75: 364: 2: convert reading to fahrenheit -> 355
76: 365: 0:
-> 264
77: 366: 0: pst -> 356
77: 367: 0: . -> 46
77: 368: 0: Dec -> 356
77: 369: 0: ( -> 40
77: 370: 0: temperature -> 356
77: 371: 0: / -> 318
77: 372: 0: 10 -> 358
77: 373: 0: ) -> 41
77: 374: 2: ' -> 272
77: 375: 2: print out temp in whole degrees Fahrenheit -> 355
78: 376: 0:
-> 264
78: 377: 0: pst -> 356
78: 378: 0: . -> 46
78: 379: 0: Char -> 356
78: 380: 0: ( -> 40
78: 381: 0: "." -> 357
78: 382: 0: ) -> 41
79: 383: 0: pst -> 356
79: 384: 0: . -> 46
79: 385: 0: Dec -> 356
79: 386: 0: ( -> 40
79: 387: 0: temperature -> 356
79: 388: 0: // -> 320
79: 389: 0: 10 -> 358
79: 390: 0: ) -> 41
79: 391: 2: ' -> 272
79: 392: 2: print out tenths using modulus operator -> 355
80: 393: 0:
-> 264
80: 394: 0: pst -> 356
80: 395: 0: . -> 46
80: 396: 0: Char -> 356
80: 397: 0: ( -> 40
80: 398: 0: "°" -> 357
80: 399: 0: ) -> 41
81: 400: 0: pst -> 356
81: 401: 0: . -> 46
81: 402: 0: Newline -> 356
83: 403: 0: PauseMS -> 356
83: 404: 0: ( -> 40
83: 405: 0: 1000 -> 358
83: 406: 0: ) -> 41
83: 407: 2: ' -> 272
83: 408: 2: update once per second -> 355
84: 409: 0:
-> 264
85: 410: 0: PRI -> 263
85: 411: 0: PauseMS -> 356
85: 412: 0: ( -> 40
85: 413: 0: _time -> 356
85: 414: 0: ) -> 41
86: 415: 0: waitcnt -> 356
86: 416: 0: ( -> 40
86: 417: 0: CLKFREQ -> 356
86: 418: 0: / -> 318
86: 419: 0: 1000 -> 358
86: 420: 0: * -> 314
86: 421: 0: _time -> 356
86: 422: 0: + -> 310
86: 423: 0: cnt -> 356
86: 424: 0: ) -> 41
88: 425: 0: PRI -> 263
88: 426: 0: Toggle -> 356
88: 427: 0: ( -> 40
88: 428: 0: _pin -> 356
88: 429: 0: ) -> 41
89: 430: 0: ! -> 344
89: 431: 0: OUTA -> 356
89: 432: 0: [ -> 91
89: 433: 0: _pin -> 356
89: 434: 0: ] -> 93
91: 435: 0: DAT -> 356
92: 436: 1: {{ -> 265
92: 437: 1:
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ TERMS OF USE: MIT License │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation │
│files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, │
│modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software│
│is furnished to do so, subject to the following conditions: │
│ │
│The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.│
│ │
│THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE │
│WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR │
│COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, │
│ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
-> 355
92: 438: 0: }} -> 267
Here is the output from the parser:
Block Comment: '
$Id$
Author: Perry Harrington
Copyright: (c) 2012 Perry Harrington
=======================================================================
This is a demo program to show the usage of the SPI SPIN object.
This demo reads the current temperature from a Maxim DS1620 digital
thermometer and thermostat and displays the temperature on the Parallax
Serial Terminal. You can obtain the datasheet from Maxim
Semiconductor.
This device is an 8 pin DIP with an active high CLK input and single DQ
data line instead of MISO and MOSI I/O. The chip uses LSB first
endianness, data is read PRE clock. The maximum clock rate is 1.75Mhz.
To send a command to the chip you must set RST high, send the command, then
signal the end of command by setting RST low. When receiving data set RST
low to signal the end of reception.
The output of the chip is a 9 bit signed temperature in 1/2 degree Celsius
units. That means you must divide the result by 2 to get whole degrees.
'
Block type changed: 258: CON
Symbol: _clkmode
Symbol: xtal1
Symbol: pll16x
Add: xtal1 + pll16x
Constant assignment: _clkmode = xtal1
Symbol: _xinfreq
Value: 5000000
Constant assignment: _xinfreq = 5000000
Symbol: DPIN
Value: 0
Constant assignment: DPIN = 0
Code Comment: 'Data pin'
Symbol: CPIN
Value: 1
Constant assignment: CPIN = 1
Code Comment: 'Clock pin'
Symbol: RST
Value: 2
Constant assignment: RST = 2
Code Comment: 'Reset pin'
Symbol: CLKr
Value: 10
Constant assignment: CLKr = 10
Code Comment: '100us clock frequency'
Symbol: START_CONVERT
Value: EE
Constant assignment: START_CONVERT = EE
Code Comment: 'Start temp conversion'
Symbol: READ_TEMP
Value: AA
Constant assignment: READ_TEMP = AA
Code Comment: 'Read temp from DS1620'
Symbol: WRITE_CONFIG
Value: 0C
Constant assignment: WRITE_CONFIG = 0C
Code Comment: 'Write config register'
Symbol: CONFIG_REG
Value: 00001010
Constant assignment: CONFIG_REG = 00001010
Code Comment: 'enable continuous conversion and CPU 3wire'
Symbol: CONVERT_TIME
Value: 750
Constant assignment: CONVERT_TIME = 750
Code Comment: 'Takes 750ms to initialize temp conversion'
Symbol: EEPROM_WTIME
Value: 10
Constant assignment: EEPROM_WTIME = 10
Code Comment: 'Takes 10ms to write to EEPROM config'
Block type changed: 259: OBJ
Load object: pst -> Parallax Serial Terminal
Load object: spi -> SPI_spinv1.1
Block type changed: 260: VAR
Variable declaration: long -> temperature
Block type changed: 262: PUB
Function: main
Block Comment: '
This function does the setup for the DS1620 and reads the current temperature once per
second, converts it to Fahrenheit, and displays it on the serial terminal
'
Symbol: pst
Symbol: Start
Value: 115200
Parameter expression: Start -> 115200
Object call: pst -> Start
Code Comment: 'debug output'
Symbol: pst
Symbol: Str
Symbol: String
Literal String: Welcome to the SPI demo! This program prints the current
Parameter expression: String -> Welcome to the SPI demo! This program prints the current
Parameter expression: Str -> String
Object call: pst -> Str
Symbol: pst
Symbol: Newline
Object call: pst -> Newline
Symbol: pst
Symbol: Str
Symbol: String
Literal String: temperature from a DS1620 digital thermometer chip.
Parameter expression: String -> temperature from a DS1620 digital thermometer chip.
Parameter expression: Str -> String
Object call: pst -> Str
Symbol: pst
Symbol: Newline
Object call: pst -> Newline
Symbol: pst
Symbol: Str
Symbol: String
Literal String: Binary Hex Fahrenheit
Parameter expression: String -> Binary Hex Fahrenheit
Parameter expression: Str -> String
Object call: pst -> Str
Symbol: pst
Symbol: Newline
Object call: pst -> Newline
Symbol: spi
Symbol: Start
Symbol: CLKr
Symbol: spi
Symbol: HIGH
Object constant reference: spi -> HIGH
Expression list: CLKr , spi#HIGH
Parameter expression: Start -> CLKr
Object call: spi -> Start
Code Comment: 'initial clock high, going low on first bit'
Symbol: DIRA
Symbol: RST
Array expression: DIRA -> RST
Post Set Assign: DIRA ~~
Code Comment: 'enable RST as output'
Symbol: OUTA
Symbol: RST
Array expression: OUTA -> RST
Post Clear: OUTA ~
Code Comment: 'set RST low'
Symbol: RST
Code Comment: 'clear the RST line so we can talk to chip'
Symbol: spi
Symbol: ShiftOut
Symbol: DPIN
Symbol: CPIN
Symbol: spi
Symbol: LSB
Value: 8
Symbol: WRITE_CONFIG
Expression list: 8 , WRITE_CONFIG
Expression list: LSB , 8
Object constant reference: spi -> LSB
Expression list: CPIN , spi#LSB
Expression list: DPIN , CPIN
Parameter expression: ShiftOut -> DPIN
Object call: spi -> ShiftOut
Code Comment: 'send the config write command'
Symbol: spi
Symbol: ShiftOut
Symbol: DPIN
Symbol: CPIN
Symbol: spi
Symbol: LSB
Value: 8
Symbol: CONFIG_REG
Expression list: 8 , CONFIG_REG
Expression list: LSB , 8
Object constant reference: spi -> LSB
Expression list: CPIN , spi#LSB
Expression list: DPIN , CPIN
Parameter expression: ShiftOut -> DPIN
Object call: spi -> ShiftOut
Code Comment: 'send config register contents'
Symbol: RST
Symbol: EEPROM_WTIME
Code Comment: 'wait 10ms for config to write to EEPROM'
Symbol: RST
Symbol: spi
Symbol: SHIFTOUT
Symbol: DPIN
Symbol: CPIN
Symbol: spi
Symbol: LSBFIRST
Value: 8
Symbol: START_CONVERT
Expression list: 8 , START_CONVERT
Expression list: LSBFIRST , 8
Object constant reference: spi -> LSBFIRST
Expression list: CPIN , spi#LSBFIRST
Expression list: DPIN , CPIN
Parameter expression: SHIFTOUT -> DPIN
Object call: spi -> SHIFTOUT
Code Comment: 'send beginning of conversion command'
Symbol: RST
Symbol: CONVERT_TIME
Code Comment: 'wait for first conversion'
Symbol: Toggle
Symbol: RST
Parameter expression: Toggle -> RST
Repeat expression: Toggle
Symbol: spi
Symbol: ShiftOut
Symbol: DPIN
Symbol: CPIN
Symbol: spi
Symbol: LSB
Value: 8
Symbol: READ_TEMP
Expression list: 8 , READ_TEMP
Expression list: LSB , 8
Object constant reference: spi -> LSB
Expression list: CPIN , spi#LSB
Expression list: DPIN , CPIN
Parameter expression: ShiftOut -> DPIN
Object call: spi -> ShiftOut
Code Comment: 'send temp reading command'
Symbol: temperature
Symbol: spi
Symbol: ShiftIn
Symbol: DPIN
Symbol: CPIN
Symbol: spi
Symbol: LSB
Symbol: spi
Symbol: PRE
Value: 9
Expression list: PRE , 9
Object constant reference: spi -> PRE
Add: LSB + spi#PRE
Object constant reference: spi -> LSB
Expression list: CPIN , spi#LSB
Expression list: DPIN , CPIN
Parameter expression: ShiftIn -> DPIN
Object call: spi -> ShiftIn
Variable assignment: temperature = spi
Code Comment: 'fetch temperature value'
Symbol: RST
Symbol: pst
Symbol: Bin
Symbol: temperature
Value: 9
Expression list: temperature , 9
Parameter expression: Bin -> temperature
Object call: pst -> Bin
Code Comment: 'print out raw value binary'
Symbol: pst
Symbol: Char
Value: 32
Parameter expression: Char -> 32
Object call: pst -> Char
Symbol: pst
Symbol: Hex
Symbol: temperature
Value: 4
Expression list: temperature , 4
Parameter expression: Hex -> temperature
Object call: pst -> Hex
Code Comment: 'print out raw hex value'
Symbol: pst
Symbol: Char
Value: 32
Parameter expression: Char -> 32
Object call: pst -> Char
Symbol: temperature
Symbol: temperature
Value: 23
Shift Left: temperature << 23
Value: 23
Shift Arithmetic Right: temperature ~> 23
Variable assignment: temperature = temperature
Code Comment: 'zero, sign extend, and convert to whole degrees'
Symbol: temperature
Value: 5
Multiply low=: temperature *= 5
Symbol: temperature
Symbol: temperature
Value: 9
Value: 5
Divide: 9 / 5
Multiply low: temperature * 9
Value: 320
Add: temperature + 320
Variable assignment: temperature = temperature
Code Comment: 'convert reading to fahrenheit'
Symbol: pst
Symbol: Dec
Symbol: temperature
Value: 10
Divide: temperature / 10
Parameter expression: Dec -> temperature
Object call: pst -> Dec
Code Comment: 'print out temp in whole degrees Fahrenheit'
Symbol: pst
Symbol: Char
Literal String: .
Parameter expression: Char -> .
Object call: pst -> Char
Symbol: pst
Symbol: Dec
Symbol: temperature
Value: 10
Modulus: temperature // 10
Parameter expression: Dec -> temperature
Object call: pst -> Dec
Code Comment: 'print out tenths using modulus operator'
Symbol: pst
Symbol: Char
Literal String: °
Parameter expression: Char -> °
Object call: pst -> Char
Symbol: pst
Symbol: Newline
Object call: pst -> Newline
Value: 1000
Code Comment: 'update once per second'
Block type changed: 263: PRI
Symbolic parameter: _time
Function: PauseMS
Symbol: CLKFREQ
Value: 1000
Divide: CLKFREQ / 1000
Symbol: _time
Multiply low: CLKFREQ * _time
Symbol: cnt
Add: CLKFREQ + cnt
Block type changed: 263: PRI
Symbolic parameter: _pin
Function: Toggle
Symbol: OUTA
Symbol: _pin
Array expression: OUTA -> _pin
Bitwize NOT: ! OUTA
Block Comment: '
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ TERMS OF USE: MIT License │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation │
│files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, │
│modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software│
│is furnished to do so, subject to the following conditions: │
│ │
│The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.│
│ │
│THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE │
│WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR │
│COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, │
│ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
'

Comments
Languages are traditionally specified in Backus–Naur Form (BNF), which is a pseudo code that resembled yacc rules very closely.
I looked over the grammar and noticed a couple of errors/incomplete bits. I used the Propeller Manual to define many of the keywords and grammar, so it should make sense to most.
Given what I've provided above, one could take that code and make a syntax highlighter, but syntax highlighting is typically much simpler.
Eg:
keyword_list: keyword_list | keywords { color = blue }
keywords: if | repeat | while | until | case
literal_list: literals { color = red }
literals: NUMBER | STRING
operator_list: operators { color = green }
operators: + | - | * | /
Ad nauseam...
Eric