[PoC] SPIN scanner parser written in lex/yacc
pedward
Posts: 1,642
Hello everyone. I can't tell you how much it bugs me (in an OCD way, not *really* bugs me) that the SPIN compiler is a monolithic compiler that doesn't use lex or yacc for the heavy lifting.
I wanted to write a tool to extract comments and context from SPIN files, to make a SPINdoc tool.
Well, I got a little carried away and wrote most of the frontend for a SPIN compiler.
I've spent probably 20 hours on it this weekend, getting reacquainted with lex and yacc (it's been 14 years since I wrote a compiler), but I now have a grammar that parses a couple of valid SPIN files, without generating a syntax error.
I attached my work in progress, for other tinkerers to look at.
I learned some things about what you can achieve in lex and what you can achieve in yacc.
In general, it's best to put all of your lexical tokens into the lex scanner, because you can control how greedy the matches are and generate unique tokens for matching rules that might otherwise be too general. It also turns out that with many of the operators in SPIN, it's mandatory to define them in a longest->shortest match order so lexical precedence can be enforced.
The parser uses all the tricks in yacc to enforce operator precedence and try to reasonably modularize the grammar. Unfortunately it's necessary to have a very long expression rule because there are so many operators and you need hooks to know how to interpret them.
lex and yacc are ideal for writing compilers, and because SPIN is a stack based language, it would make it easier to translate the grammar matches to code generation, since the order of matching is how you'd write a stack based machine.
This code, which I nicknamed splint is just an exercise, but it provides plenty of capability to do what I originally wanted, extract comments and context of the comments.
I haven't tackled PASM at all, only the SPIN grammar. I would probably add another state to the scanner, for PASM only tokens, then add a rule tree in the parser that handles PASM.
I haven't enforced tab indention, but I added hooks to the scanner to keep track of the "tab stack" so you could use the tab stack to keep track of blocks. I would add another variable that stores the difference between the last line and the current line, which would give you a positive number for a block beginning and a negative number for a block ending.
SPIN is a real bear to parse because it's not a "proper" context oriented language. Some annoying misfeatures are function declarations and calls that don't have parentheses (or empty parentheses). Of course the whitespace block handling is more difficult to parse, meaning that you can't use the parser to enforce syntax, you need to keep track of tabs outside of the parser rules and do your own checking and error detection.
My parser is VERY verbose, mainly to have an action for every rule, so I can ensure the parser is properly grabbing and assembling the tokens. A real compiler would have actual code to implement code generation and other bits, in those actions.
Anyway, it successfully parsed the SPI_spin and demo files I submitted to Parallax for Gold Standard inclusion, which include A LOT of comments and a variety of tricky operators.
Compiler abuse
Another interesting thing to note, if you know how the compiler parses the file, you can exploit that to write code that TOTALLY does not conform to the standards, but actually parses and compiles correctly. This is because parsers generally ignore (more accurately, the scanner throws away) whitespace. Because the whitespace is no longer needed, you could actually write code that just runs together likeagooglesearchwhenyouaretypinginthesearchbarofyourphone (like a google search when you are typing in the search bar of your phone).
Ever search google and forget to put spaces between your words, but it returns a suggested search term that magically separates the words with whitespace? That's because their parser matches based on tokens, it's not trying to figure out what you wrote; it matches a database of known tokens against your search string, the opposite of what you would expect.
Build
Oh, build instructions:
flex tokens.l
vi lex.yy.c (and remove "static" from the declaration of "yy_start" on line 279 so the variable can be used in the parser to debug the current scanner state.)
bison parse.y
gcc -o parser -D STANDALONE_PARSER parse.tab.c lex.yy.c
gcc -o scanner -D STANDALONE lex.yy.c
I included the generated source files, so you don't need to edit, just compile.
If you run scanner < file and pipe it to less, it will give you a debug output of what tokens were matched, their value, the string, and the parser state.
The parser takes the file in stdin too, and it will give you a complete lexical breakdown of the file in a most verbose way.
Something else you should know, this parser/scanner isn't built for wide character/UTF-16 support, so I just ran iconv -f UTF-16 -t UTF-8 < file.spin > out.spin on the included files before I ran them through the scanner/parser. If you try to run it on a regular spin file, it'll break. The proper solution would be to bolt a UTF-16 handler onto the scanner input.
Expected warnings
These are the expected warnings when generating the parser:
parse.y:261.7: warning: empty rule for typed nonterminal, and no action
parse.y:310.11: warning: empty rule for typed nonterminal, and no action
parse.y:314.13: warning: empty rule for typed nonterminal, and no action
parse.y:363.5: warning: empty rule for typed nonterminal, and no action
parse.y: conflicts: 3128 shift/reduce, 234 reduce/reduce
Output
Here's sample tokenizer output from the scanner:
Here is the output from the parser:
I wanted to write a tool to extract comments and context from SPIN files, to make a SPINdoc tool.
Well, I got a little carried away and wrote most of the frontend for a SPIN compiler.
I've spent probably 20 hours on it this weekend, getting reacquainted with lex and yacc (it's been 14 years since I wrote a compiler), but I now have a grammar that parses a couple of valid SPIN files, without generating a syntax error.
I attached my work in progress, for other tinkerers to look at.
I learned some things about what you can achieve in lex and what you can achieve in yacc.
In general, it's best to put all of your lexical tokens into the lex scanner, because you can control how greedy the matches are and generate unique tokens for matching rules that might otherwise be too general. It also turns out that with many of the operators in SPIN, it's mandatory to define them in a longest->shortest match order so lexical precedence can be enforced.
The parser uses all the tricks in yacc to enforce operator precedence and try to reasonably modularize the grammar. Unfortunately it's necessary to have a very long expression rule because there are so many operators and you need hooks to know how to interpret them.
lex and yacc are ideal for writing compilers, and because SPIN is a stack based language, it would make it easier to translate the grammar matches to code generation, since the order of matching is how you'd write a stack based machine.
This code, which I nicknamed splint is just an exercise, but it provides plenty of capability to do what I originally wanted, extract comments and context of the comments.
I haven't tackled PASM at all, only the SPIN grammar. I would probably add another state to the scanner, for PASM only tokens, then add a rule tree in the parser that handles PASM.
I haven't enforced tab indention, but I added hooks to the scanner to keep track of the "tab stack" so you could use the tab stack to keep track of blocks. I would add another variable that stores the difference between the last line and the current line, which would give you a positive number for a block beginning and a negative number for a block ending.
SPIN is a real bear to parse because it's not a "proper" context oriented language. Some annoying misfeatures are function declarations and calls that don't have parentheses (or empty parentheses). Of course the whitespace block handling is more difficult to parse, meaning that you can't use the parser to enforce syntax, you need to keep track of tabs outside of the parser rules and do your own checking and error detection.
My parser is VERY verbose, mainly to have an action for every rule, so I can ensure the parser is properly grabbing and assembling the tokens. A real compiler would have actual code to implement code generation and other bits, in those actions.
Anyway, it successfully parsed the SPI_spin and demo files I submitted to Parallax for Gold Standard inclusion, which include A LOT of comments and a variety of tricky operators.
Compiler abuse
Another interesting thing to note, if you know how the compiler parses the file, you can exploit that to write code that TOTALLY does not conform to the standards, but actually parses and compiles correctly. This is because parsers generally ignore (more accurately, the scanner throws away) whitespace. Because the whitespace is no longer needed, you could actually write code that just runs together likeagooglesearchwhenyouaretypinginthesearchbarofyourphone (like a google search when you are typing in the search bar of your phone).
Ever search google and forget to put spaces between your words, but it returns a suggested search term that magically separates the words with whitespace? That's because their parser matches based on tokens, it's not trying to figure out what you wrote; it matches a database of known tokens against your search string, the opposite of what you would expect.
Build
Oh, build instructions:
flex tokens.l
vi lex.yy.c (and remove "static" from the declaration of "yy_start" on line 279 so the variable can be used in the parser to debug the current scanner state.)
bison parse.y
gcc -o parser -D STANDALONE_PARSER parse.tab.c lex.yy.c
gcc -o scanner -D STANDALONE lex.yy.c
I included the generated source files, so you don't need to edit, just compile.
If you run scanner < file and pipe it to less, it will give you a debug output of what tokens were matched, their value, the string, and the parser state.
The parser takes the file in stdin too, and it will give you a complete lexical breakdown of the file in a most verbose way.
Something else you should know, this parser/scanner isn't built for wide character/UTF-16 support, so I just ran iconv -f UTF-16 -t UTF-8 < file.spin > out.spin on the included files before I ran them through the scanner/parser. If you try to run it on a regular spin file, it'll break. The proper solution would be to bolt a UTF-16 handler onto the scanner input.
Expected warnings
These are the expected warnings when generating the parser:
parse.y:261.7: warning: empty rule for typed nonterminal, and no action
parse.y:310.11: warning: empty rule for typed nonterminal, and no action
parse.y:314.13: warning: empty rule for typed nonterminal, and no action
parse.y:363.5: warning: empty rule for typed nonterminal, and no action
parse.y: conflicts: 3128 shift/reduce, 234 reduce/reduce
Output
Here's sample tokenizer output from the scanner:
1: 1: 1: {{ -> 265 1: 2: 1: $Id$ Author: Perry Harrington Copyright: (c) 2012 Perry Harrington ======================================================================= This is a demo program to show the usage of the SPI SPIN object. This demo reads the current temperature from a Maxim DS1620 digital thermometer and thermostat and displays the temperature on the Parallax Serial Terminal. You can obtain the datasheet from Maxim Semiconductor. This device is an 8 pin DIP with an active high CLK input and single DQ data line instead of MISO and MOSI I/O. The chip uses LSB first endianness, data is read PRE clock. The maximum clock rate is 1.75Mhz. To send a command to the chip you must set RST high, send the command, then signal the end of command by setting RST low. When receiving data set RST low to signal the end of reception. The output of the chip is a 9 bit signed temperature in 1/2 degree Celsius units. That means you must divide the result by 2 to get whole degrees. -> 355 1: 3: 0: }} -> 267 2: 4: 0: CON -> 258 4: 5: 0: _clkmode -> 356 4: 6: 0: = -> 299 4: 7: 0: xtal1 -> 356 4: 8: 0: + -> 310 4: 9: 0: pll16x -> 356 5: 10: 0: _xinfreq -> 356 5: 11: 0: = -> 299 5: 12: 0: 5_000_000 -> 358 7: 13: 0: DPIN -> 356 7: 14: 0: = -> 299 7: 15: 0: 0 -> 358 7: 16: 2: ' -> 272 7: 17: 2: Data pin -> 355 8: 18: 0: -> 264 8: 19: 0: CPIN -> 356 8: 20: 0: = -> 299 8: 21: 0: 1 -> 358 8: 22: 2: ' -> 272 8: 23: 2: Clock pin -> 355 9: 24: 0: -> 264 9: 25: 0: RST -> 356 9: 26: 0: = -> 299 9: 27: 0: 2 -> 358 9: 28: 2: ' -> 272 9: 29: 2: Reset pin -> 355 10: 30: 0: -> 264 10: 31: 0: CLKr -> 356 10: 32: 0: = -> 299 10: 33: 0: 10 -> 358 10: 34: 2: ' -> 272 10: 35: 2: 100us clock frequency -> 355 11: 36: 0: -> 264 12: 37: 0: START_CONVERT -> 356 12: 38: 0: = -> 299 12: 39: 3: $ -> 36 12: 40: 0: EE -> 360 12: 41: 2: ' -> 272 12: 42: 2: Start temp conversion -> 355 13: 43: 0: -> 264 13: 44: 0: READ_TEMP -> 356 13: 45: 0: = -> 299 13: 46: 3: $ -> 36 13: 47: 0: AA -> 360 13: 48: 2: ' -> 272 13: 49: 2: Read temp from DS1620 -> 355 14: 50: 0: -> 264 14: 51: 0: WRITE_CONFIG -> 356 14: 52: 0: = -> 299 14: 53: 3: $ -> 36 14: 54: 0: 0C -> 360 14: 55: 2: ' -> 272 14: 56: 2: Write config register -> 355 15: 57: 0: -> 264 15: 58: 0: CONFIG_REG -> 356 15: 59: 0: = -> 299 15: 60: 0: % -> 37 15: 61: 0: 0000_1010 -> 358 15: 62: 2: ' -> 272 15: 63: 2: enable continuous conversion and CPU 3wire -> 355 16: 64: 0: -> 264 16: 65: 0: CONVERT_TIME -> 356 16: 66: 0: = -> 299 16: 67: 0: 750 -> 358 16: 68: 2: ' -> 272 16: 69: 2: Takes 750ms to initialize temp conversion -> 355 17: 70: 0: -> 264 17: 71: 0: EEPROM_WTIME -> 356 17: 72: 0: = -> 299 17: 73: 0: 10 -> 358 17: 74: 2: ' -> 272 17: 75: 2: Takes 10ms to write to EEPROM config -> 355 18: 76: 0: -> 264 19: 77: 0: OBJ -> 259 21: 78: 0: pst -> 356 21: 79: 0: : -> 58 21: 80: 0: "Parallax Serial Terminal" -> 357 22: 81: 0: spi -> 356 22: 82: 0: : -> 58 22: 83: 0: "SPI_spinv1.1" -> 357 24: 84: 0: VAR -> 260 25: 85: 0: long -> 354 25: 86: 0: temperature -> 356 27: 87: 0: PUB -> 262 27: 88: 0: main -> 356 28: 89: 1: {{ -> 265 28: 90: 1: This function does the setup for the DS1620 and reads the current temperature once per second, converts it to Fahrenheit, and displays it on the serial terminal -> 355 28: 91: 0: }} -> 267 30: 92: 0: pst -> 356 30: 93: 0: . -> 46 30: 94: 0: Start -> 356 30: 95: 0: ( -> 40 30: 96: 0: 115200 -> 358 30: 97: 0: ) -> 41 30: 98: 2: ' -> 272 30: 99: 2: debug output -> 355 31: 100: 0: -> 264 31: 101: 0: pst -> 356 31: 102: 0: . -> 46 31: 103: 0: Str -> 356 31: 104: 0: ( -> 40 31: 105: 0: String -> 356 31: 106: 0: ( -> 40 31: 107: 0: "Welcome to the SPI demo! This program prints the current" -> 357 31: 108: 0: ) -> 41 31: 109: 0: ) -> 41 32: 110: 0: pst -> 356 32: 111: 0: . -> 46 32: 112: 0: Newline -> 356 33: 113: 0: pst -> 356 33: 114: 0: . -> 46 33: 115: 0: Str -> 356 33: 116: 0: ( -> 40 33: 117: 0: String -> 356 33: 118: 0: ( -> 40 33: 119: 0: "temperature from a DS1620 digital thermometer chip." -> 357 33: 120: 0: ) -> 41 33: 121: 0: ) -> 41 34: 122: 0: pst -> 356 34: 123: 0: . -> 46 34: 124: 0: Newline -> 356 35: 125: 0: pst -> 356 35: 126: 0: . -> 46 35: 127: 0: Str -> 356 35: 128: 0: ( -> 40 35: 129: 0: String -> 356 35: 130: 0: ( -> 40 35: 131: 0: "Binary Hex Fahrenheit" -> 357 35: 132: 0: ) -> 41 35: 133: 0: ) -> 41 36: 134: 0: pst -> 356 36: 135: 0: . -> 46 36: 136: 0: Newline -> 356 37: 137: 0: spi -> 356 37: 138: 0: . -> 46 37: 139: 0: Start -> 356 37: 140: 0: ( -> 40 37: 141: 0: CLKr -> 356 37: 142: 0: , -> 44 37: 143: 0: spi -> 356 37: 144: 0: # -> 35 37: 145: 0: HIGH -> 356 37: 146: 0: ) -> 41 37: 147: 2: ' -> 272 37: 148: 2: initial clock high, going low on first bit -> 355 38: 149: 0: -> 264 39: 150: 0: DIRA -> 356 39: 151: 0: [ -> 91 39: 152: 0: RST -> 356 39: 153: 0: ] -> 93 39: 154: 0: ~~ -> 348 39: 155: 2: ' -> 272 39: 156: 2: enable RST as output -> 355 40: 157: 0: -> 264 40: 158: 0: OUTA -> 356 40: 159: 0: [ -> 91 40: 160: 0: RST -> 356 40: 161: 0: ] -> 93 40: 162: 0: ~ -> 347 40: 163: 2: ' -> 272 40: 164: 2: set RST low -> 355 41: 165: 0: -> 264 42: 166: 0: Toggle -> 356 42: 167: 0: ( -> 40 42: 168: 0: RST -> 356 42: 169: 0: ) -> 41 42: 170: 2: ' -> 272 42: 171: 2: clear the RST line so we can talk to chip -> 355 43: 172: 0: -> 264 44: 173: 0: spi -> 356 44: 174: 0: . -> 46 44: 175: 0: ShiftOut -> 356 44: 176: 0: ( -> 40 44: 177: 0: DPIN -> 356 44: 178: 0: , -> 44 44: 179: 0: CPIN -> 356 44: 180: 0: , -> 44 44: 181: 0: spi -> 356 44: 182: 0: # -> 35 44: 183: 0: LSB -> 356 44: 184: 0: , -> 44 44: 185: 0: 8 -> 358 44: 186: 0: , -> 44 44: 187: 0: WRITE_CONFIG -> 356 44: 188: 0: ) -> 41 44: 189: 2: ' -> 272 44: 190: 2: send the config write command -> 355 45: 191: 0: -> 264 45: 192: 0: spi -> 356 45: 193: 0: . -> 46 45: 194: 0: ShiftOut -> 356 45: 195: 0: ( -> 40 45: 196: 0: DPIN -> 356 45: 197: 0: , -> 44 45: 198: 0: CPIN -> 356 45: 199: 0: , -> 44 45: 200: 0: spi -> 356 45: 201: 0: # -> 35 45: 202: 0: LSB -> 356 45: 203: 0: , -> 44 45: 204: 0: 8 -> 358 45: 205: 0: , -> 44 45: 206: 0: CONFIG_REG -> 356 45: 207: 0: ) -> 41 45: 208: 2: ' -> 272 45: 209: 2: send config register contents -> 355 46: 210: 0: -> 264 47: 211: 0: Toggle -> 356 47: 212: 0: ( -> 40 47: 213: 0: RST -> 356 47: 214: 0: ) -> 41 49: 215: 0: PauseMS -> 356 49: 216: 0: ( -> 40 49: 217: 0: EEPROM_WTIME -> 356 49: 218: 0: ) -> 41 49: 219: 2: ' -> 272 49: 220: 2: wait 10ms for config to write to EEPROM -> 355 50: 221: 0: -> 264 51: 222: 0: Toggle -> 356 51: 223: 0: ( -> 40 51: 224: 0: RST -> 356 51: 225: 0: ) -> 41 53: 226: 0: spi -> 356 53: 227: 0: . -> 46 53: 228: 0: SHIFTOUT -> 356 53: 229: 0: ( -> 40 53: 230: 0: DPIN -> 356 53: 231: 0: , -> 44 53: 232: 0: CPIN -> 356 53: 233: 0: , -> 44 53: 234: 0: spi -> 356 53: 235: 0: # -> 35 53: 236: 0: LSBFIRST -> 356 53: 237: 0: , -> 44 53: 238: 0: 8 -> 358 53: 239: 0: , -> 44 53: 240: 0: START_CONVERT -> 356 53: 241: 0: ) -> 41 53: 242: 2: ' -> 272 53: 243: 2: send beginning of conversion command -> 355 54: 244: 0: -> 264 55: 245: 0: Toggle -> 356 55: 246: 0: ( -> 40 55: 247: 0: RST -> 356 55: 248: 0: ) -> 41 57: 249: 0: PauseMS -> 356 57: 250: 0: ( -> 40 57: 251: 0: CONVERT_TIME -> 356 57: 252: 0: ) -> 41 57: 253: 2: ' -> 272 57: 254: 2: wait for first conversion -> 355 58: 255: 0: -> 264 59: 256: 0: repeat -> 273 60: 257: 0: Toggle -> 356 60: 258: 0: ( -> 40 60: 259: 0: RST -> 356 60: 260: 0: ) -> 41 62: 261: 0: spi -> 356 62: 262: 0: . -> 46 62: 263: 0: ShiftOut -> 356 62: 264: 0: ( -> 40 62: 265: 0: DPIN -> 356 62: 266: 0: , -> 44 62: 267: 0: CPIN -> 356 62: 268: 0: , -> 44 62: 269: 0: spi -> 356 62: 270: 0: # -> 35 62: 271: 0: LSB -> 356 62: 272: 0: , -> 44 62: 273: 0: 8 -> 358 62: 274: 0: , -> 44 62: 275: 0: READ_TEMP -> 356 62: 276: 0: ) -> 41 62: 277: 2: ' -> 272 62: 278: 2: send temp reading command -> 355 63: 279: 0: -> 264 63: 280: 0: temperature -> 356 63: 281: 0: := -> 294 63: 282: 0: spi -> 356 63: 283: 0: . -> 46 63: 284: 0: ShiftIn -> 356 63: 285: 0: ( -> 40 63: 286: 0: DPIN -> 356 63: 287: 0: , -> 44 63: 288: 0: CPIN -> 356 63: 289: 0: , -> 44 63: 290: 0: spi -> 356 63: 291: 0: # -> 35 63: 292: 0: LSB -> 356 63: 293: 0: + -> 310 63: 294: 0: spi -> 356 63: 295: 0: # -> 35 63: 296: 0: PRE -> 356 63: 297: 0: , -> 44 63: 298: 0: 9 -> 358 63: 299: 0: ) -> 41 63: 300: 2: ' -> 272 63: 301: 2: fetch temperature value -> 355 64: 302: 0: -> 264 65: 303: 0: Toggle -> 356 65: 304: 0: ( -> 40 65: 305: 0: RST -> 356 65: 306: 0: ) -> 41 67: 307: 0: pst -> 356 67: 308: 0: . -> 46 67: 309: 0: Bin -> 356 67: 310: 0: ( -> 40 67: 311: 0: temperature -> 356 67: 312: 0: , -> 44 67: 313: 0: 9 -> 358 67: 314: 0: ) -> 41 67: 315: 2: ' -> 272 67: 316: 2: print out raw value binary -> 355 68: 317: 0: -> 264 68: 318: 0: pst -> 356 68: 319: 0: . -> 46 68: 320: 0: Char -> 356 68: 321: 0: ( -> 40 68: 322: 0: 32 -> 358 68: 323: 0: ) -> 41 69: 324: 0: pst -> 356 69: 325: 0: . -> 46 69: 326: 0: Hex -> 356 69: 327: 0: ( -> 40 69: 328: 0: temperature -> 356 69: 329: 0: , -> 44 69: 330: 0: 4 -> 358 69: 331: 0: ) -> 41 69: 332: 2: ' -> 272 69: 333: 2: print out raw hex value -> 355 70: 334: 0: -> 264 70: 335: 0: pst -> 356 70: 336: 0: . -> 46 70: 337: 0: Char -> 356 70: 338: 0: ( -> 40 70: 339: 0: 32 -> 358 70: 340: 0: ) -> 41 72: 341: 0: temperature -> 356 72: 342: 0: := -> 294 72: 343: 0: temperature -> 356 72: 344: 0: << -> 334 72: 345: 0: 23 -> 358 72: 346: 0: ~> -> 336 72: 347: 0: 23 -> 358 72: 348: 2: ' -> 272 72: 349: 2: zero, sign extend, and convert to whole degrees -> 355 73: 350: 0: -> 264 73: 351: 0: temperature -> 356 73: 352: 0: *= -> 315 73: 353: 0: 5 -> 358 75: 354: 0: temperature -> 356 75: 355: 0: := -> 294 75: 356: 0: temperature -> 356 75: 357: 0: * -> 314 75: 358: 0: 9 -> 358 75: 359: 0: / -> 318 75: 360: 0: 5 -> 358 75: 361: 0: + -> 310 75: 362: 0: 320 -> 358 75: 363: 2: ' -> 272 75: 364: 2: convert reading to fahrenheit -> 355 76: 365: 0: -> 264 77: 366: 0: pst -> 356 77: 367: 0: . -> 46 77: 368: 0: Dec -> 356 77: 369: 0: ( -> 40 77: 370: 0: temperature -> 356 77: 371: 0: / -> 318 77: 372: 0: 10 -> 358 77: 373: 0: ) -> 41 77: 374: 2: ' -> 272 77: 375: 2: print out temp in whole degrees Fahrenheit -> 355 78: 376: 0: -> 264 78: 377: 0: pst -> 356 78: 378: 0: . -> 46 78: 379: 0: Char -> 356 78: 380: 0: ( -> 40 78: 381: 0: "." -> 357 78: 382: 0: ) -> 41 79: 383: 0: pst -> 356 79: 384: 0: . -> 46 79: 385: 0: Dec -> 356 79: 386: 0: ( -> 40 79: 387: 0: temperature -> 356 79: 388: 0: // -> 320 79: 389: 0: 10 -> 358 79: 390: 0: ) -> 41 79: 391: 2: ' -> 272 79: 392: 2: print out tenths using modulus operator -> 355 80: 393: 0: -> 264 80: 394: 0: pst -> 356 80: 395: 0: . -> 46 80: 396: 0: Char -> 356 80: 397: 0: ( -> 40 80: 398: 0: "°" -> 357 80: 399: 0: ) -> 41 81: 400: 0: pst -> 356 81: 401: 0: . -> 46 81: 402: 0: Newline -> 356 83: 403: 0: PauseMS -> 356 83: 404: 0: ( -> 40 83: 405: 0: 1000 -> 358 83: 406: 0: ) -> 41 83: 407: 2: ' -> 272 83: 408: 2: update once per second -> 355 84: 409: 0: -> 264 85: 410: 0: PRI -> 263 85: 411: 0: PauseMS -> 356 85: 412: 0: ( -> 40 85: 413: 0: _time -> 356 85: 414: 0: ) -> 41 86: 415: 0: waitcnt -> 356 86: 416: 0: ( -> 40 86: 417: 0: CLKFREQ -> 356 86: 418: 0: / -> 318 86: 419: 0: 1000 -> 358 86: 420: 0: * -> 314 86: 421: 0: _time -> 356 86: 422: 0: + -> 310 86: 423: 0: cnt -> 356 86: 424: 0: ) -> 41 88: 425: 0: PRI -> 263 88: 426: 0: Toggle -> 356 88: 427: 0: ( -> 40 88: 428: 0: _pin -> 356 88: 429: 0: ) -> 41 89: 430: 0: ! -> 344 89: 431: 0: OUTA -> 356 89: 432: 0: [ -> 91 89: 433: 0: _pin -> 356 89: 434: 0: ] -> 93 91: 435: 0: DAT -> 356 92: 436: 1: {{ -> 265 92: 437: 1: ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TERMS OF USE: MIT License │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation │ │files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, │ │modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software│ │is furnished to do so, subject to the following conditions: │ │ │ │The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.│ │ │ │THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE │ │WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR │ │COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, │ │ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -> 355 92: 438: 0: }} -> 267
Here is the output from the parser:
Block Comment: ' $Id$ Author: Perry Harrington Copyright: (c) 2012 Perry Harrington ======================================================================= This is a demo program to show the usage of the SPI SPIN object. This demo reads the current temperature from a Maxim DS1620 digital thermometer and thermostat and displays the temperature on the Parallax Serial Terminal. You can obtain the datasheet from Maxim Semiconductor. This device is an 8 pin DIP with an active high CLK input and single DQ data line instead of MISO and MOSI I/O. The chip uses LSB first endianness, data is read PRE clock. The maximum clock rate is 1.75Mhz. To send a command to the chip you must set RST high, send the command, then signal the end of command by setting RST low. When receiving data set RST low to signal the end of reception. The output of the chip is a 9 bit signed temperature in 1/2 degree Celsius units. That means you must divide the result by 2 to get whole degrees. ' Block type changed: 258: CON Symbol: _clkmode Symbol: xtal1 Symbol: pll16x Add: xtal1 + pll16x Constant assignment: _clkmode = xtal1 Symbol: _xinfreq Value: 5000000 Constant assignment: _xinfreq = 5000000 Symbol: DPIN Value: 0 Constant assignment: DPIN = 0 Code Comment: 'Data pin' Symbol: CPIN Value: 1 Constant assignment: CPIN = 1 Code Comment: 'Clock pin' Symbol: RST Value: 2 Constant assignment: RST = 2 Code Comment: 'Reset pin' Symbol: CLKr Value: 10 Constant assignment: CLKr = 10 Code Comment: '100us clock frequency' Symbol: START_CONVERT Value: EE Constant assignment: START_CONVERT = EE Code Comment: 'Start temp conversion' Symbol: READ_TEMP Value: AA Constant assignment: READ_TEMP = AA Code Comment: 'Read temp from DS1620' Symbol: WRITE_CONFIG Value: 0C Constant assignment: WRITE_CONFIG = 0C Code Comment: 'Write config register' Symbol: CONFIG_REG Value: 00001010 Constant assignment: CONFIG_REG = 00001010 Code Comment: 'enable continuous conversion and CPU 3wire' Symbol: CONVERT_TIME Value: 750 Constant assignment: CONVERT_TIME = 750 Code Comment: 'Takes 750ms to initialize temp conversion' Symbol: EEPROM_WTIME Value: 10 Constant assignment: EEPROM_WTIME = 10 Code Comment: 'Takes 10ms to write to EEPROM config' Block type changed: 259: OBJ Load object: pst -> Parallax Serial Terminal Load object: spi -> SPI_spinv1.1 Block type changed: 260: VAR Variable declaration: long -> temperature Block type changed: 262: PUB Function: main Block Comment: ' This function does the setup for the DS1620 and reads the current temperature once per second, converts it to Fahrenheit, and displays it on the serial terminal ' Symbol: pst Symbol: Start Value: 115200 Parameter expression: Start -> 115200 Object call: pst -> Start Code Comment: 'debug output' Symbol: pst Symbol: Str Symbol: String Literal String: Welcome to the SPI demo! This program prints the current Parameter expression: String -> Welcome to the SPI demo! This program prints the current Parameter expression: Str -> String Object call: pst -> Str Symbol: pst Symbol: Newline Object call: pst -> Newline Symbol: pst Symbol: Str Symbol: String Literal String: temperature from a DS1620 digital thermometer chip. Parameter expression: String -> temperature from a DS1620 digital thermometer chip. Parameter expression: Str -> String Object call: pst -> Str Symbol: pst Symbol: Newline Object call: pst -> Newline Symbol: pst Symbol: Str Symbol: String Literal String: Binary Hex Fahrenheit Parameter expression: String -> Binary Hex Fahrenheit Parameter expression: Str -> String Object call: pst -> Str Symbol: pst Symbol: Newline Object call: pst -> Newline Symbol: spi Symbol: Start Symbol: CLKr Symbol: spi Symbol: HIGH Object constant reference: spi -> HIGH Expression list: CLKr , spi#HIGH Parameter expression: Start -> CLKr Object call: spi -> Start Code Comment: 'initial clock high, going low on first bit' Symbol: DIRA Symbol: RST Array expression: DIRA -> RST Post Set Assign: DIRA ~~ Code Comment: 'enable RST as output' Symbol: OUTA Symbol: RST Array expression: OUTA -> RST Post Clear: OUTA ~ Code Comment: 'set RST low' Symbol: RST Code Comment: 'clear the RST line so we can talk to chip' Symbol: spi Symbol: ShiftOut Symbol: DPIN Symbol: CPIN Symbol: spi Symbol: LSB Value: 8 Symbol: WRITE_CONFIG Expression list: 8 , WRITE_CONFIG Expression list: LSB , 8 Object constant reference: spi -> LSB Expression list: CPIN , spi#LSB Expression list: DPIN , CPIN Parameter expression: ShiftOut -> DPIN Object call: spi -> ShiftOut Code Comment: 'send the config write command' Symbol: spi Symbol: ShiftOut Symbol: DPIN Symbol: CPIN Symbol: spi Symbol: LSB Value: 8 Symbol: CONFIG_REG Expression list: 8 , CONFIG_REG Expression list: LSB , 8 Object constant reference: spi -> LSB Expression list: CPIN , spi#LSB Expression list: DPIN , CPIN Parameter expression: ShiftOut -> DPIN Object call: spi -> ShiftOut Code Comment: 'send config register contents' Symbol: RST Symbol: EEPROM_WTIME Code Comment: 'wait 10ms for config to write to EEPROM' Symbol: RST Symbol: spi Symbol: SHIFTOUT Symbol: DPIN Symbol: CPIN Symbol: spi Symbol: LSBFIRST Value: 8 Symbol: START_CONVERT Expression list: 8 , START_CONVERT Expression list: LSBFIRST , 8 Object constant reference: spi -> LSBFIRST Expression list: CPIN , spi#LSBFIRST Expression list: DPIN , CPIN Parameter expression: SHIFTOUT -> DPIN Object call: spi -> SHIFTOUT Code Comment: 'send beginning of conversion command' Symbol: RST Symbol: CONVERT_TIME Code Comment: 'wait for first conversion' Symbol: Toggle Symbol: RST Parameter expression: Toggle -> RST Repeat expression: Toggle Symbol: spi Symbol: ShiftOut Symbol: DPIN Symbol: CPIN Symbol: spi Symbol: LSB Value: 8 Symbol: READ_TEMP Expression list: 8 , READ_TEMP Expression list: LSB , 8 Object constant reference: spi -> LSB Expression list: CPIN , spi#LSB Expression list: DPIN , CPIN Parameter expression: ShiftOut -> DPIN Object call: spi -> ShiftOut Code Comment: 'send temp reading command' Symbol: temperature Symbol: spi Symbol: ShiftIn Symbol: DPIN Symbol: CPIN Symbol: spi Symbol: LSB Symbol: spi Symbol: PRE Value: 9 Expression list: PRE , 9 Object constant reference: spi -> PRE Add: LSB + spi#PRE Object constant reference: spi -> LSB Expression list: CPIN , spi#LSB Expression list: DPIN , CPIN Parameter expression: ShiftIn -> DPIN Object call: spi -> ShiftIn Variable assignment: temperature = spi Code Comment: 'fetch temperature value' Symbol: RST Symbol: pst Symbol: Bin Symbol: temperature Value: 9 Expression list: temperature , 9 Parameter expression: Bin -> temperature Object call: pst -> Bin Code Comment: 'print out raw value binary' Symbol: pst Symbol: Char Value: 32 Parameter expression: Char -> 32 Object call: pst -> Char Symbol: pst Symbol: Hex Symbol: temperature Value: 4 Expression list: temperature , 4 Parameter expression: Hex -> temperature Object call: pst -> Hex Code Comment: 'print out raw hex value' Symbol: pst Symbol: Char Value: 32 Parameter expression: Char -> 32 Object call: pst -> Char Symbol: temperature Symbol: temperature Value: 23 Shift Left: temperature << 23 Value: 23 Shift Arithmetic Right: temperature ~> 23 Variable assignment: temperature = temperature Code Comment: 'zero, sign extend, and convert to whole degrees' Symbol: temperature Value: 5 Multiply low=: temperature *= 5 Symbol: temperature Symbol: temperature Value: 9 Value: 5 Divide: 9 / 5 Multiply low: temperature * 9 Value: 320 Add: temperature + 320 Variable assignment: temperature = temperature Code Comment: 'convert reading to fahrenheit' Symbol: pst Symbol: Dec Symbol: temperature Value: 10 Divide: temperature / 10 Parameter expression: Dec -> temperature Object call: pst -> Dec Code Comment: 'print out temp in whole degrees Fahrenheit' Symbol: pst Symbol: Char Literal String: . Parameter expression: Char -> . Object call: pst -> Char Symbol: pst Symbol: Dec Symbol: temperature Value: 10 Modulus: temperature // 10 Parameter expression: Dec -> temperature Object call: pst -> Dec Code Comment: 'print out tenths using modulus operator' Symbol: pst Symbol: Char Literal String: ° Parameter expression: Char -> ° Object call: pst -> Char Symbol: pst Symbol: Newline Object call: pst -> Newline Value: 1000 Code Comment: 'update once per second' Block type changed: 263: PRI Symbolic parameter: _time Function: PauseMS Symbol: CLKFREQ Value: 1000 Divide: CLKFREQ / 1000 Symbol: _time Multiply low: CLKFREQ * _time Symbol: cnt Add: CLKFREQ + cnt Block type changed: 263: PRI Symbolic parameter: _pin Function: Toggle Symbol: OUTA Symbol: _pin Array expression: OUTA -> _pin Bitwize NOT: ! OUTA Block Comment: ' ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TERMS OF USE: MIT License │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation │ │files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, │ │modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software│ │is furnished to do so, subject to the following conditions: │ │ │ │The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.│ │ │ │THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE │ │WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR │ │COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, │ │ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ '
zip
88K
Comments
Languages are traditionally specified in Backus–Naur Form (BNF), which is a pseudo code that resembled yacc rules very closely.
I looked over the grammar and noticed a couple of errors/incomplete bits. I used the Propeller Manual to define many of the keywords and grammar, so it should make sense to most.
Given what I've provided above, one could take that code and make a syntax highlighter, but syntax highlighting is typically much simpler.
Eg:
keyword_list: keyword_list | keywords { color = blue }
keywords: if | repeat | while | until | case
literal_list: literals { color = red }
literals: NUMBER | STRING
operator_list: operators { color = green }
operators: + | - | * | /
Ad nauseam...
Eric