PDA

View Full Version : Pattern matching



Dr_Acula
12-26-2009, 09:01 PM
This is a question for the Spin experts, and I would be very grateful for some advice.

I have some data coming in - either a keyboard or a serial port or from somewhere else, and I want to look for certain keywords. In pseudo basic I might do something like this:



do
a$=inputcharacter ' get a single character in a$
bufferstring$=a$+bufferstring$ ' add the new character to the left of a 16 character string
bufferstring$=left$(bufferstring$,16) ' get rid of the rightmost character
' now look for matches with arbitrary strings
if left$(bufferstring$,5)="HELLO" then
' do something
if left$(bufferstring$,7)="DRACULA" then
' do something else
loop



First thing is a rolling buffer. The above adds something to the beginning and truncates off the end, which in effect shuffles everything along. I've seen nifty code in Spin with heads and tails and rolling counters, and even clever things where you add one and then AND it with $0F which rolls it over when it goes from 15 to 0. Is that the best structure?

The next bit is matching a number of variable length strings. These are constants, and their length is known. Is it better to match with a spin string() with n bytes, or is there some other structure for matching with arbitrary strings?

I guess the first question is "Is this hard"?

If it is, then maybe I'll convert plain english commands into single byte commands, as these are easy to match.

However, there are some advantages to using text strings as it is easier to debug and see what is going on.

Advice would be very much appreciated.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller (http://www.smarthome.viviti.com/propeller)

Leon
12-26-2009, 09:25 PM
String-matching has been studied a lot, and there are some efficient algorithms. The Introduction to Algorithms book by Cormen et al analyses several of them - the Rabin-Karp algorithm performs very well:

en.wikipedia.org/wiki/Rabin%E2%80%93Karp_string_search_algorithm (http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_string_search_algorithm)

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM

SamMishal
12-27-2009, 02:00 AM
Dr_Acula said...



do
a$=inputcharacter ' get a single character in a$
bufferstring$=a$+bufferstring$ ' add the new character to the left of a 16 character string
bufferstring$=left$(bufferstring$,16) ' get rid of the rightmost character
' now look for matches with arbitrary strings
if left$(bufferstring$,5)="HELLO" then
' do something
if left$(bufferstring$,7)="DRACULA" then
' do something else
loop




Are you sure in the code above where I bolded it you want to append the character to the LEFT?
shouldn't that be to the RIGHT???· Since a person typing a word will type the first character first.
And if I understand your code you are trying to input the string a character at a time and appending it to
a buffer until the text in the buffer matches some stock commands. No?
·
Anyway, In Spin what you will need is a 17·byte buffer area declared in a DAT section with ALL Zeros as
intitial values. 17 because you need an extra byte for the 0 ending if you want a 16 character string.
·
Then you have a variable that acts as an index that gets incremented every time a character is inputted.
Then you PLACE the character in the byte buffer according to the INDEX.
·
You can create a CIRCULAR buffer if the index is moded (// operator) over 17. That means when the index
is incremented to 17 it goes back to 0.
·
Also spin has a string·a function·called StrComp() which will compare two strings for EXACT match.
·
You will need two ZStrings...that is zero ended strings.
·
You can have all your STOCK words also stored in a DAT section as ZStrings (byte buffers).
·
·
·
HOWEVER, I do not understand why you want to do the circular input buffer.....if you are using the
FullDuplexSerialPlus object it implements that for the GetStr() method....maybe you want to use a
keyboard???
·
Anyway.....the above will work if all you are trying is to get a string input one time only. The circular buffer
will start placing new input characters over old ones but the old ones not yet replaced will be still there.
·
What is best is to have a function that ACQUIRES a string input from the user INDEPENDENT of the COMPARING
operation. Once the user enters the string with a termination indicator like Enter or Space ro something
then you go on to comparing against your stock commands. Also you can limit the user to 16 characters
or WHATEVER you set the size of the BUFFER in the DAT section.

The Imput function will ZERO the buffer before every Input operation and that way you will not have
previous inputs mingled with new ones.
·
I hope this helps..... If you can be more specific as to what you are trying to accomplish I can write the
Spin program for you if you need it.
·
Regards
·
Samuel


Post Edited (SamMishal) : 12/26/2009 6:07:18 PM GMT

max72
12-27-2009, 05:31 AM
Check this regexp object:

http://forums.parallax.com/showthread.php?p=855253

Massimo

Dr_Acula
12-27-2009, 08:53 AM
Thanks guys, this has pointed me in the right direction. I've got a pile of Z80 string code that is very similar, except that it terminates with $ instead of zero, so I can translate that.

Re sam you are quite right, it will end up backwards. I need to get my head around this a bit more - something like a buffer that you feed in from the right hand side and check the last n characters where n is the length of the string you are checking?

Also, a good point re circular buffers, you don't want a string matching at the join of the circle, so maybe a linear buffer where the characters fall off the end?

The problem to solve is very similar to the GPS problem, so that code and thread are just perfect.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller (http://www.smarthome.viviti.com/propeller)

Post Edited (Dr_Acula) : 12/27/2009 1:19:46 AM GMT