Strings
Title: | Strings |
Author: | thehappyhippy |
Published: | Mon, 10 Mar 2008 20:12:00 GMT |
Strings
Introduction
A string is just a sequence of characters, one after another, but first; what is a character ?
To humans, characters are simply shapes which we recognise and attribute meaning to singularly and when making up a string or word. A computer or processor like the Propeller has no comprehension or understanding of those shapes. In order to use characters each must be represented in a form which can be used digitally. This was largely done through the American Standard Code for Information Interchange (ASCII) which specified an 8-bit, byte, value which represents the characters we use. Providing everyone agrees on what value a character has we can move characters ( and strings ) from the digital to our real world and vice-versa. We can deal with characters as shapes, the Propeller can deal with byte values representing those characters.
Digitally then, a string is just a sequence of ASCII codes which represent the values of each character, for example the string "ABC" is represented by three consecutive bytes of hexadecimal value $41, $42 then $43.
String Length
It is convenient to know how long a string is, to know where it ends and where another string ( or something else entirely ) starts. There are two ways to deal with the length of strings; by prefixing the string with a byte, word or long value which specifies how many bytes there are in the string, or by ending the string with a unique value, much like ending a sentence with a period.
Both have their advantages and disadvantages. A length prefix requires the size of the entity representing the length to be large enough to hold the length of the string or the length of string becomes limited ( 255 characters for a byte-sized length ), but a larger sized entity is wasteful for smaller strings. A mechanism to use variable sized entities depending upon length is possible but makes processing and dealing with strings complicated.
The alternative of using a unique terminating value allows for any arbitrary length of string but the characters of a string must be counted up to the terminating value to determine its length and the terminating value cannot be contained within the string itself.
Strings and Spin
The Propeller Tool chooses to deal with strings in the second way, with a unique terminating value, and this value is chosen to be zero. This is also the way in which the C programming language deals with strings. The common term for such a string representation is "zero terminated string".
The Spin programming language provides three functions which can be used to deal with strings; String, StrSize and StrComp. Any other string processing functions have to be implemented by the Spin programmer themselves.
String
The String directive allocates a sequence of byte values and a zero valued terminator in hub memory and returns a pointer to the first character of the string.
Our previous example, the string "ABC", when created using the String("ABC") function has the following byte value sequence created within hub memory; $41, $42, $43 then $00.
String(.. ) ist not a function executed during runtime (thus called "directive"). It has an extended syntax in that comma separated values can be used as parameters, concatenated at compile time. Obviously only constants and "literals" can be used for this.
ptr := string("ABC")
is in each and every respect equivalent to
ptr := @_string99
DAT
_string99 LONG BYTE "ABC",0
StrSize
The StrSize function takes a pointer to a string in hub memory, counts how many characters there are up to the zero value terminator and returns that, the size or length of a string.
With StrSize(String("ABC")) the value returned would be 3.
StrComp
The StrComp function takes two pointers each to two strings and compares each byte of the strings and returns a true value (-1) if they are the same byte sequences and a false value (0) otherwise.
With StrComp(String("ABC"),String("ABC")) the value returned would be true, with StrComp(String("ABC"),String("abc")) the value returned would be false. Note that every character has its own unique value so upper and lowercase characters are not the same.
String Handling
Propeller strings ( created by the String function ) are effectively fixed at compile time and unalterable, read-only. They could be altered but doing so would likely cause incorrect operation of the program and in some cases corruption of the entire program. Read-only strings are useful for displaying and sending messages which do not need to change but strings which are changeable are useful in a number of cases. Changeable strings can be created an manipulated under programmer control.
A string as discussed is simply a sequence of byte values terminated by a zero value. There is no reason that such sequences cannot be created within byte arrays. Once this is done, those byte arrays can be manipulated and used to perform complex string operations.
Setting a String
VAR
byte dstString[256]
PUB Main
SetString( @dstString, String("ABC") )
PRI SetString( dstStrPtr, srcStrPtr )
repeat until ( byte[ dstStrPtr++ ] := byte[ srcStrPtr++ ] ) == 0
- or -
PRI SetString( dstStrPtr, srcStrPtr )
ByteMove(dstStrPtr, srcStrPtr, StrSize(srcStrPtr)+1) '+1 for zero termination
Concatenating Two Strings
VAR
byte dstString[256]
byte srcString1[256]
byte srcString2[256]
PUB Main
SetString( @srcString1, String("ABC") )
SetString( @srcString2, String("DEF") )
AddString( @dstSring, @srcString1, @srcString2 )
PRI AddString( dstStrPtr, srcStrPtr1, srcStrPtr2 )
repeat until ( byte[ dstStrPtr++ ] := byte[ srcStrPtr1++ ] ) == 0
dstStrPtr--
repeat until ( byte[ dstStrPtr++ ] := byte[ srcStrPtr2++ ] ) == 0
- or -
PRI AddString( dstStrPtr, srcStrPtr1, srcStrPtr2 ) | len
len := StrSize(srcStrPtr1)
ByteMove(dstStrPtr, srcStrPtr1, len)
ByteMove(dstStrPtr += len, srcStrPtr2, StrSize(srcStrPtr2)+1) '+1 for zero termination
Appending a Character to a String
VAR
byte dstString[256]
PUB Main
SetString( @dstString, String("ABC") )
AppendChar( @dstString2, "D" )
PRI AppendChar( dstStrPtr, char )
repeat until ( byte[ dstStrPtr++ ] := byte[ srcStrPtr1++ ] ) == 0
byte[ dstStrPtr-1 ] := char
byte[ dstStrPtr ] := 0
- or -
PRI AppendChar( dstStrPtr, char )
dstStrPtr += StrSize(dstStrPtr)
byte[ dstStrPtr++ ] := char
byte[ dstStrPtr ] := 0
Making a String Uppercase
Making a String Lowercase
Getting the Leftmost Characters of a String
Getting the Rightmost Characters of a String
Getting a Sub-String of a String
Finding an Occurrence of a Sub-String Within a String
Other Character and String Representations
A character doesn't have to be a single 8-bit byte. It can be larger ( 16-bit is often used for unicode ) and smaller, either padded to make it a multiple of a common number of bits, or placed bit-contiguous within its storage area. Each character could be of a differing size as it is with morse code where the number of dots and dashes vary according to letter.
A string does not necessarily need a zero terminator; that can be any value which is otherwise unused or defined for the purpose, or a string may begin with a character which is also used to terminate the string, the two not being considered a part of the string itself. The terminator can also be left out entirely where the length of a string is known in advance or it can be indicated by a length which is prefixed before the string itself.
Strings do not have to be contiguous although they usually are. Non-contiguous strings will require complicated mechanisms to determine where the parts of the the string are which makes processing them difficult.