Is There A Better Way?
JonnyMac
Posts: 9,507
I'm working on a program where I need to look for a sub-string inside a larger string. I knocked this together. Is there a better way?
pub find(p_needle, p_haystack) : pos | nlen, hlen, hidx, nidx, check
'' Returns index of sub-string p_needle in string p_haystack
'' -- if not found, returns -1
nlen := strsize(p_needle)
hlen := strsize(p_haystack)
if (nlen > hlen)
return -1
repeat hidx from 0 to (hlen-nlen+1)
check := 0
repeat nidx from 0 to nlen-1
if (byte[p_needle+nidx] == byte[p_haystack+hidx+nidx])
++check
else
quit
if (check == nlen)
return hidx
return -1
This discussion has been closed.

Comments
pub find(p_needle, p_haystack) : pos | pn, ph '' Returns index of sub-string p_needle in string p_haystack '' -- if not found, returns -1 repeat while byte[p_haystack+pos] ph := p_haystack+pos pn := p_needle repeat while byte[pn] if byte[ph++] <> byte[pn++] quit ifnot byte[pn] return pos 'found pos++ return -1 'not foundAndypub find(p_needle, p_haystack) : pos | pn, ph '' Returns index of sub-string p_needle in string p_haystack '' -- if not found, returns -1 repeat (strsize(p_haystack)-strsize(p_needle)+1) ph := p_haystack+pos pn := p_needle repeat while byte[pn] if (byte[ph++] <> byte[pn++]) quit ifnot (byte[pn]) return pos pos++ return -1There is an error in my code with single character strings, they will be always found at begin.
The inner loop needs to be changed like that:
repeat while byte[pn] if byte[ph++] <> byte[pn] quit pn++But I also tried an inline PASM version of my code, and that is about 10 times faster:
pub find(p_needle, p_haystack) : pos | pn, ph, c1,c2 '' Returns index of sub-string p_needle in string p_haystack '' -- if not found, returns -1 pos := p_haystack org .hloop rdbyte c1,pos wz if_z jmp #.notfnd mov ph,pos mov pn,p_needle .nloop rdbyte c1,pn wz if_z jmp #.hnxt rdbyte c2,ph add ph,#1 cmp c2,c1 wz if_z add pn,#1 if_z jmp #.nloop .hnxt cmp c1,#0 wz if_z sub pos,p_haystack if_z jmp #.fnd add pos,#1 jmp #.hloop .notfnd neg pos,#1 .fnd end return posI also compared it with fastspin, and here are the results: Edit: This was a 3 char string that was found at 35th position in a big string
Andy
I know the P1 would calculate it each loop.
If Spin2 does calculate the value each loop, it would likely speed up the method to calculate the value once and save it to a size variable.
pub find(p_needle, p_haystack) : pos | len, pn, ph, n, h '' Returns index of sub-string p_needle in string p_haystack '' -- if not found, returns -1 len := strsize(p_haystack)-strsize(p_needle)+1 org .for1 mov ph, p_haystack add ph, pos mov pn, p_needle .for2 rdbyte n, pn wz if_z jmp #.next2 rdbyte h, ph cmp n, h wz if_ne jmp #.next2 add ph, #1 add pn, #1 jmp #.for2 .next2 or n, #0 wz if_z jmp #.done add pos, #1 .next1 djnz len, #.for1 neg pos, #1 .done endI'm always very verbose with my coding -- looking to get better at writing readable code that is a little trimmer around the edges.I don't think it does; I moved that expression out of the repeat line (to a variable), and the loop ran a little slower. I think that particular expression is evaluated once with the value moved to an internal control variable.
Yes, with "REPEAT N" loops, N is only evaluated when entering the loop, even in Spin1.
I would have thought this is the way Spin behaves, but I recall a thread by Phil about his surprise when he discovered this value was calculated each loop.
I'll need to find the thread to prove to myself I'm not crazy.
"REPEAT i FROM X TO Y (STEP Z)" type loops do evaluate X, Y and Z once per iteration. Are you thinking of those?
pub find(p_needle, p_haystack) : pos | len, pn, ph, n, h '' Returns index of sub-string p_needle in string p_haystack '' -- if not found, returns -1 len := strsize(p_haystack)-strsize(p_needle)+1 org .hloop mov ph, p_haystack ' point into haystack add ph, pos mov pn, p_needle ' point at needle .nloop rdbyte n, pn wz ' get char from needle if_z jmp #.done ' if terminating 0, all chars matched rdbyte h, ph ' get char from haystack cmp n, h wz ' compare if_ne jmp #.nextp ' if !=, move haystack starting pos add ph, #1 ' update for next chars add pn, #1 jmp #.nloop ' check next char in needle .nextp add pos, #1 ' move haystack starting pos djnz len, #.hloop ' keep going until done neg pos, #1 ' needle not found, set result to -1 .done endI appreciate the input.Edit: Wuerfel_21 was correct. The "step" inclusion threw me off.
No, I'm just crazy.
I couldn't find the thread I mentioned previously so I wrote a little test program. The repeat command behaves just like you and others claim.
I'm trying to figure out why I thought this was something which could be changed.
Thanks to everyone setting me straight.
In the above code "x+ y" is evaluated once. However the "x +y" in the code below is evaluated each loop.
Edit: This was just what @Wuerfel_21 was trying to tell me.
I can only imagine -- without looking at the byte codes -- that the counter in the first case actually counts from x+y to zero, i.e. in a djnz type of looping.
My preference for the second case would have been to evaluate x + y only once, before the beginning of the loop.
-Phil
Yes, that's how it works - there's a hidden variable that counts to zero. I don't really know why the other kind of loop doesn't cache it's bounds/step and why the REPEAT N loop doesn't give you access to the counter variable.
Once Chip adds in the uncalled code removal to Spin2 it will open up the possibility of building up larger suites of useful library functions. Areas such as string processing, maths and graphics are good candidates and can be optimised by assembly loops where it makes sense for performance.
However right now without the dead code removal in official Spin2, we are somewhat limited and would need to either cut-paste snippets from library code where that is possible and try to construct any libraries using myriads of smaller fully independent files which isn't always convenient to do in cases with shared data, otherwise we risk bloating the final code with an entire library when it may only be a handful of functions needed.
Before you also ban me from your threads I will volunteer to not to post to your threads or comment on any of your posts until you publicly apologize to Rayman.
Dear C programmers: If you the only point of your post is to tell Spin programmers how much better C is, well just don't.
Ken Gracey