Is There A Better Way?

JonnyMac · 2020-10-30 02:45

I'm working on a program where I need to look for a sub-string inside a larger string. I knocked this together. Is there a better way?

pub find(p_needle, p_haystack) : pos | nlen, hlen, hidx, nidx, check

'' Returns index of sub-string p_needle in string p_haystack
'' -- if not found, returns -1

  nlen := strsize(p_needle)
  hlen := strsize(p_haystack)

  if (nlen > hlen)
    return -1

  repeat hidx from 0 to (hlen-nlen+1)
    check := 0
    repeat nidx from 0 to nlen-1
      if (byte[p_needle+nidx] == byte[p_haystack+hidx+nidx])
        ++check
      else
        quit
    if (check == nlen)
      return hidx

  return -1

Ariba · 2020-10-30 03:45

This (untested) code may be a bit faster:

pub find(p_needle, p_haystack) : pos | pn, ph

'' Returns index of sub-string p_needle in string p_haystack
'' -- if not found, returns -1

  repeat while byte[p_haystack+pos]
    ph := p_haystack+pos
    pn := p_needle
    repeat while byte[pn]
      if byte[ph++] <> byte[pn++]
        quit
    ifnot byte[pn]
      return pos                'found
    pos++

  return -1                     'not found

Andy

JonnyMac · 2020-10-30 03:54

Thanks, Andy, I'll give it a try.

JonnyMac · 2020-10-30 17:03

Interesting. Testing my original version with what you suggested -- was about a toss-up in execution speed (though yours does use fewer variables). I modified your outer loop to work like mine, and now your [modified] version is consistently faster by 15 to 20%.

pub find(p_needle, p_haystack) : pos | pn, ph

'' Returns index of sub-string p_needle in string p_haystack
'' -- if not found, returns -1

  repeat (strsize(p_haystack)-strsize(p_needle)+1)
    ph := p_haystack+pos
    pn := p_needle
    repeat while byte[pn]
      if (byte[ph++] <> byte[pn++])
        quit
    ifnot (byte[pn])
      return pos 
    pos++

  return -1

Ariba · 2020-10-30 17:18

I also did some tests today, and your code was even faster, until I modified mine a bit.

There is an error in my code with single character strings, they will be always found at begin.
The inner loop needs to be changed like that:

    repeat while byte[pn]
      if byte[ph++] <> byte[pn]
        quit
      pn++

But I also tried an inline PASM version of my code, and that is about 10 times faster:

pub find(p_needle, p_haystack) : pos | pn, ph, c1,c2

'' Returns index of sub-string p_needle in string p_haystack
'' -- if not found, returns -1

  pos := p_haystack
  org
  .hloop  rdbyte  c1,pos  wz
   if_z   jmp     #.notfnd
          mov     ph,pos
          mov     pn,p_needle
  .nloop  rdbyte  c1,pn  wz
   if_z   jmp     #.hnxt
          rdbyte  c2,ph
          add     ph,#1
          cmp     c2,c1  wz
   if_z   add     pn,#1
   if_z   jmp     #.nloop
  .hnxt
          cmp     c1,#0  wz
   if_z   sub     pos,p_haystack
   if_z   jmp     #.fnd
          add     pos,#1
          jmp     #.hloop
  .notfnd
          neg     pos,#1
  .fnd
  end
  return pos

I also compared it with fastspin, and here are the results:

cycles     PNUT     FASTSPIN
 find:     31848    7145
 findjm:   35216    7665
 findas:   3280     2617

Edit: This was a 3 char string that was found at 35th position in a big string

Andy

Duane Degn · 2020-10-30 17:22

Does Spin2 calculate the code below each loop?

strsize(p_haystack)-strsize(p_needle)+1

I know the P1 would calculate it each loop.
If Spin2 does calculate the value each loop, it would likely speed up the method to calculate the value once and save it to a size variable.

JonnyMac · 2020-10-30 17:25

I just tried inline PASM as well -- results match the other versions and is dramatically faster. I will compare mine to yours, Andy.

pub find(p_needle, p_haystack) : pos | len, pn, ph, n, h

'' Returns index of sub-string p_needle in string p_haystack
'' -- if not found, returns -1

  len := strsize(p_haystack)-strsize(p_needle)+1

  org
.for1           mov       ph, p_haystack
                add       ph, pos
                mov       pn, p_needle
                
.for2           rdbyte    n, pn                         wz
    if_z        jmp       #.next2
                rdbyte    h, ph
                cmp       n, h                          wz
    if_ne       jmp       #.next2
                add       ph, #1
                add       pn, #1
                jmp       #.for2

.next2          or        n, #0                         wz
    if_z        jmp       #.done
                add       pos, #1
                
.next1          djnz      len, #.for1         
                neg       pos, #1

.done
  end

I'm always very verbose with my coding -- looking to get better at writing readable code that is a little trimmer around the edges.

JonnyMac · 2020-10-30 17:33

Duane Degn wrote: »
Does Spin2 calculate the code below each loop?
strsize(p_haystack)-strsize(p_needle)+1
I know the P1 would calculate it each loop.
If Spin2 does calculate the value each loop, it would likely speed up the method to calculate the value once and save it to a size variable.

I don't think it does; I moved that expression out of the repeat line (to a variable), and the loop ran a little slower. I think that particular expression is evaluated once with the value moved to an internal control variable.

Wuerfel_21 · 2020-10-30 17:33

Using the FIFO to read through the haystack would likely be even faster. (But IDK if you need to backup/restore it's state in Spin2 inline ASM?)

Wuerfel_21 · 2020-10-30 17:35

JonnyMac wrote: »
Duane Degn wrote: »
Does Spin2 calculate the code below each loop?
strsize(p_haystack)-strsize(p_needle)+1
I know the P1 would calculate it each loop.
If Spin2 does calculate the value each loop, it would likely speed up the method to calculate the value once and save it to a size variable.
I don't think it does; I moved that expression out of the repeat line (to a variable), and the loop ran a little slower. I think that particular expression is evaluated once with the value moved to an internal control variable.

Yes, with "REPEAT N" loops, N is only evaluated when entering the loop, even in Spin1.

JonnyMac · 2020-10-30 17:35

There may be ways to make it even faster; that said, I always go for clarity and the opportunity for myself and other programmers to learn. I love that Spin2 lets us play with fragments of PASM like this.

Duane Degn · 2020-10-30 17:39

Wuerfel_21 wrote: »

Yes, with "REPEAT N" loops, N is only evaluated when entering the loop, even in Spin1.

I would have thought this is the way Spin behaves, but I recall a thread by Phil about his surprise when he discovered this value was calculated each loop.

I'll need to find the thread to prove to myself I'm not crazy.

Wuerfel_21 · 2020-10-30 17:52

Duane Degn wrote: »

Wuerfel_21 wrote: »

Yes, with "REPEAT N" loops, N is only evaluated when entering the loop, even in Spin1.

I would have thought this is the way Spin behaves, but I recall a thread by Phil about his surprise when he discovered this value was calculated each loop.

I'll need to find the thread to prove to myself I'm not crazy.

"REPEAT i FROM X TO Y (STEP Z)" type loops do evaluate X, Y and Z once per iteration. Are you thinking of those?

JonnyMac · 2020-10-30 18:23

This is where I landed with the find() method -- again, using one of my favorite features of Spin2 (inline PASM) to learn and make the method faster.

pub find(p_needle, p_haystack) : pos | len, pn, ph, n, h

'' Returns index of sub-string p_needle in string p_haystack
'' -- if not found, returns -1

  len := strsize(p_haystack)-strsize(p_needle)+1

  org
.hloop          mov       ph, p_haystack                        ' point into haystack
                add       ph, pos
                mov       pn, p_needle                          ' point at needle
.nloop          rdbyte    n, pn                         wz      ' get char from needle
    if_z        jmp       #.done                                ' if terminating 0, all chars matched
                rdbyte    h, ph                                 ' get char from haystack
                cmp       n, h                          wz      ' compare
    if_ne       jmp       #.nextp                               ' if !=, move haystack starting pos
                add       ph, #1                                ' update for next chars
                add       pn, #1
                jmp       #.nloop                               ' check next char in needle
.nextp          add       pos, #1                               ' move haystack starting pos
                djnz      len, #.hloop                          ' keep going until done       
                neg       pos, #1                               ' needle not found, set result to -1 
.done
  end

I appreciate the input.

Duane Degn · 2020-10-30 18:25

Wuerfel_21 wrote: »

Are you thinking of those?

Edit: Wuerfel_21 was correct. The "step" inclusion threw me off.

No, I'm just crazy.

I couldn't find the thread I mentioned previously so I wrote a little test program. The repeat command behaves just like you and others claim.

I'm trying to figure out why I thought this was something which could be changed.

Thanks to everyone setting me straight.

Duane Degn · 2020-10-30 18:40

I think I figured out my confusion.

  repeat x + y

In the above code "x+ y" is evaluated once. However the "x +y" in the code below is evaluated each loop.

  repeat i from 1 to x + y

Edit: This was just what @Wuerfel_21 was trying to tell me.

Phil Pilgrim (PhiPi) · 2020-10-30 19:36

That difference between the two is a bit disconcerting. In the former case, there has to be a virtual variable that does the counting from 1 to x + y. So it's really not that different from the second case. But the behavior is entirely different -- for reasons I can't even begin to rationalize.

I can only imagine -- without looking at the byte codes -- that the counter in the first case actually counts from x+y to zero, i.e. in a djnz type of looping.

My preference for the second case would have been to evaluate x + y only once, before the beginning of the loop.

-Phil

Wuerfel_21 · 2020-10-30 19:58

Phil Pilgrim (PhiPi) wrote: »

That difference between the two is a bit disconcerting. In the former case, there has to be a virtual variable that does the counting from 1 to x + y. So it's really not that different from the second case. But the behavior is entirely different -- for reasons I can't even begin to rationalize.

I can only imagine -- without looking at the byte codes -- that the counter in the first case actually counts from x+y to zero, i.e. in a djnz type of looping.

Yes, that's how it works - there's a hidden variable that counts to zero. I don't really know why the other kind of loop doesn't cache it's bounds/step and why the REPEAT N loop doesn't give you access to the counter variable.

Rayman · 2020-10-31 00:54

Don't hate me, but one great thing about C is that you don't have to reinvent wheels like this...

JonnyMac · 2020-10-31 00:58

Don't hate me, but one great thing about C is that you don't have to reinvent wheels like this...

You're not helpful. Please don't post in my threads -- I have absolutely NO interest in anything you have to say.

rogloh · 2020-10-31 01:03

Having access to a large range of capabilities/algorithms like this will be vey helpful for people who need to apply them to a P2 but either don't know or have the time to design and code it all for themselves and reinvent the wheel etc.

Once Chip adds in the uncalled code removal to Spin2 it will open up the possibility of building up larger suites of useful library functions. Areas such as string processing, maths and graphics are good candidates and can be optimised by assembly loops where it makes sense for performance.

However right now without the dead code removal in official Spin2, we are somewhat limited and would need to either cut-paste snippets from library code where that is possible and try to construct any libraries using myriads of smaller fully independent files which isn't always convenient to do in cases with shared data, otherwise we risk bloating the final code with an entire library when it may only be a handful of functions needed.

Dave Hein · 2020-10-31 01:26

JonnyMac wrote: »

Don't hate me, but one great thing about C is that you don't have to reinvent wheels like this...

You're not helpful. Please don't post in my threads -- I have absolutely NO interest in anything you have to say.

@JonnyMac, that was uncalled for. Rayman was just pointing out that C has a large library of functions that handle lots of things. In this case it's the strstr function that finds substrings within strings.

Before you also ban me from your threads I will volunteer to not to post to your threads or comment on any of your posts until you publicly apologize to Rayman.

JonnyMac · 2020-10-31 01:38

Before you also ban me from your threads I will volunteer to not to post to your threads or comment on any of your posts until you publicly apologize to Rayman.

Don't hold your breath.

Roy Eltham · 2020-10-31 03:25

Wow, Rayman is one of the nicest guys on here, and has contributed so much for decades, did someone Smile in Jon's coffee today? Geez.

JonnyMac · 2020-10-31 03:43

There is no need to defend the indefensible, nor be surprised that I call Smile out when it happens. Andy was helpful without judgement. Ray knew he was being smug by prefacing his silly comment with "Don't hate me..." He could have -- if he actually cared about helping people -- said, "Hey, Jon, in C there is a strstr() function that you may want to check out." -- being he knows that a lot of C can be translated to Spin (my choice of language). How difficult is that? It's not really, Evan did it.

Dear C programmers: If you the only point of your post is to tell Spin programmers how much better C is, well just don't.

Ken Gracey · 2020-11-01 16:04

I'm locking this thread from further discussion in hopes that we can pause further discord.

Ken Gracey

Is There A Better Way?

Comments