How to search an open file (SD card) for a string?

Patrick1ab · 2010-04-03 14:41

I'm currently working on a function to get the tags of a music file.

ID3v1 (thanks to Harrison Pham) and ID3v2 (my implementation) are already working, although the ID3v2 part is somehow limited needs more testing.

Now I'm trying to get Itunes Tags working, but there is a lot of data before the really interesting "atoms" appear.
I think this is due to the cover image which is stored in the music file...

Do you know how to search a file for "©nam", "©ART", "©alb" and return the starting position of these 4 byte long entries?

Mike Green · 2010-04-03 15:05

You just have to do the search "the hard way", read in successive 512 byte blocks and scan them for the desired data. I don't think you can search for longs since they might start at any byte address. I may be wrong, but they're probably not aligned on a 4-byte boundary.

Patrick1ab · 2010-04-03 16:00

Okay, but there is one problem. The chance that this happens is very low,
but I think it could be, that the data starts at the 511th byte and continues at the beginning of the next
512 byte. So I need to save for example the last two bytes of the previous block, merge it with the first two bytes of the next block and compare again.
Sounds a bit complicated, but I should be able to do that.

MagIO2 · 2010-04-03 16:45

You should have 2 buffers 512 bytes each, then you don't have to move. Just do a wraparound when you reached 1023.
This will also increase the speed, as the time that you search in one buffer is already used by FSRW to load the next 512 bytes - that's done in the background.

For example you have a buffer somewhere

byte[noparse][[/noparse]@buffer][noparse][[/noparse]offset]

can be used to access each byte.

offset:=(offset+1) & $3ff 'this will do the wraparound

And loading the next buffer:

if offset & $1ff
' load next buffer
if offset & $3ff
' load upper 512 bytes
else
' load lower 512 bytes

Patrick1ab · 2010-04-03 19:49

Good idea! I think that I will split it into 4 smaller buffers:

While Buffer 1 is being filled, Buffer 3 is read and Buffer 2 and Buffer 4 contain the data
before and after Buffer 3. Then rotating the buffers with your solution.

MagIO2 · 2010-04-03 20:51

That's not necessary and to be honest I think it's a bad idea as SD-card sectors have a size of 512 bytes. FSRW reads the·next sector·into internal COG RAM in advance. When you request the next 512 bytes these are already there and FSRW is only copying the stuff into HUB RAM which is pretty fast in PASM.

And rotating does not mean to copy stuff from buffer to buffer, you only have to use the right offset.

offset == 0 -> means you have to request a block into buffer[noparse][[/noparse]0..511]. You can detect that by the inner if statement in my example code (if offset & $3ff / but the else case).
offset == 512 -> means you have to request a block into buffer[noparse][[/noparse]512..1023].

In case the offset reaches 1024 the & $3ff will set it to zero again.

You only need more buffers if the string you want to search is longer than 512 characters.

Post Edited (MagIO2) : 4/3/2010 8:56:33 PM GMT

Patrick1ab · 2010-04-03 21:16

Hmm sorry, I think I don't really understand.
The problem is, that I will also need to get 4 bytes which are before the search string.
That's where the size of the oncoming atom is written (damn Apple).
If I only split my buffer into 2 pieces and the first or second part is overwritten I might loose the size information and
have to start again.
I'll show you some source code soon.

MagIO2 · 2010-04-03 21:53

Where is the problem? You have a actual buffer and you have the previous 512 bytes in the other buffer.

Let's assume the buffer/sector size is·8 instead of 512

File on SD card:
00000000 00000111 11200000 00001111 13000000
 
Search strings:
111112
111113
 
(Offset & $f) == 0 -> load first sector into buffer
 
Buffer:
00000000
 
You don't find anything - offset is equal to 8 ( offset & $f == $8 ) -> load second sector in upper part of buffer:
 
00000000 00000111
 
You have matches, but your offset reaches 16 before you found the full string ( offset & $f == $0 ) -> load third sector
 
11200000 00000111
 
So you can still go back to the beginning of the string.

Patrick1ab · 2010-04-03 22:55

Yeah okay, I see now:

In my example the string + the size information (8 bytes) is much smaller than the buffer I defined (256 bytes), so nothing will be overwritten by accident.

Hmm, but the search routine won't be fast, because I always have to search the full buffer.
So each of the two parts is run through twice, because the string search function does not know that 111 is the beginning of the string.
Alternatively I would have to check every character, but that's even slower.

In addition the wraparound has to take place in the search function.
Otherwise it will read it exactly like this:
11200000 00000111

I think I've stared too long on the source code today and since it's 2 am now, I think I'll get some rest.

This is my code so far:

marker:=%0000
      \sdfat.seek(4)
      if sdfat.pgetc == "f" and sdfat.pgetc == "t" and sdfat.pgetc == "y" and sdfat.pgetc == "p"
        sdfat.pread(@Metadata, 256)
        repeat while str.indexOf(@Metadata, string("mdat"))<>-1
          if marker & %0001 == 0
            i:=str.indexOf(@Metadata, string(%10101001, "alb"))
            if i<>-1
              bytemove(@size, @Metadata[noparse][[/noparse]i+4], 4)
              if size <= 30
                bytemove(@strAlbum, @Metadata[noparse][[/noparse]i+20], size)
                strAlbum[noparse][[/noparse]size] := 0
                marker += %0001
          if marker & %0010 == 0
            i:=str.indexOf(@Metadata, string(%10101001, "art"))
            if i<>-1
              bytemove(@size, @Metadata[noparse][[/noparse]i+4], 4)
              if size <= 30
                bytemove(@strArtist, @Metadata[noparse][[/noparse]i+20], size)
                strArtist[noparse][[/noparse]size] := 0
                marker += %0010
          if marker & %0100 == 0
            i:=str.indexOf(@Metadata, string(%10101001, "nam"))
            if i<>-1
              bytemove(@size, @Metadata[noparse][[/noparse]i+4], 4)
              if size <= 30
                bytemove(@strTitle, @Metadata[noparse][[/noparse]i+20], size)
                strTitle[noparse][[/noparse]size] := 0
                marker += %0100

I know that at the end of the repeat loop I have to load the next part of the buffer.
I'm not sure if this will work (especially the line after "else"):

        if marker & %1000 == 0
          sdfat.pread(@Metadata, 128)
        else
          sdfat.pread(@Metadata[noparse][[/noparse]128], 128)
        marker ^= %1000

Post Edited (Patrick1ab) : 4/4/2010 8:10:33 PM GMT

MagIO2 · 2010-04-04 13:39

There is no need to run to it twice ... you only need some clever programming and some more variables.

You'd run through the whole file simply by incrementing offset as already shown.
For each search-string you have an counter which runs from 0 to stringlength-1. But you only increment it when the character in the file-buffer and in the search string are the same. If you reach stringlenght-1 you know that you found that search-string.

Just some pseudo-code

repeat filepos from 0 to filelength
  repeat number of searchstrings
    if buffer[noparse][[/noparse]offset] == str[noparse][[/noparse]num][noparse][[/noparse]actual_char[noparse][[/noparse]num]]
      if actual_char[noparse][[/noparse]num] == 0
        store_start_address[noparse][[/noparse]num] := filepos 
      actual_char[noparse][[/noparse]num]++
    else
      actual_char[noparse][[/noparse]num] := 0
    if actual_char[noparse][[/noparse]num] == stringlength[noparse][[/noparse]num]
      ' found

Patrick1ab · 2010-04-04 20:05

I think that it is not possible to read Itunes Tags with the limited RAM space of the propeller.

With the code I posted it can happen, that maybe you found the search string and try to read the data, but the buffer doesn't contain the data and so
you try to read past the buffer.

In my example I have a buffer of 256 bytes.
If the search string is at buffer[noparse][[/noparse]252], the program would try to load buffer[noparse][[/noparse]260] to get the size.
Since this isn't part of the buffer anymore, junk data will be written into the size variable.
Same will happen to the data fields like strAlbum, strArtist or strTitle

MagIO2 · 2010-04-04 20:44

You found the string ... fine ... you add 4 to the offset (and do an & $1ff) and if necessary load the next sector in the one of the two buffers. I don't see your problem. As long as the data you want to access is not longer than 256 bytes.

Can you provide a sample file and all the tags you search for?

MagIO2 · 2010-04-04 22:17

I prepared a demo for you. Only requirements are:

SD card - please change the constant SDpinBase according to your setup

The card needs a file named SWAP3.SYS which has at least a size of 6kB. Of course you could also use an ITunes file and change the search-strings.

Parallax Serial Terminal - this will start the program (press any key) and show the result.

PS: You can copy the title, author and stuff to other string variables in the same way I read the size. For example when you found a string you set·a copyTo variable according to which string has been found. And when (state·== $ff) & (size > 0) you simply copy the actual buffer byte into byte[noparse][[/noparse] copyTo++ ] and decrease the size.

Post Edited (MagIO2) : 4/4/2010 10:30:38 PM GMT

Patrick1ab · 2010-04-04 22:34

Great! Thank you very much

Concerning the requirements:

1. and 2. are no problem,

but I guess that Parallax Serial Terminal won't work if you program a Prop with Prop Loader using a Propeller Demoboard.
I'll probably modify it to output on my LCD.

MagIO2 · 2010-04-04 22:40

My guess is that the prop-loader will free the serial connection after loading the prop. The propeller serial terminal on the other hand frees the com-port each time it looses focus, so I'd expect no problem.

Patrick1ab · 2010-04-04 23:29

I just tried and I guess it's like I expected:

The demoboard is just programming the prop but it has no program afterwards to route the traffic to the computer. So my prop might send serial data to the demoboard, but when I select COM3 on my computer and click enable nothing arrives.

I'll have to change the output to my LCD, although it's only 4x20 characters big.

Okay, the output to the LCD works. The file content is being updated a bit to fast for my eyes, but the other data I can read

Your program seems to find all three strings. This is what my display shows me:

filesize: 9114411 bytes

found 0 size: 73140000
found 1 size: 00000000
found 2 size: 00000000

As a criteria to exit the repeat loop, I will add the search for the "mdat" string, because this is where the tag stops and the music data begins.

Post Edited (Patrick1ab) : 4/5/2010 12:08:48 AM GMT

Patrick1ab · 2010-04-05 11:13

Damn, I'm stupid!

"this prepares a file to contain some search-strings" surely means write to that file.
I should have read (and understood) the comments, before typing in the filename of my music file.

Okay, when it wrote the search strings into the file and run through the search routine, it was able to find every string.
Now that I removed the "writing part" and copied a backup of my music file to the SD card, it isn't able to find the search strings.
I think I have to reduce the string length to 4 to make it work again, because the strings in the music file aren't zero terminated.

Although, it should still be 4.... since only the first 4 bytes are being compared... hm, strange

Post Edited (Patrick1ab) : 4/5/2010 11:22:37 AM GMT

Patrick1ab · 2010-04-15 22:46

Great! Finally it works...

but there is a new problem coming up:
I measured the time it takes to read the Itunes TAG until the "mdat" atom appears (in this file at the 605078 th byte

).

Guess what my stopwatch says: 2 minutes and 53 seconds

(current data output was switched off)

Does someone know how much speed you gain by changing spin code to assembler? I'll bet it isn't 157 times faster, is it?

MagIO2 · 2010-04-16 07:26

I think you should know the data-structure of the filetype a bit better. Isn't there an offset at the beginning of the file which tells you where the TAG-area starts? Then you can skip all the music data very fast.

When I think about it .... there MUST be a pointer to the TAG area. Music data can be anything, so it can also 'accidently' look like a tag. I'd google for the iTunes file format to find out more.

Patrick1ab · 2010-04-16 09:39

MagIO2 said...
I think you should know the data-structure of the filetype a bit better. Isn't there an offset at the beginning of the file which tells you where the TAG-area starts? Then you can skip all the music data very fast.

Normally the tag area starts directly at the beginning of these files, so I don't need to skip music data.
The tag ends with mdat, which signalizes that music data is going to follow.

The problem is that this tag sometimes includes the album cover, which takes several kilobytes of space.
Maybe I'll try to look for this first and then jump the number of bytes to skip it.

How to search an open file (SD card) for a string?

Comments