How to search an open file (SD card) for a string?
Patrick1ab
Posts: 136
I'm currently working on a function to get the tags of a music file.
ID3v1 (thanks to Harrison Pham) and ID3v2 (my implementation) are already working, although the ID3v2 part is somehow limited needs more testing.
Now I'm trying to get Itunes Tags working, but there is a lot of data before the really interesting "atoms" appear.
I think this is due to the cover image which is stored in the music file...
Do you know how to search a file for "©nam", "©ART", "©alb" and return the starting position of these 4 byte long entries?
ID3v1 (thanks to Harrison Pham) and ID3v2 (my implementation) are already working, although the ID3v2 part is somehow limited needs more testing.
Now I'm trying to get Itunes Tags working, but there is a lot of data before the really interesting "atoms" appear.
I think this is due to the cover image which is stored in the music file...
Do you know how to search a file for "©nam", "©ART", "©alb" and return the starting position of these 4 byte long entries?
Comments
but I think it could be, that the data starts at the 511th byte and continues at the beginning of the next
512 byte. So I need to save for example the last two bytes of the previous block, merge it with the first two bytes of the next block and compare again.
Sounds a bit complicated, but I should be able to do that.
This will also increase the speed, as the time that you search in one buffer is already used by FSRW to load the next 512 bytes - that's done in the background.
For example you have a buffer somewhere
byte[noparse][[/noparse]@buffer][noparse][[/noparse]offset]
can be used to access each byte.
offset:=(offset+1) & $3ff 'this will do the wraparound
And loading the next buffer:
if offset & $1ff
' load next buffer
if offset & $3ff
' load upper 512 bytes
else
' load lower 512 bytes
While Buffer 1 is being filled, Buffer 3 is read and Buffer 2 and Buffer 4 contain the data
before and after Buffer 3. Then rotating the buffers with your solution.
And rotating does not mean to copy stuff from buffer to buffer, you only have to use the right offset.
offset == 0 -> means you have to request a block into buffer[noparse][[/noparse]0..511]. You can detect that by the inner if statement in my example code (if offset & $3ff / but the else case).
offset == 512 -> means you have to request a block into buffer[noparse][[/noparse]512..1023].
In case the offset reaches 1024 the & $3ff will set it to zero again.
You only need more buffers if the string you want to search is longer than 512 characters.
Post Edited (MagIO2) : 4/3/2010 8:56:33 PM GMT
The problem is, that I will also need to get 4 bytes which are before the search string.
That's where the size of the oncoming atom is written (damn Apple).
If I only split my buffer into 2 pieces and the first or second part is overwritten I might loose the size information and
have to start again.
I'll show you some source code soon.
Let's assume the buffer/sector size is·8 instead of 512
In my example the string + the size information (8 bytes) is much smaller than the buffer I defined (256 bytes), so nothing will be overwritten by accident.
Hmm, but the search routine won't be fast, because I always have to search the full buffer.
So each of the two parts is run through twice, because the string search function does not know that 111 is the beginning of the string.
Alternatively I would have to check every character, but that's even slower.
In addition the wraparound has to take place in the search function.
Otherwise it will read it exactly like this:
11200000 00000111
I think I've stared too long on the source code today and since it's 2 am now, I think I'll get some rest.
This is my code so far:
I know that at the end of the repeat loop I have to load the next part of the buffer.
I'm not sure if this will work (especially the line after "else"):
Post Edited (Patrick1ab) : 4/4/2010 8:10:33 PM GMT
You'd run through the whole file simply by incrementing offset as already shown.
For each search-string you have an counter which runs from 0 to stringlength-1. But you only increment it when the character in the file-buffer and in the search string are the same. If you reach stringlenght-1 you know that you found that search-string.
Just some pseudo-code
With the code I posted it can happen, that maybe you found the search string and try to read the data, but the buffer doesn't contain the data and so
you try to read past the buffer.
In my example I have a buffer of 256 bytes.
If the search string is at buffer[noparse][[/noparse]252], the program would try to load buffer[noparse][[/noparse]260] to get the size.
Since this isn't part of the buffer anymore, junk data will be written into the size variable.
Same will happen to the data fields like strAlbum, strArtist or strTitle
Can you provide a sample file and all the tags you search for?
SD card - please change the constant SDpinBase according to your setup
The card needs a file named SWAP3.SYS which has at least a size of 6kB. Of course you could also use an ITunes file and change the search-strings.
Parallax Serial Terminal - this will start the program (press any key) and show the result.
PS: You can copy the title, author and stuff to other string variables in the same way I read the size. For example when you found a string you set·a copyTo variable according to which string has been found. And when (state·== $ff) & (size > 0) you simply copy the actual buffer byte into byte[noparse][[/noparse] copyTo++ ] and decrease the size.
Post Edited (MagIO2) : 4/4/2010 10:30:38 PM GMT
Concerning the requirements:
1. and 2. are no problem,
but I guess that Parallax Serial Terminal won't work if you program a Prop with Prop Loader using a Propeller Demoboard.
I'll probably modify it to output on my LCD.
The demoboard is just programming the prop but it has no program afterwards to route the traffic to the computer. So my prop might send serial data to the demoboard, but when I select COM3 on my computer and click enable nothing arrives.
I'll have to change the output to my LCD, although it's only 4x20 characters big.
Okay, the output to the LCD works. The file content is being updated a bit to fast for my eyes, but the other data I can read
Your program seems to find all three strings. This is what my display shows me:
filesize: 9114411 bytes
found 0 size: 73140000
found 1 size: 00000000
found 2 size: 00000000
As a criteria to exit the repeat loop, I will add the search for the "mdat" string, because this is where the tag stops and the music data begins.
Post Edited (Patrick1ab) : 4/5/2010 12:08:48 AM GMT
"this prepares a file to contain some search-strings" surely means write to that file.
I should have read (and understood) the comments, before typing in the filename of my music file.
Okay, when it wrote the search strings into the file and run through the search routine, it was able to find every string.
Now that I removed the "writing part" and copied a backup of my music file to the SD card, it isn't able to find the search strings.
I think I have to reduce the string length to 4 to make it work again, because the strings in the music file aren't zero terminated.
Although, it should still be 4.... since only the first 4 bytes are being compared... hm, strange
Post Edited (Patrick1ab) : 4/5/2010 11:22:37 AM GMT
but there is a new problem coming up:
I measured the time it takes to read the Itunes TAG until the "mdat" atom appears (in this file at the 605078 th byte ).
Guess what my stopwatch says: 2 minutes and 53 seconds (current data output was switched off)
Does someone know how much speed you gain by changing spin code to assembler? I'll bet it isn't 157 times faster, is it?
When I think about it .... there MUST be a pointer to the TAG area. Music data can be anything, so it can also 'accidently' look like a tag. I'd google for the iTunes file format to find out more.
Normally the tag area starts directly at the beginning of these files, so I don't need to skip music data.
The tag ends with mdat, which signalizes that music data is going to follow.
The problem is that this tag sometimes includes the album cover, which takes several kilobytes of space.
Maybe I'll try to look for this first and then jump the number of bytes to skip it.