Parsing a File
pbhuter
Posts: 36
How fast could a Prop chip parse and extract data from an ASCII text file that is 5000 rows and 10 columns? I could break the file down into small pieces and parse it with multiple cogs to speed things up. Thanks.
Comments
Does anyone have an idea of how long it will take to parse an ASCII text file with 50,000 or so entries? Thanks.
and "...ASCII text file with 50,000 or so entries" and "...five times a second"
This makes no sense to me. Why not just parse the ASCII text, whatever numbers you have in columns and rows, just once?
Perhaps you have to initially parse it into binary data written out to a second file. Then read that binary data back into memory and use it as is.
However, 50 thousand numbers won't fit into 32K RAM.
So you may end up repeatedly reading that binary into RAM in smaller chunks and using it for whatever purpose. As binary data 5 times per second straight of the file it may well be doable.
If not, get a external RAM on to your Prop and pull it in from there.
So where is it coming from, how is being delivered and how fast can that be?
I don't see any mention of you data source here.
Thing is, if its a stream from which you have to extract some stuff then there is no "file" anywhere. What I mean is there is no need to have it all stored anywhere for any time. Just let it pass by and pull out what you want and discard the original stream as you go.
Attached is a txt file with some data. Could you parse it and find the very last entry (last row, last column)? Thanks.
Good and bad news.
Good news: Yes it can be done on a propeller. See screen grab. This is your 12k text file. Downloaded via xmodem to the propeller, then run a small program, and it prints out your last row, last column entry. I'd prefer C or Sbasic but Mbasic does have a rather nice command for finding the end of files.
Bad news. It took 5 seconds to read the file off the sd card.
But - it is possible to do slowly. I think it would be much much faster using Kyedos and some custom spin code, plus that would not involve external memory.
"I want to go through the ASCII file until I find a line or block of data that matches the data from my data source. Once that line or block of data is found, I need to read additional information from the ASCII file associated with that line or block of data."
As far as a parsing which is probably a more relevant performance measure, I created a test to find the last column of data on EACH row and measure how long that takes. Parsing each item until the last row gives worst case.
I get 1.38s for Spin to find the last column on each of 250 rows at 80MHz.
So, parsing 50,000 such records in spin at minimum would take > 4.6 minutes and that's without reading from SD card. Written in PASM, you might knock that down to maybe 15s since Spin is about 20 times slower than PASM but again you still have SD card overhead.
--Steve
Then we have in coming data X, Y.
We want to match X, Y with something in the table, say A and B.
Then we want to return the corresponding C, D, E..for that A, B.
Does that sound correct?
If so, the table should be pre-parsed only once and held as binary numbers somewhere. The incoming data chunks are small and can be parsed very quickly as they arrive. Probably want to store the numbers as scaled integers rather than floating point as it looks like they only have a couple of decimal places.
So basically we have problem to search for the X, Y,s in the table on each new input. Searching linearly through the table on each new input will be slow. So something smarter should be done. Basically the incoming X,Y value is a key for which we have to find the corresponding value in the table.
A simple thing would be to have the table be sorted at start up in increasing X,Y. Then for each incoming key value a binary search on the table will quickly find you the value you want. Given that the table is now basically arrays of values a binary search is easy to do.
Problem, the table still does not fit in HUB RAM. BUt it could be held in external RAM and the binary search done on it there. I think there is plenty of speed for that.
However in the absence of other "real-time" tasks in this application perhaps a different processor/micro-controller with enough RAM or FLASH to hold the table would be a simpler approach.
Paul