Shop OBEX P1 Docs P2 Docs Learn Events
Parsing a File — Parallax Forums

Parsing a File

pbhuterpbhuter Posts: 36
edited 2010-08-11 00:39 in Propeller 1
How fast could a Prop chip parse and extract data from an ASCII text file that is 5000 rows and 10 columns? I could break the file down into small pieces and parse it with multiple cogs to speed things up. Thanks.

Comments

  • mctriviamctrivia Posts: 3,772
    edited 2010-08-10 13:44
    pretty quick depending on table format. take a look at the wav player object in the obex. it reads 44000 samples per second with time to spare.
  • pbhuterpbhuter Posts: 36
    edited 2010-08-10 14:55
    I need to be able to read my file multiple times a second. So, in theory, with say five cogs running it, I could read 50,000 entries (5000 rows by 10 columns) five times a second? That would be awesome.
  • LeonLeon Posts: 7,620
    edited 2010-08-10 14:56
    Parsing a large file like that is completely different from playing a sound file. The latter can be done "on the fly".
  • pbhuterpbhuter Posts: 36
    edited 2010-08-10 15:05
    So no go on reading a file five times a second.

    Does anyone have an idea of how long it will take to parse an ASCII text file with 50,000 or so entries? Thanks.
  • LeonLeon Posts: 7,620
    edited 2010-08-10 15:17
    What do you mean by "parsing"?
  • Heater.Heater. Posts: 21,230
    edited 2010-08-10 15:19
    pbhuter: "I need to be able to read my file multiple times a second."
    and "...ASCII text file with 50,000 or so entries" and "...five times a second"

    This makes no sense to me. Why not just parse the ASCII text, whatever numbers you have in columns and rows, just once?

    Perhaps you have to initially parse it into binary data written out to a second file. Then read that binary data back into memory and use it as is.

    However, 50 thousand numbers won't fit into 32K RAM.

    So you may end up repeatedly reading that binary into RAM in smaller chunks and using it for whatever purpose. As binary data 5 times per second straight of the file it may well be doable.

    If not, get a external RAM on to your Prop and pull it in from there.
  • pbhuterpbhuter Posts: 36
    edited 2010-08-10 15:20
    I will have some values from a source that will (hopefully) also show up in the file. I need to find the place where those values show up and extract additional information from the file.
  • pbhuterpbhuter Posts: 36
    edited 2010-08-10 15:22
    I know I'll need external memory for the file. The data source (mentioned in my post a minute ago) will refresh at 5Hz or so with new information to be found in the file.
  • jazzedjazzed Posts: 11,803
    edited 2010-08-10 15:26
    pbhuter wrote: »
    So no go on reading a file five times a second.

    Does anyone have an idea of how long it will take to parse an ASCII text file with 50,000 or so entries? Thanks.
    Provide a smaller example with say less than 16K bytes in a .txt file, and define what the expected result should be. Then we can find out.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-10 15:36
    Hmm...So we are not really parsing a file here but rather a continuous stream of data coming from somewhere.

    So where is it coming from, how is being delivered and how fast can that be?

    I don't see any mention of you data source here.

    Thing is, if its a stream from which you have to extract some stuff then there is no "file" anywhere. What I mean is there is no need to have it all stored anywhere for any time. Just let it pass by and pull out what you want and discard the original stream as you go.
  • pbhuterpbhuter Posts: 36
    edited 2010-08-10 15:43
    jazzed:

    Attached is a txt file with some data. Could you parse it and find the very last entry (last row, last column)? Thanks.
  • pbhuterpbhuter Posts: 36
    edited 2010-08-10 15:45
    I have a stream of data that I need to match up with the data contained in the ASCII file.
  • LeonLeon Posts: 7,620
    edited 2010-08-10 15:58
    What you are trying to do is still unclear. Do you want to read the data until you get the sequence in that text file? Or just a subset of it?
  • pbhuterpbhuter Posts: 36
    edited 2010-08-10 16:00
    I want to go through the ASCII file until I find a line or block of data that matches the data from my data source. Once that line or block of data is found, I need to read additional information from the ASCII file associated with that line or block of data.
  • LeonLeon Posts: 7,620
    edited 2010-08-10 16:03
    I'd use a small ARM chip rather than a Propeller for that. It'll be much easier and cheaper. Try implementing it on a PC first.
  • pbhuterpbhuter Posts: 36
    edited 2010-08-10 16:09
    Would that be the chips at www.arm.com?
  • LeonLeon Posts: 7,620
    edited 2010-08-10 16:40
    ARM doesn't make chips, they only license the cores. Companies like NXP actually make the chips. Have a look at the LPC1768, it has 64k SRAM on-chip which should be able to hold your array. You can buy a little development board for it for under $30.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2010-08-10 18:07
    RE Attached is a txt file with some data. Could you parse it and find the very last entry (last row, last column)? Thanks.

    Good and bad news.

    Good news: Yes it can be done on a propeller. See screen grab. This is your 12k text file. Downloaded via xmodem to the propeller, then run a small program, and it prints out your last row, last column entry. I'd prefer C or Sbasic but Mbasic does have a rather nice command for finding the end of files.

    Bad news. It took 5 seconds to read the file off the sd card.

    But - it is possible to do slowly. I think it would be much much faster using Kyedos and some custom spin code, plus that would not involve external memory.
    800 x 566 - 49K
  • LeonLeon Posts: 7,620
    edited 2010-08-10 18:36
    I don't think that is what he really wants to do, though:

    "I want to go through the ASCII file until I find a line or block of data that matches the data from my data source. Once that line or block of data is found, I need to read additional information from the ASCII file associated with that line or block of data."
  • jazzedjazzed Posts: 11,803
    edited 2010-08-10 20:02
    I misunderstood the requirement I guess :) Just reading the last column of the last row should be no slower than reading the file which is apparently quite slow.

    As far as a parsing which is probably a more relevant performance measure, I created a test to find the last column of data on EACH row and measure how long that takes. Parsing each item until the last row gives worst case.

    I get 1.38s for Spin to find the last column on each of 250 rows at 80MHz.

    So, parsing 50,000 such records in spin at minimum would take > 4.6 minutes and that's without reading from SD card. Written in PASM, you might knock that down to maybe 15s since Spin is about 20 times slower than PASM but again you still have SD card overhead.

    --Steve
  • LeonLeon Posts: 7,620
    edited 2010-08-10 20:12
    That is still not what the OP wants. As I understand his requirement he wants to search the file for text in input data, and when a match is found, output the matched text and some of the subsequent records.
  • pbhuterpbhuter Posts: 36
    edited 2010-08-10 22:19
    I'm beginning to think what I want to do isn't possible on a Propeller chip. I'm starting to look at other processor packages. Thank you.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-10 22:25
    So what we have here is a big table containing value sets, An, Bn, Cn, Dn, En.... Where "n" is 0 to some number in tens of thousands.

    Then we have in coming data X, Y.

    We want to match X, Y with something in the table, say A and B.

    Then we want to return the corresponding C, D, E..for that A, B.

    Does that sound correct?

    If so, the table should be pre-parsed only once and held as binary numbers somewhere. The incoming data chunks are small and can be parsed very quickly as they arrive. Probably want to store the numbers as scaled integers rather than floating point as it looks like they only have a couple of decimal places.

    So basically we have problem to search for the X, Y,s in the table on each new input. Searching linearly through the table on each new input will be slow. So something smarter should be done. Basically the incoming X,Y value is a key for which we have to find the corresponding value in the table.

    A simple thing would be to have the table be sorted at start up in increasing X,Y. Then for each incoming key value a binary search on the table will quickly find you the value you want. Given that the table is now basically arrays of values a binary search is easy to do.

    Problem, the table still does not fit in HUB RAM. BUt it could be held in external RAM and the binary search done on it there. I think there is plenty of speed for that.

    However in the absence of other "real-time" tasks in this application perhaps a different processor/micro-controller with enough RAM or FLASH to hold the table would be a simpler approach.
  • pbhuterpbhuter Posts: 36
    edited 2010-08-11 00:39
    Heater, I think you hit on exactly what I am trying to do. I downloaded a complete file with a couple hundred thousand entries (which I'll probably cut down some), and the file size was about 256 kB, so I know I'll need to keep it stored in external memory somewhere. I'm going to look into other chips, though. The ARM processors Leon mentioned may do the trick. Thanks.

    Paul
Sign In or Register to comment.