Forum is broken and considering a fresh install - your input requested!

mindrobots · 2015-03-13 13:49

I never claimed it was a general migration utility, just a nice program to grab a text version of threads having a personal interest. Nothing more, nothing less.

(It is pretty cool, though!! :0) )

Heater. · 2015-03-13 13:50

Gordon,

What surprises me, I suppose, is the number of Parallax customers that complain about the company's slow adoption of new ideas, like open source or a mainstream language like C, yet cry when the change actually comes. They want all the benefits of keeping up with trends and standards, but with no inconvenience to themselves.

Are you sure these are actual same customers in each case?

I won't miss the ability to post to years-old threads.

Now this is interesting. I recently got an email notification that someone had replied to a post I made on an electronics forum so many years ago that I had forgotten all about the post and the forum!

An interesting conversation ensued.

Who is to say when a topic is "passed it's sell buy date"?

Parallax is dedicated to the idea that it's chips and other products have a long life span. Ergo so should any discussion forum about those products.

Heater. · 2015-03-13 13:55

ragtop,

I recently looked into this and was amazed there are so many forum engines out there:
http://en.wikipedia.org/wiki/Comparison_of_Internet_forum_software
What actually are you looking for?

Heater. · 2015-03-13 13:59

mindrobots,

I'm really happy that parallax-scrape works for you.

I just want to make sure everyone knows it only works for this forum version, it's basically a special purpose tool or a toy.

But hey, perhaps it can be the basis of another hack to get whatever someone wants from whatever other forum.

twm47099 · 2015-03-14 12:00

mindrobots wrote: »

Actually, there may be!

Heater. came up with parallax-scrape a while back and put it out on his Github repository. It's a little Node program that worked quite well at the time to generate a text file with everything from a thread. The only reason I say "worked" is the forums upgrade/downgrade may have changed some of the HTML markers it uses to find things. It's worth a try on some of your favorite big threads.

I'm sure he'll be glad to see it resurrected!!

Question from a windows user --

I downloaded a d installed node and downloaded parallax-scrape. How do I actually run it?

The github page shows:

$ node parallax-scrape http://forums.parallax.com/showthread.php/149173-Forum-scraping/page2

I ran node from the command line, but when I copy/paste the above into the window at the > prompt, I get a lot of errors starting with.

SynatxError:  Unexpected Identfier

followed by lots of other messages -- I can't cut and paste them unfortunately.

I also tried different versions of the above (deleting the $, deleting "$ node" nothing worked.

I'm using windows 7.

I'd apprciate any help.

thanks
Tom

Phil Pilgrim (PhiPi) · 2015-03-14 12:23

I'm not sure how server-intensive the scrape program is, but I suspect that if a lot of people were using it, it might cause a slow-down.

-Phil

twm47099 · 2015-03-14 12:42

Phil Pilgrim (PhiPi) wrote: »

I'm not sure how server-intensive the scrape program is, but I suspect that if a lot of people were using it, it might cause a slow-down.

-Phil

A clearer version of my question might be: I have no idea how to use Parallax - scrape from my Windows computer (and not much about using git-hub and even less about js.node). What do I enter and where do I enter it. Step by step would be appreciated.

For example in the install section of the readme, it shows 4 "node.js" modules that are required. I have no idea what that means. I did try entering them (copy/paste) at the > prompt, but just got error messages like I noted above.

I don't like asking open ended questions when I'm not even sure what to ask. But I would like to be able to use parallax-scrape.
Thanks
Tom

mindrobots · 2015-03-14 12:48

Tom,

Let me try and find time on my Windows VM and test all the steps and write them down. It should identical steps regardless of OS but let md try it on windows.

mindrobots · 2015-03-14 12:54

Phil Pilgrim (PhiPi) wrote: »

I'm not sure how server-intensive the scrape program is, but I suspect that if a lot of people were using it, it might cause a slow-down.

-Phil

You of course are correct and raise legitimate concern. It is the equivalent of opening up the first page of a thread and clicking the next page numbers at the top/bottom of the page as soon as the page is displayed.

I have no idea how much load that actually places on the server.

Heater. · 2015-03-14 15:03

twm47099,

If you have node installed you are so very close to having parallax-scrape running.

In the parallax-scrape instructions where it says, for example:

$ cd parallax-scrape

What it means is type "cd parallax-scrape" into your DOS Box, or whatever they call the command line on Windows now a days. Then hit the return key.

Basically the "$" there is just indicative of the command line prompt and is not part of what you type in.

You need to type in the commands "npm install XXXXXXX" so as to get some modules needed by paralax- scrape installed.

The the final command is just to run node and have it run parallax-scrape. Just type "node paralax-scrape url".

If you can suggest a clearer way to phrase those instructions I'll think about amending the page.

Heater. · 2015-03-14 15:07

phil,

Oh yeah, parallax-scrape will hit the server as fast as it can.

Perhaps that's not polite but I was never expecting to run it very often.

Anyway it's no worse than doing a recursive wget on the entire forum which anyone can do at any time they like already.

mindrobots · 2015-03-14 15:40

Tom,

I just went through all the steps on my Win7 VM.

1) Go to nodejs.org and install node
2) Go to Github Repository and download the zip (easiest way if you are not familiar with Git)
3) unzip it to some directory
4) Open up a command prompt
5) CD to that directory
6) Follow the steps in the readme.md file displayed on the github repository

You should end up with something like below:

C:\Users\rapost>node -v
v0.12.0

C:\Users\rapost>npm -v
2.5.1

C:\Users\rapost>dir
Volume in drive C has no label.
Volume Serial Number is 4466-6E0A

Directory of C:\Users\rapost

03/14/2015 06:20 PM <DIR> .
03/14/2015 06:20 PM <DIR> ..
12/18/2012 05:35 PM <DIR> Contacts
07/13/2009 10:34 PM <DIR> Desktop
03/07/2015 09:06 AM <DIR> Documents
07/13/2009 10:34 PM <DIR> Downloads
03/14/2015 06:10 PM <DIR> Dropbox
12/18/2012 05:35 PM <DIR> Favorites
11/30/2014 05:59 PM <DIR> Links
07/13/2009 10:34 PM <DIR> Music
03/14/2015 06:20 PM <DIR> parallax-scrape-master
07/13/2009 10:34 PM <DIR> Pictures
03/26/2014 12:56 PM 78 quartus2.ini
03/26/2014 12:59 PM 2,337 quartus2.qreg
12/18/2012 05:35 PM <DIR> Saved Games
12/18/2012 05:35 PM <DIR> Searches
07/13/2009 10:34 PM <DIR> Videos
03/07/2015 08:15 AM <DIR> Xilinx
2 File(s) 2,415 bytes
16 Dir(s) 16,943,493,120 bytes free

C:\Users\rapost>[highlight]cd parallax-scrape-master[/highlight]

C:\Users\rapost\parallax-scrape-master>[highlight]dir[/highlight]
Volume in drive C has no label.
Volume Serial Number is 4466-6E0A

Directory of C:\Users\rapost\parallax-scrape-master

03/14/2015 06:20 PM <DIR> .
03/14/2015 06:20 PM <DIR> ..
07/26/2013 05:51 PM 84 .gitignore
07/26/2013 05:51 PM 1,072 LICENSE
07/26/2013 05:51 PM 1,406 parallax-scrape-dom.js
07/26/2013 05:51 PM 13,490 parallax-scrape.js
07/26/2013 05:51 PM 1,445 README.md
5 File(s) 17,497 bytes
2 Dir(s) 16,943,493,120 bytes free

C:\Users\rapost\parallax-scrape-master>[highlight]npm install request[/highlight]
request@2.53.0 node_modules\request
├── caseless@0.9.0
├── json-stringify-safe@5.0.0
├── forever-agent@0.5.2
├── aws-sign2@0.5.0
├── stringstream@0.0.4
├── node-uuid@1.4.3
├── qs@2.3.3
├── oauth-sign@0.6.0
├── tunnel-agent@0.4.0
├── isstream@0.1.2
├── combined-stream@0.0.7 (delayed-stream@0.0.5)
├── mime-types@2.0.10 (mime-db@1.8.0)
├── form-data@0.2.0 (async@0.9.0)
├── http-signature@0.10.1 (assert-plus@0.1.5, asn1@0.1.11, ctype@0.5.3)
├── tough-cookie@0.12.1 (punycode@1.3.2)
├── hawk@2.3.1 (cryptiles@2.0.4, boom@2.6.1, hoek@2.11.1, sntp@1.0.9)
└── bl@0.9.4 (readable-stream@1.0.33)

C:\Users\rapost\parallax-scrape-master>[highlight]npm install htmlparser2[/highlight]
htmlparser2@3.8.2 node_modules\htmlparser2
├── domelementtype@1.3.0
├── entities@1.0.0
├── domhandler@2.3.0
├── domutils@1.5.1 (dom-serializer@0.1.0)
└── readable-stream@1.1.13 (isarray@0.0.1, inherits@2.0.1, string_decoder@0.10.3
1, core-util-is@1.0.1)

C:\Users\rapost\parallax-scrape-master>[highlight]npm install ent[/highlight]
ent@2.2.0 node_modules\ent

C:\Users\rapost\parallax-scrape-master>dir
Volume in drive C has no label.
Volume Serial Number is 4466-6E0A

Directory of C:\Users\rapost\parallax-scrape-master

03/14/2015 06:21 PM <DIR> .
03/14/2015 06:21 PM <DIR> ..
07/26/2013 05:51 PM 84 .gitignore
07/26/2013 05:51 PM 1,072 LICENSE
03/14/2015 06:22 PM <DIR> node_modules
07/26/2013 05:51 PM 1,406 parallax-scrape-dom.js
07/26/2013 05:51 PM 13,490 parallax-scrape.js
07/26/2013 05:51 PM 1,445 README.md
5 File(s) 17,497 bytes
3 Dir(s) 16,932,581,376 bytes free

C:\Users\rapost\[highlight]parallax-scrape-master>node parallax-scrape http://forums.parall
ax.com/showthread.php/157005-FYI-PropWare-Complete-build-system-and-library-for-
PropGCC 1 3 > thread.txt[/highlight]

C:\Users\rapost\parallax-scrape-master>[highlight]more thread.txt[/highlight]

blah
blah
blah........

I hate to say it's that simple but it really is that simple.

I think I've used it to grab three or four big threads. If it's used responsibly, I don't think it is a big burden on the server. If you are collecting a big thread, you can grab it once and then every so often, just go out and grab the new pages. It does allow you to specify a start and end page so you do not need to grab the entire thread every time a few pages are added.

twm47099 · 2015-03-14 16:43

Thanks, I would never have guessed how to do that.

I'm not at my computer now (using a tablet), but I'll try when I get home. Probably not a good idea to try to download the whole tachyon thread.☺

Thanks again.
Tom

Peter Jakacki · 2015-03-14 19:27

FWIW, I added the re-pagination add-on for FireFox and opened up the Tachyon thread which takes several minutes to read in all the posts as one great big web page which I then save. When I last did this back in November it resulted in a 17MB HTML page + 1.6M folder. Now, this add-on is real simple and it's real simple to use.

Heater. · 2015-03-14 20:04

That is a cool add on.

parallax-scrape is very specific, I wanted to fetch the thread as plain text, apply whatever formatting I desired and get all the attachments. It's in no way a general purpose tool.

Just for fun I tried it on the Tachyon thread.

I got nearly three megabytes of text file plus about 6 megs of attachments.

W9GFO · 2015-03-14 20:36

What would it take to copy the entire forum?

Years ago I was able to copy a website, choosing how many pages deep to go. It would put the website in a local folder and I could browse it offline. Can we do that with this forum?

twm47099 · 2015-03-14 20:36

Heater & Rick,

Thanks again. I was able to run parallax-scape. I had a few hiccups until I realized that the Dos window is limited to 6 character file and directory names (at least on my computer it is).

As far as instructions go, I think Rick's description is good for us novices (with the additional reminder about DOS file name length). Heater's description is good for those with a bit more experience, and obviously for those used to such things, as it is currently written works.

Tom

mindrobots · 2015-03-14 21:45

twm47099 wrote: »

I realized that the Dos window is limited to 6 character file and directory names (at least on my computer it is).

OK, now I'm curious. What are you running with such short file/dir names? I didn't think even DOS 1.0 was any less than 8.3 for files and 8 for directory names.

twm47099 · 2015-03-14 22:26

mindrobots wrote: »

OK, now I'm curious. What are you running with such short file/dir names? I didn't think even DOS 1.0 was any less than 8.3 for files and 8 for directory names.

Actually, it was 8 for the directory, and may have been 8 for the file name also, but I had some problems with a 7 character name. It worked once, but, the next time I used a 7 char file name as the destination file name, and tried to use the start-end page numbers the node command did nothing but drop me back at the DOS prompt. If I did not set the page numbers, or only set one page number it worked with the 7 character name, but only gave page 1 regardless of the page number in the command.

Maybe there was something else going on, but when I switched to a 6 char file name, I was able to use the start-end page numbers in the node command, and it worked. I don't think there were any typos in my commands, since I wrote them in a notebook file and used cut/paste to put them in the DOS command line.

I'm using a Windows 7 Net book. Unfortunately it is running the starter version of windows 7, and when I tried the built in upgrade, Microsoft had stopped activating that online. I don't know if that is the reason for the issue.

Tom

Heater. · 2015-03-14 22:33

twm47099,

This is not making any sense to me. Can you post those commands, one that works and one that does not, here? Then we can perhaps see exactly what is going on.

twm47099 · 2015-03-14 23:14

Heater,
I'll do that. It will have to wait until Sunday PM when I've got that computer.

Tom

twm47099 · 2015-03-15 11:33

Heater. wrote: »

twm47099,

This is not making any sense to me. Can you post those commands, one that works and one that does not, here? Then we can perhaps see exactly what is going on.

And for good reason - today I tried 8 different command lines with 8.3 file names:

node parallax-scrape http://forums.parallax.com/showthread.php/160376-Forum-is-broken-and-considering-a-fresh-install-your-input-requested! > eight001.txt

node parallax-scrape http://forums.parallax.com/showthread.php/160376-Forum-is-broken-and-considering-a-fresh-install-your-input-requested!/page2 > eight002.txt

node parallax-scrape http://forums.parallax.com/showthread.php/160376-Forum-is-broken-and-considering-a-fresh-install-your-input-requested! 4 6 > eight003.txt

node parallax-scrape http://forums.parallax.com/showthread.php/160376-Forum-is-broken-and-considering-a-fresh-install-your-input-requested! 2 2 > eight004.txt

And 4 the same but with 6.3 file names.

To make sure I didn't have any typos when entering the command, I put each command into its own bat file. Then at the DOS command prompt I typed the bat file name <enter>.

All 8 worked as expected. So yesterday, I must have screwed up something in the command. So going forward, I will be sure to put the command into a bat file and check before running it.

Properly embarrassed,
Tom

Tumbler · 2015-03-28 23:40

What would it take to copy the entire forum?
Years ago I was able to copy a website, choosing how many pages deep to go. It would put the website in a local folder and I could browse it offline. Can we do that with this forum?

Yeah, a copy of the mysql database will be fine

Forum is broken and considering a fresh install - your input requested!

Comments