I never claimed it was a general migration utility, just a nice program to grab a text version of threads having a personal interest. Nothing more, nothing less.
What surprises me, I suppose, is the number of Parallax customers that complain about the company's slow adoption of new ideas, like open source or a mainstream language like C, yet cry when the change actually comes. They want all the benefits of keeping up with trends and standards, but with no inconvenience to themselves.
Are you sure these are actual same customers in each case?
I won't miss the ability to post to years-old threads.
Now this is interesting. I recently got an email notification that someone had replied to a post I made on an electronics forum so many years ago that I had forgotten all about the post and the forum!
An interesting conversation ensued.
Who is to say when a topic is "passed it's sell buy date"?
Parallax is dedicated to the idea that it's chips and other products have a long life span. Ergo so should any discussion forum about those products.
Heater. came up with parallax-scrape a while back and put it out on his Github repository. It's a little Node program that worked quite well at the time to generate a text file with everything from a thread. The only reason I say "worked" is the forums upgrade/downgrade may have changed some of the HTML markers it uses to find things. It's worth a try on some of your favorite big threads.
I'm sure he'll be glad to see it resurrected!!
Question from a windows user --
I downloaded a d installed node and downloaded parallax-scrape. How do I actually run it?
I'm not sure how server-intensive the scrape program is, but I suspect that if a lot of people were using it, it might cause a slow-down.
-Phil
A clearer version of my question might be: I have no idea how to use Parallax - scrape from my Windows computer (and not much about using git-hub and even less about js.node). What do I enter and where do I enter it. Step by step would be appreciated.
For example in the install section of the readme, it shows 4 "node.js" modules that are required. I have no idea what that means. I did try entering them (copy/paste) at the > prompt, but just got error messages like I noted above.
I don't like asking open ended questions when I'm not even sure what to ask. But I would like to be able to use parallax-scrape.
Thanks
Tom
Let me try and find time on my Windows VM and test all the steps and write them down. It should identical steps regardless of OS but let md try it on windows.
I'm not sure how server-intensive the scrape program is, but I suspect that if a lot of people were using it, it might cause a slow-down.
-Phil
You of course are correct and raise legitimate concern. It is the equivalent of opening up the first page of a thread and clicking the next page numbers at the top/bottom of the page as soon as the page is displayed.
I have no idea how much load that actually places on the server.
1) Go to nodejs.org and install node
2) Go to Github Repository and download the zip (easiest way if you are not familiar with Git)
3) unzip it to some directory
4) Open up a command prompt
5) CD to that directory
6) Follow the steps in the readme.md file displayed on the github repository
You should end up with something like below:
C:\Users\rapost>node -v
v0.12.0
C:\Users\rapost>npm -v
2.5.1
C:\Users\rapost>dir
Volume in drive C has no label.
Volume Serial Number is 4466-6E0A
I hate to say it's that simple but it really is that simple.
I think I've used it to grab three or four big threads. If it's used responsibly, I don't think it is a big burden on the server. If you are collecting a big thread, you can grab it once and then every so often, just go out and grab the new pages. It does allow you to specify a start and end page so you do not need to grab the entire thread every time a few pages are added.
FWIW, I added the re-pagination add-on for FireFox and opened up the Tachyon thread which takes several minutes to read in all the posts as one great big web page which I then save. When I last did this back in November it resulted in a 17MB HTML page + 1.6M folder. Now, this add-on is real simple and it's real simple to use.
parallax-scrape is very specific, I wanted to fetch the thread as plain text, apply whatever formatting I desired and get all the attachments. It's in no way a general purpose tool.
Just for fun I tried it on the Tachyon thread.
I got nearly three megabytes of text file plus about 6 megs of attachments.
Years ago I was able to copy a website, choosing how many pages deep to go. It would put the website in a local folder and I could browse it offline. Can we do that with this forum?
Thanks again. I was able to run parallax-scape. I had a few hiccups until I realized that the Dos window is limited to 6 character file and directory names (at least on my computer it is).
As far as instructions go, I think Rick's description is good for us novices (with the additional reminder about DOS file name length). Heater's description is good for those with a bit more experience, and obviously for those used to such things, as it is currently written works.
I realized that the Dos window is limited to 6 character file and directory names (at least on my computer it is).
OK, now I'm curious. What are you running with such short file/dir names? I didn't think even DOS 1.0 was any less than 8.3 for files and 8 for directory names.
OK, now I'm curious. What are you running with such short file/dir names? I didn't think even DOS 1.0 was any less than 8.3 for files and 8 for directory names.
Actually, it was 8 for the directory, and may have been 8 for the file name also, but I had some problems with a 7 character name. It worked once, but, the next time I used a 7 char file name as the destination file name, and tried to use the start-end page numbers the node command did nothing but drop me back at the DOS prompt. If I did not set the page numbers, or only set one page number it worked with the 7 character name, but only gave page 1 regardless of the page number in the command.
Maybe there was something else going on, but when I switched to a 6 char file name, I was able to use the start-end page numbers in the node command, and it worked. I don't think there were any typos in my commands, since I wrote them in a notebook file and used cut/paste to put them in the DOS command line.
I'm using a Windows 7 Net book. Unfortunately it is running the starter version of windows 7, and when I tried the built in upgrade, Microsoft had stopped activating that online. I don't know if that is the reason for the issue.
This is not making any sense to me. Can you post those commands, one that works and one that does not, here? Then we can perhaps see exactly what is going on.
This is not making any sense to me. Can you post those commands, one that works and one that does not, here? Then we can perhaps see exactly what is going on.
And for good reason - today I tried 8 different command lines with 8.3 file names:
To make sure I didn't have any typos when entering the command, I put each command into its own bat file. Then at the DOS command prompt I typed the bat file name <enter>.
All 8 worked as expected. So yesterday, I must have screwed up something in the command. So going forward, I will be sure to put the command into a bat file and check before running it.
What would it take to copy the entire forum?
Years ago I was able to copy a website, choosing how many pages deep to go. It would put the website in a local folder and I could browse it offline. Can we do that with this forum?
Comments
(It is pretty cool, though!! :0) )
An interesting conversation ensued.
Who is to say when a topic is "passed it's sell buy date"?
Parallax is dedicated to the idea that it's chips and other products have a long life span. Ergo so should any discussion forum about those products.
I recently looked into this and was amazed there are so many forum engines out there:
http://en.wikipedia.org/wiki/Comparison_of_Internet_forum_software
What actually are you looking for?
I'm really happy that parallax-scrape works for you.
I just want to make sure everyone knows it only works for this forum version, it's basically a special purpose tool or a toy.
But hey, perhaps it can be the basis of another hack to get whatever someone wants from whatever other forum.
Question from a windows user --
I downloaded a d installed node and downloaded parallax-scrape. How do I actually run it?
The github page shows:
I ran node from the command line, but when I copy/paste the above into the window at the > prompt, I get a lot of errors starting with.
followed by lots of other messages -- I can't cut and paste them unfortunately.
I also tried different versions of the above (deleting the $, deleting "$ node" nothing worked.
I'm using windows 7.
I'd apprciate any help.
thanks
Tom
-Phil
A clearer version of my question might be: I have no idea how to use Parallax - scrape from my Windows computer (and not much about using git-hub and even less about js.node). What do I enter and where do I enter it. Step by step would be appreciated.
For example in the install section of the readme, it shows 4 "node.js" modules that are required. I have no idea what that means. I did try entering them (copy/paste) at the > prompt, but just got error messages like I noted above.
I don't like asking open ended questions when I'm not even sure what to ask. But I would like to be able to use parallax-scrape.
Thanks
Tom
Let me try and find time on my Windows VM and test all the steps and write them down. It should identical steps regardless of OS but let md try it on windows.
You of course are correct and raise legitimate concern. It is the equivalent of opening up the first page of a thread and clicking the next page numbers at the top/bottom of the page as soon as the page is displayed.
I have no idea how much load that actually places on the server.
If you have node installed you are so very close to having parallax-scrape running.
In the parallax-scrape instructions where it says, for example: What it means is type "cd parallax-scrape" into your DOS Box, or whatever they call the command line on Windows now a days. Then hit the return key.
Basically the "$" there is just indicative of the command line prompt and is not part of what you type in.
You need to type in the commands "npm install XXXXXXX" so as to get some modules needed by paralax- scrape installed.
The the final command is just to run node and have it run parallax-scrape. Just type "node paralax-scrape url".
If you can suggest a clearer way to phrase those instructions I'll think about amending the page.
Oh yeah, parallax-scrape will hit the server as fast as it can.
Perhaps that's not polite but I was never expecting to run it very often.
Anyway it's no worse than doing a recursive wget on the entire forum which anyone can do at any time they like already.
I just went through all the steps on my Win7 VM.
1) Go to nodejs.org and install node
2) Go to Github Repository and download the zip (easiest way if you are not familiar with Git)
3) unzip it to some directory
4) Open up a command prompt
5) CD to that directory
6) Follow the steps in the readme.md file displayed on the github repository
You should end up with something like below:
I hate to say it's that simple but it really is that simple.
I think I've used it to grab three or four big threads. If it's used responsibly, I don't think it is a big burden on the server. If you are collecting a big thread, you can grab it once and then every so often, just go out and grab the new pages. It does allow you to specify a start and end page so you do not need to grab the entire thread every time a few pages are added.
I'm not at my computer now (using a tablet), but I'll try when I get home. Probably not a good idea to try to download the whole tachyon thread.☺
Thanks again.
Tom
parallax-scrape is very specific, I wanted to fetch the thread as plain text, apply whatever formatting I desired and get all the attachments. It's in no way a general purpose tool.
Just for fun I tried it on the Tachyon thread.
I got nearly three megabytes of text file plus about 6 megs of attachments.
Years ago I was able to copy a website, choosing how many pages deep to go. It would put the website in a local folder and I could browse it offline. Can we do that with this forum?
Thanks again. I was able to run parallax-scape. I had a few hiccups until I realized that the Dos window is limited to 6 character file and directory names (at least on my computer it is).
As far as instructions go, I think Rick's description is good for us novices (with the additional reminder about DOS file name length). Heater's description is good for those with a bit more experience, and obviously for those used to such things, as it is currently written works.
Tom
OK, now I'm curious. What are you running with such short file/dir names? I didn't think even DOS 1.0 was any less than 8.3 for files and 8 for directory names.
Actually, it was 8 for the directory, and may have been 8 for the file name also, but I had some problems with a 7 character name. It worked once, but, the next time I used a 7 char file name as the destination file name, and tried to use the start-end page numbers the node command did nothing but drop me back at the DOS prompt. If I did not set the page numbers, or only set one page number it worked with the 7 character name, but only gave page 1 regardless of the page number in the command.
Maybe there was something else going on, but when I switched to a 6 char file name, I was able to use the start-end page numbers in the node command, and it worked. I don't think there were any typos in my commands, since I wrote them in a notebook file and used cut/paste to put them in the DOS command line.
I'm using a Windows 7 Net book. Unfortunately it is running the starter version of windows 7, and when I tried the built in upgrade, Microsoft had stopped activating that online. I don't know if that is the reason for the issue.
Tom
This is not making any sense to me. Can you post those commands, one that works and one that does not, here? Then we can perhaps see exactly what is going on.
I'll do that. It will have to wait until Sunday PM when I've got that computer.
Tom
And for good reason - today I tried 8 different command lines with 8.3 file names:
And 4 the same but with 6.3 file names.
To make sure I didn't have any typos when entering the command, I put each command into its own bat file. Then at the DOS command prompt I typed the bat file name <enter>.
All 8 worked as expected. So yesterday, I must have screwed up something in the command. So going forward, I will be sure to put the command into a bat file and check before running it.
Properly embarrassed,
Tom
Yeah, a copy of the mysql database will be fine