Web scraping software
- Started
- Last post
- 7 Responses
- trooperbill0
winhttrack
- thoughtandtheory0
If you know rails:
http://railscasts.com/episodes/1…
- armsbottomer0
if you know a smidgen of python, you could check out beautiful soup http://www.crummy.com/software/B…. otherwise:
1) curl a page in php and parse through it with regular expressions in phpor
2)make an ajax request to a php script that curls a page and traverse the dom in js to get what you want.
you need to find persistent markup across the pages that you want to grab, but if its table data, the process is quite easy.
- Noggin0
Their was a guy at my old company who used some software to grab data from just the tables on a webpage, I think it then changed the url (page number parameter) and went thru the catalog.
- vaxorcist0
I'd have to know alot more about what you intend to scrape to know if it's likely to work in pre-existing web scraping systems.... some E-commerce systems are easy, some very hard, depending on alot of things, like search term results, spidering, and code variability....
Note that most web scraping tools may generate data much more complex than you want.... I've written web scraping code for people who have tried packaged solutions but found them generating lots of random stuff, not usefully organized data....
But.... sometimes it's more simple than you intend, sometimes harder... be aware that depending on other sites code may cause hiccups every so often when they change things with no notice...
- Noggin
Any recommendations? Its just for the data really (mainly product info/catalog) rather than the entire website.
Open source/free would be best but if theres a decent commercial thats worth it then thats cool.