How to efficiently scrape web data, or collect tons of tweets?
How to efficiently scrape web data, or collect tons of tweets?
-Python example
-Requesting and fetching the webpage into the code: httplib2 module
-Parsing the content and getting the necessary info: BeautifulSoup from bs4 package
-Twitter API: the Python wrapper for performing API requests. It handles all the OAuth and API queries in a single Python interface
-MongoDB as the database
-PyMongo: the Python wrapper for interacting with the MongoDB database
-Cronjobs: a time based scheduler in order to run scripts at specific intervals; allows to bypass the "rate limit exceed" error