How to efficiently scrape web data, or collect tons of tweets?

How to efficiently scrape web data, or collect tons of tweets?



-Python example
-Requesting and fetching the webpage into the code: httplib2 module
-Parsing the content and getting the necessary info: BeautifulSoup from bs4 package
-Twitter API: the Python wrapper for performing API requests. It handles all the OAuth and API queries in a single Python interface
-MongoDB as the database
-PyMongo: the Python wrapper for interacting with the MongoDB database
-Cronjobs: a time based scheduler in order to run scripts at specific intervals; allows to bypass the "rate limit exceed" error

Popular posts from this blog

After analyzing the model, your manager has informed that your regression model is suffering from multicollinearity. How would you check if he's true? Without losing any information, can you still build a better model?

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

What does Latency mean?