How to Scrape Tweets from Twitter using Python
How to Scrape Tweets from Twitter using Python
Send download link to:
Twitter is a great source to get publicaly available realtime data on most trending topics in the world and also the user data. It is very easy to scrape tweets from Twitter API keys.
In one of our previous tutorials we learnt how to create twitter API keys. In this tutorial we will use those keys to scrape tweets from twitter.
This tutorial will show how to scrape tweets related to COVID 19 from twitter using twitter API’s. Twitter has user-friendly API ‘s to easily access publically available data.
Now that you have API Key, SecretAPIkey, Accesstoken and secret access token save them in a text file to access them securely. To access the keys use ConfigParser:
import configparser config = configparser.RawConfigParser() config.read(filenames = '/path/twitter.txt')
This will create an object(config) and read the keys securely. This is important as we do not want to expose our keys to others.
We will be using Tweepy library to extract data from twitter. Read more about Tweepy here link.
Install and import tweepy:
!pip install tweepy import tweepy as tw
Now we need to access our API keys from config object. We can do that by using .get method:
accesstoken = config.get('twitter','accesstoken') accesstokensecret = config.get('twitter','accesstokensecret') apikey = config.get('twitter','apikey') apisecretkey = config.get('twitter','apisecretkey')
Next step is do authantication using OAuthHandler:
auth = tw.OAuthHandler(apikey,apisecretkey) auth.set_access_token(accesstoken,accesstokensecret) api = tw.API(auth,wait_on_rate_limit=True)
Now we have successfully authenticated and connected with twitter using API’s.
Next step is to define the serach word (twitter hashtag) and date from which we want to scrape tweets from:
search_word = '#coronavirus' date_since = '2020-05-28'
To scrape tweets create a tweepy cursor ItemIterator object and add parameters i.e api object, search word, date since, langauage etc.
tweets = tw.Cursor(api.search,q = search_word, lang ='en',since = date_since).items(1000)
Now we hvae got the tweets related to Coronavirus in tweets object. To get the details of these tweets we will write a for loop and grab details like geo, tweet text, user name, user location. To read more follow this link
tweet_details = [[tweet.geo,tweet.text,tweet.user.screen_name,tweet.user.location]for tweet in tweets]
Output:
That is all on how to scrape tweets from twitter.
More about Twitter scraper that can scrape Twitter data, tweet, followers. Also, you can download the sample data.