Using the Twitter API

What is it?

In this lesson, you will gain experience using simple commands to gain access to data via the Twitter API.

Why should I learn it?

Python is a server-side language that is used for data analysis as well as the basis for frameworks like Django. It is helpful to know more than one programming language, so you can choose the correct one for your needs.

What can I do with it?

Many of the techniques we have covered previously will be applicable in Python, things like strings, variables, data types, if statements and loops, but the syntax is different when applying these techniques in Python.

How long will it take?

Programming is something that you learn over time, through practice. But in a few hours, you will be introduced to the basic elements of Python and applications using it with APIs.

Getting Started with APIs

What is an API? An API or Application Programming Interface is something that a website or online service designs to make some of their data available to the public, for use in applications or other purposes. Most web services have an API – Facebook, Yelp, Twitter… they all have ways to access their public data. You usually have to do something to sign up for access to an API, and then you are granted access credentials to use in your code.

For the commands below that are run in the Terminal (on Mac) or other Command Line program, you do not type in the $. That just indicates command line instruction.

The tutorial has been updated for Python3. It provides some very basic ways to get Twitter data via the command line. More informaton about using this technique can be found on this article: How to Scrape Tweets With snscrape

Using the Twitter API

We are going to use the snscape library to scrape tweets from the Twitter API. You need to have Python3 version 3.8 or higher to use these techniques.

Use this command in the Terminal or Command Line to check your Python3 version:

$ python3 --version

If you don't have Python3 on your computer, you will need to install it.

You also need to have Git installed to do this tutorial. Check your git version

$ git --version

Now that you have Python3 and Git installed, you will run the following command to install the developer version of snscape.

$ pip install git+https://github.com/JustAnotherArchivist/snscrape.git

Scraping a User's Tweets

Use this command on the command line to scrape a user's tweets. The jsonl option provides the json format. The progress options provides progress on the command line as the scraping occurs. You can use the max-results option to specify the number of tweets you want.

$ snscrape --jsonl --progress --max-results 100 twitter-search "from:cindyroyal" > user-tweets.json

The result will be in the folder in which you executed this command, in a file named user-tweets.json. Of course, you can modify the script for the username and change the name of the json output file. See below for converting JSON to CSV to be used in a spreadsheet and further analysis.

Scraping a Term or Phrase in Tweets

Use this command on the command line to scrape a search term or hashtag.

$ snscrape --jsonl --progress --max-results 200 twitter-search "sxsw" > text-query-tweets.json

The result will be in the folder in which you executed this command, in a file named text-query-tweets.json. Of course, you can change the term searched and the name of the json output file. See below for converting JSON to CSV to be used in a spreadsheet and further analysis.

Scraping with Tweepy

Tweepy is a Python library for scraping Twitter. You can find the tutorial that these instructions are based on here.

In the Terminal, use pip3 to install Tweepy. (You should have pip3 installed from the previous tutorial.

$ pip3 install tweepy

Tweepy requires that you have Twitter Developer authentication credentials to include in your code. Set up a project and Twitter will provide you with a Consumer Key, Consumer Secret, Access Token and Access Token Secret. You will include your own as indicated in the code to the right.

This code also requires that you install pandas, a code library that provides data analysis tools for the Python programming language, which provides the formatting for your dataframe.

$ pip3 install pandas

The code samples to the right demonstration how to use Tweepy to scrape a user's tweets and for scraping a search term. Put the code for each example into a text file with a .py extension (don't name it tweepy.py) and run the file in the Terminal, like so.

$ python3 mytweep.py

The result will be in the folder in which you executed this command, in a file with the name indicated in the tweets_df.to_csv line. The file will be in csv format which can be opened with a spreadsheet program.

Code Sample - Scrape a User's Tweets with Tweepy

Code Sample - Scrape a Search Term with Tweepy

Other Tools

JSON to CSV Converter - use this script to convert the JSON created above into a csv. Download the csv and open in it in a spreadsheet program. You can then isolate the text field and use the tools below for further analysis.

Tag Crowd - The Tag Crowd site makes quick word visualizations. It’s easy to use and flexible. Go to tagcrowd.com and insert your text. Then run the visualization. You may have to exclude common words before the visualization is meaningful. I like to show frequencies and display 100 words maximum. Play with the settings to get the right visualization for your topic.

Word Frequency - One of the things you might want to do is run a word frequency script to determine which words are used the most. This is similar to what sites like TagCrowd.com do when they want to visualize terms in a word cloud. But you can use a Python script to get word counts. You might want to use this in some manner in your analysis, like using the data in a chart on your site.

Find the script to the right, put it in a text file and name it wordfreq.py. Be careful to retain the spacing.

Copy the text you want to analyze and put it in a txt file in the same folder you are in with your script. Run this script in the Terminal with Python. It will ask for an input file (the txt file that includes your text) and an output file (what you want to name the file that will include the word counts. Give it a .txt extension).

$ python3 wordfreq.py

You can then open the file in spreadsheet program and sort the frequencies. Once you get past the common words like a, an, the, you will start to see meaningful words used in the text.

Code Sample - Word Frequency Script

Moving On

Now you have a basic understanding of API concepts. Practice using these techniques and do some research on how to use other APIs.

© 2021 CodeActually · clroyal@gmail.com