Acquiring Social Media Data

What is an API?

– API stand for Application Programming Interface

– allows software to interact with a website

– API calls consist of requests and response of structured data

 

Example: Twitter

– Collect tweets:

1) User timeline: GET statuses/user_timeline; gets most recent tweets posted by a year, limited to last 3,200 tweets, returns 200 at a time, so must page, rate limit: 900 tweets per 15 minutes. Examples: Collect individual news organizations, individual members of congress.

2) Search: GET search/tweets; search recent tweets (sample of tweets from last 7 days), returns up to 100 at a time, so must page, not the same as search on Twitter website, rate limit: 180 tweets per 15 minutes. Example: Get tweets from an event.

3) Filter Stream: POST statuses/filter; Real-time filtering of all public tweets; continue to receive additional tweets over a single call to API. (No paging.) Limits: when high volume, will not receive all tweets. One stream at a time per set of credentials. Example: Women’s March.

– You can never assume that you have all the data.

– Resources for Twitter data: DocNow (Tweet Catalog); TweetSets

-According to Twitter’s terms, you cannot share the complete tweets, you have to share the tweet IDs.

-Once a tweet has been deleted, it cannot be shared.

-How do you collect twitter data?

Twarc: github.com/docnow/twarc

Twurl: github.com/twitter.twurl

Social Feed Manager: go.gwu.edu/sfmgw

Tags: tags.hawksey.info

 

Example: Facebook

– Graph API Explorer

– JSON

– collect by node.

– can only collect public pages