How to Scrape and Download Tweets for Free
Twitter has 187 Million monetizable monthly active users with the USA, Japan, and India being its largest user base. You can extract tweets data from Twitter profiles, hashtags, timelines for several use-cases:
- Brand Monitoring
- Predictive Analytics
- Competitor Tracking
- Sentiment Analysis
- Training ML Models
- Industry Trends Analysis
- Market Research
- Marketing Optimization
- New Product Innovation
There are several ways to extract data from Twitter:
- Using web scraping tool
- Using open-source scraping packages
- Using Twitter APIs
“Click and scrape” web scraping tools don’t require you to write any code, and thus it is the easiest way to scrape tweets. Open source scraping packages require you to know the native programming language. Plus, it is community-managed, so there is no guarantee of timely updates or bug fixes. The good thing about Twitter API is that it’s scalable and is from Twitter itself, but the dark aspect is that for scraping 5 million tweets, you need to pay $2.5k + developer salary + network resources. This would cost you around 10k+.
While using the click and scrape tool, you can save up to 97% compared to other methodologies, as the professional plan (scrape tweets at speed and scale) costs merely $200. Technically, you can save the entire 100%. But the free plan is not suggested for big data use cases. You can choose whatever suits your scraping budget & requirements.
In this insight, we shall demonstrate how to scrape tweets from Twitter for free using the Octoparse tool:
Method 1: Scrape Twitter using a pre-built tweet scraping template
Step 1: Download Octoparse. Login/register. Select +Task under Task Templates.
Step 2: Select the Twitter template under the Social Media tab.
Step 3: Set up a Twitter data scraping template. You’ve 5 different options:
- Tweets (URLs): Scrape the latest tweets based on URLs
- Top tweets(post only): Scrape tweet content, tweet ID, etc.,
- Author page (post only): Scrape tweets, likes from the Twitter account page.
- Author page(days before): Custom author page template
- Advanced Search: Scrape tweets and their content, likes, retweets, Twitter author ID, retweets, comments of the post, and comment details.
You can click on any template of your choice, view sample data, and click on use template to start scraping.
Enter a hashtag of your choice on the next screen, an input for the number of scrolls as Twitter has implemented infinite scroll. Only after scrolling new posts load. Next, click Save & Run.
Finally, click on Cloud Extraction, and it starts scraping tweets for the hashtag #fashion or whatever you gave as input.
Note: Using pre-built templates requires you to have a standard or professional plan, which is quite affordable.
Method 2: Building your own custom Twitter scraping template without any coding
Now, we shall build a custom tweet scraper for Elon Musk’s Tweets.
Hover over +New and select advanced mode. Then enter the target Twitter account URL in the website field. In our case, it is of Elon Musk. Click Save.
On the next screen, if the auto-detect feature is enabled by default, disable it by clicking on Turn OFF Auto-detect. So that we can customize fields.
Create pagination for scrolling feature by clicking in a blank area, and selecting Loop click single element
This creates a pagination loop.
Hover over to the “Click to Paginate” and click on the Gear icon to customize the scraper for infinite scroll. First, activate the Ajax load and give a timeout based on your network speed:
Also, for the “After the page is loaded”, Set the Scroll to “for one screen”. Give the value 5 to the “repeats” field, and set the wait time to 3s.
Why did we perform the above steps?
Twitter and many other social media websites have implemented infinite scroll features for boosting engagement and customer experience. So, setting the scroll repeats and wait time ensures new tweets are loaded by scrolling the screen and can be scraped dynamically.
Why did we not scroll first and then crawl all the tweets at once?
Because Twitter has implemented a website structure that hides previous tweets from the code and dynamically updates the HTML body to only present max 10 or 12 tweets at any time. This is the reason why despite 5 screen scrolls, we will still get a few duplicate tweets. But guess what? It’s easy to get rid of duplicates in Octoparse. We’ll show how in a few seconds.
Now, select multiple tweet’s top-level DIVs. It automatically selects 6 elements, and then in the Tips window click on Select all sub-elements. Octoparse intelligently selects relevant data points.
Delete unwanted fields and rename them. Observe Octoparse tool selects 15 fields for extracting data. We keep only 8 and delete the rest. Next, click on Extract data in the Tips dialog.
Yayyyy! Our Twitter tweet scraping template is ready! This is how the workflow looks :
One last thing, to make sure right tweet blocks are selected, let’s update the Xpath for Loop Item: //article[contains(@class,”1dbjc4n”)]
Also for demonstration, I’ve limited the number of paginations to 10. Of-course, you need to hover on The Pagination block and click on the gear icon.
That’s it. We’re all set to run the Twitter scraper. Click on the Save button at the top. Ideally, you should click on save every time you modify your workflow. Now, click on the Run button. You can run the tweets crawler on your device, in the cloud with an option to schedule the scrapers. We chose to run on the device:
This tweeter scraper will crawl the latest tweets from Elon Musk. Once the crawling is complete, you’ll see the following screen:
Observe that there are 12 duplicates out of 114 tweet data lines. When you click on Export data, the next screen presents you with an option to remove duplicates. Of course, we’re going to select that. Why care about duplicate data? Nobody wants it.
Now, having removed the duplicates and post scraping 100+ tweets in almost no time, let’s export it in XLS format. You can also scrape in JSON, CSV, and HTML. If you wish to export to a cloud database or your on-device database, that’s very much possible by clicking on “Export to database”. Press OK.
If you see the following screen (which you’ll certainly, if you’ve followed this tutorial properly), then congrats!! You’ve learned to scrape tweets for free.
Here’s how it looks in the Google Sheets:
In this tutorial on “how to scrape and download tweets for free”, we saw different ways in which businesses can harness social data and especially the tweets data. From brand monitoring to keeping an eye on competitors, and from training ML models to developing new products, social data can help transform your business if done right. For scraping tweets we touched on different methodologies like using open-source libraries, Twitter APIs, No-code scraping tools. And then demonstrated using my favorite and one of the most popular no-code web scraping tools i.e, Octoparse. We covered two ways to scrape tweets:
- Using pre-built templates, which hardly takes 20 seconds to start scraping tweets at scale and speed.
- Using custom-built tweet scraping workflow, which takes less than 3 minutes to set up and start scraping
For more resources on scraping, smash here.
Wish you a great time scraping Twitter!
- Best Data Science Tools: Automation, Analytics, and Visualisation - December 9, 2021
- Automate Job Feed Scraping & Posting To Scale-Up Your Business - September 29, 2021
- How To Develop And Grow Your Niche Job Board Aggregator Websites? - September 16, 2021