How to Crawl Google Search Results
Google, Bing, Baidu, and Youtube are the most popular search engines on the web today. For marketers, and web scrapers these are mouthwatering sources for big data. Esp Google, 12% of entire internet traffic goes through Google. It has several products which are scrape-worthy. For example, you can scrape
- Youtube for machine learning (NLP),
- Google Maps for location-specific directory data,
- Google Images for building image classification models,
- Google News for tracking trends, or
- scrape Google Trends itself for most trending stuff, and of course, you can scrape
- Google search for SEO, ads optimization, competitor tracking, etcetera.
Why scrape Google Search results (SERPs)?
Google has done a stellar job at developing algorithms with a weightage on search keywords and search intent. Google Search efficiently matches a web explorer’s search queries to the right web pages on the internet. This creates an opportunity for marketers to scrape Google results and stay on top of
- what the world is talking about,
- what the competitors are doing, and
- how their own search-engine focused marketing efforts are performing.
Use-cases of scraping Google Search results
For search engine result page (SERP) results, everything starts from a keyword. It can be the name of an industry, a kind of service, or a product. Octoparse users crawl SERPs for
- Competitor tracking and analysis
You can scrape SERP results, including paid ads and organic results. This data helps you build a database of competition-relevant data like new product launches, PR efforts, organic news, market segments targeted by competitors, etcetera. Also, it helps you analyze your performance for particular keywords.
- Sentiment analysis
Scraping SERP results, esp blog post results from influencers, news coverage by media houses, etcetera can help you analyze sentiments. For example, a marketing and analytics company can scrape posts related to a political leader and build neural networks-based models to analyze public sentiments.
- Business research & lead generation
If you’re starting a new business, or if you’re exploring new domains to expand your existing business then you can collect a set of companies who are already pursuing your idea, research their products, and so on. For example, you want to start a solar panel retail business in your hometown “XYZ”, you can scrape search results for “buy solar panels in XYZ”. You’ll get a list of businesses (if they exist) in URL format. This very method can also be used for lead generation. Say, you want to sell website building services for Solar panel sellers in your region.
Demo: Scraping SERPs for SEO benefits
We’ll now demonstrate scraping Google SERPs with an example. Suppose you’re into health blogging, previously you have written about diseases about Eyes, Lungs, Kidneys, and now you want to write about heart diseases too. So, you can scrape Healthline for all their “Heart-Disease” articles.
The search query we use “site:https://www.healthline.com/health/heart-disease”. Notice the “site:” syntax we used. This is advanced google search syntax to list results from only selected domains.
We can scrape the following from SERP:
The title could be very important to gain insights like the tonality of your competitor’s title. The title is said to set the mood of the reader, so analyzing competitor titles can help you write your own titles effectively.
URLs affect SEO too. Keywords can be included in URLs. So, you can also analyze the URL strategy of your competitors
Meta-description is a small piece of text visible under the BLUE LINKS on SERPs
Meta-keywords define what the article is about but as this was being misused by many publishers, google stopped giving it any priority for SEO. So, we’ll avoid this for now.
How to build a Google crawler with Octoparse
There are several tutorials on the web explaining how to scrape Google using Python and related technologies. Using these technologies often requires programming knowledge. But there are easier ways to scrape the web. For example, a point-and-click solution to scrape the web. We want to give our audience an easy way out using a web scraping tool that requires no coding knowledge. The only thing you need to do is to enter the keyword and Octoparse will reach the search result page and scrape all the listing results for you.
Download Octoparse and install it on your computer. Register if you haven’t got an account and then log in to data fields.
Octoparse allows you to scrape google search results in 3 ways:
Method 1: Using pre-built templates
#Step 1: Open the Google task template by clicking on the “+New” button and then on “Task Template”.
Enter keywords you want to scrape from Google results.
#Step 3: Lay back and let Octoparse do the job for you.
To export data while scraping using pre-built templates, you must use a subscription package. By the way, here is a sneak peek into the scraped data.
Method 2: Using Advanced Template and Auto-detection Feature
Step 1: Create a new template using Advanced Mode
Step 2: Enter the starting website. And click Save.
Step 3: Select Auto-detect web page data on the Tips menu.
Wait until auto-detect finishes page-processing.
When Octoparse processes the entire page, it detects and presents you with multiple results. If the default result is not what you’re looking for then you can click on Switch auto-detect results on the Action Tips menu to choose other patterns. In our case, the default result is the required result, so we create the workflow by clicking on Create workflow. Observe, the data points are highlighted in a light red shed with a dark boundary.
Here’s how the Final workflow looks with pagination.
There were some unwanted data fields that we deleted, and kept only three data fields:
Title, URL, and Meta-description. Finally, we click on Save at the top and Run the task to scrape Google search results.
Now, run the task on your device and wait until it is finished. Subscribe to Octoparse plans if you want to scrape at scale. The benefits of a premium scraping plan are
- Faster scraping speed,
- Cloud extraction
- Scheduled scraping, and
- Overcoming anti-scraping technologies using IP Proxy and User Agents rotation.
Observe how we scraped all 76 Google SERPs in mere 38 seconds.
Let’s take a look at the data in Google Sheets:
Method 3: Custom template building using Advanced Mode
The above two methods were pretty easy. Using these two methods you can start scraping in less than a minute. But if for some reason these fail to suit your needs, you can always build a customized Google SERP scraper using the third and final methodology. Refer to this tutorial to build complex templates for scraping Google SERP results.
Extracting data from SERPs could be very useful to businesses. In this tutorial, we explained different use cases of scraping Google. We also explored three different ways to use a market-leading no-code scraping solution Octoparse for scraping Google SERPs. If you’re scraping at scale, you can use Octoparse Enterprise Services to outsource your scraping requirements at an affordable budget.
- Best Data Science Tools: Automation, Analytics, and Visualisation - December 9, 2021
- Automate Job Feed Scraping & Posting To Scale-Up Your Business - September 29, 2021
- How To Develop And Grow Your Niche Job Board Aggregator Websites? - September 16, 2021