How to Scrape Amazon Products +(ASINs)
If information is power, price information may make you a retail powerhouse.
Research from Syracuse University found that customers are more aware of external prices. That being said, keeping an eye on your competitor’s pricing strategy will further boost customer loyalty.
To collect the pricing information, web scraping is an ideal solution. Having said that, here’s how the article is structured:
- Use-Cases Of Scraping Ecommerce Product Data
- Ways To Scrape Amazon Products?
- How To Scrape Amazon Products Using Octoparse?
- What is Amazon ASINs
- Scraping ASINs Of Amazon Products alongside other useful data
- Conclusion – Actionable Next Steps
Use-Cases Of Scraping eCommerce Product Data
- Competition mapping and SWOT analysis of your own as well as your competitor’s business by analyzing the number of products available, delivery, pricing, reviews, etc.
- Competitor’s price monitoring for intelligently pricing your products
- Sentiment analysis to understand which products might work and which won’t
- Predictive analytics to support the development of new products, pricing it right, and marketing in the right geographies.
Ways To Scrape Amazon eCommerce Product Data
You can scrape amazon.com in three ways –
- Manual Scraping
Great strategy, if the number of products to be scraped is limited to 10, 20, or 50. Not recommended for anything beyond 40-50 products. It will be a very time-intensive and cost-intensive deal.
- Crawling using generic scripts
Good scraping strategy when you’re crawling thousands of websites and have low data consistency requirements. You need to have expertise in python, JS, or a crawling programming language.
- SaaS Scraping Tools
The recommended strategy for scraping eCommerce websites like Amazon, Walmart, Macy’s, Alipay, eBay, or any other website. Why? These are
- Easy to set up.
- Easy to start.
How To Scrape Amazon Product Data
We shall now scrape Amazon products data using Octoparse, which is a GUI SaaS-based tool to scrape any website. Yes, you can scrape websites built using dynamic JS and AJAX too. But before getting into that – let’s understand what Amazon ASIN is, and why it is important.
What is an Amazon ASIN?
ASIN is Amazon’s version of UPC, EAN, or ISBN. ASIN is an acronym for Amazon Standard Identification Number. It’s a 10-character alphanumeric code to identify each product uniquely on Amazon. For books, the ASIN is the same as the ISBN.
Why must you scrape ASINs?
It can be a gamechanger if you’re building
- your Amazon FBA empire, or
- your price comparison website, or
- ML models to perform predictive analytics on Amazon’s data
How can it be a game-changer?
ASINs from Amazon can help you gather data across best-performing products, daily estimates of sales and revenue for unique products, identify competitors or related products by leveraging keywords, and product data information for that product. Some ASIN SaaS research providers even claim to serve you with the most wished products. All this could be crucial for the successful execution of your retail strategy [use-case].
Amazon ASIN Grabber
How do I find the Amazon ASIN?
There are two ways you can find the ASIN of a certain product.
- Product Description
- Product Page URL
Free Amazon ASIN grabber
Octoparse is a web scraping tool for non-coders to easily grab web data at scale. If you are looking to get ASIN data at scale, there are two ways you can make it through Octoparse.
1. Pre-built ASIN scraper (Template Mode)
Many scrapers are built for users to scrape data without coding and they are called “Templates” on Octoparse. When you enter the software, click into the task template section and choose the Amazon ASIN grabber.
The template DOES NOT name “Amazon ASIN grabber” but “URLs Amazon” because using this Amazon scraper, you are getting not only ASIN but also product data like title, price, and star ratings.
By entering keywords into the scraper, you can grab the AMAZON ASIN of products under certain queries. I just scraped 140 Amazon product ASINs within a minute using templates. Try it yourself!
How to use templates: step by step guide
Have problems: contact our support for help
2 Building ASIN scraper (Advanced Mode)
If you like challenges and DIY, the advanced mode is a place you can play around. Unlike pre-built scrapers in template mode, an advanced mode is a place where you can build your own crawlers according to your customized needs.
The key to successfully build your ASIN scraper is to locate accurately where the data is and ask the robot to grab it (where to find ASIN, link). Don’t worry, Octoparse has translated the programming process to a point-and-click User Interface that everyone can work on.
First, we shall scrape Amazon products data, and then I shall show you how to grab Amazon ASIN by trimming the product page URL. Here we go,
Step 1: Launch Your Octoparse Instance, Log in & click on Advanced Mode “Task”.
Step 2: Enter the URL.
In our case –
Step 3: Scrape The Product Details.
Note 1: If you get a captcha. Switch to browser mode as shown below, solve it, and then switch back by turning it off. Browser mode is also useful to obtain login cookies etcetera. If the captcha doesn’t surface, skip switching to browser mode. And continue,
Note 2: Turn on the “workflow mode” to keep an eye on your actions.
- Let’s create the template for custom scraping. You’ll see the above image on the top half of your computer screen, and the below image in the remaining half.
- Scroll to the bottom of the page and find the “next” pagination button.
- Click on it. You will see the following “Action Tips” screen –
- Click on “Loop click the selected link”. The workflow now looks like –
- Now, perform the following steps :
- Click on the “Go To Web Page” element of the workflow. This will take you back to the search results on the first page, exactly where we started.
- Click the space within the Pagination Box.
- Next, click the product listing name on the Amazon product search query result page. It should automatically select all the listed products on the page. Observe, the first product which you clicked is in the green selection while the remaining product names are in red. This is because Octoparse detected other listings as product names but is waiting for you to validate them. How? Click on “Select all” in the “action tips” pop-up.
- As soon as you click “Select all”. “Action tips” screen updates to as following –
- Now, if we only needed names of products, this would have been sufficient. But we need product details, as well as Amazon product ASINs. So, we’ll continue by clicking on “Loop click each element” in the “action tips”. Why? This instructs the amazon scraper to visit each link in sequence and open the product page so that we can scrape amazon product data.
- At this stage, Our workflow looks like this –
- So, we’re on the product page now! And we shall scrape product data along with ASINs. For aesthetic brevity, I’ll only choose major data points like product name, URL, title, price, description. If you want reviews too, refer to this article where you discover ways to scrape amazon product reviews.
- Let’s get started with scraping Amazon’s product data.
- Observe the data points we selected. Additionally, We also selected Amazon ASIN data –
- Now we rename the data fields to suit our requirements and then click on “Extract data”:
- We are almost done with our Amazon Scraper. A few adjustments are required but this is how the workflow looks after performing the above steps –
- Now the ASIN we extracted might not be at the same place every time. And so the data could be inconsistent. To sort it out, we modify the Xpath for the ASIN.
Note: You can custom modify Xpaths for any data point to boost the consistency of the scraped data when scraping Amazon products at scale.
- Click on “ASIN”
- Then On the Edit icon i.e., a pencil icon
- You get the following screen –
- Now insert the following Xpath into the “Matching XPath” field to locate ASIN. Press OK.
- This is one way to get ASIN from product information data. I’ll share another way. This time by extracting ASIN from the product page URL.
- Extracting Amazon ASIN from the product page URL
- Add predefined data i.e, URL data to our existing extracted data set. Then click on edit data like we did earlier. Select “Refine Extracted Data” –
2. Click on Add Step:
3. Click on “match with regular expression” –
4. Now extract ASIN from URL using RegExp
5. Observe, we used (?<=/dp/)(+?)(?=/) as our expression to get the ASIN.
6. Press OK on the next screen.
Step 4: Save the workflow and start the extraction to extract the amazon product data with ASIN.
- You can extract locally, on the cloud, or create an API. For demonstration, we’re doing local extraction. Click on “Local Extraction”. And it’ll start extracting Amazon product data.
- The result looks very clean and consistent –
Note: The products with blank price cells are the ones that are currently not available. This is how I scraped 120 items using the free version of Octoparse. Try it out. Getting started, setting up, and extracting took hardly 4-5 minutes.
Resources to product research using ASINs
Finally, we’re done scraping amazon. Phew!! You can build highly scalable eCommerce product scrapers using Octoparse. It’s a flexible, on-demand, SaaS-based tool and doesn’t require you to know any programming language. Simply click and extract. You have a free version to get your hands dirty. For enterprise needs, contact us to nail your pricing and retail execution.
Happy Scraping with Octoparse 😉
- Best Data Science Tools: Automation, Analytics, and Visualisation - December 9, 2021
- Automate Job Feed Scraping & Posting To Scale-Up Your Business - September 29, 2021
- How To Develop And Grow Your Niche Job Board Aggregator Websites? - September 16, 2021