How to Scrape Walmart Reviews using Python
How to Scrape Walmart Reviews using Python
Send download link to:
In one of our previous tutorial we learn about scrape product data from Walmart like product price, name, ratings etc. In this tutorial we will learn how to scrape product reviews i.e. reviews given to a product by its buyers. So learn about Scrape Walmart Reviews.
There are a lot of insights that can be drawn from these reviews as they give the first hand data of a diverse set of customers about their likes, dislikes, what features they expect in a product, how sensitive they are to price etc. Knowing all this information helps the producersto make changes in their offerings to increase their sales. It is much easier to use these reviews and a do a complete data analysis on it than doing a full-fledged market research especially for small producer who can’t spend a lot of money on research.
Scraping these reviews can be tedious task if not planned well because generally for a popular product these reviews runs into thousands of pages.
In this tutorial we will create a script which can be used to scrape walmart reviews of any product from Walmart buy just making a few changes. First we will create a search query and by changing this search query we can go to different products and scrape their reviews. We will go to https://www.walmart.com/search/?query=hand-soap and scrape the reviews given to various hand soaps.
Let’s get to the code:
Import Libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup
Define search query, you can change this to scrape data for any other products:
search_query="hand-soap"
base_url=https://www.walmart.com/search/?query=
url=base_url+search_query
Output:
Define headers:
header={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36','referer':'https://www.amazon.com/s?k=nike+shoes+men&crid=28WRS5SFLWWZ6&sprefix=nike%2Caps%2C357&ref=nb_sb_ss_organic-diversity_2_4'}
Send get request:
search_response=requests.get(url,headers=header)
Create a function to get the content of the page of required query: This function will send a get request to the url and grab the html content.
cookie={} # insert request cookies within{}
def getWalmartSearch(search_query):
url="https://www.walmart.com/search/?query="+search_query
print(url)
page=requests.get(url,headers=header)
if page.status_code==200:
return page
else:
return "Error"
Extract data id of every product: On Walmart website every product has a unique identification number assigned called data id. By finding these id’s and adding it to the base url we can go the individual product page and from there we can see all reviews and grab it. So let’s see where this data id is:
In above image we can clearly see data id, same way we can find data id for any product on Walmart.
data_id=[]
response=getWalmartSearch('hand-soap')
soup=BeautifulSoup(response.content)
for i in soup.findAll("div",{'class':"search-result-gridview-item-wrapper"}):
data_id.append(i.a['href'].split('/')[-1])
Now that we have data id we can go to individual product pages and grab the reviews:
reviews=[]
for j,k in zip(data_id,range(1,20)):
response=requests.get('https://www.walmart.com/reviews/product/'+str(j)+'?page='+str(k),headers=header)
soup=BeautifulSoup(response.content)
for i in soup.findAll("div",{'class':"review-body"}):
reviews.append(i.text)
Create a pandas data frame:
rev={'reviews':reviews}
review_data=pd.DataFrame.from_dict(rev)
pd.set_option('max_colwidth',800)
review_data.head(10)
Output:
More about Walmart review scraper once look at the sample data that we have previously scraped so you can get proper idea.