Hi guys, its been a long time since my last post about half years or so, and this is my first post at 2022. I hope this year is better than before, and we can recover from this COVID.
This idea pop up in my mind few weeks ago while i'm surfing at github search. Since 2013 i've been writing blog manually, finding a niche and topic to write at my blog, writing news and sometime personal.
Since 2017 i love writing a program using PHP and it make me lazy to writing an article or news or something personal to me, i love to mine data from a website or from an app so i can use it as my content at my Auto Generated Content (AGC) Blog.
Few weeks ago i think i should start making listing website, because in Indonesia there are so many people still use google for finding information on anything, a place, a service etc. So this is a good oportunity to make website traffic, the idea is simple thats is to make usefull AGC website that contain listing about people business and infographic.
After few hours surfing on github search i decided to write a script to scrape Google Maps Business data by using query like this : "xxxx near yyy", x stand for business type, and y is for location name like district, village, city, etc.
There's 2 option i got, the first is to write it on javascript node and the second on python, so i choose python because i dont really like node.
I used 2 main library for scraping data automatically:
1. Selenium WebDriver (for automating browsing process)
2. BeautifulSoup (for Parsing HTML Content)
- Some library used
from datetime import datetime
from fileinput import filename
import logging
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from webdriver_manager.chrome import ChromeDriverManager
import json
import time
import os
import sys
os.system('clear')
- Make sure the library installed, if not install it first
- Selenium WebDriver
try:
from selenium import webdriver
except:
seleniumcommand = "python3 -m pip install selenium"
os.system(seleniumcommand)
- BeautifulSoup
try:
from bs4 import BeautifulSoup
except:
bs4command = "python3 -m pip install bs4"
os.system(bs4command)
i wrote some function that may help our script
- Scroll the result
this function will help us to show the rest of the google maps search result so it wont limited just 5-6, and it took all 20 item
def scrolling(driver):
try:
scrollable_div = driver.find_element_by_xpath(
'//*[@id="pane"]/div/div[1]/div/div/div[2]/div[1]')
driver.execute_script(
'arguments[0].scrollTop = arguments[0].scrollHeight', scrollable_div)
time.sleep(2)
except NoSuchElementException:
print("Error: can't find scrollbar")
print("")
AUTOMATED SCRAPING PROCESS
Basic info that we need for each business
{
link : business maps url, so it can be used later
title : business name
thumbnail: business image
category : business category
address : business address
phone : business phone number
plusCode : business address with plus code
openHours: business open hours
rating : business rating
website : business website
}
Query selector
- Link
Business link is obtained from extracted href on links selectora.a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd
links = [x.get_attribute('href') for x in driver.find_elements_by_css_selector("a.a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd")]
Run for each item in Links and open browser to scrape the data
- Business Name
title = parser.select('h1')[0].text.strip()
- Business Image
if (parser.find('button', {
'jsaction': 'pane.heroHeaderImage.click'
})):
img = parser.find(
'button', {
'jsaction': 'pane.heroHeaderImage.click'
}
).img['src']
else :
img = ""
- Business Category
if (parser.find('button', jsaction = "pane.rating.category")):
category = parser.find('button', jsaction = "pane.rating.category").text.strip()
else :
category = ""
- Business address
if (parser.find('button', {
'data-tooltip': 'Salin alamat'
})):
address = parser.find(
'button', {
'data-tooltip': 'Salin alamat'
}
).text.strip()
else :
address = ""
- Business Phone Number
if (parser.find('button', {
'data-tooltip': 'Salin nomor telepon'
})):
phone = parser.find(
'button', {
'data-tooltip': 'Salin nomor telepon'
}
).text.strip()
else :
phone = ""
- Business Address with Plus Code
if (parser.find('button', {
'data-tooltip': 'Salin Plus Codes'
})):
plusCode = parser.find(
'button', {
'data-tooltip': 'Salin Plus Codes'
}
).text.strip()
else :
plusCode = ""
- Business Open Hours
if (parser.find('div', {
'class': 'LJKBpe-open-R86cEd-haAclf'
})):
openHoursResults = {}
openHours = parser.find(
'div', {
'class': 'LJKBpe-open-R86cEd-haAclf'
})['aria-label']
for days in openHours.split('; '):
dayTime = days.replace(
'hingga', '-').replace('. Sembunyikan jam buka untuk seminggu', '').split(',')
dayInput = {
'dayName': dayTime[0],
'openHour': dayTime[1]
}#
print(type(dayInput))
openHoursResults[dayTime[0]] = dayTime[1]
else :
openHoursResults = {}
- Business Rating
if (parser.find('span', {
'class': 'aMPvhf-fI6EEc-KVuj8d'
})):
rating = parser.find(
'span', {
'class': 'aMPvhf-fI6EEc-KVuj8d'
}).text.strip()
else :
rating = ""
- Business Website
I still cant find the right selector, but fortunately there's copy button for the website, so we can capture the clipboard and put it on the variable
driver.find_element_by_xpath(
'//img[@alt="Salin situs"]').click()
website = clipboard.paste()
After we got the data you can put it on the list and append on main list for further use
result = {
"link": driver.current_url,
"title": title,
"thumbnail": img,
"category": category,
"address": address,
"phone": phone,
"plusCode": plusCode,
"openHours": openHoursResults,
"rating": rating,
"website": website
}
logging.info("Scraping done, append results...")
results.append(result)
Run the script
to run the script just edit the file and find result
variable and change it to your query
python forhive.py
sample single results
{
"link": "https://www.google.com/maps/place/Bengkel+mobil+%22DNF+Auto+Service+Pekanbaru%22/data=!4m5!3m4!1s0x31d5abe309bd1cdb:0xfe08771ea01b758a!8m2!3d0.4948262!4d101.4188712?authuser=0&hl=id&rclk=1",
"title": "Bengkel mobil \"DNF Auto Service Pekanbaru\"",
"thumbnail": "https://lh5.googleusercontent.com/p/AF1QipNveI4kZ01yc_ypg9pWc-ShMP-dQZxUrPDGrE67=w408-h510-k-no",
"category": "Bengkel Mobil",
"address": "Didepan Plaza Mebel, Jl. Soekarno - Hatta No.8, 9, Delima, Kec. Tampan, Kota Pekanbaru, Riau 28292",
"phone": "0812-6178-1555",
"plusCode": "FCV9+WG Delima, Kota Pekanbaru, Riau",
"openHours": {
"Senin": "08.30 - 17.00",
"Selasa": "08.30 - 17.00",
"Rabu": "08.30 - 17.00",
"Kamis": "08.30 - 17.00",
"Jumat": "08.30 - 17.00",
"Sabtu": "08.30 - 15.00",
"Minggu": "Tutup"
},
"rating": "4,5",
"website": "log_level=0"
}
IMPORTANT NOTES
this scraper just run on Google Maps with indonesian language
, i will update the selector to a config file later so you can change the selector based on your map language
.
Screenshoot
Full code
For full code you can download it here