Beautiful Soup vs Scrapy vs Selenium - Which one to use? Python Web scraping
4 min read
Table of contents
Beautiful Soup, Scrapy and Selenium are among widely used tools for Web Scraping using Python. But it often get's confusing when starting out with Web scraping about which tool to learn first, which one is useful for Large Scale Data Extraction and why 3 different tools for one single purpose.
Let's answer this today in our article:
Beautiful Soup
If you prefer watching a video:
Beautiful Soup is a Parsing library of Python which helps to locate and separate specific data points from a large HTML content.
Example: Given a HTML content that looks like below:
It helps it convert to the below data:
It is usually used with a library called requests
to get HTML content and later Beautiful Soup is used to get specific content from it by locating elements inside the HTML content.
The Pros of using Beautiful soup are:
It's easy for beginners to understand and it follows simple syntax.
Allows you to separate the data points you need from large HTML content
Cons of Beautiful Soup:
Dependency on another library like Request to get HTML content. Cannot get HTML content on it's own.
Does not support User Interactions like Click, filling up forms, scrolling, etc.
Selenium
Selenium is particularly useful when the Website you are extracting data from is made up of Javascript or we need to perform some automated user interactions such as Filling up the forms, logins, clicks, scrolling, etc.
It's a great library that is used for Automated testing as well.
It works by opening up a browser of your choice (Chrome, Firefox, Safari, Edge, etc.) and performing any actions or getting the HTML content from the page.
Selenium supports various languages like Python, Java, C#, and more.
To understand, how it works, let's see a simple example:
Here's the code created with Python and Selenium which will open a new Chrome window and go to the website google.com and do a google search for us:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
driver.get("https://www.google.com")
search_box = driver.find_element("name", "q")
search_query = "Restaurants in Paris"
search_box.send_keys(search_query)
search_box.send_keys(Keys.RETURN)
time.sleep(5)
driver.close()
Here's how it will look after running the above Python code:
The Pros of using Selenium are:
Great for Javascript-enabled websites
Allows us to perform User Interactions such as Typing, login, scrolling and more
Inbuilt features to get the HTML content, Screenshots, and more.
Cons of Selenium:
Slower to get the data
Can be easily detected by browsers
Scrapy
Scrapy is a super powerful Open Source Web Scraping tool that can help you extract data with ease.
So the main features that make Scrapy a super Powerful open-source tool are:
It Allows you to send concurrent requests to a website and extract data asynchronously (sending multiple requests in parallel) which makes it a fast web scraping tool.
It also connects seamlessly with other Technologies like Selenium, Beautiful Soup, Splash, and more. It's the power of Python. Simply import the libraries and start using them
Export the Output easily to CSV, Excel, JSON or database like MongoDB, MYSQL or others.
Has Inbuilt techniques to bypass Antibot mechanisms of the websites
However, there are some cons as well:
It's tough to understand for beginners.
It takes a bit extra time to set up and start scraping.
Complete Comparison
Now that we have seen each of them, here's a comparison of all the three technologies together:
Conclusion
So if you are someone who is just starting out with Web scraping, it's recommened to start with Beautiful Soup. Do some projects and move to other Technologies like Selenium and Scrapy.
These will help you to understand how each of the Technologies are useful in their own ways.
Hope this article helps in your Data Extraction Journey.
Also, you can check out our Detailed articles below on each of the Technologies mentioned today:
Web Scraping in Python using Beautiful Soup
Web Scraping using Selenium and Python - 2024 Roadmap
Web Scraping with Scrapy and Python - Guide with Real world Example
Hope you like them too! :)