Web_Scraper
Introduction
This Python program is a web scraper that extracts data about graphics cards from a specific website. It uses the BeautifulSoup library to parse the HTML content of the website and requests library to fetch the web page.
Requirements
- Python 3.x
- BeautifulSoup library (
beautifulsoup4) - Requests library (
requests) - Openpyxl library (
openpyxl)
You can install the required libraries using pip:
pip install beautifulsoup4 requests openpyxl
How to Use
Clone this repository or download the files.
Open a terminal or command prompt and navigate to the project directory.
Run the Python script
app.py:
app.py
The program will start scraping data from the website and display the brand, name, and price of each graphics card on the console.
Once the scraping is complete, the program will save the data to an Excel file named
Graphics Card.xlsx.
Configuration
You can modify the URL in the scrape_graphics_cards_data() function inside the app.py file to scrape data from a different website or adjust the parameters as needed.
Output
The program will generate an Excel file Graphics Card.xlsx containing the scraped data. Each row in the Excel file represents a graphics card and includes the columns Brand, Name, and Price.
Disclaimer
This web scraper is provided for educational and informational purposes only. Please be respectful of the website’s terms of service and scraping policies. Always obtain proper authorization before scraping any website, and use the scraper responsibly and ethically.
Source Code: app.py
from bs4 import BeautifulSoup
import requests
import openpyxl
def extract_brand_name_and_title(name):
# Split the name and return the first word as the brand name and the rest as title
brand, title = name.split(' ', 1)
return brand, title
def scrape_graphics_cards_data():
try:
# Create a new Excel workbook and set up the worksheet
excel = openpyxl.Workbook()
sheet = excel.active
sheet.title = "price"
sheet.append(['Brand', 'Name', 'Price'])
url = 'https://www.techlandbd.com/pc-components/graphics-card?sort=p.price&order=ASC&fq=1&limit=100'
response = requests.get(url)
response.raise_for_status()
# Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')
# Find all product cards on the webpage
cards = soup.find('div', class_='main-products product-grid').find_all(
'div', class_='product-layout has-extra-button')
for card in cards:
# Extract the product name
name = card.find('div', class_='name').a.text
# Split the name to get the brand and title
brand, title = extract_brand_name_and_title(name)
# Extract the product price
price = card.find('div', class_='price').span.text
# Print the product details and add them to the Excel sheet
print(brand, title, price)
sheet.append([brand, title, price])
# Save the Excel file
excel.save('Graphics Card.xlsx')
except Exception as e:
print("An error occurred:", e)
if __name__ == "__main__":
# Call the main scraping function
scrape_graphics_cards_data()