WebScraping-Data-Analytics

WebScraping-for-job-Website

In this code we are fetching information from a job website named totaljobs about job listing available, filters them out according to skills and saves the output in a local file

This program is able to fetch the: * Job Title/Role needed * Company name * location
* salary

User Story

As a data analyst I want to be able to get web large information in csv file.

Acceptance Criteria

Acceptance Criteria

  • It is done when I can make a request to a specified url.
  • It is done when I get response from that url.
  • It is done when I get the target content from the url.
  • It is done when that content is saved in csv file.

Sample Output

Packages used

  • BeautifulSoup
  • requests
  • csv file

Challenges encountered:

  • The only real difficulty was trying to locate the precise ID and passing robots elements (such as find element by ID, x-path, class, and find_all) that would appropriately transmit the information back.
  • In overall our team was succussful to apply python on web scraping to complete our assignment.

Steps To Execution

  • Fork this repository and navigate to the WebScraping-Data-Analytics folder
  • Execute the program by running the pydatanalytics.py file using $ python pydatanalytics.py
  • The program will then fetch the information and put the information into a csv file.

Team Members

Source Code: pydataanalytics.py

import csv
import requests 
from bs4 import BeautifulSoup

#Url to the jobsite (using tottal job as an examples)
url =  'https://www.totaljobs.com/jobs/in-london'

r = requests.get(url)

# parsing the html to beautiful soup
html_soup= BeautifulSoup(r.content, 'html.parser')

# Targeting the jobs container
job_details = html_soup.find('div', class_='ResultsContainer-sc-1rtv0xy-2')

# Pulling out the needed tags
job_titles =job_details.find_all(['h2','li','dl'])
company_name =job_details.find_all('div', class_='sc-fzoiQi')

total_job_info = job_titles + company_name

# Writing the data to a CSV file
with open('job_data_2.csv', mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Job Title', 'Location', 'Salary', 'Company Name']) # header row
    min_length = min(len(job_titles), len(company_name))
    for i in range(0, min_length - 3):
        job_title = job_titles[i].text.strip()
        location = job_titles[i+1].text.strip()
        salary = job_titles[i+2].text.strip()
        company = company_name[i+3].text.strip()
        writer.writerow([job_title, location, salary, company])
       # print(job_title)