Web Scraping With Node.js and Cheerio 🔥
In this post, I will show you how to create a web scraper using Node.js, Axios, and Cheerio. I use Axios to make an HTTP request to a website then I use Cheerio to parse the HTML, and extract the data I needed.
Let’s get to it. 🚀
Create a Web Scraper with Node.js, Axios, and Cheerio
In this example, I will scrap Hacker News Website (https://news.ycombinator.com/) and extract the news list on it.
First, create a project folder, then npm init
or yarn init
. You'll get a new package.json
file.
Then follow these steps:
- Install Axios and Cheerio
yarn add axios cheerio
- Create a file called
webScrapper.js
- Import Axios and Cheerio
const axios = require('axios')
const cheerio = require('cheerio')
- Create a function called
scrape()
- Make an HTTP call to Hacker News Website. you’ll get the string containing the HTML code. To view what the data looks like, I add
console.log
inside Axiosthen
block. Don't forget to call the function after that by adding the codescrape()
const scrape = () => {
axios.get('https://news.ycombinator.com/')
.then(({ data: page }) => {
console.log(page)
})
.catch(error => {
console.error(error)
})
}
scrape()
- If you want to see the result, run the code with Node.js
node webScrapper.js
- You’ll see the result something like this.
- That is the same content as from the website.
- All right, let’s extract the content of the website. I want to get the news title and the URL. So, I inspect the element and find them. As you can see, the news title and URL are available inside
span
which hasclass
titleline
. So I will get the anchor tag (a
), then get the text inside it for the news title and also thehref
attributes for the new URL.
- I create a function to extract data from the page
const extractData = (page) => {
const $ = cheerio.load(page)
const $newsList = $('.titleline > a')
const result = []
for (const $news of $newsList) {
const title = $news.children[0].data
const url = $news.attribs.href
result.push({ title, url })
}
return result
}
- If you don’t understand what the code above does, read this:
-cheerio.load
= parse string HTML to be Cheerio Object, so you can traverse it
-$(.titleline > a)
= get theanchor
element inside an element that has classtitleline
- Loop the$newsList
array and get the news title and URL, then create an Object and push it toresult
array
- Return theresult
array - Ok, the extract function is ready. Let’s call it inside
scrap
function
const scrape = () => {
axios.get('https://news.ycombinator.com/')
.then(({ data: page }) => {
const result = extractData(page)
console.log(result)
})
.catch(error => {
console.error(error)
})
}
- Run the scraper again to see the result
node webScrapper.js
- Now, I have the news list data I wanted as an Array of Object 🔥🔥🔥
🌟 Here is the full final code 🌟
The result of Web Scraping using Node.js and Cheerio
That’s an example of how to create a web scrapper with Node.js, Axios, and Cheerio 😎. Now, when you have the extracted data, you can save it to a CSV file, Google Spreadsheet, or a database, or post it to another place you want. If you want to see how to do it, write a comment below and I will write an article about it.
Please do web scraping wisely. Thank you. 😉
This article also published on https://blog.baguscahyono.dev/web-scraping-with-nodejs-and-cheerio