Web Scraping With Node.js and Cheerio 🔥

Bagus Budi Cahyono
4 min readMay 27, 2023

--

In this post, I will show you how to create a web scraper using Node.js, Axios, and Cheerio. I use Axios to make an HTTP request to a website then I use Cheerio to parse the HTML, and extract the data I needed.

Let’s get to it. 🚀

Photo by Markus Spiske on Unsplash

Create a Web Scraper with Node.js, Axios, and Cheerio

In this example, I will scrap Hacker News Website (https://news.ycombinator.com/) and extract the news list on it.

First, create a project folder, then npm init or yarn init . You'll get a new package.json file.

Then follow these steps:

  • Install Axios and Cheerio
yarn add axios cheerio
  • Create a file called webScrapper.js
  • Import Axios and Cheerio
const axios = require('axios')
const cheerio = require('cheerio')
  • Create a function called scrape()
  • Make an HTTP call to Hacker News Website. you’ll get the string containing the HTML code. To view what the data looks like, I add console.log inside Axios then block. Don't forget to call the function after that by adding the code scrape()
const scrape = () => {
axios.get('https://news.ycombinator.com/')
.then(({ data: page }) => {
console.log(page)
})
.catch(error => {
console.error(error)
})
}

scrape()
  • If you want to see the result, run the code with Node.js
node webScrapper.js
  • You’ll see the result something like this.
  • That is the same content as from the website.
  • All right, let’s extract the content of the website. I want to get the news title and the URL. So, I inspect the element and find them. As you can see, the news title and URL are available inside span which has class titleline . So I will get the anchor tag (a), then get the text inside it for the news title and also the href attributes for the new URL.
  • I create a function to extract data from the page
const extractData = (page) => {
const $ = cheerio.load(page)
const $newsList = $('.titleline > a')

const result = []
for (const $news of $newsList) {
const title = $news.children[0].data
const url = $news.attribs.href

result.push({ title, url })
}

return result
}
  • If you don’t understand what the code above does, read this:
    - cheerio.load = parse string HTML to be Cheerio Object, so you can traverse it
    - $(.titleline > a) = get the anchor element inside an element that has class titleline
    - Loop the $newsList array and get the news title and URL, then create an Object and push it to result array
    - Return the result array
  • Ok, the extract function is ready. Let’s call it inside scrap function
const scrape = () => {
axios.get('https://news.ycombinator.com/')
.then(({ data: page }) => {
const result = extractData(page)
console.log(result)
})
.catch(error => {
console.error(error)
})
}
  • Run the scraper again to see the result
node webScrapper.js
  • Now, I have the news list data I wanted as an Array of Object 🔥🔥🔥

🌟 Here is the full final code 🌟

The result of Web Scraping using Node.js and Cheerio

That’s an example of how to create a web scrapper with Node.js, Axios, and Cheerio 😎. Now, when you have the extracted data, you can save it to a CSV file, Google Spreadsheet, or a database, or post it to another place you want. If you want to see how to do it, write a comment below and I will write an article about it.

Please do web scraping wisely. Thank you. 😉

This article also published on https://blog.baguscahyono.dev/web-scraping-with-nodejs-and-cheerio

--

--