Here’s what we’ll be building
DISCLAIMER: Instagram does not allow the scraping of their website. This tutorial is for demonstration purposes only.
This tutorial will only focus on building the backend API that scrapes Instagram and returns the video link we want. The frontend of this app is built in React. You could equally build the frontend in simple HTML, CSS and Vanilla JS.
I noticed that if you view any video on Instagram and view inspect the HTML of the resulting page, you’ll find that there’s a link to the MP4 video on Instagram’s CDN servers. If you click this link, it’ll take you directly to the Instagram video.
First things first.
- Initialize a package.json file with default values.
npm init -y
2. Install the NPM packages we need.
npm i express cors axios cheerio
If you’ve never used Node.js — the Express package gives us a server, Cors allows us to make requests from different origins(i.e if you had a frontend app on abc.com and your server was on def.com, CORS is the package that allows our server accept requests from different origins). Axios is the package for visiting a website and finally Cheerio is the package for inspecting and scraping elements on a Webpage.
3. Your start script should be changed to
4. Let’s write the code. Create a scraper.js file in the root of your project. First we bring in our packages — express, cors, axios and cheerio.
We also need to call the app.use() method. This function calls all our middlewares. Think of middlewares as any function you want to run on every single request to our server. First, we call express.json() — this ensures that the requests coming into the server are in the JSON format. Secondly we call Cors() to ensure that if our frontend and backend are on different origins, they can still talk to each other.
5. Next, we create a getVideo() function. This is function takes a URL. Goes to that URL(using axios) and returns the result/response/page of that URL to Cheerio for scraping.
6. Now that we have defined the function that goes to Instagram and returns the videoString, we need a way to accept the URL users send in. This is where our express server comes in. This will also be an asynchronous function — i.e we need to wait for the result of the getVideo() function before we can return a response to the user.
7. And that’s it. We’re done. We can test our endpoint with postman.
The entire code for the frontend and backend is available on my github here. Make sure to clone and star the repo.
I have another article on testing this app or any other Node.js app here.