Downloading HTML from a Website

Published by Nyau Wai Hoe - Updated on

The internet is like a huge library filled with tons of information and stuff to check out. When we visit a website, what we see on our screens is put together using HTML, which stands for HyperText Markup Language. It’s pretty much the building blocks for any webpage you come across. If you’re into things like collecting data from websites, testing, or just figuring out how a webpage is built, you might want to save a website’s HTML. This guide is here to walk you through how to do that in a way that’s cool with the website owners and the law.

Also see: Download All Files From a Website Directory Using Wget in Windows 11 or 10

Downloading HTML from a Website

First up, remember to play nice and check the website’s rules and the law before you go downloading anything. Look for a file called robots.txt on the site, and make sure what you’re doing is okay.

What is HTML?

HTML is the secret sauce that makes websites possible. It’s a special code that web browsers use to show webpages. HTML works with other tech like CSS (which makes pages pretty) and JavaScript (which makes pages do cool stuff) to make webpages look good and work well.

An HTML document is just a text file with HTML code, marked by tags that tell the browser how to show different parts of the page, like “heading,” “paragraph,” “table,” and so on.

Related resource: How to Convert HTML to PDF in Windows 11/10

How to download HTML file from a website

There are a few ways to grab HTML from a website, depending on what you need. We’ll go over some common methods like using a web browser, command line, and even Python.

Using a Web Browser

We’ll focus on Google Chrome here, but other browsers work similarly.

Downloading HTML source code from a website

  1. Open Google Chrome and go to the webpage you want.
  2. Right-click on the page and select “View Page Source” to see the HTML in a new tab.View page source for a website
  3. Right-click in the new tab and choose “Save As” to save the HTML file on your computer.Download HTML source code from website

Useful tip: How to Run HTML Code in Notepad Windows 11

Downloading HTML with pictures, CSS, and JS from a website

If you’re looking to grab a full webpage with all its bells and whistles like pictures, CSS, and JS files, here’s how you can do it:

  1. First up, open Google Chrome and head to the webpage you want to download. Once you’re there, just right-click anywhere on the page and select “Save As”.Download HTML CSS JS file from website
  2. Next, a box will pop up asking where you’d like to save the HTML file on your computer. Hit “Save”, and just like that, you’ve got the webpage’s HTML saved on your device.Save a webpage as HTML and CSS JS files with Chrome

When you save a webpage like this, whether it’s with Chrome or another browser, you’ll get two things:

  1. An HTML file: This is the .html file with the webpage’s HTML, which includes the structure and the content. It points to any external stuff the webpage uses, like images, CSS, and JavaScript files.
  2. A resources folder: Along with the HTML file, you’ll get a folder named just like the HTML file but with ‘_files’ at the end. This folder’s packed with all the webpage’s resources that are now saved locally. This includes images, CSS files, JavaScript files, and more, which are needed for the webpage to look right offline.

Chrome save the HTML as a single file

The browser will tweak the HTML file’s links to these resources so they point to the local copies in the resources folder. This lets the webpage load up just fine even when you’re offline.

But keep in mind, not everything might work offline. For example, server-side scripts (like PHP) won’t run, and if the webpage grabs resources on the fly with JavaScript, those might not be downloaded either.

Recommended guide: How to Check When a Web Page Was Last Updated

Why doesn’t Chrome save the HTML as a single file?

The main reason is all about keeping the webpage working and looking as it should. Webpages are complex, with not just HTML but also CSS for the looks and JavaScript for the action. These bits are usually in separate files. If Chrome crammed everything into one HTML file, it would have to stuff all these resources directly into the HTML, which can mess with how easy it is to read, edit, and how well it runs.

So, to keep everything running smoothly and looking good, Chrome saves the HTML and its resources as separate files, rather than mixing it all into one file.

See also: Create Website or Application Shortcut on Desktop using Chrome

Using the Command Line

If you’re on Linux or MacOS, you can use commands like wget or curl to grab HTML.

  1. Using wget:
    wget https://example.com

    This saves the webpage’s HTML in your current folder.

  2. Using curl:
    curl https://example.com > example.html

    This saves the webpage’s HTML into a file named “example.html”.

Download HTML from website using command line

Using Python

Python is great for grabbing webpages, too. Here’s a quick script:

import requests

url = "https://example.com"
response = requests.get(url)

with open('example.html', 'w') as file:
   file.write(response.text)

Download HTML from website using Python

This code asks a website for its HTML and then saves it to a file called ‘example.html’ on your computer. If you’re new to Python and run into an error saying it can’t find a module, you might need to install it using Python’s package manager, pip. Just open your command line and type:

pip install requests

This will install the needed module. If you need another module, you can install it in the same way.

Advanced Usage: Web Scraping

Once you’ve got the HTML, you might want to pull out specific bits of information. This is where web scraping comes in handy. Python has a library called beautifulsoup4 that’s perfect for this. It lets you sift through HTML easily.

Here’s a quick example:

from bs4 import BeautifulSoup
import requests

url = "https://example.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Let's say we want to extract all the headers in the page
headers = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])

for header in headers:
   print(header.text)

This script takes the HTML, digs through it with BeautifulSoup, and prints out all the headers.

Web Scraping Python using BeautifulSoup

Dealing with Dynamic Content

Some websites load their content on the fly, meaning you might not get everything by just saving the HTML. For these cases, you can use tools like Selenium, Puppeteer, or Playwright. These tools actually open up a web browser, let the page load completely, and then let you access the HTML.

Here’s how you might use Selenium with Python:

from selenium import webdriver

# Make sure the chromedriver is in your PATH
driver = webdriver.Chrome()

driver.get('https://example.com')

html = driver.page_source

with open('example.html', 'w') as file:
   file.write(html)

driver.quit()

Use Selenium to download HTML from website in Python

This method can be a bit slow and uses more resources, but it’s great for getting everything from pages that load content dynamically.

One last thing

Whether you’re into coding, testing, or just curious about how websites work, knowing how to download HTML is a very useful skill. There are lots of ways to do it, depending on what you need and what tools you’ve got. Just make sure to always respect the website’s rules and only download stuff when it’s okay to do so.


Nyau Wai Hoe
Nyau Wai Hoe is the Founder and Chief Editor of WindowsDigitals.com. With a degree in software engineering and over 12 years of experience in the tech support industry, Nyau has established himself as an expert in the field, with a primary focus on the Microsoft Windows operating system. As a tech enthusiast, he loves exploring new technologies and leveraging them to solve real-life problems.

Share via
Copy link