Using wget to download websites

in #tutorial7 years ago

Using wget to download websites

I know. There are some GUI tools for this. But what if you are stuck in a terminal only environment? Besides, nothing beats plain old terminal.

The simplest form is

wget -r [url] or 
wget --recursive [url]

--recursive: Turn on recursive retrieving. The default maximum depth is 5

Now before you go downloading the internet exercise a little patience, let's cover some more basics.

Using the above command, the url will be downloaded, but will not be really suitable for offline viewing. The links in the downloaded document(s) will still point to the internet. To enable relative (offline) links do this

wget -rk [url] or
wget --recursive --convert-links [url]

--convert-links: After the download is complete, convert the links in the document(s) to make them suitable for local viewing.

The above command will alter the document(s) for offline viewing. You might want wget to keep the original files. Then do this

wget -rkK [url] or
wget --recursive --convert-links --backup-converted [url]

--backup-converted: When converting a file, back up the original version with a .orig suffix

The above command(s) will only download the html file. To tell wget to download all files necessary to display the page properly (images, sounds, linked css etc) use

wget -rkp [url] or
wget --recursive --convert-links --backup-converted --page-requisites [url]

--page-requisites: This option causes wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.

Again, don’t go yet. The default level of links to follow is 5. This might be too much (or too small in case your plan is to download the whole internets). you can specify the link level thus

wget -rkpl 3 [url] or
wget --recursive --convert-links --backup-converted --page-requisites --level=3 [url]

--level=depth: Specify recursion maximum depth level depth.

Finally, you might want wget to do all the hard work of downloading the internet and delete the files immediately after.

wget -r –delete-after url

This is not all wget can do though, to learn more about the various options check the man pages for wget

man wget

That’s it folks. Happy interwebs downloading.

Sort:  

Congratulations @aardvocate! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of upvotes received

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!