Automate the shit
Yup. Automation is the key. Why bother to do boring tasks when you can automate it?
Automation=Less hard work+Time saver
In this very first blog, I will share how I automate the process of downloading wallpapers using Bash Scripting and with some Regular Expressions.
On one fine day, I was looking for minimalist wallpapers for my Elementary OS. There comes this site https://wallpaperplay.com/ which has some huge collection of wallpapers. But the problem was with downloading process. It was like:-
- Click on Download button.
- Wait for 5 seconds (Timer).
- Click on generated link.
- And finally Right click ----> Save image as.
I was like, who is gonna pass these 4 stages to download a wallpaper? At least not me.
There comes the idea to automate this time taking process.
My approach was like this:-
- Check source code of current page.
- Extract the wallpaper links from it.
- Save all the links to a file.
- Pass the file as an argument to wget command.
Now it was a matter of time to convert this approach into an actual working code.
wget https://wallpaperplay.com/board/minimalist-desktop-wallpapers
This will download and save the HTML file with path name i.e "minimalist-desktop-wallpapers".
- Extracting filename using Regular Expression from URL.
echo "$1" | grep -Eoi 'board/.*' | cut -d'/' -f2
- Now we have to extract the links from downloaded HTML page using Regular Expressions.
cat "minimalist-desktop-wallpapers" | grep -Eoi '<a[^>]+>'
This will extract all HTML anchor tags which contains links. Here cat command is being use to reads data from the file and gives their content as output. The output is then redirected using pipe " |" to grep command.
grep -Eoi '<a[^>]+>'
This is a Regular Expression which extracts the anchor tags "<a>". It means "Match everything that starts with " <a" except " >" in between and upto " >"
E = Extended o = Show only the matched string i = Case insensitive search
- Extract links from href attribute.
This Regular Expression will extract the value of href attribute.It means "Match everything that starts with " /" and contains anything afterwards and ends with " .jpg"
Till here we will get output like this- /walls/full/2/5/9/20629.jpg
which is a relative URL. We can't really use this as an argument for wget command. We need to append "https://wallpaperplay.com" at the beginning of the URL to make it absolute URL.
- Making relative URL into absolute URL.
sed 's/^/https://www.wallpaperplay.com/g'
This Regular Expression will add "https://www.wallpaperplay.com" at the beginning of URL.
sed is stream editor and is basically use for editing text.
" ^" means at the beginning of each line.
After doing all these operations, we finally have valid URL's. Redirect all links to a text file using " >" operator and then finally pass the text file using -i flag to wget command and -P to save wallpapers in "wallpapers" folder.
cat "minimalist-desktop-wallpapers" | grep -Eoi '<a[^>]+>' | grep -Eoi '/.*jpg' | sed 's/^/https://www.wallpaperplay.com/g' > links.txt
wget -i links.txt -P wallpapers/
filename=$(echo "$1" | grep -Eoi 'board/.*' | cut -d'/' -f2)
cat "$filename" | grep -Eoi '<a[^>]+>' | grep -Eoi '/.*jpg' | sed 's/^/https://www.wallpaperplay.com/g' > links.txt
wget $2 $3 -i links.txt -P wallpapers/
rm $filename
rm links.txt