Extract & mirror cache url's from google search pages

Saved search pages go in, cache links come out.
It’s handy for mirroring a dead site by using site:domain.com as the search parameter.

Notes: Without rate limiting I was blocked after request #169. However, there were no issues when using the limits below. The wait time can probably be much lower though. The empty useragent is required for wget to work.

To match the junk part of the filename (search?q=cache:4Ip_t8yQ-rL2:) use this regex: search\?q=cache:…………:

pcregrep -ho http://\\d\(.+?\)\(?=[+]\(.+\)\"\>Cached\\) search.html [search2.html etc.] > cachelist.txt
wget --wait 30s --random-wait --user-agent="" -i cachelist.txt

“Command name is 25% fewer characters to type! Save days of free-time! Heck, it’s 50% shorter compared to grep -r.”