Extract & mirror cache url’s from google search pages

Saved search pages go in, cache links come out.
It’s handy for mirroring a dead site by using site:domain.com as the search parameter.

Notes: Without rate limiting I was blocked after request #169. However, there were no issues when using the limits below. The wait time can probably go much lower though. The empty user-agent is required for wget to work.

pcregrep -hoM http://webcache\.googleusercontent\.com/search\\?q\\=cache:\(.+?\)\(?=[+]\(.+\)\"\(.*\)\>Cached\) searc*.html > cachelist.txt
wget --wait 15s --random-wait --user-agent="" -i cachelist.txt

To match the junk part of the filename
(search?q=cache:4Ip_t8yQ-rL2:) use this:
search\?q=cache:............:

edit: updated for new search output & lowered wait to 15s