I’ve seen several articles in the past with titles like “Top 10 things you didn’t know about bash programming.” These articles are disappointing on two levels: first of all, the tricks are almost always things I already knew. And secondly, if you want to write portable programs, you can’t depend on bash features (not every platform has bash!). POSIX-like shells, however, are much more widespread.
Since writing redo, I’ve had a chance to start writing a few more shell scripts that aim for maximum portability, and from there, I’ve learned some really cool tricks that I haven’t seen documented elsewhere. Here are a few.
Extract & mirror cache url’s from google search pages
Saved search pages go in, cache links come out.
It’s handy for mirroring a dead site by using site:domain.com as the search parameter.
Notes: Without rate limiting I was blocked after request #169. However, there were no issues when using the limits below. The wait time can probably go much lower though. The empty user-agent is required for wget to work.
pcregrep -hoM http://webcache\.googleusercontent\.com/search\\?q\\=cache:\(.+?\)\(?=[+]\(.+\)\"\(.*\)\>Cached\) searc*.html > cachelist.txt
wget --wait 15s --random-wait --user-agent="" -i cachelist.txt
To match the junk part of the filename (search?q=cache:4Ip_t8yQ-rL2:) use this:
search\?q=cache:............:
edit: updated for new search output & lowered wait to 15s
Set mp3 date tag as YYYY-MM-DD based on mtime
Particularly useful for large, poorly tagged radio archives mirrored with wget.
Processes files in the working directory matching the patters specified.
Requires: mutagen (for tagging)
for FILE in *.mp3; do mid3v2 --date=$(perl -e '@d=localtime ((stat(shift))[9]); printf "%4d-%02d-%02d", $d[5]+1900,$d[4]+1,$d[3]' $FILE) $FILE; done
“A simple and dirty HTML/XML template library for Python 3.”
>>> from dirty.html import *
>>> page = xhtml(
... head(
... title("Dirty"),
... meta(name="Author", content="Hong, MinHee <minhee@dahlia.kr>")
... ),
... body(
... h1("Dirty"),
... p("Dirty is a simple DSEL template library that...")
... )
... )
>>> print(page)
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" />
<head>
<title>Dirty</title>
<meta content="Hong, MinHee <minhee@dahlia.kr>" name="Author" />
</head>
<body>
<h1>Dirty</h1>
<p>Dirty is a simple DSEL template library that...</p>
</body>
</html>
Time formatting in Haskell
showTime :: Int -> Int -> String
showTime hours minutes
| hours == 0 = "12" ++ ":" ++ showMin ++ " am"
| hours <= 11 = (show hours) ++ ":" ++ showMin ++ " am"
| hours == 12 = (show hours) ++ ":" ++ showMin ++ " pm"
| otherwise = (show (hours - 12)) ++ ":" ++ showMin ++ " pm"
where
showMin
| minutes < 10 = "0" ++ show minutes
| otherwise = show minutes
Main> showTime 13 37
"1:37 pm"
From Haskell for C Programmers.
“Command name is 25% fewer characters to type! Save days of free-time! Heck, it’s 50% shorter compared to grep -r.”
>>> a = 500
>>> b = 500
>>> a is b
False
>>> c = 200
>>> d = 200
>>> c is d
True
“Can you surmise why this inconsistency happens?”
Disable the new look in Safari 4
The new tabs in Safari 4 are nice, but I prefer the old classic look. If you’re like me, like I know I am, use these commands in the terminal to disable the new look:
defaults write com.apple.Safari DebugSafari4TabBarIsOnTop -bool NO
defaults write com.apple.Safari DebugSafari4IncludeToolbarRedesign -bool NO
defaults write com.apple.Safari DebugSafari4LoadProgressStyle -bool NO
“A generator and a list used like a cache.”
class GeneratorList(object):
def __init__(self, generator):
self.__generator = generator
self.__list = []
def __getitem__(self, index):
for _ in range(index - len(self.__list) + 1):
self.__list.append(self.__generator.next())
return self.__list[index]
def trim(docstring):
if not docstring:
return ''
lines = docstring.expandtabs().splitlines()
# Determine minimum indentation (first line doesn't count):
indent = sys.maxint
for line in lines[1:]:
stripped = line.lstrip()
if stripped:
indent = min(indent, len(line) - len(stripped))
# Remove indentation (first line is special):
trimmed = [lines[0].strip()]
if indent < sys.maxint:
for line in lines[1:]:
trimmed.append(line[indent:].rstrip())
# Strip off trailing and leading blank lines:
while trimmed and not trimmed[-1]:
trimmed.pop()
while trimmed and not trimmed[0]:
trimmed.pop(0)
return '\n'.join(trimmed)
With this algorithm, an indented documentation string such as this…
def foo():
"""
A multi-line
docstring.
"""
…is converted to: "A multi-line\ndocstring."