moreutils is a growing collection of the unix tools that nobody thought to write thirty years ago.
It began when I blogged:
I'm a fan of the unix tools philosophy, but I sometimes wonder if there's much room for new tools to be added to that toolbox. I've always wanted to come up with my own general-purpose new unix tool.
Well, after lots of feedback documented in the many followups (1 2 3) in my blog, I've concluded:
Maybe the problem isn't that no-one is writing them, or that the unix toolspace is covered except for specialised tools, but that the most basic tools fall through the cracks and are never noticed by people who could benefit from them.
And so the moreutils collection was born, to stop these programs from falling through the cracks.
Probably the most general purpose tool in moreutils so far is sponge(1),
which lets you do things like this:
% sed "s/root/toor/" /etc/passwd | grep -v joey | sponge /etc/passwd
There are lots more listed in the README, and I'm always interested to add more to the collection, as long as they're suitably general-purpose, and don't duplicate other well-known tools.
Should moreutils have a mailing list? Mail me if you'd like to be on such a list. For now, you can subscribe to the news RSS feed below.
Download
A Debian package as well as the source tarball for moreutils can be downloaded from packages.debian.org, or using apt.
The git repository can be cloned from git://git.kitenet.net/moreutils
News
moreutils 0.31 released with these changes
- pee.1: Document difference with tee in stdout.
- ts: Support displaying fractional seconds via a "%.S" conversion specification. Closes: #482789
moreutils 0.30 released with these changes
- debhelper v7; rules file minimisation
- Use DESTDIR instead of PREFIX.
- Add a DOCBOOK2XMAN setting. (Greg KH)
- ifne: Add -n which makes it run the command if stdin is empty.
- ifne: If no command is specified, print usage information.
moreutils 0.29 released with these changes
- Add ifne, contributed by Javier Merino.
- sponge, ifne: Ensure that suspending/resuming doesn't result in partial writes of the data, by using fwrite() rather than write().
- sponge: Handle large data sizes by using a temp file rather than by consuming arbitrary amounts of memory. Patch by Brock Noland.
- ts: Allow both -r and a format to be specified, to parse dates and output in a specified format.
- ts: Fix bug in timezone regexp.
TODO
pee should support non-blocking i/o to write to the pipes to allow concurrent processing of the data by the programs. Alternatively, switch to fountain http://hea-www.cfa.harvard.edu/~dj/tmp/fountain-1.0.2.tar.gz.
Altentavely, make sponge buffer to stdout if no file is given, and use it to buffer the data from pee. Although this will be less efficient and will not work as well for very large streams unless sponge avoids buffering the whole contents in memory in this case.
Tools under consideration
Here are some that are under consideration but have not yet been included. Feel free to suggest others. I also welcome feedback on which of these to include.
dirempty/exists
It's too hard to tell if a directory is empty in shell. Also, while test -e works ok for a single file, it fails if you want to see if a wildcard matches anything.
Suggested in bug #385069, see bug for my comments.
cattail
Allows catting a file that's still changing (ie, being downloaded) to a program. The new bits of the file will continue to be fed to the program until the download is done.
Submitted by Justin Azoff, with code. However, it has to use heuristics to guess when the download (or whatever) is done. The current heuristic, 10 seconds w/o growth, wouldn't work very well for me on dialup.
phoenix
Respawns a process unless a user really wants to quit. Suggested in bug #382428
Doesn't seem general enough.
haschanged
Run it once to store a file's hash, and the second time it'll check whether the file has changed. http://blog.steve.org.uk/index.php/archives/2006/08/26/the-traffic-is-waiting-outside/
tmp
puts stdin into a temp file and passes it to the specified program
ex: zcat file.bmp.gz | tmp zxgv
connect
connect 'cmd1' op 'cmd2' ... -- connects fd's of commands together, etc
- In the same spirit as 'pee', but much more powerful.
- If done very simply, this is handy for writing coprocesses as pipelines that need to communicate back and forth.
- You can do SOME of this with a great shell, like bash or zsh; you can do almost all the rest with a bunch of mkfifo commands plus simple redirection, but with added complexity and a lot of manual steps.
- This command could be even more powerful if you gave it
essentially a "netlist" of fd's to connect. I'm sure the command line
syntax could be improved, but you get the idea. Very very complex
example just to illustrate:
connect 'cmd1' '<> #0:3>4' 'cmd2' '3>' \
'cmd3' '3<>3 #0:0>' 'cmd4' '3>#1:5'
- specs specify connections between adjacent cmds
- qualified specs (w/ '#') allow more complex connections
- Some sane defaults, but can be overridden
- stdin goes to first process that doesn't redirect it
- stdout comes from everyone that doesn't redirect it
- stderr comes from everyone that doesn't redirect it
- cmd1's stdout -> cmd2's stdin
- cmd2's stdout -> cmd1's stdin
- fd3 -> cmd2's fd4
- cmd2's fd3 -> cmd3's stdin
- cmd3's fd3 -> cmd4's fd3
- cmd4's fd3 -> cmd3's fd3
- stdin -> cmd4
- cmd4's fd3 -> cmd1's fd5
- stdout <- all w/o redirected stdout (in this case, cmd3)
- stderr <- all w/o redirected stderr (in this case, all)
If you think this is a good idea, let me know. I have a basic connect command, but it only does two commands. However, I'll be happy to code this up if there is interest. (In fact, I think I may anyway, so I don't keep doing stuff like this ad-hoc all the time). -- from Wesley J. Landaker
Should be possible to roll mispipe up into this by adding a way to flag which command(s) exit status to return.
Rejected tools
(Some of these rejections may be reconsidered later.)
add
adds up numbers from stdin
Already available in numutils. RFP bug filed.
todist
inputs a list of numbers and outputs their distribution, a value and how many time it occurs in the input http://baruch.ev-en.org/files/todist
More suitable for numutils, which can probably do it already. RFP bug filed.
tostats
inputs a list of numbers and outputs some statistics about the numbers: average, stddev, min, max, mid point http://baruch.ev-en.org/files/tostats
More suitable for numutils, which can probably do it already. RFP bug filed.
unsort
Randomise the lines of a file. Perfect candidate, but bogosort and rl (from the randomize-lines package) already do it.
http://savannah.nongnu.org/projects/shuffle/ is a similar thing, which its author describes as "almost coreutil ready, but its memory bound, a big nono". (Apparently coreutils 6 has a
shoufandsort --random-sort.)mimedetermines the mime type of a file using the gnome mine database
The File::MimeInfo perl module has a
mimetypethat works like this, and uses the freedesktop.org mime database, same as GNOME.srename
Applies a sed pattern to a list of files to rename them. Rejected because perl has a
renameprogram that works nearly identically.