~ajroach42.com

I'm Andrew. I write about the past and future of tech, music, media, culture, art, and activism. This is my blog.

Downloading 78s from the Internet Archive

Posted: June 19, 2021

I’m leaving this as a note for myself, so that I can refer to it in the future, but it might help other people too. I’m downloading a large quantitiy of rips of 78 RPM records from the internet archive’s Great 78 Project (for a project that I’ll discuss later) and I want to ensure that I only get the “The preferred versions suggested by an audio engineer at George Blood, L.P. [which] have been copied to have […] more friendly filenames.”

I’m using The Internet Archive’s python CLI. It doesn’t have an obvious method for doing this. It does, however, support searching files to download with a glob. This is how I’ve downloaded things like this in the past, but I’ve been doing too much bash and not enough Python recently, so I kept screwing up the syntax.

The syntax to download friendly filename mp3s from the George Blood LP collection at the internet archive using the ia python tool is:

./ia download --search="collection:georgeblood" --glob="[!_]*.mp3"

I kept trying [^_] which does not work because the ia client uses Python’s fnmatch function for a Unix filepattern match, as can be seen in internetarchive client’s code.

So there we go, a one liner for the Internet Archive ia python CLI glob to ignore filenames with underscores.

(And that should be enough keywords that I actually find this in four years when I need to remember it again. :-) )


If you enjoyed this post, please consider signing up for my newsletter. or following me on Mastodon.


Check out the other stuff I do: Retro Social (Mastodon Instance), Analog Revolution, Space Age Ideas, Of Many Trades. If you want to help me keep making stuff, check out the rewards available from my Patreon.


Share on: