~ajroach42.com

I'm Andrew. I write about the past and future of tech, music, media, culture, art, and activism. This is my blog.

Downloading 78s from the Internet Archive

Posted: June 19, 2021

I’m leaving this as a note for myself, so that I can refer to it in the future, but it might help other people too. I’m downloading a large quantitiy of rips of 78 RPM records from the internet archive’s Great 78 Project (for a project that I’ll discuss later) and I want to ensure that I only get the “The preferred versions suggested by an audio engineer at George Blood, L.P. [which] have been copied to have […] more friendly filenames.”

I’m using The Internet Archive’s python CLI. It doesn’t have an obvious method for doing this. It does, however, support searching files to download with a glob. This is how I’ve downloaded things like this in the past, but I’ve been doing too much bash and not enough Python recently, so I kept screwing up the syntax.

The syntax to download friendly filename mp3s from the George Blood LP collection at the internet archive using the ia python tool is:

./ia download --search="collection:georgeblood" --glob="[!_]*.mp3"

I kept trying [^_] which does not work because the ia client uses Python’s fnmatch function for a Unix filepattern match, as can be seen in internetarchive client’s code.

So there we go, a one liner for the Internet Archive ia python CLI glob to ignore filenames with underscores.

(And that should be enough keywords that I actually find this in four years when I need to remember it again. :-) )


If you enjoyed this post, please consider signing up for my newsletter. or following me on Mastodon.