Making safe file names
Roy Smith
roy at panix.com
Tue May 7 20:22:17 EDT 2013
In article <mailman.1428.1367972114.3114.python-list at python.org>,
Dave Angel <davea at davea.name> wrote:
> On 05/07/2013 03:58 PM, Andrew Berg wrote:
> > Currently, I keep Last.fm artist data caches to avoid unnecessary API calls
> > and have been naming the files using the artist name. However,
> > artist names can have characters that are not allowed in file names for
> > most file systems (e.g., C/A/T has forward slashes). Are there any
> > recommended strategies for naming such files while avoiding conflicts (I
> > wouldn't want to run into problems for an artist named C-A-T or
> > CAT, for example)? I'd like to make the files easily identifiable, and
> > there really are no limits on what characters can be in an artist name.
> >
>
> So what you need first is a list of allowable characters for all your
> target OS versions. And don't forget that the allowable characters may
> vary depending on the particular file system(s) mounted on a given OS.
>
> You also need to decide how to handle Unicode characters, since they're
> different for different OS. In Windows on NTFS, filenames are in
> Unicode, while on Unix, filenames are bytes. So on one of those, you
> will be encoding/decoding if your code is to be mostly portable.
>
> Don't forget that ls and rm may not use the same encoding you're using.
> So you may not consider it adequate to make the names legal, but you
> may also want they easily typeable in the shell.
One possible tool that may help you here is unidecode
(https://pypi.python.org/pypi/Unidecode). It doesn't solve your whole
problem, but it does help get unicode text into a form which is both
7-bit clean and human readable.
More information about the Python-list
mailing list