
Hello Currently, os.path.splitext will split a string giving you the piece behind the last dot:
os.path.splitext('file.tar.gz') ('file.tar', '.gz')
In some cases, what we really want is the two last parts when splitting on the dots (like in my example). What about providing an extra argument to be able to grab more than one dot ?
os.path.splitext('file.tar.gz', numext=2) ('file', '.tar.gz')
If numext > numbers of dots, it will just split after the first dot:
os.path.splitext('file.tar', numext=2) ('file', '.tar')
What do you think ? Regards Tarek -- Tarek Ziadé | http://ziade.org

Tarek Ziadé wrote:
I'm not sure whether that would really solve anything. The general problem with extensions is that they can span multiple "dotted" parts in a filename, but whether they do or not depends on the extensions. E.g. you can have 'file.tar', 'file.tar.gz', 'file.tgz', 'file.tar.gz.uu', 'file.tag.gz.asc', 'file.tar.gz.gpg', etc. OTOH, it's possible to have files using extra dotted parts to signal certain properties to the user, which don't really mean anything in terms of encoding, file format or compression, e.g. 'file.i686.linux.64bit.bin' Most systems I know that have to deal with file extensions, come with a list of possible extensions and then register a handler or property with each. They typically use the 'longest match wins' strategy and then use the match extension as file extension. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 20 2010)
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Tue, Apr 20, 2010 at 5:44 PM, M.-A. Lemburg <mal@egenix.com> wrote: [..]
They typically use the 'longest match wins' strategy and then use the match extension as file extension.
A split from the first right dot is also fine with me:
os.path.splitext('file.tar.gz', longest_match=True) ('file', '.tar.gz')
Tarek

Tarek Ziadé wrote:
I'm not sure whether that would really solve anything. The general problem with extensions is that they can span multiple "dotted" parts in a filename, but whether they do or not depends on the extensions. E.g. you can have 'file.tar', 'file.tar.gz', 'file.tgz', 'file.tar.gz.uu', 'file.tag.gz.asc', 'file.tar.gz.gpg', etc. OTOH, it's possible to have files using extra dotted parts to signal certain properties to the user, which don't really mean anything in terms of encoding, file format or compression, e.g. 'file.i686.linux.64bit.bin' Most systems I know that have to deal with file extensions, come with a list of possible extensions and then register a handler or property with each. They typically use the 'longest match wins' strategy and then use the match extension as file extension. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 20 2010)
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Tue, Apr 20, 2010 at 5:44 PM, M.-A. Lemburg <mal@egenix.com> wrote: [..]
They typically use the 'longest match wins' strategy and then use the match extension as file extension.
A split from the first right dot is also fine with me:
os.path.splitext('file.tar.gz', longest_match=True) ('file', '.tar.gz')
Tarek
participants (3)
-
Conrad Irwin
-
M.-A. Lemburg
-
Tarek Ziadé