[Tutor] Re: regexes, thanks

Derrick 'dman' Hudson dman@dman.ddts.net
Wed Nov 20 23:42:02 2002


--R3G7APHDIzY6R/pk
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Nov 19, 2002 at 05:41:18PM +0100, lumbricus@gmx.net wrote:
| Hello!
|=20
| > I was actually wanting to take a list of filenames, for example:
| >=20
| > Pink Floyd - 02 - Dogs.mp3
| > Japanese_Noh_Music_--_Gaku.mp3
| > asu - searching_2001_edit.MP3
| >=20
| > ...and pass it through a sequence of regexes (of which this is just one=
)=20
| > to produce consistent filenames along the lines of:
| >=20
| > Pink_Floyd-02-Dogs.mp3
| > Japanese_Noh_Music--Gaku.mp3
| > Asu--Searching_2001_Edit.mp3
|=20
| Dashes in file names are pure evil - avoid them.

Why do you say that?  I see how a dash (or two) at the beginning of a
filename can cause trouble, but not how dashes in the middle have any
effect.

(for those who need a workaround for a file starting with dashes,
programmatically the os.rename() function can be used to rename the
file; alternatively many commands use the argument "--" to indicate
that the rest of the arguments are not options but are literal data
(eg a filename for 'rm' or 'mv'))

| > I first read in the filenames, then produce a list of tuples [filename,=
=20
| > new_filename], which are used for renaming at the end.  So, I need to=
=20
| > perform operations on certain parts of the filename, not just the whole=
=20
| > thing.  It seems that there would be a simple way to perform this stage=
.=20
| >   Like,
| >=20
| > new_filename =3D words.sub('([a-zA-Z]+)', '\u\1', filename)

Try this to see what the problem is :
    print '\u\1'
then try
    print r'\u\1'
and
    print '\\u\\1'

| > But I couldn't get \1, \2 substitution working (had to use '\g<1>') and=
=20
|=20
| group()
|=20
| > it doesn't look like python supports \u and \U ('man perlre' for info),=
=20
|=20
| capitalize()?

You can put this together like thus (untested) :

    the_re =3D re.compile( "(...)" )   # the ellipsis means put the real pa=
ttern here
    possible_match =3D the_re.search( some_text_string )
    if possible_match is not None :
        first_part_of_new_name =3D possible_match.group( 1).capitalize()
    else :
        print "no match"   # whatever you want to do if there's no match

The idea here is that when a regex matches a string, the result is a
"match" object which encapuslates the various aspects of the match.
One of the aspects is the groups that the pattern created.  You can
access each group individually and then to whatever string processing
you want on it.  Depending on what you want to do with the strings, it
may be easier to write out the operations like this than to try and do
it all in a single substitution.  Choose the method that works and
which is best suited for the situation.


I don't know if python's regex engine supports \u.  You can check in
the library reference.

HTH,
-D

--=20
Whoever loves discipline loves knowledge,
but he who hates correction is stupid.
        Proverbs 12:1
=20
http://dman.ddts.net/~dman/

--R3G7APHDIzY6R/pk
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAj3cZ0QACgkQO8l8XBKTpRTrjwCggwHc7yPTueHKgV3OTAtg83TU
S5MAnikzmArRhHoSbQkzIDKxK9bROMzX
=qZv9
-----END PGP SIGNATURE-----

--R3G7APHDIzY6R/pk--