Regex Matching on Readline()

John Machin sjmachin at lexicon.net
Thu Dec 20 15:13:28 EST 2007


On Dec 21, 6:50 am, jwwest <jww... at gmail.com> wrote:
> Anyone have any trouble pattern matching on lines returned by
> readline? Here's an example:
>
> string = "Accounting - General"
> pat = ".+\s-"
>
> Should match on "Accounting -". However, if I read that string in from
> a file it will not match. In fact, I can't get anything to match
> except ".*".
>
> I'm almost certain that it has something to do with the characters
> that python returns from readline(). If I have this in a file:
>
> Accounting - General
>
> And do a:
>
> line = f.readline()
> print line
>
> I get:
>
> A c c o u n t i n g  -  G e n e r a l
>
> Not sure why, I'm a nub at Python so any help is appreciated. They
> look like spaces to me, but aren't (I've tried matching on spacs too)
>
> - james

To find out what the pseudo-spaces are, do this:

    print repr(open("the_file", "rb").read()[:100])

and show us (copy/paste) what you get.

Also, tell us what platform you are running Python on, and how the
file was created (by what software, on what platform).




More information about the Python-list mailing list