[Tutor] RE module is working ?
Karim
karim.liateni at free.fr
Fri Feb 4 20:07:24 CET 2011
On 02/04/2011 02:36 AM, Steven D'Aprano wrote:
> Karim wrote:
>
>>>> *Indeed what's the matter with RE module!?*
>>> You should really fix the problem with your email program first;
>> Thunderbird issue with bold type (appears as stars) but I don't know
>> how to fix it yet.
>
> A man when to a doctor and said, "Doctor, every time I do this, it
> hurts. What should I do?"
>
> The doctor replied, "Then stop doing that!"
>
> :)
Yes this these words made me laugh. I will keep it in my funny box.
>
>
> Don't add bold or any other formatting to things which should be
> program code. Even if it looks okay in *your* program, you don't know
> how it will look in other people's programs. If you need to draw
> attention to something in a line of code, add a comment, or talk about
> it in the surrounding text.
>
>
> [...]
>> That is not the thing I want. I want to escape any " which are not
>> already escaped.
>> The sed regex '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I have
>> made regex on unix since 15 years).
Mainly sed, awk and perl sometimes grep and egrep. I know this is the
jungle.
> Which regex? Perl regexes? sed or awk regexes? Extended regexes? GNU
> posix compliant regexes? grep or egrep regexes? They're all different.
>
> In any case, I am sorry, I don't think your regex does what you say.
> When I try it, it doesn't work for me.
>
> [steve at sylar ~]$ echo 'Some \"text"' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
> Some \\"text\"
I give you my word on this. Exact output I redid it:
#MY OS VERSION
karim at Requiem4Dream:~$ uname -a
Linux Requiem4Dream 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 23:42:43
UTC 2011 x86_64 GNU/Linux
#MY SED VERSION
karim at Requiem4Dream:~$ sed --version
GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.
GNU sed home page: <http://www.gnu.org/software/sed/>.
General help using GNU software: <http://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-gnu-utils at gnu.org>.
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
#MY SED OUTPUT COMMAND:
karim at Requiem4Dream:~$ echo 'Some ""' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
Some \"\"
# THIS IS WHAT I WANT 2 CONSECUTIVES IF THE FIRST ONE IS ALREADY ESCAPED
I DON'T WANT TO ESCAPED IT TWICE.
karim at Requiem4Dream:~$ echo 'Some \""' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
Some \"\"
# BY THE WAY THIS ONE WORKS:
karim at Requiem4Dream:~$ echo 'Some "text"' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
Some \"text\"
# BUT SURE NOT THIS ONE NOT COVERED BY MY REGEX (I KNOW IT AND WANT
ORIGINALY TO COVER IT):
karim at Requiem4Dream:~$ echo 'Some \"text"' | sed -e
's/\([^\\]\)\?"/\1\\"/g'
Some \\"text\"
By the way in all sed version I work with the '?' (0 or one match)
should be escaped that's the reason I have '\?' same thing with save
'\(' and '\)' to store value. In perl, grep you don't need to escape.
# SAMPLE FROM http://www.gnu.org/software/sed/manual/sed.html
|\+|
same As |*|, but matches one or more. It is a GNU extension.
|\?|
same As |*|, but only matches zero or one. It is a GNU extension
> I wouldn't expect it to work. See below.
>
> By the way, you don't need to escape the brackets or the question mark:
>
> [steve at sylar ~]$ echo 'Some \"text"' | sed -re 's/([^\\])?"/\1\\"/g'
> Some \\"text\"
>
>
>> For me the equivalent python regex is buggy: r'([^\\])?"', r'\1\\"'
>
> No it is not.
>
Yes I know, see my latest post in detail I already found the solution. I
put it again the solution below:
#Found the solution: '?' needs to be inside parenthesis (saved pattern)
because outside we don't know if the saved match argument
#will exist or not namely '\1'.
>>> re.subn(r'([^\\]?)"', r'\1\\"', expression)
(' \\"\\" ', 2)
> The pattern you are matching does not do what you think it does. "Zero
> or one of not-backslash, followed by a quote" will match a single
> quote *regardless* of what is before it. This is true even in sed, as
> you can see above, your sed regex matches both quotes.
>
> \" will match, because the regular expression will match zero
> characters, followed by a quote. So the regex is correct.
>
> >>> match = r'[^\\]?"' # zero or one not-backslash followed by quote
> >>> re.search(match, r'aaa\"aaa').group()
> '"'
>
> Now watch what happens when you call re.sub:
>
>
> >>> match = r'([^\\])?"' # group 1 equals a single non-backslash
> >>> replace = r'\1\\"' # group 1 followed by \ followed by "
> >>> re.sub(match, replace, 'aaaa') # no matches
> 'aaaa'
> >>> re.sub(match, replace, 'aa"aa') # one match
> 'aa\\"aa'
> >>> re.sub(match, replace, '"aaaa') # one match, but there's no group 1
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/usr/local/lib/python3.1/re.py", line 166, in sub
> return _compile(pattern, flags).sub(repl, string, count)
> File "/usr/local/lib/python3.1/re.py", line 303, in filter
> return sre_parse.expand_template(template, match)
> File "/usr/local/lib/python3.1/sre_parse.py", line 807, in
> expand_template
> raise error("unmatched group")
> sre_constants.error: unmatched group
>
> Because group 1 was never matched, Python's re.sub raised an error. It
> is not a very informative error, but it is valid behaviour.
>
> If I try the same thing in sed, I get something different:
>
> [steve at sylar ~]$ echo '"Some text' | sed -re 's/([^\\])?"/\1\\"/g'
> \"Some text
>
> It looks like this version of sed defines backreferences on the
> right-hand side to be the empty string, in the case that they don't
> match at all. But this is not standard behaviour. The sed FAQs say
> that this behaviour will depend on the version of sed you are using:
>
> "Seds differ in how they treat invalid backreferences where no
> corresponding group occurs."
>
> http://sed.sourceforge.net/sedfaq3.html
>
> So you can't rely on this feature. If it works for you, great, but it
> may not work for other people.
>
>
> When you delete the ? from the Python regex, group 1 is always valid,
> and you don't get an exception. Or if you ensure the input always
> matches group 1, no exception:
>
> >>> match = r'([^\\])?"'
> >>> replace = r'\1\\"'
> >>> re.sub(match, replace, 'a"a"a"a') # group 1 always matches
> 'a\\"a\\"a\\"a'
>
> (It still won't do what you want, but that's a *different* problem.)
>
>
>
> Jamie Zawinski wrote:
>
> Some people, when confronted with a problem, think "I know,
> I'll use regular expressions." Now they have two problems.
>
> How many hours have you spent trying to solve this problem using
> regexes? This is a *tiny* problem that requires an easy solution, not
> wrestling with a programming language that looks like line-noise.
>
> This should do what you ask for:
>
> def escape(text):
> """Escape any double-quote characters if and only if they
> aren't already escaped."""
> output = []
> escaped = False
> for c in text:
> if c == '"' and not escaped:
> output.append('\\')
> elif c == '\\':
> output.append('\\')
> escaped = True
> continue
> output.append(c)
> escaped = False
> return ''.join(output)
>
Thank you for this one! This gives me some inspiration for other more
complicated parsing. :-)
>
> Armed with this helper function, which took me two minutes to write, I
> can do this:
>
> >>> text = 'Some text with backslash-quotes \\" and plain quotes "
> together.'
> >>> print escape(text)
> Some text with backslash-quotes \" and plain quotes \" together.
>
>
> Most problems that people turn to regexes are best solved without
> regexes. Even Larry Wall, inventor of Perl, is dissatisfied with regex
> culture and syntax:
>
> http://dev.perl.org/perl6/doc/design/apo/A05.html
Ok but if I have to suppress all use of my one-liner sed regex most used
utilities this is like refusing to use my car to go to work
and make 20km by feet.
For overuse I can understand that though I already did 30 lines of
pure sed script using all it features
which would have taken much more lines with awk or perl language.
Anyway I am inclined to python now so if a re module exists with my
small regex there is no big deal to become familiar with this module.
Thanks for your efforts you've done.
Regards
Karim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110204/327d6566/attachment-0001.html>
More information about the Tutor
mailing list