Regex driving me crazy...

Patrick Maupin pmaupin at gmail.com
Wed Apr 7 21:03:47 EDT 2010


On Apr 7, 7:49 pm, Patrick Maupin <pmau... at gmail.com> wrote:
> On Apr 7, 4:40 pm, J <dreadpiratej... at gmail.com> wrote:
>
>
>
> > Can someone make me un-crazy?
>
> > I have a bit of code that right now, looks like this:
>
> > status = getoutput('smartctl -l selftest /dev/sda').splitlines()[6]
> >         status = re.sub(' (?= )(?=([^"]*"[^"]*")*[^"]*$)', ":",status)
> >         print status
>
> > Basically, it pulls the first actual line of data from the return you
> > get when you use smartctl to look at a hard disk's selftest log.
>
> > The raw data looks like this:
>
> > # 1  Short offline       Completed without error       00%       679         -
>
> > Unfortunately, all that whitespace is arbitrary single space
> > characters.  And I am interested in the string that appears in the
> > third column, which changes as the test runs and then completes.  So
> > in the example, "Completed without error"
>
> > The regex I have up there doesn't quite work, as it seems to be
> > subbing EVERY space (or at least in instances of more than one space)
> > to a ':' like this:
>
> > # 1: Short offline:::::: Completed without error:::::: 00%:::::: 679:::::::: -
>
> > Ultimately, what I'm trying to do is either replace any space that is> one space wiht a delimiter, then split the result into a list and
>
> > get the third item.
>
> > OR, if there's a smarter, shorter, or better way of doing it, I'd love to know.
>
> > The end result should pull the whole string in the middle of that
> > output line, and then I can use that to compare to a list of possible
> > output strings to determine if the test is still running, has
> > completed successfully, or failed.
>
> > Unfortunately, my google-fu fails right now, and my Regex powers were
> > always rather weak anyway...
>
> > So any ideas on what the best way to proceed with this would be?
>
> You mean like this?
>
> >>> import re
> >>> re.split(' {2,}', '# 1  Short offline       Completed without error       00%')
>
> ['# 1', 'Short offline', 'Completed without error', '00%']
>
>
>
> Regards,
> Pat

BTW, although I find it annoying when people say "don't do that" when
"that" is a perfectly good thing to do, and although I also find it
annoying when people tell you what not to do without telling you what
*to* do, and although I find the regex solution to this problem to be
quite clean, the equivalent non-regex solution is not terrible, so I
will present it as well, for your viewing pleasure:

>>> [x for x in '# 1  Short offline       Completed without error       00%'.split('  ') if x.strip()]
['# 1', 'Short offline', ' Completed without error', ' 00%']

Regards,
Pat



More information about the Python-list mailing list