Better Regex and exception handling for this small code
Ganesh Pal
ganesh1pal at gmail.com
Tue Jul 18 12:52:28 EDT 2017
Thanks Cameron Simpson for you suggestion and reply quite helpful :)
On Wed, Jul 12, 2017 at 5:06 AM, Cameron Simpson <cs at zip.com.au> wrote:
> On 11Jul2017 22:01, Ganesh Pal <ganesh1pal at gmail.com> wrote:
>
>> I am trying to open a file and check if there is a pattern has changed
>> after the task got completed?
>>
>> file data:
>> ........................................................
>>
>> #tail -f /file.txt
>> ..........................................
>> Note: CRC:algo = 2, split_crc = 1, unused = 0, initiator_crc = b6b20a65,
>> journal_crc = d2097b00
>> Note: Task completed successfully.
>> Note: CRC:algo = 2, split_crc = 1, unused = 0, initiator_crc = d976d35e,
>> journal_crc = a176af10
>>
>>
>> I have the below piece of code but would like to make this better more
>> pythonic , I found regex pattern and exception handling poor here , any
>> quick suggestion in your spare time is welcome.
>>
>>
>> #open the existing file if the flag is set and check if there is a match
>>
>> log_file='/file.txt'
>> flag_is_on=1
>>
>
> Use "True" instead of "1". A flag is a Boolean thing, and should use a
> Boolean value. This lets you literally speak "true" and 'false" rather than
> imoplicitly saying that "0 means false and nonzero means true".
>
> data = None
>>
>
> There is no need to initialise data here because you immediately overwrite
> it below.
>
> with open(log_file, 'r') as f:
>> data = f.readlines()
>>
>> if flag_is_on:
>>
>
> Oh yes. Just name this variable "flag". "_is_on" is kind of implicit.
>
> logdata = '\n'.join(data)
>>
>
> Do other parts of your programme deal with the file data as lines? If not,
> there is little point to reading the file and breaking it up into lines
> above, then joining them together against here. Just go:
>
> with open(log_file) as f:
> log_data = f.read()
>
> reg = "initiator_crc =(?P<ini_crc>[\s\S]*?), journal_crc"
>>
>
> Normally we write regular expressions as "raw" python strings, thus:
>
> reg = r'initiator_crc =(?P<ini_crc>[\s\S]*?), journal_crc'
>
> because backslashes etc are punctuation inside normal strings. Within a
> "raw" string started with r' nothing is special until the closing '
> character. This makes writing regular expressions more reliable.
>
> Also, why the character range "[\s\S]"? That says whitespace or
> nonwhitespace i.e. any character. If you want any character, just say ".".
>
> crc = re.findall(re.compile(reg), logdata)
>>
>
> It is better to compile a regexp just the once, getting a Regexp object,
> and then you just use the compiled object.
>
> if not crc:
>> raise Exception("Pattern not found in logfile")
>>
>
> ValueError would be a more appropriate exception here; plain old
> "Exception" is pretty vague.
>
> checksumbefore = crc[0].strip()
>> checksumafter = crc[1].strip()
>>
>
> Your regexp cannot start or end with whitespace. Those .strip calls are
> not doing anything for you.
>
> This reads like you expect there to be exactly 2 matches in the file. What
> if there are more or fewer?
>
> logging.info("checksumbefore :%s and checksumafter:%s"
>> % (checksumbefore, checksumafter))
>>
>> if checksumbefore == checksumafter:
>> raise Exception("checksum not macthing")
>>
>
> Don't you mean != here?
>
> I wouldn't be raising exceptions in this code. Personally I would make
> this a function that returns True or False. Exceptions are a poor way of
> returning "status" or other values. They're really for "things that should
> not have happened", hence their name.
>
> It looks like you're scanning a log file for multiple lines and wanting to
> know if successive ones change. Why not write a function like this
> (untested):
>
> RE_CRC_LINE = re.compile(r'initiator_crc =(?P<ini_crc>[\s\S]*?),
> journal_crc')
>
> def check_for_crc_changes(logfile):
> old_crc_text = ''
> with open(logfile) as f:
> for line in f:
> m = RE_CRC_LINE.match(line)
> if not m:
> # uninteresting line
> continue
> crc_text = m.group(0)
> if crc_text != old_crc_text:
> # found a change
> return True
> if old_crc_text == '':
> # if this is really an error, you might raise this exception
> # but maybe no such lines is just normal but boring
> raise ValueError("no CRC lines seen in logfile %r" % (logfile,))
> # found no changes
> return False
>
> See that there is very little sanity checking. In an exception supporting
> language like Python you can often write code as if it will always succeed
> by using things which will raise exceptions if things go wrong. Then
> _outside_ the function you can catch any exceptions that occur (such as
> being unable to open the log file).
>
> Cheers,
> Cameron Simpson <cs at zip.com.au>
>
More information about the Python-list
mailing list