[Tutor] Logical error?
Bob Williams
linux at barrowhillfarm.org.uk
Sat May 3 23:14:16 CEST 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Steven,
On 03/05/14 02:53, Steven D'Aprano wrote:
> Hi Bob, and welcome!
>
> My responses interleaved with yours, below.
>
> On Fri, May 02, 2014 at 11:19:26PM +0100, Bob Williams wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>
>> Hi,
>>
>> I'm fairly new to coding and python. My system is linux
>> (openSUSE 13.1).
>
> Nice to know. And I see you have even more infomation about your
> system in your email signature, including your email client and
> uptime. But what you don't tell us is what version of Python you're
> using. I'm going to guess that it is something in the 3.x range,
> since you call print as a function rather than a statement, but
> can't be sure.
>
I have both 2 and 3 installed here, but I call this script with
python2. I guess I should concentrate on Python 3 as that's the way
things are going.
> Fortunately in this case I don't think the exact version matters.
>
>
> [...]
>> fullPath = [] # declare (initially empty) lists truncPath = []
>>
>> with codecs.open('/var/log/rsyncd.log', 'r') as rsyncd_log: for
>> line in rsyncd_log.readlines(): fullPath += [line.decode('utf-8',
>> 'ignore').strip()]
>
> A small note about performance here. If your log files are very
> large (say, hundreds of thousands or millions of lines) you will
> find that this part is *horribly horrible slow*. There's two
> problems, a minor and a major one.
>
The log file is typically a few thousand lines, so my code runs fast
enough, but I understand your point.
> First, rsyncd_log.readlines will read the entire file in one go.
> Since you end up essentially copying the whole file, you end up
> with two large lists of lines. There are ways to solve that, and
> process the lines lazily, one line at a time without needing to
> store the whole file. But that's not the big problem.
>
> The big problem is this:
>
> fullPath += [line.decode('utf-8', 'ignore').strip()]
>
> which is an O(N**2) algorithm. Do you know that terminology? Very
> briefly: O(1) means approximately constant time: tripling the size
> of the input makes no difference to the processing time. O(N) means
> linear time: tripling the input triples the processing time.
> O(N**2) means quadratic time: tripling the input increases the
> processing time not by a factor of three, but a factor of three
> squared, or nine.
>
> With small files, and fast computers, you won't notice. But with
> huge files and a slow computer, that could be painful.
>
> Instead, a better approach is:
>
> fullPath.append(line.decode('utf-8', 'ignore').strip())
>
> which avoids the O(N**2) performance trap.
>
Understood.
>
[snip]
>
> Now at last we get to your immediate problem: the above is intended
> to iterate over the lines of fullPath. But it starts at the
> beginning of the file, which may not be today. The first time you
> hit a line which is not today, the program exits, before it gets a
> chance to advance to the more recent days. That probably means that
> it looks at the first line in the log, determines that it is not
> today, and exits.
>
I'd missed the (now obvious) point that my if condition contained two
terms which both have to be true.
> I'm going to suggest a more streamlined algorithm. Most of it is
> actual Python code, assuming you're using Python 3. Only the
> "process this line" part needs to be re-written.
>
> new_activity = False # Nothing has happened today. with
> open('/var/log/rsyncd.log', 'r', encoding='utf-8', errors='ignore')
> as rsyncd_log: for line in rsyncd_log: line = line.strip() if
> line[0:10] == today and 'recv' in line: new_activity = True process
> this line # <== fix this
>
> if not new_activity: print("no new albums have been added today")
>
Thanks. This works nicely.
>
>
> This has the benefit that every line is touched only once, not
> three times as in your version. Performance is linear, not
> quadratic. You should be able to adapt this to your needs.
>
> Good luck, and feel free to ask questions!
>
- --
Bob Williams
System: Linux 3.11.10-7-desktop
Distro: openSUSE 13.1 (x86_64) with KDE Development Platform: 4.13.0
Uptime: 06:00am up 11:26, 4 users, load average: 0.00, 0.02, 0.05
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlNlXCYACgkQ0Sr7eZJrmU71OACfSy1XWSDA08DNAndcA89AZg6Z
+2IAniQJIrSd7wVJWl2MEtEdHlcdkwfj
=YtSq
-----END PGP SIGNATURE-----
More information about the Tutor
mailing list