[Tutor] lc_ctype and re.LOCALE

Mon Feb 1 07:27:35 EST 2016

On 31 January 2016 at 21:41, Albert-Jan Roskam <sjeik_appie at hotmail.com> wrote:
>> >
>> >
>> > You're looping over all pairs of locales:
>> >
>> > Suppose there are N locales and M is sys.maxunicode. The number of
>> > pairs of locales is N*(N-1)/2 which grows like N**2. For each pair you
>> > loop over M characters so the innermost loop body is repeated
>> > something like M*N**2 times.
>> >
>> > Assume that f(locale, c) is the function that gets e.g. m1 or m2 in
>> > your code above. We can swap the loops around so that the outer loop
>> > is over unicode characters. Then the inner loop can be over the
>> > locales but we only loop over all N locales once rather than over all
>> > N**2 pairs of locales. This looks like this:
>> >
>> > for c in unicode_chacters:
>> > matched = f(locales[0], c) # Check the first locale
>> > for locale in locales:
>> > assert all(f(locale, c) == matched for locale in locales)
>> >
>> > This way you call f(locale, c) M*N times which if N is not small
>> > should be a lot faster than M*N**2 times.
>>
>
> I blindly followed a code tuning tip I once read about in Code Complete
> (McConnel 2004; page 623 [1]):
>  " Putting the Busiest Loop on the Inside
>
>  When you have nested loops, think about which loop you want on the outside
> and
>  which you want on the inside. Following is an example of a nested loop that
> can be
>  improved:
>  ...
>  The key to improving the loop is that the outer loop executes much more
> often than the
>  inner loop. Each time the loop executes, it has to initialize the loop
> index, increment it
>  on each pass through the loop, and check it after each pass"
>
>  [1]
> https://khmerbamboo.files.wordpress.com/2014/09/code-complete-2nd-edition-v413hav.pdf
>
>  Your advice makes perfect sense, though. So McConnel's code tuning tip may
> be a rule-of-thumb, with exceptions, right?

I don't really understand the text you've quoted from the book but I
guess it's referring to a method to choose which loop should be the
outer in a nested loop. The rationale depends on many things though so
it would have to be a rule of thumb. However I doubt that McConnel's
advice applies to the situation we have in this thread because I
wasn't suggesting to simply swap the loops around.

Let's consider what McConnel might mean. Suppose I have a nested loop
so it looks like:

    # case 1
    for n in range(N):
        func_n(n)
        for m in range(M):
            func_m(m)
            func_nm(n, m)

The functions func_n etc. might not really be functions. They are
standins for blocks of code that you might have in those positions.
The point is that func_n depends only on n and needs to be re-executed
each time n changes. Likewise func_m depends only on m and func_nm
depends on both.

Sow func_nm is always going to be called N*M times. Swapping the order
of the two loops won't change that. However func_n will be called N
times and func_m will be called N*M times.

Swapping the loops around we would have

    # case 2
    for m in range(M):
        func_m(m)
        for n in range(N):
            func_n(n)
            func_nm(n, m)

In this case we have M calls to func_m and N*M called to func_n. If
C() is the cost of calling a function then the total cost is

    case 1:  M*N*C(func_nm) + N*C(func_n) + N*M*C(func_m)
    case 2:  M*N*C(func_nm) + M*C(func_m) + N*M*C(func_n)

Since the first term is the same in both cases we can ignore it. Now
suppose that func_n and func_m have approximately equal cost. Let's
just say they each have a cost of 1 for simplicity. Then comparing the
second two terms we get

    case1: N + N*M
    case 2: M + N*M

Now which is bigger depends on which is bigger of N and M. So we can
say that if N < M then case 1 is better and if M < N case 2 is better.
In other words: the outer loop should be the one with fewer
iterations. This suggests that

    for n in range(3):
        for m in range(1000):

is better than

    for m in range(1000):
        for n in range(3):

Just looking at calls to the range function the first example calls
range 4 times where as the second one calls range 3001 times so maybe
that is correct. The effect is likely to be small though (especially
if there is any significant work in the inside the inner loop i.e.
func_nm is costly).

Now consider the case where func_n and func_m have wildly different
costs. Let's assume that func_n takes orders of magnitude more time
than func_m. For example func_n might load something from a database
and func_m might add two integers. In this case which is bigger out of
N and M is likely to be unimportant. func_n is costly and we must
minimise the number of times we do it: the outer loop should be the
loop that is most costly to iterate. Example:

    words = 'foo', 'bar', 'baz'

    with open('stuff.txt') as inputfile:
        for line in inputfile:
            for word in words:
                if word in line:
                    print(word, line)

Switching the order in that loop gives us:

    for word in words:
       with open('stuff.txt') as inputfile:
            for line in inputfile:
                if word in line:
                    print(word, line)

So although words is a shorter loop (let's assume file has many more
than 3 lines) we should still use the file as the outer loop. The
second version here reads the whole file 3 times which is pretty much
guaranteed to be more expensive than any gains from not looping over
words too many times.

So by this last logic your two loops are one over the locales and one
over the unicode characters. Looping over the characters is cheap. For
each locale you call setlocale() and I have no idea how expensive that
is: maybe it just stores a string somewhere or maybe it loads some
files from the filesystem and parses them or something. I'm sure that
changing locale is more expensive than iterating through unicode
characters but I couldn't say whether it's a massive difference or
not. In any case if changing locale is more costly AND there are many
more unicode characters than locales then both arguments above suggest
the following:

    for locale in locales:
       for character in unicode_characters:

However the question now is what do you put in the inner-most loop? We
want to *compare* all of the (locale, character) combinations across
locales with each character fixed. We'd have to store the results of
the calculations in some data structure e.g. for each locale we have a
list of results for each unicode character.

    results = {}
    for locale in locales:
       results[locale] = []
       for character in unicode_characters:
            results[locale][character] = f(locale, character)

It's possible to use a smaller data structure but that's not
important. The main thing is that since we want compare across locales
for a given character the problem naturally lends itself to looping
over locales with the character fixed. Swapping the loops around we no
longer need a data structure:

   for character in unicode_characters:
        val = f(locales[0], character)
        assert all(f(locale, character) == val for locale in locales)

However using a data-structure may be faster: it's common for there to
be a trade off between memory usage and computation time.

In any case NONE of the above applies to the code you actually showed
which looks like this:

    for locale1, locale2 in combinations(locales, 2):
        for i in xrange(sys.maxunicode + 1):
            assert(f(locale1, i) == f(locale2, i))

The difference here is that you are doing many more loop iterations
than necessary. In the argument about swapping loops above I assumed
that func_nm is always called N*M times. But that's not true in the
loop above: by looping over combinations you've made the outer loop a
loop with len(locales)**2 iterations instead of len(locales)
iterations. This means that your innermost function body is executed
N**2 * M times which dwarfs any optimisation that can come from
switching the order of the loops.

The reason is that each locale appears in N-1 locale pairs. So your
loop will see each locale, character combination N-1 times instead of
just once. McConnel's arguments are about higher-order (meaning less
important) effects. The number 1 rule of optimising loops is to
minimise the number of loop iterations; in the ideal case you would
somehow reduce the number of iterations to 0.

For this particular problem the minimum possible is all locale,
character combinations so you need to have at least N*M inner loop
iterations. I would always begin by thinking of a method that meets
that minimum.

--
Oscar