[Python-3000] string C API

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Sat Sep 16 23:20:44 CEST 2006


"Martin v. Löwis" <martin at v.loewis.de> writes:

> Just try implementing comparison some time. You can end up implementing
> the same algorithm six times at least, once for each pair (1,1), (1,2),
> (1,4), (2,2), (2,4), (4,4). If the algorithm isn't symmetric (i.e.
> you can't reduce (2,1) to (1,2)), you need 9 different versions of the
> algorithm. That sounds more complicated than always decoding.

That's why I'm proposing only two variants, ISO-8859-1 and UCS-4.

String equality: two variants. Two others are trivial if the
representation is always canonical.

String < and <=: 8 variants in total, all generated from a single
20-line piece of C code, parametrized by preprocessor macros.

String !=, >, >=: defined in terms of the above.

String concatenation:
   if both strings are narrow:
      allocate a narrow result
      copy narrow from str1 to result
      copy narrow from str2 to result
   else:
      allocate a wide result
      if str1 is narrow:
         copy narrow->wide from str1 to result
      else:
         copy wide from str1 to result
      if str2 is narrow:
         copy narrow->wide from str2 to result
      else:
         copy wide from str2 to result

__contains__, startswith, index: three variants, one other is trivial.

Seems simple enough for me.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


More information about the Python-3000 mailing list