[Python-bugs-list] [ python-Bugs-603930 ] string.punctuation

noreply@sourceforge.net noreply@sourceforge.net
Wed, 04 Sep 2002 01:01:23 -0700


Bugs item #603930, was opened at 2002-09-03 07:03
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=603930&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Ignacio Dosil Lago (do_sil)
Assigned to: Nobody/Anonymous (nobody)
Summary: string.punctuation

Initial Comment:
string.punctuation doesn't include the characters ¡ and
¿ used in spanish and galician.
When I start a python interactive session, import the
string module and "print string.punctuation" these two
characters never appear (indipendently of the
interpreter version).
Zope uses this module to implement structured text.
When somebody tries to write structured text, in
spanish, galician or ..., that includes any of these
two characters it doesn't work. For item, **¡this
should be bold text if it where structured text in
galician!**
Is this a lack in the python library or does it depend
on a third party?

----------------------------------------------------------------------

Comment By: Andreas Jung (ajung)
Date: 2002-09-04 04:01

Message:
Logged In: YES 
user_id=11084

I agree with Martin that it would be better
to use the informations from unicode. On the other side
there is restructuredtext approaching as standard for
Python and there is also a reStructuredText product
for Zope available that resolves most of the
related problem by having a better markup and a
clear markup. It is a pain to give these
kind of punctuation problems for every country and 
language. I suggest to hack STletters.py
for your needs. The current StructuredText
implementation in Zope has these problems
by design and it is hard to fix them.

-aj

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-09-03 17:19

Message:
Logged In: YES 
user_id=21627

However, it might be better for Zope to use a truly
locale-independent determination of punctuation, namely the
Unicode database. Invoking g objects, which would be
locale-aware.

However, it might be better for Zope to use a truly
locale-independent determination of punctuation, namely the
Unicode database. Invoking unicodedata.category(u"\xa1");
this gives "Po". The Unicode database recognizes the
following punctuation categories:

Pc  Punctuation, Connector    
Pd  Punctuation, Dash    
Ps  Punctuation, Open    
Pe  Punctuation, Close    
Pi  Punctuation, Initial quote (may behave like Ps or Pe
depending on usage)    
Pf  Punctuation, Final quote (may behave like Ps or Pe
depending on usage)    
Po  Punctuation, Other

So I would recommend that Zope uses the Unicode database.
They should either check for categories starting with "P".

It might be worth noting that string.punctuation contains
characters that are not classified as punctuators in Unicode:

$ Sc
+ Sm
< Sm
= Sm
> Sm
^ Sk
` Sk
| Sm
~ Sm

(Sm:  Symbol, Math; Sc  Symbol, Currency; Sk  Symbol, Modifier)

So it might be that Zope is also interested in symbols
(categories starting with "S").


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=603930&group_id=5470