Question on List processing
Steven D'Aprano
steve at pearwood.info
Tue Apr 26 12:29:33 EDT 2016
On Wed, 27 Apr 2016 01:38 am, subhabangalore at gmail.com wrote:
> I am trying to send you a revised example.
> list1=[u"('koteeswaram/BHPERSN engaged/NA ','class1')",
> u"('koteeswaram/BHPERSN is/NA ','class1')"]
Please don't use generic names that mean nothing like "list1". We can see it
is a list, but what is it for? Use a name that describes what the purpose
of the list is. Even "input" and "output" are better names.
> [('koteeswaram/BHPERSN engaged/NA ','class1'),
> ('koteeswaram/BHPERSN is/NA ','class1')]
What is this? The output? Don't make us guess what things are.
My *guess* is that you have a list of Unicode strings that look like this:
u"('aaa/TAG bbb/TAG ','class1')"
and you want to do six things:
- normalise the string;
- convert the Unicode string to ASCII, ignoring anything that isn't ASCII;
- delete the parentheses in the string;
- delete the leading and trailing single quotes;
- split the string on the comma;
- combine them into a tuple.
So let's make some functions:
# Untested
def remove_parentheses(string):
if string.startswith("(") and string.endswith(")"):
string = string[1:-1]
return string
def remove_single_quotes(string):
if string.startswith("'") and string.endswith("'"):
string = string[1:-1]
return string
def convert(string):
if not isinstance(string, unicode):
raise TypeError("expected unicode, but got %s"
% type(string).__name__)
string = unicodedata.normalize('NFKD', string)
string = string.encode('ascii','ignore')
string = remove_parentheses(string)
first_part, second_part = string.split(",")
first_part = remove_single_quotes(first_part)
second_part = remove_single_quotes(second_part)
return (first_part, second_part)
input = [ ... ] # your input strings
output = []
for string in input:
output.append(convert(string))
--
Steven
More information about the Python-list
mailing list