[Tutor] read in text file containing non-English characters
Martin A. Brown
martin at linux-ip.net
Fri Jan 13 04:18:34 CET 2012
Greetings Francis,
You have entered the Unicode or multiple character set zone. This
is the deep end of the pool, and even experienced practitioners have
difficulty here. Fortunately, Python eases the burden on you, but
this still requires some care.
: Given a simple text file of departments, capitals, longitude and
: latitude separated by commas
Side notes--if you are dealing with geographic hierarchies, you may
wish to consider using some publicly 'standard' available hierarchy.
Since you are mailing from a .state.ny.us address, I might guess
that you are working inside a context in which you may not have
control over the source of the geographic hierachical data, however,
I'll point out the following:
* GeoNames: http://www.geonames.org/
* GNIS: http://en.wikipedia.org/wiki/Geographic_Names_Information_System
Apologies to somebody with actual brains who said this, but: The
great thing about standards, is that you have so many to choose
from.
OK, so that's unrelated to your direct question. You have a few
capitals and latlongs that you want to read from a comma-separated
file.
: Ahuachapán,Ahuachapán,-89.8450,13.9190
: Cabañas,Sensuntepeque,-88.6300,13.8800
: Cuscatlán,Cojutepeque,-88.9333,13.7167
:
: I would like to know to how to read in the file and then access
: arbitary rows in the file, so that I can print a line such as:
:
: The capital of Cabañas is Sensuntepeque
:
: while preserving the non-English characters
:
: now, for example, I get
:
: Cabañas
You don't show even a snippet of code. If you are asking
for help here, it is good form to show us your code. Since
you don't state how you are reading the data and how you are
printing the data, we can't help much. Here are some tips:
* Consider learning how to use the csv module, particularly in
your case, csv.reader (as Ramit Prasad has already suggested).
* Consider checking the bytestream to see if the bytes produced
on output are the same as on input (also, read the text that
Mark Tompkins indicated and learn to distinguish Unicode from
UTF-8).
* Report back to the list the version of Python you are using.
[Different versions of Python have subtly different handling of
non ASCII character set data, but this should probably not be an
issue for the more obvious issue you are showing above.]
We can have no idea what your ultimate goal is with the data, but
can help you much more if you show us the code.
Here's a sample of what I would/could do (Python 2.6.5):
import csv
reader = csv.reader(open('input-data.txt'),delimiter=',')
for row in reader:
print 'The capital of %s is %s' % (row[0], row[1],)
The above is trivial, but if you would like some more substantive
assistance, you should describe your problem in a bit more detail.
-Martin
--
Martin A. Brown
http://linux-ip.net/
More information about the Tutor
mailing list