Opening multiple Files in Different Encoding
oscar.j.benjamin at gmail.com
Wed Jul 11 22:55:31 CEST 2012
On 11 July 2012 19:15, <subhabangalore at gmail.com> wrote:
> On Tuesday, July 10, 2012 11:16:08 PM UTC+5:30, Subhabrata wrote:
> > Dear Group,
> > I kept a good number of files in a folder. Now I want to read all of
> > them. They are in different formats and different encoding. Using
> > listdir/glob.glob I am able to find the list but how to open/read or
> > process them for different encodings?
> > If any one can help me out.I am using Python3.2 on Windows.
> > Regards,
> > Subhabrata Banerjee.
> Dear Group,
> No generally I know the glob.glob or the encodings as I work lot on
> non-ASCII stuff, but I recently found an interesting issue, suppose there
> are .doc,.docx,.txt,.xls,.pdf files with different encodings.
Some of the formats you have listed are not text-based. What do you mean by
the encoding of e.g. a .doc or .xls file?
My understanding is that these are binary files. You won't be able to read
them without the help of a special module (I don't know of one that can).
> 1) First I have to determine on the fly the file type.
> 2) I can not assign encoding="..." whatever be the encoding I have to read
Perhaps you just want to open the file as binary? The following will read
the contents of any file binary or text regardless of encoding or anything
f = open('spreadsheet.xls', 'rb')
data = f.read() # returns binary data rather than text
> Any idea. Thinking.
> Thanks in Advance,
> Subhabrata Banerjee.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-list