<div class="gmail_quote">On 11 July 2012 19:15, <span dir="ltr"><<a href="mailto:subhabangalore@gmail.com" target="_blank">subhabangalore@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Tuesday, July 10, 2012 11:16:08 PM UTC+5:30, Subhabrata wrote:<br>
> Dear Group,<br>
><br>
> I kept a good number of files in a folder. Now I want to read all of<br>
> them. They are in different formats and different encoding. Using<br>
> listdir/glob.glob I am able to find the list but how to open/read or<br>
> process them for different encodings?<br>
><br>
> If any one can help me out.I am using Python3.2 on Windows.<br>
><br>
> Regards,<br>
> Subhabrata Banerjee.<br>
</div>Dear Group,<br>
<br>
No generally I know the glob.glob or the encodings as I work lot on non-ASCII stuff, but I recently found an interesting issue, suppose there are .doc,.docx,.txt,.xls,.pdf files with different encodings.</blockquote><div>
<br></div><div>Some of the formats you have listed are not text-based. What do you mean by the encoding of e.g. a .doc or .xls file?</div><div><br></div><div>My understanding is that these are binary files. You won't be able to read them without the help of a special module (I don't know of one that can).</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
1) First I have to determine on the fly the file type.<br>
2) I can not assign encoding="..." whatever be the encoding I have to read it.<br></blockquote><div><br></div><div>Perhaps you just want to open the file as binary? The following will read the contents of any file binary or text regardless of encoding or anything else:</div>
<div><br></div><div>f = open('spreadsheet.xls', 'rb')</div><div>data = f.read() # returns binary data rather than text</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Any idea. Thinking.<br>
<br>
Thanks in Advance,<br>
<div class="HOEnZb"><div class="h5">Regards,<br>
Subhabrata Banerjee.<br>
<br>
--<br>
<a href="http://mail.python.org/mailman/listinfo/python-list" target="_blank">http://mail.python.org/mailman/listinfo/python-list</a><br>
</div></div></blockquote></div><br>