Question about working with html entities in python 2 to use them as filenames
Steven Truppe
steven.truppe at chello.at
Tue Nov 22 15:54:27 EST 2016
I've made a pastebin with a few examples: http://pastebin.com/QQQFhkRg
On 2016-11-22 21:33, Steven Truppe wrote:
> I all,
>
>
> i'm using linux and python 2 and want to parse a file line by line by
> executing a command with the line (with os.system).
>
> My problem now is that i'm opening the file and parse the title but
> i'm not able to get it into a normal filename:
>
>
> import os,sys
>
> import urlib,re,cgi
>
> import HTMLParser, uincodedata
>
> import htmlentiytdefs
>
> imort chardet
>
> for ULR in open('list.txt', "r").readlines():
>
> teste_egex="<title>(.+?)</title>
>
> patter = re.compile(these_regex)
>
> htmlfile=urlib.urlopen(URL)
>
> htmltext=htmlfile.read()
>
> title=re.aindall(pater, htmltext)[0]
>
> title = HTMLParser.HTMLParser.unescape(title)
>
> print "title = ", title
>
> # here i would like to create a directory named after the content of
> the title
>
>
> I allways get this error:
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2
>
>
>
> i've played around with .ecode('latin-1') or ('utf8') but i was not
> yet able to sove this simple issue.
>
>
> Tanks in advance,
>
> Truppe Steven
>
More information about the Python-list
mailing list