[Tutor] HTML --> TXT?

Curtis Larsen curtis.larsen@Covance.Com
Wed, 29 Mar 2000 13:13:13 -0600


Justin - Thanks, but I tried looking at those, but they were about as
clear as mud to me.
If they really are for translations such as this, would someone please
post a more straight-forward example?

Deirdre - Thanks for the code -- its very helpful and gives me a
starting point.
It was what I was originally thought would need to be done, but thought
there might be an easier way.
Would the "re" module be a good use for something like this, or would it
be overkill?

Thanks!
Curtis

>>> Justin Sheehy <dworkin@ccs.neu.edu> 03/29/00 10:47AM >>>
"Curtis Larsen" <curtis.larsen@Covance.Com> writes:

> Is there a fairly simple Python-ish way to convert an HTML file to
text?

Check out the htmllib and formatter modules.  The HTMLParser and
DumbWriter classes in those respective modules should do what you need.

-Justin



_______________________________________________
Tutor maillist  -  Tutor@python.org
http://www.python.org/mailman/listinfo/tutor

begin 644 TEXT.htm
M/"%$3T-465!%($A434P@4%5"3$E#("(M+R]7,T,O+T141"!(5$U,(#0N,"!4
M<F%N<VET:6]N86PO+T5.(CX-"CQ(5$U,/CQ(14%$/@T*/$U%5$$@8V]N=&5N
M=#TB=&5X="]H=&UL.R!C:&%R<V5T/6ES;RTX.#4Y+3$B(&AT='`M97%U:78]
M0V]N=&5N="U4>7!E/@T*/$U%5$$@8V]N=&5N=#TB35-(5$U,(#4N,#`N,CDQ
M.2XV,S`W(B!N86UE/4=%3D52051/4CX\+TA%040^#0H\0D]$62!B9T-O;&]R
M/2-F9F9F9F8@#0IS='EL93TB1D].5#H@,3!P="!!<FEA;#L@34%21TE.+4Q%
M1E0Z(#)P>#L@34%21TE.+51/4#H@,G!X(CX-"CQ$258^2G5S=&EN("T@5&AA
M;FMS+"!B=70@22!T<FEE9"!L;V]K:6YG(&%T('1H;W-E+"!B=70@=&AE>2!W
M97)E(&%B;W5T(&%S(&-L96%R(`T*87,@;75D('1O(&UE+CPO1$E6/@T*/$1)
M5CY)9B!T:&5Y(')E86QL>2!A<F4@9F]R('1R86YS;&%T:6]N<R!S=6-H(&%S
M('1H:7,L('=O=6QD('-O;65O;F4@<&QE87-E('!O<W0@#0IA(&UO<F4@<W1R
M86EG:'0M9F]R=V%R9"!E>&%M<&QE/SPO1$E6/@T*/$1)5CXF;F)S<#L\+T1)
M5CX-"CQ$258^#0H\1$E6/D1E:7)D<F4@+2!4:&%N:W,@9F]R('1H92!C;V1E
M("TM(&ET<R!V97)Y(&AE;'!F=6P@86YD(&=I=F5S(&UE(&$@<W1A<G1I;F<@
M#0IP;VEN="X\+T1)5CX-"CQ$258^270@=V%S('=H870@22!W87,@;W)I9VEN
M86QL>2!T:&]U9VAT)FYB<W`[=V]U;&0@;F5E9"!T;R!B92!D;VYE+"!B=70@
M#0IT:&]U9VAT('1H97)E(&UI9VAT(&)E(&%N(&5A<VEE<B!W87DN/"]$258^
M#0H\1$E6/E=O=6QD('1H92`B<F4B(&UO9'5L92!B92!A(&=O;V0@=7-E(&9O
M<B!S;VUE=&AI;F<@;&EK92!T:&ES+"!O<B!W;W5L9"!I="!B92`-"F]V97)K
M:6QL/SQ"4CX\+T1)5CX\+T1)5CX-"CQ$258^5&AA;FMS(3PO1$E6/@T*/$1)
M5CY#=7)T:7,\+T1)5CX-"CQ$258^/$)2/B9G=#LF9W0[)F=T.R!*=7-T:6X@
M4VAE96AY("9L=#MD=V]R:VEN0&-C<RYN974N961U)F=T.R`P,R\R.2\P,"`Q
M,#HT-T%-(`T*)F=T.R9G=#LF9W0[/$)2/B)#=7)T:7,@3&%R<V5N(B`F;'0[
M8W5R=&ES+FQA<G-E;D!#;W9A;F-E+D-O;29G=#L@#0IW<FET97,Z/$)2/CQ"
M4CXF9W0[($ES('1H97)E(&$@9F%I<FQY('-I;7!L92!0>71H;VXM:7-H('=A
M>2!T;R!C;VYV97)T(&%N($A434P@#0IF:6QE('1O('1E>'0_/$)2/CQ"4CY#
M:&5C:R!O=70@=&AE(&AT;6QL:6(@86YD(&9O<FUA='1E<B!M;V1U;&5S+B9N
M8G-P.R!4:&4@#0I(5$U,4&%R<V5R(&%N9#Q"4CY$=6UB5W)I=&5R(&-L87-S
M97,@:6X@=&AO<V4@<F5S<&5C=&EV92!M;V1U;&5S('-H;W5L9"!D;R!W:&%T
M(`T*>6]U(`T*;F5E9"X\0E(^/$)2/BU*=7-T:6X\0E(^/$)2/CQ"4CX\0E(^
M7U]?7U]?7U]?7U]?7U]?7U]?7U]?7U]?7U]?7U]?7U]?7U]?7U]?7U]?7U]?
M7U\\0E(^5'5T;W(@#0IM86EL;&ES="9N8G-P.R`M)FYB<W`[(%1U=&]R0'!Y
M=&AO;BYO<F<\0E(^/$$@#0IH<F5F/2)H='1P.B\O=W=W+G!Y=&AO;BYO<F<O
M;6%I;&UA;B]L:7-T:6YF;R]T=71O<B(^:'1T<#HO+W=W=RYP>71H;VXN;W)G
M+VUA:6QM86XO;&ES=&EN9F\O='5T;W(\+T$^/$)2/CPO1$E6/CPO0D]$63X\
(+TA434P^#0I%
`
end


-----------------------------------------------------
Confidentiality Notice: This e-mail transmission 
may contain confidential or legally privileged 
information that is intended only for the individual 
or entity named in the e-mail address. If you are not 
the intended recipient, you are hereby notified that 
any disclosure, copying, distribution, or reliance 
upon the contents of this e-mail is strictly prohibited. 

If you have received this e-mail transmission in error, 
please reply to the sender, so that Covance can arrange 
for proper delivery, and then please delete the message 
from your inbox. Thank you.