[I18n-sig] Chinese GB HTML to XML in Chinese

Matt Gushee Matt Gushee <matt.gushee@fourthought.com>
Thu, 2 May 2002 09:56:05 -0600


On Tue, Apr 30, 2002 at 11:22:51PM -0700, RedPineseed wrote:

> I have a bunch of HTML in Chinese GB2312 encoding and
> I want extract the useful Chinese info and out into
> XML format with Chinese <em>Tags</em>. I could filter
> the HTML with raw sting regexp and output XML in
> Chinese GB.

I'm sure people will be happy to help, but you need to give some
more details so we can understand what kind of problem you're having.
What version of Windows are you running? What version of Python?
What Python library or other tool are you using to convert to GB
encoding (Python does *not* support GB by default)? Finally, some
examples of your actual code would be very helpful.

 When i try to open that XML in XMLSpy, all
> is corrupted. please help. Thanks.

Do you know for certain whether your version of XMLSpy supports 
GB encoding?

-- 
Matt Gushee                               Consultant
matt.gushee@fourthought.com               +1 303 583 9900 x108
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Boulder, CO 80301-2537, USA
XML strategy, XML tools (http://4Suite.org), knowledge management