Re: getting rid of —

Simon Forman sajmikins at gmail.com
Fri Jul 3 00:40:42 EDT 2009


On Jul 2, 4:31 am, Tep <petshm... at googlemail.com> wrote:
> On 2 Jul., 10:25, Tep <petshm... at googlemail.com> wrote:
>
>
>
> > On 2 Jul., 01:56, MRAB <pyt... at mrabarnett.plus.com> wrote:
>
> > > someone wrote:
> > > > Hello,
>
> > > > how can I replace '—' sign from string? Or do split at that character?
> > > > Getting unicode error if I try to do it:
>
> > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0x97 in position
> > > > 1: ordinal not in range(128)
>
> > > > Thanks, Pet
>
> > > > script is # -*- coding: UTF-8 -*-
>
> > > It sounds like you're mixing bytestrings with Unicode strings. I can't
> > > be any more helpful because you haven't shown the code.
>
> > Oh, I'm sorry. Here it is
>
> > def cleanInput(input)
> >     return input.replace('—', '')
>
> I also need:
>
> #input is html source code, I have problem with only this character
> #input = 'foo — bar'
> #return should be foo
> def splitInput(input)
>     parts = input.split(' — ')
>     return parts[0]
>
> Thanks!

Okay people want to help you but you must make it easy for us.

Post again with a small piece of code that is runnable as-is and that
causes the traceback you're talking about, AND post the complete
traceback too, as-is.

I just tried a bit of your code above in my interpreter here and it
worked fine:

|>>> data = 'foo — bar'
|>>> data.split('—')
|['foo ', ' bar']
|>>> data = u'foo — bar'
|>>> data.split(u'—')
|[u'foo ', u' bar']

Figure out the smallest piece of "html source code" that causes the
problem and include that with your next post.

HTH,
~Simon

You might also read this: http://catb.org/esr/faqs/smart-questions.html



More information about the Python-list mailing list