sqlite3 decode error
David Pratt
fairwinds at eastlink.ca
Tue Nov 8 18:30:26 EST 2005
Hi Jean-Paul for some really good advice. I'll take a look at the
project to see how this is handled. I was not aware of your wrapper
project for SQLite - so this is something new to look at too. I have
worked with SQLObject and also Django's db wrappers. In fact this
question has come out of an SQLObject implementation in RDFlib since it
is here I discovered this issue in the way this backend is behaving
with SQLite3 and I have got it working now. I am only starting to warm
to the idea of unicode throughout. For example. In the backend code
that I am trying to work with you have this.
_tokey is a helper to bring things into the relational database,
_fromkey is a helper when extracting data from the database.
Commenting out the .decode("UTF-8") and value = value.decode("UTF-8")
allowed me to get this working but I need to make this work with
unicode. My unicode experience is limited and I am confused about
writing unicode compatible replacements for things like:
return '<%s>' % ''.join(splituri(term.encode("UTF-8")))
def splituri(uri):
if uri.startswith('<') and uri.endswith('>'):
uri = uri[1:-1]
if uri.startswith('_'):
uid = ''.join(uri.split('_'))
return '_', uid
if '#' in uri:
ns, local = rsplit(uri, '#', 1)
return ns + '#', local
if '/' in uri:
ns, local = rsplit(uri, '/', 1)
return ns + '/', local
return NO_URI, uri
def _fromkey(key):
if key.startswith("<") and key.endswith(">"):
key = key[1:-1].decode("UTF-8") ## Fails here when data
extracted from database
if key.startswith("_"):
key = ''.join(splituri(key))
return BNode(key)
return URIRef(key)
elif key.startswith("_"):
return BNode(key)
else:
m = _literal.match(key)
if m:
d = m.groupdict()
value = d["value"]
value = unquote(value)
value = value.decode("UTF-8") ## Fails here when data
extracted from database
lang = d["lang"] or ''
datatype = d["datatype"]
return Literal(value, lang, datatype)
else:
msg = "Unknown Key Syntax: '%s'" % key
raise Exception(msg)
def _tokey(term):
if isinstance(term, URIRef):
term = term.encode("UTF-8")
if not '#' in term and not '/' in term:
term = '%s%s' % (NO_URI, term)
return '<%s>' % term
elif isinstance(term, BNode):
return '<%s>' % ''.join(splituri(term.encode("UTF-8")))
elif isinstance(term, Literal):
language = term.language
datatype = term.datatype
value = quote(term.encode("UTF-8"))
if language:
language = language.encode("UTF-8")
if datatype:
datatype = datatype.encode("UTF-8")
n3 = '"%s"@%s&<%s>' % (value, language, datatype)
else:
n3 = '"%s"@%s' % (value, language)
else:
if datatype:
datatype = datatype.encode("UTF-8")
n3 = '"%s"&<%s>' % (value, datatype)
else:
n3 = '"%s"' % value
return n3
else:
msg = "Unknown term Type for: %s" % term
raise Exception(msg)
In an unrelated question, it appears SQLite is also extremely flexible
about what types of data it can contain. When writing SQL in Postgres
I use timestamp type and can use this also in SQLite. With my work with
Django, the same information is mapped to datetime type. Would you be
inclined to recommend the use of one type over the other. If so, can
you explain the rationale for this choice. Many thanks.
Regards,
David
On Tuesday, November 8, 2005, at 04:49 PM, Jean-Paul Calderone wrote:
> On Tue, 08 Nov 2005 16:27:25 -0400, David Pratt
> <fairwinds at eastlink.ca> wrote:
>> Recently I have run into an issue with sqlite where I encode strings
>> going into sqlite3 as utf-8. I guess by default sqlite3 is converting
>> this to unicode since when I try to decode I get an attribute error
>> like this:
>>
>> AttributeError: 'unicode' object has no attribute 'decode'
>>
>> The code and data I am preparing is to work on postgres as well a
>> sqlite so there are a couple of things I could do. I could always
>> store any data as unicode to any db, or test the data to determine
>> whether it is a string or unicode type when it comes out of the
>> database so I can deal with this possibility without errors. I will
>> likely take the first option but I looking for a simple test to
>> determine my object type.
>>
>> if I do:
>>
>>>>> type('maybe string or maybe unicode')
>>
>> I get this:
>>
>>>>> <type 'unicode'>
>>
>> I am looking for something that I can use in a comparison.
>>
>> How do I get the type as a string for comparison so I can do something
>> like
>>
>> if type(some_data) == 'unicode':
>> do some stuff
>> else:
>> do something else
>>
>
> You don't actually want the type as a string. What you seem to be
> leaning towards is the builtin function "isinstance":
>
> if isinstance(some_data, unicode):
> # some stuff
> elif isinstance(some_data, str):
> # other stuff
> ...
>
> But I think what you actually want is to be slightly more careful
> about what you place into SQLite3. If you are storing text data,
> insert is as a Python unicode string (with no NUL bytes, unfortunately
> - this is a bug in SQLite3, or maybe the Python bindings, I forget
> which). If you are storing binary data, insert it as a Python buffer
> object (eg, buffer('1234')). When you take text data out of the
> database, you will get unicode objects. When you take bytes out, you
> will get buffer objects (which you can convert to str objects with
> str()).
>
> You may want to look at Axiom
> (<http://divmod.org/trac/wiki/DivmodAxiom>) to see how it handles each
> of these cases. In particular, the "text" and "bytes" types defined
> in the attributes module
> (<http://divmod.org/trac/browser/trunk/Axiom/axiom/attributes.py>).
>
> By only encoding and decoding at the border between your application
> and the outside world, and the border between your application and the
> data, you will eliminate the possibility for a class of bugs where
> encodings are forgotten, or encoded strings are accidentally combined
> with unicode strings.
>
> Hope this helps,
>
> Jean-Paul
> --
> http://mail.python.org/mailman/listinfo/python-list
>
More information about the Python-list
mailing list