data structure suggestion (native python datatypes or sqlite; compound select)
Vlastimil Brom
vlastimil.brom at gmail.com
Fri Sep 17 04:44:43 EDT 2010
2010/9/17 MRAB <python at mrabarnett.plus.com>:
> On 17/09/2010 00:56, Vlastimil Brom wrote:
>>
>> 2010/9/17 MRAB<python at mrabarnett.plus.com>:
>>>
>>> On 16/09/2010 23:11, Vlastimil Brom wrote:
>>>>
>>>> ...
>>>> I put together some code, which works as expected, but I suspect
>>>> somehow, that there must be better ways of doing it.
>>>>
>>>> Two things I am not quite clear about are using the placeholders for
>>>> the data identifiers and "chaining" the SELECT parameters.
>>>>
>>>> I Couldn't find a way to use "?" placeholder for table or column
>>>> names, hence I ended up using string interpolation for them and
>>>> placeholders for the data values, like.
>>>> curs.execute('SELECT * FROM "%s" WHERE "%s"==?' % (text_name,
>>>> index_col), (text_index,))
>>>> is there a better way or is it not supposed to supply these
>>>> identifiers programatically?
>>>>
>>> You would normally expect the structure of the database to be fixed and
>>> only the contents to vary.
>>>
>>>> For getting the matching text indices given the tags, tag_values
>>>> combination I ended up with a clumsy query:
>>>>
>>>> combined_query_list = ['SELECT "%s" FROM "%s" WHERE "%s"==?' %
>>>> (index_col, text_name, tag) for tag in tags]
>>>> sql_query = " INTERSECT ".join(combined_query_list)
>>>> curs.execute(sql_query, tag_values)
>>>>
>>>> which produces e.g.:
>>>> SELECT "ind" FROM "n" WHERE "KC"==? INTERSECT SELECT "ind" FROM "n"
>>>> WHERE "VN"==?
>>>>
>>>> or alternatively:
>>>>
>>>> select_begin = 'SELECT "%s" FROM "%s" WHERE ' % (index_col, text_name)
>>>> where_subquery = " AND ".join('"%s"==?' % (tag,) for tag in tags)
>>>> sql_query = select_begin + where_subquery
>>>>
>>>> with the resulting query string like:
>>>> SELECT "ind" FROM "n" WHERE "KC"==? AND "VN"==? ('12', '1')
>>>>
>>>> (BTW, are these queries equivalent, as the outputs suggest, or are
>>>> there some distinctions to be aware of?)
>>>>
>>>> Anyway, I can't really believe, this would be the expected way ...
>>>>
>>> If you're selecting rows of a table then using 'AND' would seem the
>>> obvious way.
>>>
>>>
>> Thanks for the answer,
>> Well, that may be a part of the problem, the database structure is
>> going to be fixed once I'll have the text sources complete, but I was
>> trying to keep it more general, also allowing the identifiers to be
>> passed programmatically (based on the tagged text in question).
>>
>> yes, I am just selecting rows - based on the combination of the column
>> values (which, I guess, might be an usual database approach(?).
>> However, I was unsure, whether it is usual to construct the query
>> string this way - partly using string interpolation or sequence
>> joining.
>> Or should there normally be no need for construct like the above and I
>> am doing something wrong in a more general sense?
>>
> In general you'd have a fixed database structure and fixed queries.
> You'd design it so that you wouldn't have different columns for
> different tagged texts, forcing you to change the database structure
> when texts were added or removed, but no-one will come around to your
> house to stop you creating queries dynamically. :-)
> --
> http://mail.python.org/mailman/listinfo/python-list
>
Ok, thanks for confirming my suspicion :-),
Now I have to decide whether I shall use my custom data structure,
where I am on my own, or whether using an sql database in such a
non-standard way has some advantages...
The main problem indeed seems to be the fact, that I consider the
tagged texts to be the primary storage format, whereas the database is
only means for accessing it more conveniently.
Thanks again,
vbr
More information about the Python-list
mailing list