<div><blockquote class="gmail_quote" style="margin-top: 0; margin-right: 0; margin-bottom: 0; margin-left: 0; margin-left: 0.80ex; border-left-color: #cccccc; border-left-width: 1px; border-left-style: solid; padding-left: 1ex">
<br></blockquote></div>Thanks for further comments David,<br><br>You are right, the choice of an appropriate data structure isn't easy; I described the requirements in an earlier post:<br><a href="http://mail.python.org/pipermail/python-list/2007-December/469506.html">http://mail.python.org/pipermail/python-list/2007-December/469506.html</a><br>
<br>Here is a slightly modified sample of the text source I currently use in my present (ad hoc and quite dirty) code:<br><br><KC 12><KNJ 13><VCC 0><VN rn_1_vers_1><FN 21rb><b>Wi den L...n<br>
<VN rn_2_vers_1>m...k gein den P<u>ra</u>gern.</b> <br><br><VNJ 1><VCC 1><VN 1>Wen dy h<u>er</u> czu T... vf dy vient<br><VNJ 2><VCC 2><VN 2>w<u>er</u>n gesampt czu h<u>er</u>tin strit,<br>
<VNJ 3><VN 3>dez m...s vs gein obint<br><VNJ 4><VCC 3><VN 4>W... stunt s...t.<br><VNJ 5><VCC 3><VN 5>Doch gar Sthir czu em kam,<br><VNJ 6><VCC 5><VN 6>h...g W... sich ken em nom.<br>
<VNJ 7><VCC 6><VN 7>Ez irgink ubil den L...:<br><VNJ 8><VCC 7><VN 8>S... slug W...u<u>m</u> daz houbt ab vil gern.<br><VNJ 9><VN 9>Den P<u>ra</u>g<u>er</u>n mochtin dy L...n<u>er</u> nit intken,<br>
<VNJ 10><VCC 8><VN 10>si slugin si, das si plutis hin runnen.<br><br><VNJ 00><VCC 9><VN _rn_1_vers_11><b>Wi de<u>m</u> Stracka geschach,<br><VN rn_2_vers_11>du er do heim sin husf<u>ro</u>win sach.</b> <br>
<br><VNJ 11><VCC 9><VN 11>Aus allin L...t<u>er</u>n ein<u>er</u> hin quom,<br><VNJ 12><VCC 10><VN 12><FN 21va>der waz S...<u>ra</u>ka gna<u>n</u>t mit de<u>m</u> nom.<br>
<VNJ 13><VCC 11><VN 13>Em ein vetil den rat gab, ...<br><br>========================================<br>The corresponding data stored as a dict look like the following dict (here sorted, at least on the highest level):<br>
(<u>, <i>, <b> ... are ignored here, these are used for the text formatting for the gui and aren't retrieved any more)<br><br>{0: {u'KC': u'12', u'VCC': u'0', u'KNJ': u'13', u'FN': u'21rb', u'VN': u'rn_1_vers_1'},<br>
13: {u'KC': u'12', u'VCC': u'0', u'KNJ': u'13', u'VN': u'rn_2_vers_1', u'FN': u'21rb'},<br>39: {u'KC': u'12', u'VCC': u'1', u'KNJ': u'13', u'VN': u'1', u'VNJ': u'1', u'FN': u'21rb'},<br>
71: {u'KC': u'12', u'VCC': u'2', u'KNJ': u'13', u'FN': u'21rb', u'VNJ': u'2', u'VN': u'2'},<br>102: {u'KC': u'12', u'VCC': u'2', u'KNJ': u'13', u'FN': u'21rb', u'VNJ': u'3', u'VN': u'3'},<br>
126: {u'KC': u'12', u'VCC': u'3', u'KNJ': u'13', u'VN': u'4', u'VNJ': u'4', u'FN': u'21rb'},<br>144: {u'KC': u'12', u'VCC': u'3', u'KNJ': u'13', u'FN': u'21rb', u'VNJ': u'5', u'VN': u'5'},<br>
171: {u'KC': u'12', u'VCC': u'5', u'KNJ': u'13', u'VN': u'6', u'VNJ': u'6', u'FN': u'21rb'},<br>199: {u'KC': u'12', u'VCC': u'6', u'KNJ': u'13', u'FN': u'21rb', u'VNJ': u'7', u'VN': u'7'},<br>
224: {u'KC': u'12', u'VCC': u'7', u'KNJ': u'13', u'VN': u'8', u'VNJ': u'8', u'FN': u'21rb'},<br>264: {u'KC': u'12', u'VCC': u'7', u'KNJ': u'13', u'VN': u'9', u'VNJ': u'9', u'FN': u'21rb'},<br>
307: {u'KC': u'12', u'VCC': u'8', u'KNJ': u'13', u'FN': u'21rb', u'VNJ': u'10', u'VN': u'10'},<br>348: {u'KC': u'12', u'VCC': u'9', u'KNJ': u'13', u'VN': u'_rn_1_vers_11', u'VNJ': u'00', u'FN': u'21rb'},<br>
373: {u'KC': u'12', u'VCC': u'9', u'KNJ': u'13', u'FN': u'21rb', u'VNJ': u'00', u'VN': u'rn_2_vers_11'},<br>409: {u'KC': u'12', u'VCC': u'9', u'KNJ': u'13', u'VN': u'11', u'VNJ': u'11', u'FN': u'21rb'},<br>
444: {u'KC': u'12', u'VCC': u'10', u'KNJ': u'13', u'VN': u'12', u'VNJ': u'12', u'FN': u'21va'},<br>480: {u'KC': u'12', u'VCC': u'11', u'KNJ': u'13', u'FN': u'21va', u'VNJ': u'13', u'VN': u'13'}}<br>
<br>The keys are indices of the places in text, where the value of some tag/property ... changes; also the not changing values are copied in the inner dicts, hence after determining the index of the first such "boundary" preceeding the given text position (using bisect), the complete parameters can be retrieved with a single dict lookup.<br>
The other way seems much more complicated with this dict structure. I.e. it should be possible to find the index/or all indices/or the lowest index given some combinations of tag values - e.g. <br>with the input: u'FN': u'21va' >> 444 should be found (as the lowest index with this value)<br>
for u'KC': u'12' AND u'VN': u'3' >> 102 <br>(there are of course several other combinations to search for)<br> I couldn't find an effective way to lookup the specific values of the inner dict (other than nested loops), probably using a set of tuples instead the inner dict would allow the checks for intersection with the input data, but this also seems quite expensive, as it requires the loop with checks over the entire data ... the same would probably be true for multiple reversed dicts keyed on the tags and containing the indices and the corresponding values (but I actually haven't tested the latter two approaches). <br>
<br>The approach with sqlite I'm testing now simply contains columns for all relevant "tags" plus a column for text index; this way I find it quite easy to query the needed information in both directions (index >> tags; tag(s) >> index/indices) with the db selects - even as a beginner with SQL.<br>
<br>The exact set of tags will be complete with the availability of all texts, then the gui will be adapted to make the user selections available (comboboxes). The searches are restricted to the predefined "properties", they are not manually queried. <br>
With the dynamic building of the db I just tried, not to hardcode all the tags, but it would be later possible to create the database tables and columns also in a controlled "static" way.<br> <br>Comparing the two versions I have, the dictionary approach really seems quite a bit more complicated (the data as well as the functions for the retrieval), than the sqlite3 version. <br>
<br>For this app I really actually don't need a persistent db, nor should it be shared etc. The only used thing are the efficient lookup possibilities, which seemed to do what I needed (simply using intersect selects). Are there any ways how to implements this efficiently using the python builtin types?<br>
<br>Greetings,<br> Vlasta<br><br>