[Tutor] Fwd: make a sqlite3 database from an ordinary text file

Roel Schroeven roel at roelschroeven.net
Mon Jun 20 04:37:45 EDT 2022


Op 20/06/2022 om 1:43 schreef avi.e.gross at gmail.com:
> I may not be looking at this right since much happens in functions not seen
> but does cur.execute(many) expect two items of text or two items of type
> integer or will it handle a list or tuple with arbitrary numbers of contents
> and so on, is what makes the design decision.
cur.executemany() takes an iterable which on each iteration yields a 
number of items corresponding to the number of placeholders ("?") in the 
query. You can use any datatype that sqlite3 understands, but care 
should be taken to use the type you need. Other than other databases, 
sqlite3 stores data as the type you give it, rather than the type 
specified in the database definition (in a sense sqlite3 is dynamically 
typed like Python, instead of statically typed like other SQL 
implementations).

So AFAICS the code is wrong on that point: it should convert the data to 
integers before handing them over to the database.

> Deeper within that function, it presumable maps your passed function to code
> that may do,
>
> First, Second = f()
>
> And so on. Or it may just take a single result or other choices. It is a bit
> confusing to pass a filename as someone pointed out.
This cur.executemany() call is the equivalent of:

     for time, nr_pieces in data_generator('abcd.txt'):
         cur.execute("insert into workdata values (?, ?)", (time, 
nr_pieces))

But a bit shorter and with presumably less overhead.
> In what follows, either I am confused or the one asking the question is.
>
> The goal of the iterator seems to be to avoid reading all the data in at
> once, right. Otherwise, why bother when you can do something like reading in
> the entire file and splitting and putting the results in something like a
> list structure, or perhaps in something from numpy.
>
> So why does the code use readline() (no 's') to read in a single line and
> only once?
>
> As I read the somewhat confusing code, the goal is to open the file the
> first time and read a line and throw the results away as it is not clear
> where it is saved. Then the goal is to loop on nothing (or maybe one line)
> and yield the results of splitting it and pause till called again. But since
> only no or maybe one line have been received, the call to iterate should
> just exit as the loop is done.
>
> So I think reading a single line should be moved down within the loop!
That readline() call is to skip the first line, because it contains 
column names instead of actual data. After that the code iterates over 
the file object which yields each remaining line on each iteration, 
which is then split and yielded for use by cur.executemany().
> And you need some kind of test to know when there are no remaining lines in
> the file or the current line is empty and instead of yielding a non-existent
> result, you should return what the iteration protocol needs and close the
> file, albeit the with does that for you silently as part of that protocol.
No need, that's all taken care of automatically. "for line in obj" stops 
if there are no more lines, and as you say the with statement closes the 
file.
> But as others have noted, it is not clear other than as an exercise, if this
> iterator is needed. Your other modules may supply something for you.
Why not use an iterator? Once you get the hang of it, using iterators is 
often just as easy as using lists; with the advantage that a program 
that uses iterators where it matters works for small and large datasets 
without any problem because it doesn't need large amounts of memory to 
store its data.

-- 
"Peace cannot be kept by force. It can only be achieved through understanding."
         -- Albert Einstein



More information about the Tutor mailing list