[Tutor] make a sqlite3 database from an ordinary text file

avi.e.gross at gmail.com avi.e.gross at gmail.com
Mon Jun 20 18:02:42 EDT 2022


Dennis,

While your comments are quite valid and reasonable and make me wonder about
all sorts of techniques hackers can use that often must be defended against,
wouldn't it be nice if people doing homework would actually show us the
problem rather than trying to give a few parts?

If this is meant to be serious code to be heavily used in real world
situations, such as for a job at Google, it does often require some
bulletproofing as you describe. But is it that or just solving a problem I a
sandbox to learn how to use a few features, or is it more akin to homework?

I am not saying the case in point is homework but it has some signs I
interpret that way. It may be the kind of homework where you do independent
study from a book and are trying the questions at the end of the chapter and
really want to learn, not where you want to get someone to just do it for
you in a class and not learn to do it yourself.  It seems to both be open to
some packages and advanced ideas while we get steered away from others. Some
of the replies probably are way off base if we only knew the context.

I had to go out for the afternoon at this point and note that with amusement
that a book I was reading about web development happened to cover various
ways to store and query data from a browser on the local machine and had a
section where a similar javascript interface tp something like SQL LITE was
used that clearly showed how the API expected multiple names to be given as
a single argument with something like a list. So I am not sure why our
persistent questioner said the code he tried did not work.

I look for patterns and anomalies that might help me GUESS things and I saw
some here that may, and may not, mean anything.

For example, why is it required to use a text file with what looks like a
space or tab or whitespace separator? Alan has suggested ways to 
Change the file to something like a CSV that can be done directly in the
non-python arena of the database. Well, yes, that is a perfectly reasonable
approach if you are not told the goal is to do it all in a circumscribed
subset of python!

In real life, other than perhaps efficiency considerations, many people
would start with existing tools such as numpy and pandas and read the darn
data in natively,  specifying a delimiter as in:

pd.read_csv('file.csv', delimiter=' ')

is a fairly easy way to read his "text file" unless a requirement is made to
read it in a line at a time.  Functions like the above typically read quite
a bit of the file to determine what types the columns of data should be, so
they probably are not designed to act much like generators. But I suspect
they could easily be if the goal was not to read in all of the file. You
could read in a thousand lines or so and serve those until you ran low and
got another batch and so on. 

So is it really a requirement to read a line at a time and not to make sure
all columns are consistent? Apparently not.

Nor were we told of any requirements that this be SQL safe. Yes, tools can
ensure that but if this was a beginners course, I might add requirements
like take the two text items you got from the split and checks them several
ways before using them. They should all be digits, and no longer than some
length of say 8 characters. If you allow commas or underscores, strip them.
If they can be decimals, or scientific notation, it gets more complex but
sometimes something as simple as calling a function that converts the
contents to a number and then converting them back, can reject or repair
many such strings and possibly make them safe to use by interpolating the
contents directly.

I can not be sure but I think the main problem in this example was poor
programming based on what may not have been a well-planned design. 

Loops need some things done before and some within and some after and you
need to segregate parts of your plan so each is done as often as needed
(such as once) and in the right place. 

In my view, this assignment of self-motivate project is not that complicated
if approached systematically with a knowledge of how things can be done and
some minor skills at debugging such as strategically placed print statement.
The fact that several of us are going back and forth may partially be due to
some English language issues and might work better if we communicated in
their native language. I suspect none of us participating does! LOL! I do
not speak Punjabi or even Hindi. So patience and some awareness the other is
working hard to try to communicate is necessary. 

Where are we? Several possible methods have been offered using
advice/pseudocode or actual python code and if none of them works, we need
more than just being told the user could not get it to work. WHERE did it
fail? What were the context of variables? What error message? Did they
cut/paste and adjust indentation correctly, or perhaps type something
directly but wrong?

There is only so much time and effort being offered here and it is, in my
case, more about teaching someone so they can then do it again on their own,
than producing unpaid work or something that shows off what we can do but
not what they need.

-----Original Message-----
From: Tutor <tutor-bounces+avi.e.gross=gmail.com at python.org> On Behalf Of
Dennis Lee Bieber
Sent: Monday, June 20, 2022 12:18 PM
To: tutor at python.org
Subject: Re: [Tutor] make a sqlite3 database from an ordinary text file

On Mon, 20 Jun 2022 06:29:58 +0530, Manprit Singh
<manpritsinghece at gmail.com> declaimed the following:

>See i know this can be done very easily using numpy and Pandas. But I 
>am trying to do it with Core Python only.
>
>cur.execute("create table workdata(? INT, ? INT)", ("Time", "Pieces"))
>
>I just want to insert column names (where these column names are  read 
>from text file abcd.txt 1st line)  using a question mark style 
>placeholder.  Pls let me know if anything is possible .
>

	You DON'T!

	As stated in my previous post, that is manipulating the database
schema
-- which should never rely upon values found in data files due to the
possibility of an SQL injection attack.

https://www.explainxkcd.com/wiki/index.php/327:_Exploits_of_a_Mom

	The purpose of using the parameterized placeholders is that the
interface can "sanitize" (escape/quote) characters in the DATA that have
meaning in SQL.

	The SQLite3 adapter appears to inherently block using parameterized
queries to modify the database schema. If you REALLY must generate the
schema using data from a file, you have to do all of the that in Python,
BEFORE passing it to .execute().

>>> import sqlite3 as db
>>> con = db.connect("junk.db")
>>> cur = con.cursor()
>>> cur.execute("create table workdata (? INT, ? INT)", ("Time", 
>>> "Pieces"))
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
sqlite3.OperationalError: near "?": syntax error
>>> cur.execute("create table workdata (%s INT, %s INT)" % ("Time", 
>>> "Pieces"))
<sqlite3.Cursor object at 0x000002B642A8D030>
>>> cur.execute("insert into workdata (Time, Pieces) values (?, ?)", 
>>> (123, "who"))
<sqlite3.Cursor object at 0x000002B642A8D030>
>>>

	Note that the second .execute() is using Python string interpolation
to create the schema definition SQL, not parameterized SQLite3 operations.
The third .execute() shows that parameterized statements work with DATA.
And, per the warning about SQL injection attacks, you should NEVER use
string interpolation (or other formatting operations) in Python to work with
data for the database.


-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
	wlfraed at ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/

_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list