Extracting dataframe column with multiple conditions on row values
Edmondo Giovannozzi
edmondo.giovannozzi at gmail.com
Sat Jan 8 08:00:43 EST 2022
Il giorno sabato 8 gennaio 2022 alle 02:21:40 UTC+1 dn ha scritto:
> Salaam Mahmood,
> On 08/01/2022 12.07, Mahmood Naderan via Python-list wrote:
> > I have a csv file like this
> > V0,V1,V2,V3
> > 4,1,1,1
> > 6,4,5,2
> > 2,3,6,7
> >
> > And I want to search two rows for a match and find the column. For
> > example, I want to search row[0] for 1 and row[1] for 5. The corresponding
> > column is V2 (which is the third column). Then I want to return the value
> > at row[2] and the found column. The result should be 6 then.
> Not quite: isn't the "found column" also required?
> > I can manually extract the specified rows (with index 0 and 1 which are
> > fixed) and manually iterate over them like arrays to find a match. Then I
> Perhaps this idea has been influenced by a similar solution in another
> programming language. May I suggest that the better-answer you seek lies
> in using Python idioms (as well as Python's tools)...
> > key1 = 1
> > key2 = 5
> Fine, so far - excepting that this 'problem' is likely to be a small
> part of some larger system. Accordingly, consider writing it as a
> function. In which case, these two "keys" will become
> function-parameters (and the two 'results' become return-values).
> > row1 = df.iloc[0] # row=[4,1,1,1]
> > row2 = df.iloc[1] # row=[6,4,5,2]
> This is likely not native-Python. Let's create lists for 'everything',
> just-because:
>
> >>> headings = [ "V0","V1","V2","V3" ]
> >>> row1 = [4,1,1,1]
> >>> row2 = [6,4,5,2]
> >>> results = [ 2,3,6,7 ]
>
>
> Note how I'm using the Python REPL (in a "terminal", type "python" (as
> appropriate to your OpSys) at the command-line). IMHO the REPL is a
> grossly under-rated tool, and is a very good means towards
> trial-and-error, and learning by example. Highly recommended!
>
>
> > for i in range(len(row1)):
>
> This construction is very much a "code smell" for thinking that it is
> not "pythonic". (and perhaps the motivation for this post)
>
> In Python (compared with many other languages) the "for" loop should
> actually be pronounced "for-each". In other words when we pair the
> code-construct with a list (for example):
>
> for each item in the list the computer should perform some suite of
> commands.
>
> (the "suite" is everything 'inside' the for-each-loop - NB my
> 'Python-betters' will quickly point-out that this feature is not limited
> to Python-lists, but will work with any :iterable" - ref:
> https://docs.python.org/3/tutorial/controlflow.html#for-statements)
>
>
> Thus:
>
> > for item in headings: print( item )
> ...
> V0
> V1
> V2
> V3
>
>
> The problem is that when working with matrices/matrixes, a math
> background equips one with the idea of indices/indexes, eg the
> ubiquitous subscript-i. Accordingly, when reading 'math' where a formula
> uses the upper-case Greek "sigma" character, remember that it means "for
> all" or "for each"!
>
> So, if Python doesn't use indexing or "pointers", how do we deal with
> the problem?
>
> Unfortunately, at first glance, the pythonic approach may seem
> more-complicated or even somewhat convoluted, but once the concepts
> (and/or the Python idioms) are learned, it is quite manageable (and
> applicable to many more applications than matrices/matrixes!)...
> > if row1[i] == key1:
> > for j in range(len(row2)):
> > if row2[j] == key2:
> > res = df.iloc[:,j]
> > print(res) # 6
> >
> > Is there any way to use built-in function for a more efficient code?
> This is where your idea bears fruit!
>
> There is a Python "built-in function": zip(), which will 'join' lists.
> NB do not become confused between zip() and zip archive/compressed files!
>
> Most of the time reference book and web-page examples show zip() being
> used to zip-together two lists into a single data-construct (which is an
> iterable(!)). However, zip() will actually zip-together multiple (more
> than two) "iterables". As the manual says:
>
> «zip() returns an iterator of tuples, where the i-th tuple contains the
> i-th element from each of the argument iterables.»
>
> Ah, so that's where the math-idea of subscript-i went! It has become
> 'hidden' in Python's workings - or putting that another way: Python
> looks after the subscripting for us (and given that 'out by one' errors
> in pointers is a major source of coding-error in other languages,
> thank-you very much Python!)
>
> First re-state the source-data as Python lists, (per above) - except
> that I recommend the names be better-chosen to be more meaningful (to
> your application)!
>
>
> Now, (in the REPL) try using zip():
>
> >>> zip( headings, row1, row2, results )
> <zip object at 0x7f655cca6bc0>
>
> Does that seem a very good illustration? Not really, but re-read the
> quotation from the manual (above) where it says that zip returns an
> iterator. If we want to see the values an iterator will produce, then
> turn it into an iterable data-structure, eg:
>
> >>> list( zip( headings, row1, row2, results ) )
> [('V0', 4, 6, 2), ('V1', 1, 4, 3), ('V2', 1, 5, 6), ('V3', 1, 2, 7)]
>
> or, to see things more clearly, let me re-type it as:
>
> [
> ('V0', 4, 6, 2),
> ('V1', 1, 4, 3),
> ('V2', 1, 5, 6),
> ('V3', 1, 2, 7)
> ]
>
>
> What we now see is actually a "transpose" of the original 'matrix'
> presented in the post/question!
>
> (NB Python will perform this layout for us - read about the pprint library)
>
>
> Another method which can also be employed (and which will illustrate the
> loop required to code the eventual-solution(!)) is that Python's next()
> will extract the first row of the transpose:
>
> >>> row = next( zip( headings, row1, row2, results ) )
> >>> row
> ('V0', 4, 6, 2)
>
>
> This is all-well-and-good, but that result is a tuple of four items
> (corresponding to one column in the way the source-data was explained).
>
> If we need to consider the four individual data-items, that can be
> improved using a Python feature called "tuple unpacking". Instead of the
> above delivering a tuple which is then assigned to "row", the tuple can
> be assigned to four "identifiers", eg
>
> >>> heading, row1_item, row2_item, result= next( zip( headings, row1,
> row2, results ) )
>
> (apologies about email word-wrapping - this is a single line of Python-code)
>
>
> Which, to prove the case, could be printed:
>
> >>> heading, row1_item, row2_item, result
> ('V0', 4, 6, 2)
>
>
> (ref:
> https://docs.python.org/3/tutorial/datastructures.html?highlight=tuple%20unpacking#tuples-and-sequences)
>
>
> Thus, if we repeatedly ask for the next() row from the zip-ped
> transpose, eventually it will respond with the row starting 'V2' - which
> is the desired-result, ie the row containing the 1, the 5, and the 6 -
> and if you follow-through using the REPL, will be clearly visible.
>
>
> Finally, 'all' that is required, is a for-each-loop which will iterate
> across/down the zip object, one tuple (row of the transpose) at a time,
> AND perform the "tuple-unpacking" all in one command, with an
> if-statement to detect the correct row/column:
>
> >>> for *tuple-unpacking* in *zip() etc*:
> ... if row1_item == *what?* and row2_item == *what?*
> ... print( *which* and *which identifier* )
> ...
> V2 6
>
> Yes, three lines. It's as easy as that!
> (when you know how)
>
> Worse: when you become more expert, you'll be able to compress all of
> that down into a single-line solution - but it won't be as "readable" as
> is this!
>
>
> NB this question has a 'question-smell' of 'homework', so I'll not
> complete the code for you - this is something *you* asked to learn and
> the best way to learn is by 'doing' (not by 'reading').
>
> However, please respond with your solution, or any further question
> (with the next version of the code so-far, per this first-post - which
> we appreciate!)
>
> Regardless, you asked 'the right question' (curiosity is the key to
> learning) and in the right way/manner. Well done!
>
>
> NBB the above code-outline does not consider the situation where the
> search fails/the keys are not found!
>
>
> For further information, please review:
> https://docs.python.org/3/library/functions.html?highlight=zip#zip
>
> Also, further to the above discussion of combining lists and loops:
> https://docs.python.org/3/tutorial/datastructures.html?highlight=zip#looping-techniques
>
> and with a similar application (to this post):
> https://docs.python.org/3/faq/programming.html?highlight=zip#how-can-i-sort-one-list-by-values-from-another-list
>
> --
> Regards,
You may also transpose your dataset. Then the index will become your column name and the column name become your index:
To read your dataset:
import pandas as pd
import io
DN = """
V0,V1,V2,V3
4,1,1,1
6,4,5,2
2,3,6,7
"""
df = pd.read_csv(io.StringIO(DN))
Transpose it:
dft = df.T
Find all the index with your condition:
idt = (dft[0] == 1) & (dft[1] == 5)
Print the columns that satisfy your condition:
print(dft[idt])
As you see, without explicit loop.
More information about the Python-list
mailing list