Taking two and more data frames and extracting data on unique keys in python

Danish Hussain danish21h at gmail.com
Sun Jun 5 06:56:00 EDT 2016

down vote
Its a bit long question have patience. firstly I have 2 data frames one in which i have name of a guy and pages liked by him in columns. So no. of columns will be different for different person here is the example. 1st column is the name of user.Then pages liked by him is stored across the row.So no. of columns for 'random guy' will be different from 'mank rion'. 'BlackBick' , '500 Startups' e.t.c are name of the page. let say name of this data frame is User_page

random guy      BlackBuck            GiveMeSport    Analytics Ninja 
mank nion       DJ CHETAS            Celina Jaitly  Gurkeerat Singh
pop rajuel      WOW Editions         500 Startups   Biswapati Sarkar
Roshan ghai     MensXP               No Abuse       the smartian 
Now I have another Data frame in which is kind of same as upper one but in the place of page's name there is a category of page.you might now there are different category of pages on fb. so let say 'BlacBuck''s category is 'Transport/Freight'. There are pages with same name and different category.That is why i cant use name directly as key this is how my data frame looks like.Let say name of this data frame User_category.

random guy      Transport/Freight    Sport      Insurance Company 
mank nion       Arts/Entertainment   Actress    Actor/Director
pop rajuel      Concert Tour         App Page   Actor/Director
Roshan ghai     News/Media Website   Community  Public Figure  
Now I have two more Data frames. one in which I have name of fb pages as 1st column and 162 more columns with some tag for each page there is value 1 for i*j element if ith page comes in to jth tag otherwise left empty so it will look like.let say name of this dataframe is Page_tag

    name of page              tag 1        tag2        tag3
    BlackBuck                     1          1             
    GiveMeSport                   1                      1
    Analytics Ninja               1                      1
    DJ CHETAS                                1           1
the another one have name of categories as 1st column and same 162 as further. like this. let say name of this dataframe is Category_tag.

   category_name              tag 1        tag2        tag3
    Sport                                     1           1
    App Page                      1                       1
    Actor/Director                1                                        
    Public Figure                         1               1
Now what I have to get the tag counts for each user from pages he has liked. for that first I have to first check that the page which he has liked where exist in data frame of Page_tag which is 3rd dataframe in my question if it exist there take the counts of tags that how many times a specific tags appeared for that user.this is first step if not found the name of page as no. of pages in Page_tag dataframe(3rd one) is limited. I will go to category of page (from 2nd dataframe in this question) for the pages which are left out and for that category i will count the tags count for the specific user from dataframe named Category_tags(4th dataframe in this question) and sum the tag count and my output something like this. Output

username             tag1                   tag2           tag3 
random guy              1                      2             2 
mank nion               2                      1             3
pop rajuel              4                      0             2 
Roshan ghai             0                      2             1
a i*j element on this dataframe shows no. times that the jth tag appears for ith user. I have written code for this and more in R i am stuck in this particular step. The code of R wasnt optimal as i used loops many time. I wanted to rhis optimally, hopefully can be done in pandas. Please me know if clarification is needed. Any help will be appreciated. Thank you.

More information about the Python-list mailing list