Fwd: Unable to convert pandas object to string
Paul Barry
paul.james.barry at gmail.com
Sat Jun 24 07:31:48 EDT 2017
Forgot to include this reply to the list (as others may want to comment).
---------- Forwarded message ----------
From: Paul Barry <paul.james.barry at gmail.com>
Date: 24 June 2017 at 12:21
Subject: Re: Unable to convert pandas object to string
To: Bhaskar Dhariyal <dhariyalbhaskar at gmail.com>
Note that .info(), according to its docs, gives you a "Concise summary of a
DataFrame". Everything is an object in Python, including strings, so the
output from .info() is technically correct (but maybe not very helpful in
your case).
As I've shown, we can work out that the data you want to work with is in
fact a string, so I've added some code to my notebook to show you how to
tokenize the first row of data. This should get you started on doing this
to the rest of your data.
Note, too, that some of the data in these specific columns contains
something other than a string, so you'll need to clean up that first (see
the end of the updated notebook, attached, for how I worked out that this
was indeed the case).
I hope this all helps.
Paul.
On 24 June 2017 at 11:31, Bhaskar Dhariyal <dhariyalbhaskar at gmail.com>
wrote:
> The data type showing there is object. In[4] in the first page. I wanted
> to tokenize the name & desc column and clean it
>
>
> On Sat, Jun 24, 2017 at 3:54 PM, Paul Barry <paul.james.barry at gmail.com>
> wrote:
>
>> Hi Bhaskar.
>>
>> Please see attached PDF of a small Jupyter notebook. As you'll see, the
>> data in the fields you mentioned are *already* strings. What is it you are
>> trying to do here?
>>
>> Paul.
>>
>> On 24 June 2017 at 10:51, Bhaskar Dhariyal <dhariyalbhaskar at gmail.com>
>> wrote:
>>
>>>
>>> train.csv
>>> <https://drive.google.com/file/d/0B1D4AyluMGU0enoxbElGTV94Q0E/view?usp=drive_web>
>>> here it is thanks for quick reply
>>>
>>> On Sat, Jun 24, 2017 at 3:14 PM, Paul Barry <paul.james.barry at gmail.com>
>>> wrote:
>>>
>>>> Any chance you could post one line of data so we can see what we have
>>>> to work with?
>>>>
>>>> Also - have you taken a look at Jake VanderPlas's notebooks? There's
>>>> lot of help with pandas to be found there: https://github.com/jake
>>>> vdp/PythonDataScienceHandbook
>>>>
>>>> Paul.
>>>>
>>>> On 24 June 2017 at 10:32, Bhaskar Dhariyal <dhariyalbhaskar at gmail.com>
>>>> wrote:
>>>>
>>>>> <class 'pandas.core.frame.DataFrame'>
>>>>> Int64Index: 171594 entries, 0 to 63464
>>>>> Data columns (total 7 columns):
>>>>> project_id 171594 non-null object
>>>>> desc 171594 non-null object
>>>>> goal 171594 non-null float64
>>>>> keywords 171594 non-null object
>>>>> diff_creat_laun 171594 non-null int64
>>>>> diff_laun_status 171594 non-null int64
>>>>> diff_status_dead 171594 non-null int64
>>>>> dtypes: float64(1), int64(3), object(3)
>>>>>
>>>>> not able to convert desc and keywords to string for preprocessing.
>>>>> Tried astype(str). Please help
>>>>> --
>>>>> https://mail.python.org/mailman/listinfo/python-list
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
>>>> http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
>>>> Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.
>>>>
>>>
>>>
>>
>>
>> --
>> Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
>> http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
>> Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.
>>
>
>
--
Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.
--
Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.
More information about the Python-list
mailing list