[BangPypers] How should I do it?
Anand Balachandran Pillai
abpillai at gmail.com
Fri Jan 15 10:10:13 CET 2010
On Fri, Jan 15, 2010 at 2:17 PM, Noufal Ibrahim <noufal at gmail.com> wrote:
> On Fri, Jan 15, 2010 at 1:04 PM, Dhananjay Nene <dhananjay.nene at gmail.com
> >wrote:
>
> > This seems to be an output of print_r of PHP. If you have a flexibility,
> > try
> > to have the PHP code output the data into a language neutral format (eg
> > json, yaml, xml etc.) and then parse it in python using the appropriate
> > parser. If not you may have to write a custom parser. I did google to
> find
> > if one existed, but couldn't easily locate one.
> >
>
>
> There is
> http://www.php.net/manual/en/book.json.php for PHP and Python2.6 onwards
> has json part of the stdlib.
>
> If you don't have access to the webserver, you might be able to use the php
> interpreter on your own machine to parse this into something more language
> neutral
>
If you take a look at your data, it is surprisingly close to how
a nested Python dictionary will look like, except that instead of
':' to separate key from value, it uses '=>', which is what Perl
and PHP uses.
So, the following solution takes advantage of this fact and
converts your data to a Python dictionary.
Here is the complete solution.
def scrub(data):
# First replace [code][/code] parts
data = data.replace('[code]','').replace('[/code]','')
# Replace '=>' with ':'
data = data.replace('=>',':')
# Now, count and trans are not strings in
# data, so Python will complain, hence we
# define these as strings with same name!
count, trans = 'count','trans'
# Now prefix data with { and post-fix with }
data = '{' + data + '}'
print data
# Eval it to a dictionary
mydict = eval(data)
print mydict
if __name__ == "__main__":
scrub(open('data.txt').read())
And it neatly prints as,
{'a': {'count': 1164, 'trans': {'kaoi': 0.053943079999999997, 'kaa':
0.03726579, 'haai': 0.067746970000000004, 'kaisai': 0.088346750000000002,
'kae': 0.034464500000000002, 'kai': 0.049819820000000001, 'eka':
0.14900490999999999, '\\(none\\)': 0.044000850000000001}}, 'confident':
{'count': 4, 'trans': {'mailatae': 0.028564269999999999, 'ashahvasahta':
0.74918567999999996, 'anaa': 0.015785520000000001, 'jaitanae': 0.01227762,
'pahraaram\\.nbha': 0.069907289999999997, 'utanai': 0.01929341,
'atahmavaishahvaasa': 0.090954649999999998, 'uthaanae': 0.01403157}},
'consumers': {'count': 4, 'trans':
{'sauda\\\xef\xbf\xbd\\\xef\xbf\xbd\\\xef\xbf\xbddha': 0.11875471,
'upabhaokahtaa': 0.75144361999999998, 'upabhaokahtaaom\\.n':
0.12980166000000001}}}
Now, use the data as a Python dictionary.
It is a clever hack, taking advantage of the nature of the data. But
it is far more faster than the other approaches posted here.
--Anand
>
>
> --
> ~noufal
> http://nibrahim.net.in
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>
--
--Anand
More information about the BangPypers
mailing list