Improve Python + Influxdb import performance

Prathamesh prathamesh.nimkar at gmail.com
Mon Apr 3 13:03:47 EDT 2017


On Monday, April 3, 2017 at 9:52:38 PM UTC+5:30, INADA Naoki wrote:
> You can reuse connection, instead of creating for each request. (HTTP
> keep-alive).
> 
> On Tue, Apr 4, 2017 at 1:11 AM, Prathamesh <prathamesh.nimkar at gmail.com> wrote:
> > Hello World
> >
> > The following script is an extract from
> >
> > https://github.com/RittmanMead/obi-metrics-agent/blob/master/obi-metrics-agent.py
> >
> > <<Code begins>>
> >
> > import calendar, time
> > import sys
> > import getopt
> >
> > print '---------------------------------------'
> >
> > # Check the arguments to this script are as expected.
> > # argv[0] is script name.
> > argLen = len(sys.argv)
> > if argLen -1 < 2:
> >     print "ERROR: got ", argLen -1, " args, must be at least two."
> >     print '$FMW_HOME/oracle_common/common/bin/wlst.sh obi-metrics-agent.py  <AdminUserName> <AdminPassword> [<AdminServer_t3_url>] [<Carbon|InfluxDB>] [<target host>] [<target port>] [targetDB influx db>'
> >     exit()
> >
> > outputFormat='CSV'
> > url='t3://localhost:7001'
> > targetHost='localhost'
> > targetDB='obi'
> > targetPort='8086'
> >
> > try:
> >     wls_user = sys.argv[1]
> >     wls_pw = sys.argv[2]
> >     url  = sys.argv[3]
> >     outputFormat=sys.argv[4]
> >     targetHost=sys.argv[5]
> >     targetPort=sys.argv[6]
> >     targetDB=sys.argv[7]
> > except:
> >     print ''
> >
> > print wls_user, wls_pw,url, outputFormat,targetHost,targetPort,targetDB
> >
> > now_epoch = calendar.timegm(time.gmtime())*1000
> >
> > if outputFormat=='InfluxDB':
> >     import httplib
> >     influx_msgs=''
> >
> > connect(wls_user,wls_pw,url)
> > results = displayMetricTables('Oracle_BI*','dms_cProcessInfo')
> > for table in results:
> >     tableName = table.get('Table')
> >     rows = table.get('Rows')
> >     rowCollection = rows.values()
> >     iter = rowCollection.iterator()
> >     while iter.hasNext():
> >         row = iter.next()
> >         rowType = row.getCompositeType()
> >         keys = rowType.keySet()
> >         keyIter = keys.iterator()
> >         inst_name= row.get('Name').replace(' ','-')
> >         try:
> >             server= row.get('Servername').replace(' ','-').replace('/','_')
> >         except:
> >             try:
> >                 server= row.get('ServerName').replace(' ','-').replace('/','_')
> >             except:
> >                 server='unknown'
> >         try:
> >             host= row.get('Host').replace(' ','-')
> >         except:
> >             host=''
> >         while keyIter.hasNext():
> >             columnName = keyIter.next()
> >             value = row.get(columnName )
> >             if columnName.find('.value')>0:
> >                 metric_name=columnName.replace('.value','')
> >                 if value is not None:
> >                     if value != 0:
> >                         if outputFormat=='InfluxDB':
> >                             influx_msg= ('%s,server=%s,host=%s,metric_group=%s,metric_instance=%s value=%s %s') % (metric_name,server,host,tableName,inst_name,  value,now_epoch*1000000)
> >                             influx_msgs+='\n%s' % influx_msg
> >                             conn = httplib.HTTPConnection('%s:%s' % (targetHost,targetPort))
> >                             ## TODO pretty sure should be urlencoding this ...
> >                             a=conn.request("POST", ("/write?db=%s" % targetDB), influx_msg)
> >                             r=conn.getresponse()
> >
> > <<Code ends>>
> >
> > It currently takes about 3 minutes to execute completely and I was thinking of a way to make it run faster
> >
> > Data alignment (Influx line protocol) & data loading - done together takes up most of the time
> >
> > Influxdb is currently loading data at around 3 points/second
> >
> > Any way to align the data separately, store it and load it as a batch?
> >
> > I feel that would help improve performance
> >
> > Please let me know if you have any pointers
> > I can send the data sheet if required
> >
> > Thanks P
> > --
> > https://mail.python.org/mailman/listinfo/python-list

How do I do that?


More information about the Python-list mailing list