[BangPypers] Question on making hive python UDF object persistent
getpramod.r at gmail.com
Wed Jan 23 12:07:40 EST 2019
I'm trying to code a Hive UDF in python, which loads a pickle object
(basically a set of linear model weights). These weights that are read from
the pickle, are used to score a set of observations from a hive table. Once
I have computed the scores, I would also want to update the weights, based
on the truth value that I receive from the same Hive table, so that the
next observation is scored on the updated weights.
Something like this:
Python UDF code:
import numpy as np
betas = pickle.load(open('B.pkl','rb'))
for line in sys.stdin:
data = line.strip().split('\t')
X = np.array(data[:-1])
y = np.array(data[-1])
ycap = sigmoid(np.dot(betas,X))
new_beta = np.dot(np.dot(np.linalg.inv(np.dot(X.T,X)),X.T),y)
I did read about making a python object in hive udf persistent across all
the cores (stateful udtf). Can anyone help me with a sample code?
Thanks in advance!
More information about the BangPypers