You are viewing a single comment's thread from:

RE: Using SKlearn I am getting memory errors is there anyway to use batching?

in #python8 years ago

warnings.warn("Numerical issues were encountered "
UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.

Is appearing now I changed to pd.to_numeric().

Sort:  

Did you normalize the numerical columns? Set them to manageable values.
I usually normalize to down scale the values and pass them thru a sigmoid function to rescale the values between 1 and 0 or tanh if you want values between 1 and -1 (tanh is a numpy function)

Here is the normalize function:

import numpy as n
def normalizeme(dfr, pinv=False):
nmean = n.mean(dfr, axis=0)
nstd = n.std(dfr, axis=0)
dfr = (dfr - nmean) / nstd
if pinv == False:
return dfr
else:
return [dfr, nmean, nstd]

Here is the sigmoid function:
import numpy as n
def sigmoidme(dfr):
return 1.0 / (1 + pow(n.e,-dfr))

please note: I'm not a markdown expert unfortunately so the indentation in the code pasted here is a little off.