Seems like you're not seeing any deprecation warnings, in the newer versions of pandas convert_objects/convert_numeric is deprecated, I presume due to memory issues.
Are you not seeing a deprecation warning?
If so:
pip freeze > freeze.txt
pip install --upgrade pandas
and re-run your python file, you'll then see the deprecation warnings.
Instead of convert_objects(convert_numeric=True) try using pd.to_numeric(), you only need timeSeconds converted to numerical type, converting one field instead of all is one memory optimization step I see clearly visible.
To see other memory issues, try using a memory profiler to find the memory bottlenecks. The profiler will tell you which line numbers are hogging up resources.
memory_profiler and line_profiler are two profilers that I use when bottleneck resolution is required.
pip install memory_profiler
pip install line_profiler
Did you normalize the numerical columns? Set them to manageable values.
I usually normalize to down scale the values and pass them thru a sigmoid function to rescale the values between 1 and 0 or tanh if you want values between 1 and -1 (tanh is a numpy function)
Here is the normalize function:
import numpy as n
def normalizeme(dfr, pinv=False):
nmean = n.mean(dfr, axis=0)
nstd = n.std(dfr, axis=0)
dfr = (dfr - nmean) / nstd
if pinv == False:
return dfr
else:
return [dfr, nmean, nstd]
Here is the sigmoid function:
import numpy as n
def sigmoidme(dfr):
return 1.0 / (1 + pow(n.e,-dfr))
please note: I'm not a markdown expert unfortunately so the indentation in the code pasted here is a little off.
Glad to help out! It actually blows my mind that whatever I wrote above actually made any sense, and that you made progress on my advise, or a step towards progress, at least.
Keras == "very few lines of code"
Bookmarked this video a while back for future reference. When you mentioned the batch process issue, it reminded me of this video presentation.
Possible batch processing solution @17:32 with the model.fit() function.
I assume that model.fit() might be a wrapper for some heavy parallel routines that segment your training dataset rows into manageable chunks to be batch-processed within the gpu as kernel functions. Saves you writing the for loops as .fit() takes care of all the vectorization and/or compiled loops inside the gpu space. Python loops are tremendously slow, you can vectorize as much as possible with numpy arrays, but translating for loops into vectorized code is a little tricky.
Keras does all the grunt work.
Keras great for protoyping.
Keras == "very few lines of code"