H2o Server Crash
Solution 1:
You need a bigger boat.
The error message is saying "heapUsedGC=11482667352", which is higher than MEM_MAX. Instead of giving max_mem_size="12G"
why not give it more of the 64GB you have? Or build a less ambitious model (fewer hidden nodes, less training data, something like that).
(Obviously, ideally, h2o shouldn't be crashing, and should instead be gracefully aborting when it gets close to using all the available memory. If you are able to share your data/code with H2O, it might be worth opening a bug report on their JIRA.)
BTW, I've been running h2o 3.10.x.x as the back-end for a web server process for 9 months or so, automatically restarting it at weekends, and haven't had a single crash. Well, I did - after I left it running 3 weeks and it filled up all the memory with more and more data and models. That is why I switched it to restart weekly, and only keep in memory the models I needed. (This is on an AWS instance, 4GB of memory, by the way; restarts done by cron job and bash commands.)
Solution 2:
You can always download the latest stable release from https://www.h2o.ai/download (there's a link labeled "latest stable release"). The latest stable Python package can be downloaded via PyPI and Anaconda; the latest stable R package is available on CRAN.
I agree with Darren that you probably need more memory -- if there is enough memory in your H2O cluster, H2O should not crash. We usually say that you should have a cluster that's at least 3-4x the size of your training set on disk in order to train a model. However, if you are building a grid of models, or many models, you will need to increase the memory so that you have enough RAM to store all those models as well.
Post a Comment for "H2o Server Crash"