Run Different Classifiers/algorithms In Parallel Using Spark
I have a dataset and I wanted to test different classifiers in parallel using Spark with Python. For example, if I want to test a Decision Tree and a Random Forest, how could I run
Solution 1:
@larissa-leite
To overcome this, I'm using [multiprocessing](https://docs.python.org/3/library/multiprocessing.html)
like explained in that thread.
This is the code of the thread:
from multiprocessing import Process
def func1():
print 'func1: starting'
for i in xrange(10000000): pass
print 'func1: finishing'
def func2():
print 'func2: starting'
for i in xrange(10000000): pass
print 'func2: finishing'
if __name__ == '__main__':
p1 = Process(target=func1)
p1.start()
p2 = Process(target=func2)
p2.start()
p1.join()
p2.join()
Just explain why I'm using this: I trained several text classifier models (more than 200) using OneVsRestClassifier and I need to span out every model the text that I receive.
The latency here it's less than 200ms to get all predictions to me (the baseline time reaction for the human being can be something between 100ms to 420ms) so this 'latency' it's not a big deal for me.
Post a Comment for "Run Different Classifiers/algorithms In Parallel Using Spark"