Skip to content Skip to sidebar Skip to footer

Run Different Classifiers/algorithms In Parallel Using Spark

I have a dataset and I wanted to test different classifiers in parallel using Spark with Python. For example, if I want to test a Decision Tree and a Random Forest, how could I run

Solution 1:

@larissa-leite

To overcome this, I'm using [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) like explained in that thread.

This is the code of the thread:

from multiprocessing import Process

def func1():
  print 'func1: starting'
  for i in xrange(10000000): pass
  print 'func1: finishing'

def func2():
  print 'func2: starting'
  for i in xrange(10000000): pass
  print 'func2: finishing'

if __name__ == '__main__':
  p1 = Process(target=func1)
  p1.start()
  p2 = Process(target=func2)
  p2.start()
  p1.join()
  p2.join()

Just explain why I'm using this: I trained several text classifier models (more than 200) using OneVsRestClassifier and I need to span out every model the text that I receive.

The latency here it's less than 200ms to get all predictions to me (the baseline time reaction for the human being can be something between 100ms to 420ms) so this 'latency' it's not a big deal for me.

Post a Comment for "Run Different Classifiers/algorithms In Parallel Using Spark"