Run Different Classifiers/algorithms In Parallel Using Spark

February 28, 2024 Post a Comment

I have a dataset and I wanted to test different classifiers in parallel using Spark with Python. For example, if I want to test a Decision Tree and a Random Forest, how could I run

Solution 1:

@larissa-leite

To overcome this, I'm using [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) like explained in that thread.

This is the code of the thread:

from multiprocessing import Process

def func1():
  print 'func1: starting'
  for i in xrange(10000000): pass
  print 'func1: finishing'

def func2():
  print 'func2: starting'
  for i in xrange(10000000): pass
  print 'func2: finishing'

if __name__ == '__main__':
  p1 = Process(target=func1)
  p1.start()
  p2 = Process(target=func2)
  p2.start()
  p1.join()
  p2.join()

Just explain why I'm using this: I trained several text classifier models (more than 200) using OneVsRestClassifier and I need to span out every model the text that I receive.

The latency here it's less than 200ms to get all predictions to me (the baseline time reaction for the human being can be something between 100ms to 420ms) so this 'latency' it's not a big deal for me.

Python College

Run Different Classifiers/algorithms In Parallel Using Spark

Solution 1:

Post a Comment for "Run Different Classifiers/algorithms In Parallel Using Spark"