Skip to content Skip to sidebar Skip to footer

Parallelize Tree Creation With Dask

I need help about a problem that I'm pretty sure dask can solve. But I don't know how to tackle it. I need to construct a tree recursively. For each node if a criterion is met a co

Solution 1:

As I understand, the way you tackle recursion (or any dynamic computation) is to create tasks within a task.

I was experimenting with something similar, so below is my 5 minute illustrative solution. You'd have to optimise it according to characteristics of the algorithm.

Keep in mind that tasks add overhead, so you'd want to chunk the computations for optimal results.

Relevant doc:

Api reference:

import numpy as np
import time
from dask.distributed import Client, worker_client

# Create a dask client
# For convenience, I'm creating a localcluster.
client = Client(threads_per_worker=1, n_workers=8)
client

class Node:
    def __init__(self, level):
        self.level = level
        self.val = None
        self.childs = None   # This was missing

def merge(node, childs):
    values = [child.val for child in childs]
    if all(values) and sum(values)<0.1:
        node.val = np.mean(values)
    else:
        node.childs = childs
    return node        

def compute_val():
    time.sleep(0.1)            # Is this required.
    return np.random.rand(1)

def build(node):
    print(node.level)
    if (np.random.rand(1) < 0.1 and node.level>1) or node.level>5:
        node.val = compute_val()
    else:
        with worker_client() as client:
            child_futures = [client.submit(build, Node(level=node.level+1)) for _ in range(2)]
            childs = client.gather(child_futures)
        node = merge(node, childs)
    return node

tree_future = client.submit(build, Node(level=0))
tree = tree_future.result()

Post a Comment for "Parallelize Tree Creation With Dask"