Skip to content Skip to sidebar Skip to footer

Dask Delayed Function Call With Non-passed Parameters

I am seeking to better understand the following behavior when using dask.delayed to call a function that depends on parameters. The issue seems to arise when parameters are specifi

Solution 1:

I will try to keep this brief.

When a function is serialised in order to be sent to workers, python also sends local variables and functions needed by the function (its "closure"). However, it stores the modules it references by name, it does not try to serialise your whole runtime. This means that zippy_parser is imported in the worker, not deserialised. Since the function read has never been called in the worker, the global variable is never initialised.

So, you could call read in the workers as part of your function or otherwise, but probably the pattern or setting module-global variables from with a function isn't great. Dask's delayed mechanism prefers functional purity, that the result you get should not depend on the current state of the runtime.

(note that if you had created the client after calling read in the main script, the workers might have got the in-memory version, depending on how subprocesses are configured to be created on your system)

Solution 2:

I encourage you to pass in all parameters to your dask delayed functions explicitly, rather than relying on the global namespace.

Post a Comment for "Dask Delayed Function Call With Non-passed Parameters"