Dask Delayed Function Call With Non-passed Parameters
Solution 1:
I will try to keep this brief.
When a function is serialised in order to be sent to workers, python also sends local variables and functions needed by the function (its "closure"). However, it stores the modules it references by name, it does not try to serialise your whole runtime.
This means that zippy_parser
is imported in the worker, not deserialised. Since the function read
has never been called
in the worker, the global
variable is never initialised.
So, you could call read
in the workers as part of your function or otherwise, but probably the pattern or setting module-global variables from with a function isn't great. Dask's delayed mechanism prefers functional purity, that the result you get should not depend on the current state of the runtime.
(note that if you had created the client after calling read
in the main script, the workers might have got the in-memory version, depending on how subprocesses are configured to be created on your system)
Solution 2:
I encourage you to pass in all parameters to your dask delayed functions explicitly, rather than relying on the global namespace.
Post a Comment for "Dask Delayed Function Call With Non-passed Parameters"