Recurrent Machine Learning Etl Using Luigi
Solution 1:
Your pattern looks largely correct. I would start by using a cron job to call a script that triggers the Load
task pipeline. It looks like this Load
task already verifies the existence of new files in the S3 bucket, but you would have to change the output to also be conditional, which could be a status file or something else if there is nothing to do. You could also do this in a higher level WrapperTask
(with no output) that just required the Load
task only if there were new files. Then you could use this WrapperTask
to require two different Load tasks and which would respectively require your Transform1
and Transform2
.
Adding in containers... what my cron really calls is a script that pulls my latest code from git, builds a new container if necessary, and then calls docker run. I have another container that is always up running luigid
. The daily docker run executes a shell script in the container using CMD
that calls the luigi task with the parameters needed for that day.
Post a Comment for "Recurrent Machine Learning Etl Using Luigi"