A word on scale—there is a tendency to reach out to packages or external programs, such as Spark, to solve the problem. Often they do solve the problem. But it's been my experience that ultimately, when doing things at scale, there is no one-size-fits-all solution. Therefore, it's good to learn the basics, so that when necessary, you may refer to the basics and extrapolate them to your situation.
Again on the topic of scale—both researchers and practitioners would do well to learn to plan projects. This is one thing that I am exceedingly bad at. Even with the help of multiple project managers, machine learning projects have a tendency to spiral out of control. It does take quite a bit of discipline to manage these. This is both on the implementor's part and on the stakeholder's part.
Last...