PySpark Coding Practices: Lessons Learned
-
Alex Gillmor and Shafi Bashar, Machine Learning Engineers
- May 14, 2018
In our previous post, we discussed how we used PySpark to build a large-scale distributed machine learning model. In this post, we will describe our experience and some of the lessons learned while deploying PySpark code in a production environment. Yelp’s systems have robust testing in place. It’s a hallmark of our engineering. It allows us to push code confidently and forces engineers to design code that is testable and modular. Broadly speaking, we found the resources for working with PySpark in a large development environment and efficiently testing PySpark code to be a little sparse. By design, a lot...