Do Data Scientist needs to be a programmer?
You don’t need to be a great programmer, but you need to be a good programmer with the right habits and industry skills.
Before going into details, some thoughts!
I have observed, heard, and witnessed that as Data Scientist, my job is not to make code ready for productionalization. It is fair for the companies that have a proper team with roles define. But, lots of companies (including big ones) are still in the process of scalable AI/ML adoption.
For companies in AI/ML adoption phase - As a data scientist, your job is not merely to run production pipelines / models as adhoc scripts in Jupyter / R script.
Some of the easy steps a Data Scientist can do:
- Move out from Jupyter / R Adhoc scripts as soon as the business is consuming your results.
- Keep code clean with not many details in the comments, avoid printing variables, remove a block of code that is not useful.
- Use functions to abstract the complexity and reusability, readability, and testing.
Productionalization of ML products is essential. Think of Scaling is necessary!
A couple of next steps that you should take:
- Learn about Model Training - Real-time vs. Batch
- Learn about Model Serving - Online Steaming using Kafka / Spark; building APIs modules and Device edge modeling (based on business use-cases)
- Learn about Model Monitor and maintenance.
Embrace the technology - Docker, Kubernetes, Continuous Integration and Deployment, Monitoring tools.
You don’t need to be an expert in all these technologies. Work with your Software Engineers team - Teamwork is the dream work!
Next Step:
Introductory Session for API Building