For a software engineer, the hardest thing about developing Machine Learning functionality should be finding clean and representative ground-truth data, but it often isn’t. If you already have a source of good quality data (perhaps because it is already gathered by your application), here are some obstacles that still lay ahead of you:
Learning ML
ML concepts are hard to understand. The industry is still mostly in the research phase, with the beginnings of a movement towards making systems and learning resources consumable by the average developer. Books and libraries like FastAI are examples of this movement, but it still takes days or weeks to learn how to do the most basic things.
To do something basic like image classification, you have to:
- Understand concepts like tensors, loss functions, transfer-learning, logistic regression, network fine-tuning, hyper-parameter search, over-fitting, active learning, regularization, and quantization.
- Get familiar with one or more ML libraries like PyTorch, Tensorflow, FastAI, or scikit-learn. This is harder than getting familiar with a normal programming library because the concepts and paradigms are very different from what programmers are used to.
- Find state-of-the-art (SOTA) deep neural networks for images from research and industry. Continue to search for new SOTA network