Hidden Debt of Machine Learning Platforms

Hidden Debt of Machine Learning Platforms

With advancement of artificial intelligence and machine learning – from smart assistants, like Alexa and Siri, to tagging our friends in the social network, it is not surprising that every aspect of our lives is being touched and revolutionized by this new technology.

It is also impacting enterprise space and driving a transformation in the manufacturing sector, paving the way for Industry 4.0 revolution

However creating and maintaining an AI/ML framework is fundamentally different from traditional software solutions. As D. Shully called out in his seminal paper ‘Hidden Technical Debt in Machine Learning Systems’[i],there is a technical price, an enterprise has to pay in order to implement and maintain AI/ ML systems, before they can reap its rewards.

Source: Hidden Technical Debt in Machine Learning System, D. Sculley et. Al1

In this WP, we will go over the system level thinking needed to implement a functional ML system within an enterprise environment.

In conventional software development, there is an increasing focus of module isolation and code segmentation. However for ML systems, there is intricate dependence on the input signals, feature list etc, making it harder to follow the conventional software models. For example, if system has ‘N’ set of features, removing a feature could impact the entire systems, in terms of weights, connectivity etc and could require re-calibrating the entire system.

Similarly, addition of a new feature could change the dynamics of the system. So in order to create and maintain a robust AI/ ML based system, one possible option is to divide the system into multiple stages with well-defined boundaries and interfaces.

Lastly, the framework needs to integrate with the enterprise existing frameworks/ data lakes and need to be Dev Ops friendly, so that it becomes a sustainable AI/ML solution and not just a point solution for one time usage.

Following flow diagram depicts one such possible implementation, where the entire flow is divided into four different stages.

Figure 1: Potential 4 stage implementation of an Industrial AI/ML System

  • Stage 1 -Sensor Network: The very first stage of an Industrial AI/ ML system could be sensor ingest/ integration stage, which collects the data from different sensors or sensor networks. These sensors could include microphones, cameras,vibration sensors etc.
  • Stage 2 -Data Lakes: As highlighted in D. Sculley’s paper, 90% of the efforts of a machine learning system is to clean and filter input signals, perform ETL operation and make the data usable for down the stream system. So the next stage of the system is to process this sensor data, which might be different formats, like audio files, time series etc., and create a well-defined data-lake for the next stage processing.
  • Layer 3:Data Labeling annotation and feature extraction: This step is needed during the model definition or training phase, and during model updates. The input data needs to labeled and the right feature sets need to be defined, so that right ML algorithms can be developed.
    • Once the models are well defined and trained,another critical step is to decide the timing and the potential trigger points,when this model should get updated.
  • Layer 4: Predictive Analytics: This is the outcome of all the hard work. At this stage, data and ML trained model could perform the predictive analytics to provide valuable insights.

In the upcoming blogs, we will delve into these stages in detail and show how mSense implementation leverages this structured approach for audio classification for predictive analytics. We will also highlight the Dev Ops friendliness of mSense platform, which makes it easier to seamlessly integrate it in the enterprise framework.

In the meantime, please visit us at www.msense.ai to get more details.


[i] Hidden Technical Debt in Machine Learning System, D. Sculley et. Al., https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf