AI for Industry: future trends and perspectives

There is no doubt posed on the research momentum around AI for industrial prognosis. Indeed, almost every proposal for new architectural solutions related to Industry 4.0 conceives AI-based prognosis as a core part of utmost relevance for the smart operation of the industrial asset under focus. The use of data fusion techniques and machine learning algorithms to exploit all the available information allows incorporating intelligence into improved, cloud-based hands-on machines and production lines, through software integration and deployment. Complex behaviors and prognostic models can be learned from historical data, tons of data can be analyzed in real time and industrial assets and production processes can be intelligently monitored in an on-line fashion. Cloud-powered data processing and Big Data management are also key technological ingredients in this regard. However, the community still faces a number of research niches and challenges demanding further investigation and development in the near future. Such challenges should stimulate the interest and steer the efforts of early researchers and newcomers to this exciting research field.

Descriptive Prognosis: Visual Analytics for an Enhanced Understandability

One of the most recurrently encountered handicaps for the widespread adoption of data-based prognosis is the assimilation of information by the operator of the industrial plant. When it comes to descriptive prognosis, it is often the case that the produced information by the deployed models cannot be processed straightforward by non-specialized personnel unless some sort of preprocessing is devised for an improved, more intuitive understanding of the captured patterns. This is particularly relevant in legacy industrial facilities through their first transition steps towards a digital mode of operation, production and management. In this stage descriptive modeling should incur simplistic approaches targeted at a twofold objective: 1) to crosscheck that the captured data is in accordance to the knowledge and historical experience by the personnel of the plant during their working years in the plant; and 2) to provide different levels of abstraction and complexity in the representation of data, optimally matched to the technical competences and needs of the managing staff of the industrial company. Once the industrial staff verifies that the prognostic information provided by basic descriptive models matches their intuition, a path is cleared towards embracing more advanced algorithms and methods for describing the normal operation of the industrial setup under analysis. To this end, data fusion techniques, when needed, must be designed with extreme care not to oversee the expertise of the personnel in regards to the number and temporal resolution of the monitored signals. Visual analytics, understood as the study and development of new ways of data representation fostering interpretability and understandability of the displayed information flows, has recently emerged as a promising discipline to visually adapt the discovered insights and optimally present results to different human profiles. These aspects will be crucial in real use cases where to deploy models for descriptive prognostic with a minimum guarantee of usability and practical utility, along with other technological approaches aimed at this same purpose (e.g. human machine interfaces).

Predictive Prognosis: Class Imbalance, Non-stationarity and Transfer Learning

When industrial prognosis is formulated as a classification or regression problem, the relatively low incidence of faults in the industrial machine or asset being monitored is a circumstance that hinders the proper construction of a predictive model to undertake the classification task. When training a model with few or no evidences of the events of interest (e.g. operational faults or changing operational conditions), it is likely that the model becomes biased towards the so-called majority class. In other words, the learning algorithm focuses on predicting the most frequent class (namely, normal operation) with high accuracy, while misclassifying or simply ignoring the least frequent class (correspondingly, faulty operation) which, in turn, is the one whose detection provides most practical value for the industry. This is actually a very recurrent problem in predictive prognosis, particularly when casted as a binary classification problem. Workarounds abound in the form of preprocessing methods such as class under/oversampling techniques, specialized balanced ensembles or embedded modifications of the model learning algorithm devised to account for the class imbalance present in the training dataset. However, even though there have been notable advances in class imbalance for multilabel and multiclass classification from an application-agnostic perspective, most real use cases where predictive prognosis is put to practice oversimplify the underlying problem to its binary version, despite the immediate profits that could derive from the discrimination of the type of fault predicted to occur (e.g. tailored predictive maintenance or a more resilient design of the processes and machinery involved in production). Extrapolating the aforementioned findings to industrial prognosis would by itself provide an increased predictive awareness of the fault patterns of the monitored assets. This would call for interesting synergies with visual analytics so as to help managers upon an alarm comprising different types of fault.

Another research area in data fusion and analysis emerging in industrial prognosis is Online Learning over data streams. Online Learning implies deep changes not only in what refers to the learning algorithm (e.g. incremental model update), but also in regards to the obsolescence of the knowledge retained by the model under phenomena that is not necessarily symptomatic of the failure to be predicted (e.g. a change of working regime of the machinery, lack of calibration, sensing drift and other factors alike). In such a case, the adoption of elements from concept drift detection and adaptation for the industrial setting has lately come into scientific debate, as they can be efficient means for prognosis over time-evolving data streams. Indeed, subtle changes in the distribution of the data streams under faulty and faultless operation can make the predictive knowledge captured in the model become catastrophically obsolete at a point in time, eventually triggering maintenance alarms when there is no such a need in practice. The detection and consequent adaptation of the learning algorithm (either actively or passively) to this drift could eventually minimize its impact and maintain the detection performance of the prognostic solution within admissible levels of practicality. In this context, industrial applications requiring prognosis over data streams should particularly inspect the latest advances for recurrent concept drifts, since phenomena for drifting data streams usually occur repeatedly in industrial setups (e.g. recalibration or the change of operator in the machine). Online predictive models capable of learning from data streams subject to uncertainty should also be at the core of future research in industrial prognosis, due to the high level of uncertainty and noise characterizing certain sources of data.

Finally, Transfer Learning and Domain Adaptation are also trends in Data Science deserving further attention for industrial prognosis, since this portfolio of techniques can be an effective workaround for the scarcity of labeled prognosis data in industrial setups. In manufacturing industries with presence in different countries the deployed machinery features a high level of similarity between plants, with different designs due to the provider or varying contexts in diverse aspects such as maintenance policies, personnel skills or quality of the processed raw material. Transfer Learning could make a predictive prognosis model developed for a certain industrial plant be reused in part as a starting point for predicting failures in another plant, even if differences exist between the context in which such plants operate.

Prescriptive Prognosis: Complex Constraints and Realistic Objectives

When turning the focus to prescriptive prognosis, the most challenging paradigm encountered in practice remains tightly coupled to the match between the formulated optimization problem and the decision making process that such a problem aims to model. Industries, particularly those related to the manufacturing of goods, are complex environments where human and machinery coexist and interact, often without a holistically centralized management. In this context it is often the case that actions triggered by a prescriptive prognosis model do not conform to the practical criteria and/or constraints under which such actions would be manually enforced. In this case, the developed prescriptive models would fail to apply when deployed over the industrial plant, thus being left aside from managerial processes. Therefore, new working methodologies are needed to ensure that the prescriptive research hypothesis is aligned with the real requirements of the industry process at hand. Besides, such methodologies should also account for other practical aspects that could eventually affect the design of efficient solvers for their resolution, including the variability of metrics and/or constraints along time, cost implications of decisions made by the model or the presence of conflicting objectives in the criteria guiding such decisions (such as productivity and reliability when prescribing maintenance operations in a job shop scheduling problem).

Integration of Expert Knowledge and Physics in Hybrid Prognostic Models

In terms of data fusion, there has been little discussion on efficient procedures for representing and integrating expert knowledge towards its consideration in subsequent modeling phases. Beyond techniques for fusing the information captured at different scales and temporal resolutions from the industrial machinery and warehouse platforms, there is common belief that the aggregated knowledge collected over years of experience of the personnel is a valuable informational asset and a key factor for success in prognosis modeling. When the computed health or status indicator from data is representative enough to address the problem under study, the modeling stage becomes rather straightforward. More precisely, if expert knowledge of the problem to be solved and involved assets is available in advance, it is advisable to stress on the fusion of data and the definition of the KPIs to be measured. If expert knowledge is limited or not easily representable as a continuous or discrete variable, the emphasis must be instead placed on the modeling phase, attempting to address the analytical task in a more exhaustive manner. This can be seen as a trade-off between model complexity and a priori knowledge. Based on this principle, the attention of the research community should be directed towards the development of hybrid models capable of seamlessly fusing incorporating the expertise fed back from the industry personnel within its learning algorithm. For this purpose models suited to deal with multidimensional time-domain data instances should lay at the core of this research niche, such as recently reported recurrent models for sequence prediction with uncertainty and distance based classification for time series.

An open challenge related to the above remains when blending together Data Science and principles stemming from Mechanics, Thermodynamics and other physical principles linked to the failure of specific industrial processes, particularly those related to the manufacture of materials (e.g. metallurgy, polymers and plastics). The hybridization of theoretical concepts with learned evidences from historical data has shown itself to be highly profitable for energy efficiency or battery life prediction, coining the so-called gray-box or semi-physical modeling concept. Both complement each other and may help reducing the impact of label scarcity, lack of data or insufficiently generalizable theoretical approaches to the prognosis task under analysis. However, the integration of this theoretical knowledge in prognostic models is made in an ad-hoc fashion, being fully determined by the use case at hand. More principled studies are needed to evince under which conditions this hybridization yields significant performance gains for the model, delving into new ways to quantify the degree of innovation provided by theoretical concepts over a given prognostic dataset.

Prognosis towards Flexible, Cost-effective Production

The digital revolution faced by industries in recent times can be thought of as an enabler for a better adaptability of manufacturing production processes and industrial assets to dynamic conditions and requirements demanded by their consumers. Companies can even produce different products by communicating specifications to the machine. Thus, product variations can be automatically and flexibly manufactured by using well-defined standards. To this end, every single part involved in the process produces and processes data delivered by other parts, including information related to quality, inventory and relevantly for the current study: health monitoring. Parts are continuously informing about their own status and that of the phase of the production line where they are installed, which requires gathering all such information through an IoT platform and centralizing it in a cloud-based system able to store and process large volumes of data. The intelligent prognostic analysis of this collected information is crucial to ensure that manufacturing industries operate robustly and cost-efficiently within highly competitive markets. By ensuring optimized prognostic decisions in terms of maintenance highly customized and reconfigurable products can be manufactured, closely matching their specifications to customers' needs. However, a closer look at the compatibility of maintenance decisions and flexible production needs and schedules is still lacking in the literature.

In this regard, cost effectiveness of prognostic decisions are rarely addressed in the literature, even though this criterion can determine their practical feasibility. Instead, the optimality of decisions is rather formulated as productivity, energy savings, inventory saturation and other technological KPI. In regards to flexibility, the ability of a manufacturing plant to dynamically produce small yet highly customized lots by virtue of informed, data-based decisions in operations and maintenance may clash with excessive economical investments if the prognosis modeling problem is formulated without considering cost effectiveness among its objectives. The inclusion of this metric into the design of prognostic models (particularly those prescribing predictive maintenance actions) is promising in light of a strand of contributions related to the effectiveness of maintenance investments.

Hardware and Communications: Industrial IoT Networks Using Fog Computing

The IoT revolution within Industry extends computing and network capabilities and minimizes the need for human interaction within industrial processes and operations. However, implementing IoT solutions is a huge transformation process, involving not only technology and products, but also the change of mindsets.

Interestingly, several challenges faced by companies willing to deploy IoT networks composed by distributed edge devices connect with the risks derived from this sharp transformational process. One key challenge has to do with the investment costs required for the IoT deployment, which could be daunting for Small and Medium Enterprises (SMEs) due to the unpredictability of the future value chain. The lack of skills and experience of the current Information Technology (IT) staff must be carefully considered as well, since it is often insufficient to deal with the vast amount of hardware and software solutions required for IoT-empowered Industry 4.0. In terms of interoperability and available standards, it turns out that current IoT ecosystems suffer from the fragmentation of conventional solutions and implementation standards. Moreover, industrial IoT sensors must coexist with legacy equipment that is already deployed on the plant, which must be integrated into distributed IoT architectures as seamlessly and efficiently as possible. Data security, privacy and governance is also important given the vast amount of data generated by a wide variety of sources. It actually causes a big concern with the ownership of data, so that a secure access must be guaranteed particularly for industries whose products and assets are critical in these terms. In this regard, prognostic models must be complemented by schemes and mechanisms for authenticated sensor access and data encryption/verification/integrity assurance so that the operation of the prognostic model is robust against attacks based on unauthorized modification/injection/removal of industrial data along their life cycle.

Data capture mechanisms are already in place and the digitalization of industrial assets, products, processes and services are ready to improve productivity, satisfaction and incomes through data-driven solutions. However, there still remains a wide gap to be bridged between real industrial equipment and their digital twins which are required, among other uses, to develop optimized maintenance/operation decisions in regards to their predicted prognosis. Furthermore, in many cases data are used only when the equipment undergoes servicing by a field engineer. Moreover, the full integration of Edge and Cloud Computing technologies is yet uncertain in many industrial sectors. Therefore, the development and relative higher maturity of Fog Computing frameworks can ignite the digitalization process of Industry 4.0 and efficiently support IoT applications.


Diez-Olivan, Alberto, et al. "Data fusion and machine learning for industrial prognosis: Trends and perspectives towards Industry 4.0." Information Fusion 50 (2019): 92-111.

industry 4.0 industry machine learning digitaltransformation industrial prognosis