Harmonizing Data Collection to Utilize the Full Power of Data

Data is a powerful asset, yet its power is limited by the quality of the data and our ability to make use of that information. At Acubed, the ADAM team is collecting massive amounts of data and leveraging its power to train Machine Learning (ML) models. Within aerospace, ML models can be applied throughout the design and manufacturing processes to reduce production lead times, costs and improve workflows, enabling OEMs like Airbus to meet the increasing demands of commercial travel (i.e. producing more aircraft). Behind those models is data, and lots of it.

Turning a high-level vision into reality

The key to maximizing the potential of data is to stage it so it can be readily analyzed while maintaining its fidelity. Our teams use a value stream mapping technique to understand the flow of components and information. This visual tool combines the physical steps with information flow to create transparency and help the team analyze the processes involved. With this view, we can identify data access points, data owners and understand how to acquire the data, how it’s created and what it means. That data stems from various places, including email, SAP (Hana), phone calls, database tables and more. It also comes in multiple formats ranging from structured to semi-structured to unstructured data and streaming, all of which require a different method of ingestion. After that, the data goes through a “clean and combine” stage that transforms it into valuable and trustworthy data.

In our approach, we use logs for our data platform to help scale process complexity and capture data dynamics. We convert operational and external data logs into a distributed event store, which records data events. Streams can be short-lived with lots of events or long-lived with fewer events. These distributed event stores, a.k.a log interfaces, are reliable, fault-tolerant, support multiple subscribers and automatically balance consumer load. Working with logs makes it possible to go back in time and examine what happened to match information with another event or perform other knowledge enrichments commonly seen in Semantic Knowledge Graphs.

Fig. 1: Converting operational and external data logs into a distributed event store

Connected data is more valuable data

Data is often said to be the blood of the company, yet it is difficult to discover, access and understand. Often datasets are poorly documented, missing semantic descriptions, explanations of attributes, etc. These challenges impair the performance of daily operations and pose limitations to analytics and artificial intelligence (AI) activities. In the data extraction process, metadata such as who created it, when, why and under what rights should be incorporated. We introduce semantics to describe complex dependencies and additional property metadata by providing a domain ontology with standard schemas. Ontology-based data extraction works by associating event data with schema entities, such as concepts/classes and properties. The key takeaway? Make it easy to find, access and understand datasets across the organization. We live by the FAIR principle, ensuring data management follows four key attributes: Findable, Accessable, Interoperable and Reusable.

Fig. 2: Ontology-based data extraction works by associating event data with schema entities

Legacy versus data-driven business

We live in an age of innovation where companies and engineers thrive on experimenting and developing new technologies and business logic. Yet this constant evolution poses new challenges for the data industry as new solutions are launched to help organize information and host it in effective platforms. While this certainly is helpful, it also creates a cycle where each new generation of data experts is faced with new, yet the very same, challenges that arise from using vendor-specific systems. Rather than spending time leveraging the power of data through innovation, teams are stuck trying to unify that data. To escape this cycle, we propose using open standards and data-driven solutions, such as breaking out application logic and representing it as data.

With an effective process, data can be framed and explored further to deliver actionable insights. For example, the teams at the Airbus U.S. Manufacturing Facility, with a high-fidelity datastream, have access to fast KPI reporting and materials optimal allocation recommendations obtained from onsite material inventory review as well as incoming materials arrival time estimation. With accurate and dependable data, the assembly line knows in advance if a component will arrive early or late, empowering the teams to take action and reduce the risk of interruptions.

Whether data is being used to increase business efficiency, identify new strategies or train ML models, the success of that outcome depends on the ability to use the data effectively. Teams must know how to identify the right data, collect it, properly store it and understand how it can liberate the company from being tied to older and expensive solutions.

If you’re interested in joining our team and building the future of flight, apply here.

- Joakim Soederberg