Mastering the Internet of Things with Master Data Management


Many organizations have struggled to harness the IoT’s scale, size, and speedy datasets for a cogent business use case justifying investing in this expression of big data. If the principal challenge is making sense of this continually generated streaming data in real-time, the solution is unequivocally as straightforward as it is effective: Master Data Management.

“IoT data in itself is transactional in nature,” posited Naveego CTO Derek Smith. “So, IoT data itself is not master data. However, what people don’t realize is that what master data does is provide context for the IoT data.”

The context furnished by master data—and by solutions relying on MDM technologies—accelerates understanding what the IoT’s semi-structured data means for concrete business use cases pertaining to commonplace domains like customer and product data.

Solidifying a set of master data for any domain relevant to an IoT business application involves various aspects of data discovery, data profiling, data quality, records management, and data modeling.

Perfecting these data management dimensions gives organizations a blueprint with which to contextualize IoT data since, according to Smith, “the IoT data is kind of useless without the context that master data brings to it throughout the whole ecosystem.”

Data Discovery, Data Profiling

Oftentimes, the context yielded by MDM is necessary to understand IoT data within the larger scope of analytics. In the shipping industry, for example, which has taken on newfound popularity in the past couple months, MDM is critical to analyzing sensor data from deliveries to understand—and anticipate—customer needs. “The way that you get insights from the IoT data is you relate it to the master data that it relates to,” Smith explaind. “So, if you want to see the number of packages by customer in the state of Michigan, for example, you have to relate all that IoT [sensor] data back to the master data in order to produce that.”

Modern MDM platforms help users understand IoT data with automated data discovery capabilities predicated on data profiling. Organizations can simply connect to data sources and, courtesy of a combination of machine learning and static algorithms, automatically learn what type of data they contain, what the data looks like, and its content. Such information is critical to mapping these sources to target systems like data lakes. When leveraging various weather data sensors for retail or trading opportunities, for example, organizations can use MDM’s “auto discovery and profiling to help see the way different sensors report their information, then map that to a [uniform] schema of whatever sensor data that you want,” Smith mentioned.

Data Quality

The mapping referenced by Smith borders on a number of preeminent facets of data management, including data integration, data modeling and, perhaps most importantly when considering data governance, data quality. In competitive platforms, automating data quality is a natural output of the data discovery process and becomes the basis of data quality suggestions. These intelligent hubs “can say we detected this data, 99 percent of it’s in this pattern; do you want to ensure that all the data that comes through in this field is in this pattern?” Smith remarked. Oftentimes, transformations relevant to data quality are part of the mapping required to standardize data for use in a singular data model.

According to Lore IO CEO Digvijay Lamba, such mapping involves standardizing “the column names and the value sets.” Implicit to this capacity is input from users “where you can add your own business rules and transform rules in order to merge and reshape the data as needed,” Lamba commented. Additional data quality measures are available via APIs. For example, organizations can “tie into a third-party address verification API and then use information from that to alter the data pipeline,” Smith disclosed.

Golden Records

If contextualizing IoT data against the golden record of MDM systems is the foundation for applying the IoT to business use cases, creating—and maintaining—that golden record within MDM becomes doubly important. The capacity to match and merge records is critical to producing a golden record that’s a “pristine version,” Smith acknowledged. “Once we know what that pristine version is we’ll deliver it back to your source systems but also to your analytical systems all around your environment.” Contemporary methods for mastering data involve:

  • Machine learning: Relying on a combination of unsupervised and supervised learning techniques, certain MDM systems are “building [a model] up over time as it sees more and more, and it uses the matches that it creates to then apply to the next match,” Smith denoted. Additionally, machine learning influences processes in which “if you have two data sources where doctor names are actual names and they’re not exact matches, and you need to merge them,” Lamba said.
  • Static Algorithms: Matching presents a particularly compelling use case for static algorithms. According to Profisee CTO Eric Melcher, these algorithms are part of a larger process that effectively “forms a group” of records based on detecting relationships between them. This approach leverages an in-memory engine to index records and rapidly match them.
  • Survivorship: The notion of survivorship is an advanced form of merging and matching concepts, the former of which “is more sophisticated: I’m going to build one version of this data, how do I choose the value for it?” Smith indicated. Survivorship enables organizations to select “all this logic that you can define to automatically populate your golden record with the best data available,” Melcher specified. Best of all, survivorship allows organizations to cull different information from different records for a super record. Organizations can automate the use of choosing “the phone number from the CRM but the address from the ERP,” Melcher noted, or from any other available sources.

Integrating, Aggregating Data

Contextualizing the IoT with MDM ultimately requires the ability to integrate, and in certain conditions aggregate, data from these respective systems. The aforesaid functions for maintaining golden records take on renewed importance when issuing real-time updates of streaming or sensor data. Some of the better platforms in this space utilize real-time plug-ins for change data capture to incorporate IoT input into MDM with low latent matching and merging. Smith articulated a use case in which there’s an IoT app frequently changing data in a database and the overarching MDM system “can watch for that change and as soon as say, a customer updates their address from a web portal and it makes a change in that database, [MDM] can see that that change occurred as it’s being saved, and then deliver it to all the systems in real time because it can determine that’s the record that changed.”

This use case underscores the expansion of traditional MDM to a newfound role as an integration hub or orchestration mechanism that ultimately masters data from source systems (such as the IoT) to almost any downstream one. Within this paradigm, the overarching system is an integration layer that leverages both ETL and ELT, respectively, to “move data into a data lake and make sure that it’s staying up to date and accurate,” Smith said. Formally integrating data, of course, involves rectifying schema differences for cohering data to a common data model. Convincing options in this space achieve this objective with “a no-code UI to automatically map a company’s data into a target model,” Lamba maintained.

Winning Together

The symbiotic relationship between MDM and the IoT is mutually beneficial. Utilizing IoT data is a critical means of modernizing MDM and making it continually relevant in today’s increasingly distributed data environments. Conversely, MDM supplies the apposite context needed to quickly understand the significance of IoT data and incorporate that into larger data-driven processes.

More importantly, perhaps, the pairing of these applications also alludes to the growth of MDM beyond simply a static hub. The data integration capacity Smith referred to in which platforms delivering MDM operate as integration layers for transforming, mastering, and ultimately replicating data into repositories could very well represent the future of these hubs. If so, MDM will find itself not only embedded in foundational processing for the IoT, but for almost every dimension of data-driven applications imaginable.