When I started building analytical data systems and data warehouses 25 years ago, I had to sell many businesspeople on the value of their information; it wasn’t just an abstract by-product of their operational applications.  Data governance was considered some boondoggle that consultants came up with to increase project costs.  Consequently, we had to approach the work of documenting and conforming data and mapping it to the business delicately and, more often than not, tactically.  Over time, an understanding developed about the power of data to improve and transform the organization. The eventual adoption of data governance (or at least its value) became broadly accepted.  The data generated by an organization is considered an asset which has value to stakeholders both internal and external.

The modern approach to Master Data Management was born alongside data governance to address the challenge of managing strategic data assets and, in many cases, the governance itself.  Unsurprisingly, MDM had the same perception issues as data governance (overly expensive and complicated) and had the additional burden of being fairly esoteric and technical.  While many organizations either had or are starting to establish data governance processes, MDM is still uncommon and often fragmented.

What is Master Data?

Master Data originated with simple analog systems such as library card catalogs, which supported the management of all the books or documents that were in the repository.  A variety of attributes for each book, document, or file folder was documented to be accessible.  The identifiers used included anything from creation date or person to object source and/or access location.

The term metadata was coined in the late 60s along with the early databases to describe the same kinds of data in a digital format.  Early Master Data Management was usually tactical and focused only on key domains of data (customers, staff, products, assets, facilities, schools, etc.).  Frequently, only a small number of domains were included as departmental initiatives.  An increased appreciation for data’s value as well as rapid growth of organizational data resulted in a goal of enterprise-wide MDM.

Many of the clients I have worked with over the last two decades have more than 200 data systems.  Some of these have thousands of tables and tens of thousands of data elements with the same information existing in multiple systems.  In large school districts, for example, some data (such as students, staff, classes, rosters, tests, scores, etc.) will exist in more than 20-50 different systems – some systems just containing a copy of the data and others adding their own attributes.  Which source is the “system of record” and what is the filtering and latency for that same data in other locations?  In cases 2 to 5 steps away from the system of record, moved by different code/processes, implemented by different developers, and scheduled by different platforms, what transformations are taking place?  That’s a lot of metadata to create and manage.  That is where MDM applications become critical to effectively manage all of the data.

Styles of Master Data Management

There have been different schools of thought on how Master Data Management should be implemented and at this point, practitioners will advocate any of them depending upon an organization’s needs.  In fact, they will consider them to be evolutionary phases.  These are the current industry accepted approaches:

Registry

This is the traditional notion of MDM, where the solution really collects data from the source systems and does the work of cataloging, merging/de-duplication, cleansing, and then presenting the data for all other downstream systems such as a data warehouse.  The source applications remain the system of record.  This is the model for MDM that I have been involved in the most and am confident to say is the most common approach and generally should be the most cost/effort efficient.

Consolidation

This approach builds upon the Registry idea.  Once all of merging and cleansing is complete, business owners for each piece of data can review/approve and otherwise augment the master data.  While much of the master data is managed in the source applications, the MDM platform becomes the system of record for all downstream applications.

Co-Existence

This approach builds upon the Registration and Consolidation approaches by supporting a feedback loop to the source applications so that any updates made by the data owners in the MDM platform can be integrated back into the source systems.  Downstream applications would still be fed from the MDM platform.

Centralized

To MDM purists, this is the goal and the likely source for much of the general pessimism with MDM.  With this approach, the MDM platform becomes the system of record.  Master data (the staff, customers, students, programs, assets, products, etc.) is managed within the solution and then synchronized with all other systems which need it, including the transactional systems (which were the originating system).  If a school needs to enroll a new student, the MDM system must be involved in that transaction.  To make it seamless, the Student Management System would take the data from user, background check that with the MDM platform to determine if a record exists, and if not, create it in both systems.  In this model, the master data is known by practitioners as a “golden record.”

MDM Software

A comprehensive MDM solution needs all the capabilities of several different applications: Extract, Transform, & Load (ETL), Data Quality and Cleansing, Metadata Management, Enterprise Application Integration (EAI), and Data Repository.  That is a lot of functionality.  As a result, the traditional vendors for ETL and Data Cleansing are some of the leaders in MDM.  There are a variety of tools offered by more than 40 vendors which include all or some of that functionality.

This is why understanding which of the aforementioned approaches/models of MDM your organization should implement becomes so critical.  If the Registry approach is determined strategically to be all the organization requires, then less functionality is suitable.

MDM software options can be classified as follows:

BI Embedded MDM:  The commercial MDM tools I have worked with most were part of business intelligence (BI) platforms and especially ETL tools.  Think of software vendors such as Informatica, Talend, and SAP/Business Objects – this is how IBM and Oracle’s solutions originated.  Such tools perform all the major functions such as capture, catalog, profiling/analysis, rules definition, cleansing, and augmenting the application data with semantic metadata (defining and describing the data in straightforward business terms).  The data collected and managed in the embedded MDM application layer then feeds into the reporting/analysis layers of the BI platform.  The same software can populate the Operational Data Store (ODS), the Enterprise Data Warehouse (EDW), and be a source to all downstream applications.  The issue with many of these tools is that they can be a closed system — they only communicate effectively with the other tools from that software vendor.

Enterprise MDM:  These are platforms which may have started as ETL and BI and related applications, which have been integrated but offer the full MDM technology stack (the whole enchilada).  As most technologists might expect, IBM, Oracle, Microsoft, and SAP all offer enterprise MDM solutions and they are all considered leaders in the field (Gartner and other surveys I have seen agree).  Orchestra EBX is a pure-play MDM solution recently acquired by TIBCO (who already had MDM technologies) and this is considered a leader as well.  There are many niche players who may offer most all of the functionality but either specialize in certain industries or pieces of the technology stack.  The reputation for high costs and complexity are deserved, but if the organization is committed to comprehensive enterprise MDM, this solution may be worth the investment.

Homegrown MDM:  This is without a doubt the most common type of solution for Master Data Management and what I see the most.  These solutions frequently start more as an enterprise integration initiative such as an ODS and/or EDW and then data cleansing, semantic metadata, and data augmentation is incrementally added.  The initiative morphs into MDM over time.  In combination with some commercial ETL and cleaning components this can be an effective and affordable option if the organization is interested in less than the full Centralized model of MDM.  The most common issue with this approach lies within the solution.  Built tactically over time without any vision of enterprise MDM along the way, it is often cobbled together with different technologies, by different teams, and often higher costs as the organization’s data systems evolve.

So, Is It Time for MDM?

As of today, It is rare for me to encounter an organization of any size which does not appreciate the value of their data and the costs of integration between the applications and the need for data governance.  Senior leadership commonly understands that data quality and governance is something their organization should invest in.  MDM is less prevalent because decision-makers lack awareness of the options or have negative perceptions of the pure Centralized model of MDM and the historical costs and complexity of the Enterprise tools.  Many organizations do not realize that they already have some (or all) of the pieces, which simply need to be utilized more fully.  Fortunately, these are easy obstacles for us to overcome.


Brian Pritzl is a Director at Clarity Partners with over 28 years of consulting, data analysis, research, and data architecture experience across a wide variety of industries, database platforms, analytical tool sets, and scope of data from project to departmental and enterprise.  Speaking to his expertise in data fundamentals, Mr. Pritzl has laid out the basics of Master Data Management (MDM) as part of an overall data organization strategy.