The Publisher/Consumer Bottleneck: How Quality Management Systems Can Address the Limitations of Data Warehouse Models

Your organization’s approach to data access and analytics may be resulting in massive inefficiencies and delays that can impact your operation at every level. And worse – you may not even realize it.

Many companies operate in “data warehouse” information environments, dumping data into a single location and doling it out on demand.[i] Data can only be analyzed as quickly as it can be accessed, and limited IT staffing often prevents needed information from being accessed, and thus effectively utilized. The problem is the use of flat-architecture data lakes. Storing vast amounts of data in raw formats, they render it almost inaccessible; it is disorganized, lacking source information, priority, or other vital structural elements designed to make retrieving information a simple process.

The trouble trickles down from there. Bad data structuring leads to bad data analytics, and slow data accessibility forces companies to increasingly focus on finding better tools rather than solving the problem at its systemic root. In other words, the problem is self-reinforcing. Breaking out of this cycle is essential, and it requires recognizing what exactly the problem is.

The trouble with a data warehouse or data lake model is that it depends on a traditional publisher/consumer arrangement that creates bottlenecks; source systems publish to a data warehouse for consumption by end users, for whom the data must be manually retrieved, but because data retrieval requires inputting specific queries in structured language, queries can only be made by trained professionals. This is especially true when queries grow more complicated; this “man-in-the-middle” approach is usually the only way data can be retrieved at all. Outside of simple data requests using a small number of pre-defined queries, the data cannot be procured apart from the mediation of an IT professional.[ii]

This can take weeks. The retrieval process is amplified over time; if the desired information isn’t present in the retrieved data, the process repeats itself as often as necessary until the end user gets the data they needed initially. Not only is the individual request process unnecessarily time-consuming, but its endless repetition makes it increasingly so – and this is discussing only a single analytics case. This systemic inefficiency is repeated in case upon case throughout every part of the company, consuming valuable man-hours and slowing analytics work to an unnecessary degree; often to a near-standstill.[iii]

This process is even further complicated by the fact that many large companies have multiple business intelligence systems all feeding data into a single data lake. Different sectors of the same company that need their information formatted or organized in certain ways to feed into their local reporting and analytics needs to often work at cross-purposes with other departments, while still feeding their data into the same disorganized pool. It is a kind of structural chaos, caused by information being entered into the data warehouse in a wide variety of formats, organizational schemes, and potentially even languages; while some overlap is to be expected, every data source is necessarily different from every other data source. Which means that the data an end user needs isn’t always in an accessible format, leading to faulty reports.

Simply put, the data warehouse model is not scalable to large enterprises in terms of hardware, platform, or future planning.

Failing this, the end result is the necessity of focusing on tool selection; data in these disorganized masses can only be accessed by the tools that are designed to use them, which can even further complicate and slow the analytics process. Couple these issues centered around data discovery, organization, and retrieval with the necessity of analysis, and the problem becomes clear: inaccessible data leads to late reporting and suboptimal decisions.

The emphasis must be on ensuring efficient and accessible data structures are in place from the get go, in which data sources are organized, classified, and the data between them structurally reconciled, creating an environment that is characterized by both information agility and business agility. A well-structured data environment is a prerequisite for analytical tools to have any utility; by making data accessible to flexible analytics, companies are able to engage in the kinds of strong internal self-analysis that allows optimal, data-driven business decisions to be made.[iv]

 

Subscribe to the Sparta Systems Blog. Enter your email address:

Delivered by FeedBurner