Analytics Advantage – Part 1A: Democratization

What is the Analytics Advantage and What Keeps Companies From Realizing It?

Abstract

We explore the kinds of data being produced by transportation and logistics systems, how that data can be used to create a data-driven enterprise, and the substantial obstacles to achieving an analytic advantage.

Article

This article is an excerpt from the first report: The Democratization of Analytics for Transportation and Logistics.
A copy of the full report can be downloaded here.

This Part 1A is the first in a multi-part series on using analytics in transportation and logistics to achieve a competitive advantage:

  • Part One: The Democratization of Analytics for Transportation and Logistics — Here in Part One, we explore the kinds of data being produced by transportation and logistics systems, how that can be used to create a data-driven enterprise, the substantial obstacles to achieving an analytic advantage, and how those obstacles can be overcome.
  • Part Two: Analytics for Private Fleet and Driver Performance — In Part Two, we look at how analytics uses real-time location data, combined with orders, plans, proof-of-delivery, vehicle data, and more to drive significant improvements to fleet and driver performance.
  • Part Three: Analytics for Improving Carrier Performance and Leveraging Trade Data — In the third and final part, we discuss how analytics can create improvements to carrier performance while substantially reducing costs, as well as how trade data can help importers and exporters gain competitive insights, manage supply chain risk, optimize total landed costs, and more.

The Analytics Advantage

As companies continue to digitize their supply chain and logistics operations, they are beginning to recognize that they are sitting on a wealth of information that has an enormous, yet largely untapped potential to help increase their performance and improve their customer service. Increasingly, there are examples of how companies have gained significant value through the data that is available to them. Not only do their internal systems generate volumes of relevant data, but there are external sources that can be leveraged to give a comprehensive view of their business, supply chain, and even their entire industry. This level of insight is profoundly changing how companies use their supply chain and logistics operations and the value they deliver.

Supply chain and logistics systems generate a wealth of valuable data. Core planning and execution systems such as a transportation management system (TMS) or route planning, real-time visibility, mobile, and telematics are prime data generators. For companies that operate internationally, importing or exporting of goods often use global trade management systems to classify goods, submit filings, and ensure compliance, creating additional valuable supply chain and logistics data. All of the data from these systems need to be brought together into a cohesive interlinked set, in order to provide deep insights into operational performance and customer service. Companies that are able to intelligently use this data, become data-driven enterprises that realize a powerful ‘analytics advantage.

Figure 1 – Using Logistics-related Data to Create an Analytics Advantage

Data-driven Organizations Realize an Analytics Advantage

Data-driven, analytics-capable organizations are able to ask the right questions and extract tremendous value from the mountains of data they already have. They are able to find very specific ways to improve their purchased transportation or fleet performance, evaluate their ability to better serve their customers and the cost to do it, understand and optimize the total landed cost of their international sources, improve how well they are complying with regulatory mandates, and much more. Becoming data-driven fundamentally changes an organization’s culture, decision-making processes, and the role of metrics.

Table 1 – Characteristics of Traditional vs. Data-Driven Organizations

The Democratization of Analytics

What really sets a data-driven organization apart is how widely and deeply diffused the use of data and analytics is throughout the enterprise. Analytics should not be the exclusive domain of a few data scientists isolated in their ivory towers. Data-driven insights and the ability to ask questions should be available throughout the organization, at the point wherever decisions are made. Analytics is democratized by infusing them into the systems and processes used by professionals throughout the organization.

Analytics should be accessible to all — the people at the front lines making the day-to-day decisions and getting things done, as well as to the managers and executives responsible for functional and business unit effectiveness. Broad access to good analytics ‘lifts the blinders off’ for an organization. Employees gain much more specificity into where and how performance can be improved, as well as the tools for early identification and correction of execution issues. The organization becomes much more agile and competitive — quickly sensing issues and quickly adjusting course, solving those issues before they become bigger problems. Employees are more empowered to solve problems directly and effectively. As a result, morale, individual performance, and job satisfaction improve. Democratization is a powerful force.

What Keeps Companies from Realizing Their Analytics Advantage?

What Makes Data Wrangling So Time-Consuming?

Activities1 that make data wrangling challenging and time-consuming include:

  • Define and clarify the use cases — Working with the end users, creating clear definitions of what problems you are trying to solve, what kinds of entities are involved (people, transactions, products, batches, loads, etc.), and what data is likely to be relevant, potential sources for that data, etc.
  • Obtain access to the data/ensure proper security — Obtaining access to confidential data and putting mechanisms in place to ensure it remains secure; may mean exposing only derived inferences, while confidential data remains hidden and inaccessible. Figuring all this out, setting it up properly, and doing the necessary security testing/audits takes time.
  • Resolve key entities/data elements — This includes deduplication which can be a non-trivial exercise when entities have multiple identities or identifiers (like different spellings or part numbers).
  • Identify the relationships between the data — When pulling data from many different sources, the relationship between the data needs to be understood and incorporated into the analytic model. This can involve identifying natural keys and how they relate. The data owners will need to confirm that the relationships are correct.
  • Discover any changes to syntax or semantics — If the syntax or semantics of any of the data has changed over time, that needs to be taken into account, or any results incorporating data from those time periods will be incorrect.
  • Find and incorporate dispersed data about actions, outcomes — Data about actions taken and the outcomes that resulted may be dispersed across individual spreadsheets, emails, and other documents. Finding and incorporating this data is time-consuming.
  • Identify and adjust for biased data – Algorithms are only as good as the data they are given. Selection bias occurs when the sample data available is not an accurate representation of the real world.5 There are many other types of bias6 in the data as well. These can be adjusted for, but it is labor-intensive.
  • Feature Engineering — Feature engineering is the process of selecting and transforming data into attributes for a machine learning model to use. It is one of the most important and time-consuming activities for successful machine learning.
  • Correlation/redundancy analysis — Data that is redundant should be removed to reduce the complexity of the model. Examples might be when you have different fields for the same attribute in different languages, or a numerical value that fully represents another text attribute.

As more and more competitors become data-driven, those who don’t do so will be at a major disadvantage. Many companies are constrained in their use of analytics due to a set of interrelated challenges:

  • Data Science Talent Shortage — While there are shortages of IT talent across the board, data scientists and big data analytics specialists top the list as the scarcest.2
  • The IT Bottleneck — For organizations lucky enough to have some in-house talent, the backlog3 of projects can stretch for months or years.
  • Data Wrangling Challenges — Data wrangling typically consumes 50%-80% of data scientist/analytics project resources, with precious little time and resources left for building models and doing analysis.

Data Science Talent Shortage

The shortage of data scientists has made it hard for even deep-pocketed companies to hire and retain an adequately staffed in-house analytics team. LinkedIn’s 2018 Workforce Report said, “Demand for data scientists is off the charts.” The report singled out data scientists over all the other jobs.

The number of data scientist jobs grew by over 6.5X from 2012-20174 and continues to grow at over 50%/year. Most estimates indicate there are more than twice as many data science job openings as there are qualified candidates. The consensus is that demand for these skills will continue to increase faster than the supply of qualified candidates for many years to come.

The IT Bottleneck

Hiring good data scientists is only part of the battle. Most IT departments are already stretched thin and being asked to do more with less. Over 70% of their time (on average) is consumed maintaining and supporting existing systems, rather than helping implement new functionality or delivery of new types of analytics and reports. IT department project backlogs often extend to over a year (in some cases several years). Analytics projects just have to get in line. That is the opposite of being agile for a business trying to keep up.

Data Wrangling Challenges

Data wrangling involves finding, accessing, organizing, cleaning, and enriching data from many different sources into a cohesive data set that is useable for analytics.7 The effort required to do this is usually vastly underestimated. For more details on the activities involved and what consumes all that time, see the sidebar What Makes Data Wrangling So Time-Consuming?

Given all these challenges, individuals often manually gather whatever data they can and ‘grind out answers’ in Excel. They can use spreadsheets to try to find underperforming drivers, carriers, and so forth, to get a better handle on what is going on. Yes, they can get answers this way, but it is a very manual approach. A more scalable and sustainable approach is needed. Partnering with the right transportation and logistics solutions provider can be the answer for many companies.

In Part 1B of this series, we look at how solution provider capabilities can be leveraged to overcome some of these challenges.

_____________________________________________________

1 For more on these activities, see What is Data Wrangling and Why Does it Take So Long. — Return to article text above
2 In the Harvey Nash/KPMG CIO survey, CIOs were asked “Which functions do you feel suffer from a skills shortage?” Presented with a list of 22 different IT-related functions, Big Data/Analytics topped the list in both last year’s survey (What’s Driving Data Science Hiring in 2019?) and this year’s survey (see chart in the results here-scroll down to question 16). — Return to article text above
3 IT backlogs are not just due to the workload on data scientists from analytics projects, but all of the IT staff from a variety of demands such as maintenance and support of existing systems (which consume on average over 70% of IT budget and resources), massive implementations or major upgrades to ERP or other systems, integration and data cleanup efforts, and so forth. — Return to article text above
4 See LinkedIn’s 2017 U.S. Emerging Jobs ReportReturn to article text above
5 For example, suppose you have rich data about a subset of customers, but that subset is not a representative sample of your broader customer base. Perhaps you have certain data only from opt-in customers, and maybe they are younger or of a certain mix of persona types that is different from your overall base of customers. For more, see Picking Favorites: A Brief Introduction to Selection BiasReturn to article text above
6 Other types of bias in the data include seasonal bias, linearity bias, confirmation bias, recall bias, survivor bias, observer bias, and reinforcement bias. — Return to article text above
7 Data wrangling may also be used to prepare data for AI and machine learning uses. — Return to article text above


To view other articles from this issue of the brief, click here.

Scroll to Top