Spend Analytics: Part 4 – Steps in the Spend Analysis Process


Attaining value from spend analytic projects requires setting explicit goals, a good foundation of data hygiene, the right level of classification, great analyst skills, and last but not least – decisive improvement actions must be taken.


Planning: Setting Direction and Goals, Assessing Readiness, Project Planning

Image by Ronald Carreño from Pixabay

Spend analytic projects are often driven to meet key goals of the CPO and procurement organization, chiefly cost savings. However, as mentioned in Parts Two and Three, there are also many different possible uses or goals for spend analytic tools and projects, which might be driven by other executives or offices as well. The goals of each spend analytics project should be encapsulated by very specific metrics and measurable achievements, to guide the efforts and measure success. The CPO will definitely want to be able to quantify the accomplishments.

The types of data required will almost always include supplier, commodity, and spend information. But which information and which level of detail are needed? Do you need P.O. information? Down to the line item? Do you need performance information from execution systems, such as on-time delivery metrics. Do you need external data feeds to enhance internal information you have about the supplier, or to provide commodity market price information? Perhaps you need benchmarking data on what your peers are spending on specific commodities or services as a percent of revenue, or some of their procurement metrics. These requirements should be understood; it all comes back to the clearly stated objectives of the project.

Exposing Data Issues

Doing a spend analysis project often uncovers problems with a company’s data1 and can be the impetus for cleaning it up and making it more complete and accessible. Common challenges may include:

  • The required data does not exist within any corporate database. It may exist instead in unstructured Word or PDF files or scanned faxes, on paper only, is not recorded, or never even was sent from the supplier or trading partner.
Image by Mohamed Hassan from Pixabay
  • Data is spread out over many systems, in many formats.
  • Data is dirty or not normalized or duplicated — e.g. 10 different part numbers used for the same part, 15 different variations of spelling of a supplier’s name.
  • The fields exist in the database, but are unpopulated or thinly populated, or have incorrect data in them.
  • High level data exists, but the necessary detailed data required for the desired analysis is missing.

Before making promises to meet specific project goals, it is, therefore, critical to assess the availability and state of the data needed. It is often the case that a significant auxiliary project is needed first to get the feeder systems, databases, and data into shape. This may involve steps such as:

  • Adding the fields and data collection mechanisms required to start populating the needed data. This will still not solve the historical data problem, but provides a path for analysis capabilities moving forward.
  • OCR-scanning of existing paper documents. This can be expensive and may be justified only for high-value historical data.
  • Supplier data integration programs — via EDI, MFT, or web-based self-service portals — to automate data collection.
  • Supplier incentives for data hygiene and completeness — for supplier-provided data, it may be necessary to establish consequences, such as penalties or payment delays, for suppliers who fail to keep their data up-to-date and accurate.
  • Data management programs — person(s) dedicated to maintaining the quality and completeness of the data, in the most automated and cost-effective ways. This could be a part time responsibility, as full blown MDM programs may be affordable only for large and mature organizations.

In making these investments, teams should take a step back from the specific analytics project and assess the value for other uses. Sometimes these data improvement projects cannot be justified for the single spend analytics project, but can be justified when the value of the investment can be realized over multiple projects or in other ways by the company (such as increased customer satisfaction and reduced costs due to fewer errors).

At the same time, you (or your bosses) may decide you are not ready to make these investments, or may realize they will take some time. In that case, the priority may be switched or goals scaled back to analysis projects that are achievable with the current state of your source systems or with a more modest investment in the data cleanup and augmentation.

Loading Source Data

Once a plan is in place, a mechanism for extracting and loading the data needs to be configured. This will involve things like identifying the source data systems and fields, mapping them to the spend analysis system, and specifying any needed transformations. Most companies typically will use a traditional ETL (Extract, Transform, Load) tool for this step. Especially for larger projects, where you are pulling data from dozens of systems and hundreds of fields, something is bound to change between data refreshes that will impact the mappings. Mechanisms should, therefore, be set up to monitor these systems and alert the team when these changes occur.

Making the Data Usable: Cleanse/Normalize/Classify

Now the actual work of fixing data quality problems begins. As mentioned earlier, systemic issues that cause quality issues should be addressed, whenever possible, or you will be receiving a steady stream of dirty data. For example, if there is a source system that lets the user type in any spelling they want for a supplier’s name, that system should be modified, if possible, so that it instead presents a drop down list allowing only the correct, single version of the supplier’s name to be used. Another example is implementing barcode scanning or system-to-system integration to replace manual rekeying of data that already exists on a label or in another system. Those types of fixes will constitute a series of separate projects for cleaning up the source systems data that will bear fruit over time in higher data quality and reduced cleansing efforts going forward.

The actual work then involves things like identifying and fixing duplicates, filling in missing data, normalizing the data to use the same units, and classification. Most systems have technology that can semi-automate this process, such as tools that recognize and correct duplicates or common misspellings, or help with the classification. Some of these can be quite sophisticated in their algorithms.

Classification Taxonomies

Source: Image by UNSPSC

A key step here is classifying suppliers, commodities, and services. The dimensions and depth of classification that you will decide to do depend on the goals of the project and the amount of effort and cost required. For commodities, one of the most widely used classification taxonomies is UNSPSC (United Nations Standard Products and Services Code). However, this does not meet all companies’ or projects’ requirements. As the saying goes, “The nice thing about standards is there are so many to choose from,” and that is true of classification taxonomies, which include many industry-specific or domain-specific taxonomies such as NIGP, UPC, GTIN, APN, AHFS, eCl@ss, ETIM, RNTD2 and many others. Custom, company-specific taxonomies may also be used. Commodities may be classified by other dimensions besides the type of product or service, such as regulatory status (e.g. controlled substances); handling requirements (e.g. hazardous materials); environmental impact; and all sorts of parametric data like size, weight, dimension, power consumption, capacity, etc. Again, the possibilities are endless and it all comes back to what you are trying to accomplish.

Suppliers may be classified by industry, parent corporation, capabilities, certifications, geographies, diversity ownership status, company size, credit scores, regulatory status, number of lawsuits, and just about any other dimension you can think of.

Classification Process

Classifying is most commonly done by first running the data through a rules-based classification engine, and then doing manual checking and correcting, often done by a team of low-cost offshore labor. The rules in the classification engine can take the form of correlating text strings. For example “CA”= “Cellulose Acetate.” There may be further rules for exceptions, for example, if the vendor = “Permabond,” then “CA” = “Cyanoacrylate.” These rules can consider other factors such as which business unit is purchasing the product, in which geography, for which project code, or just about any other factor that might change the meaning and classification. Typically the solution provider has built up a base library of these classification rules over time and allows customer-specific rules to be layered on top.

Classifying is a labor-intensive process. There is a point of diminishing returns for the level of completeness (% of commodities or suppliers classified) and accuracy (% of items correctly classified). No system ever gets to 100% coverage and accuracy. Some vendors may provide coverage guarantees or goals, (often in the range of 80% to 95%), and less commonly, accuracy guarantees (in the range of 70% to 90%).

End-User Classification and Data Cleanup

Almost all modern spend analysis systems provide a way for the end users to report misclassifications or incorrect data, ideally at the point they encounter it while doing their job. There is usually a mechanism to submit the recommended correction right then and there which kicks off a workflow process that routes the recommended changes for approval. Once approved by the data administrator, the new rule(s) are implemented and take effect for the next data refresh. The system should provide a closed loop workflow for correcting problems at the source data systems. By monitoring these issues, systemic data quality problems can be identified and resolved.

There is a tradeoff for spending more time to get the data to a desired level of classification. With the ability for the end user/buyer to make corrections, some advocate doing less upfront cleanup and letting end users do a larger portion of identifying problems while they use the system. The end users are the most qualified to make context-aware corrections at the point of use using their accumulated domain-specific and circumstance-specific knowledge.

Frequency of Data Refresh

Traditionally, spend analytics data is refreshed quarterly, or in some cases monthly. Some systems use less manual labor, enabling weekly or even daily refreshes. More frequent refreshes can be especially useful for certain types of analytics, such as identifying supplier risk, where you are looking for early warning of problems.

Finding Answers: Analysis

Once the data has been loaded, cleansed, and categorized, the fruits of all that labor can start to be realized. The analysis should be guided by priorities explicitly set as part of the project goals and objectives. This might include things like looking first at the largest categories of spend or areas where the largest savings can be most easily or quickly realized.

Image by Sergei Tokmakov, Esq. https://Terms.Law from Pixabay

With the flexibility and visual interfaces in modern systems, there are virtually an infinite number of ways to slice, dice, view, filter, drill down, visualize, and analyze the data. This is where the experience, skill, and creativity of the analyst will make a difference in how quickly and effectively answers are found. It is useful to create training, mentoring, and knowledge-sharing mechanisms so that individual capabilities can become institutional capabilities.

Making Improvements: Execute/Monitor

This is where the rubber meets the road. If you don’t do anything differently as a result of your analysis, then you have to ask, “What was the point of the project?” For example, once you have identified off-contract spend, then you need to put in place programs and mechanisms to get some of that spend onto contracts. The spend analytic tools can help in focusing those efforts by understanding things like why specific spend is off-contract (there may be legitimate reasons for portions of it), where to focus your efforts (e.g. organizations, commodities, or individuals with the highest share of off-contract spend), and so forth. Then the organization needs to put in place steps such as communication, education, changes to approval workflows, new rules in the procurement engine, and so forth, to move more spend onto the contracts.

The person(s) actually executing the changes is often not the same person doing the analysis. An analyst may identify the areas of opportunity for spend consolidation, then the sourcing professional puts together the sourcing strategy, creates and executes the actual sourcing events, and negotiates the contracts. The analyst will, therefore, need to package the information in a way that can be understood and absorbed by the intended audience. That person may also be an active participant in the ongoing implementation projects to provide further inputs and insights.

Monitoring and Measuring Improvements

Last but not least, companies need to monitor the actual results. Being able to measure and demonstrate achievement of the stated goals is the most direct way of validating project success. Then this success should be built on. An organization that gets good at the spend analysis processes described here — setting goals, pulling together the data, finding the answers, making changes, and measuring the improvements — becomes a ‘continuous improvement machine,’ where success builds upon success. Isn’t that the type of organization you would like to be a part of?


1 These types of projects often become the driver for cleaning up data and systems, as described in this supplier management case study. — Return to article text above
2 NIGP (National Institute of Governmental Purchasing), UPC (Universal Product Code), GTIN (Global Trade Item Number), APN (Australian Product Number), AHFS (American Hospital Formulary Service), eCl@ss (Classiification and Product Description), ETIM (Electro-Technical Information Model), RNTD (RosettaNet Technical Dictionary). — Return to article text above

To view other articles from this issue of the brief, click here.

Scroll to Top