Data Strategy: Getting from Defense to Offense — A User’s Guide


Arrayo spoke with industry executives and experts about the keys to success in data governance. We talked to Chief Data Officers, Chief Digital Officers, Chief Financial Officers, and experts in finance, pharma, biotech and healthcare. Our mission was to find the answer to one simple question: what works?

The people we spoke with come from a variety of organizations — some companies are young, agile startups, while others are mature Fortune-500 corporations. Everyone we interviewed believes that data is a business asset. They all agree with this theory, but how it manifests itself in the real world takes many forms. In our previous research project, we discussed data culture and “data nirvana”. All interviewees want to implement a robust data strategy, but how to get there and how to find the right balance between data defense and data offense is the challenge. Implementing a data strategy is an achievable goal, and often stems from the delivery of key tactical initiatives. When the data office is new, it may have been created to deal with tactical problems. In this case, the scope of the first deliverables is probably known: keep the shop running, deal with regulatory demands, make sure the analysts are happy, or fix the reporting. There will be a lot of “projects” on your plate; these will mostly be tactical fixes to get things done fast. The main question remains: how can you pivot from tactical short-term fixes towards strategic long-term initiatives that bring a competitive advantage? How do you move from defense to offense?

In this article, we present a series of ideas to help you refocus the function of the data office from a tactical solutions provider towards a more strategic enabler. Each idea presented resulted from discussions we have had with C-suite executives who have deployed a diverse suite of successful approaches. The progression of each suggestion moves from an immediate tactical initiative towards overall strategy, or from defense to offense. Defensive actions include responding to specific regulatory requirements, fixing data or unifying specific datasets to tackle a specific business issue. It is about defending the enterprise. Offense is about creating new insights, leveraging what has already been built, creating value with qualified and trusted data, enabling machine learning and AI, and supporting the expansion of analytics across domains.

Keep your Tactical Solutions Close and your Strategy Closer

Strategy gurus teach that you need a well-defined strategy before starting to implement one. Embrace the tactical, but don’t lose sight of your strategic goals. If you are new to the data office, you will need to get your arms around the data landscape, data culture, various pain points, and ensure you know who the key players are. As you do this, make sure you start to think about what your strategic goals are. Keep your strategy simple, powerful, and explainable.

Set some easy talking points around your strategy. These can be used in management presentations, executive communication decks, workshopping ideas, and as an elevator pitch. Here are some samples:

  • Data democratization — making sure good data is available to everyone, on demand, with little or no delays.
  • Trust the data! How much confidence do your stakeholders have in their data?
  • Instant time-to-market for data — eliminate manual touches and automate.

Make certain each strategic goal is data-centric and achievable. You will have to deal with a bewildering array of applications, data lakes, data warehouses, reporting environments, and legacy systems, but you will be better off if your data strategy is focused on a set of system-agnostic goals. At this point, it’s best not to include “replace Old System X with New System Y” as a goal.

Focus on the Business

So, how do you get from defense to offense? First, confirm everyone understands the Chief Data Officer (CDO) is a business-oriented role. As you take on new tactical issues, ascertain that you deliver business value by pivoting from an IT-driven role to a business-driven role. The tactical solutions you implement need to support your business “data customers”, whoever they may be. In Finance, this could be risk, analytics, regulatory and reporting groups, or customer-facing roles. In pharma and biotech, it may be the pre-clinical scientists, translational scientists, RWD/RWE (Real World Data/Evidence) data scientists, marketing/sales groups, or management reporting. Your data clients are not IT.

Start Measuring the Bad Data Tax

You hired people to do analytics. You may have some great analysts who are unable to focus on the role they were hired for since they need to spend time and resources on data collection, data correction, data cleaning, data curation, and data manipulation. This preparation effort should be measured and communicated. One of our respondents conducted an internal study on time and resources and found that analysts spent 80% of their time on obtaining and preparing data, and 20% on actual analytics. This result was a real eye-opener for the C-suite, who had assumed that the time proportions were the reverse (20% on data prep, 80% on analytics).

Other CDOs have also calculated the cost of bad data. This is called the “bad data tax”, and each group may define it differently. One way is to estimate the time spent waiting for data, fixing data, manipulating data, and cleaning data until it is fit for purpose. If you take a conservative estimate of the time, number of people involved and salary levels, you can compute the dollar value of the “bad data tax”. Include other factors such as post-hoc data fixing and overrides, and you can approach a true picture of the cost of bad data. Let every manager know how much bad data is costing his or her group to strengthen your buy-in.

Nail Operations

In the early days, you need to show some wins. While the goal of the data office might be to implement a robust data quality and data governance strategy, you may first want to make sure that the middle office and support teams have a strong data foundation. If you want to show your value to the organization, you should make sure the people who run the day to day operations have rock-solid operational best practices regarding data management. This will help streamline and improve their daily business processes, which is something everyone will notice. Get the nuts and bolts in place: if the data wheels are running smoothly and without manual interventions, you will get people’s attention. For instance, if you create runbooks (documentation and instructions) for production jobs, an apps/systems inventory or high-level information flow diagrams, people in IT and other groups will notice and see value. Other examples mentioned by the experts we spoke with were tools that help people understand and visualize data feeds, data sources, data flows and the data lifecycle in very concrete ways. You can also bring value to the teams who are running the daily processes by creating tools and workflows that will help them manage changes in their data. For instance, you can work towards good change controls, intake control, and data offboarding procedures.

One of the first touchpoints with other groups is a data quality issue log. That is a great entry point to see where the data pain lies. If you can get a standard data issue management process in place and follow up with plans for change management, you will begin to set a solid foundation. Once the foundational pieces are in place, you can begin to think more about what a data strategy will look like for your enterprise.

In the life sciences, there are complex data processing and cleansing steps in next gen sequencing (NGS) that can have impactful downstream effects, both good and bad. For example, a scientist may receive results from an informatics group that simply don’t make sense, and which cause them to question the analytics, the assays or even the entire study. One CDO we interviewed was having NGS data processed by a third party, but their scientists kept questioning the results. Since the company had implemented advanced data hygiene and a robust data strategy, they were able to use informatics experts to analyze the file dumps from the third party and identify a key calibration error that was skewing their entire lab’s analytics. Without the robust data strategy and data hygiene policies it would have been impossible to track down where things were going wrong in that data lifecycle.

Find the right people who will benefit from operational efficiency. They are the ones who have a million spreadsheets and who can’t wait for a three-day close to get the “official” version of the data. In analytics, it’s the people who spend 80 percent of their time getting the data ready and 20 percent of the time doing analysis, instead of the other way around. Get their data wrangled and automated and use that as a wedge to drive towards a data strategy. As one of our banking C-level contacts says, “people [CDOs] start with operational pain due to volume and risk — but a trader may pitch the wrong trade if there’s an error in the data.” In other words, the priority will typically be around easing pains encountered by people who are running operational processes due to the high volume of activities and risk associated with these. However, there are other risks related to errors in data that can have big, high-profile negative impacts.

Manage the Data Lifecycle Well

One place where tactical can segue into the strategic is managing the data life cycle. The terminology of a “data life cycle” may make eyes glaze over, but everyone understands data pain — bad data, late data, stale data, conflicting answers depending on who you ask, endless reconciliations, and requests that can’t be met for data details that contributed to a result. You can solve some “data pain” if you start to implement good change control in your data processes. When a system that houses the data is updated or replaced, your data customers want to understand what it means to them. Will changes upstream break their reports? Will the new system give them everything they rely on now? You can help them control and manage the change at the data level so that they will understand how this system change is going to rock their world.

For example, remember what happened when the Euro replaced legacy European currencies? Every system in every firm had to have a plan for the transition. The old currencies had to be depreciated and replaced by the Euro; conversion rates had to be managed; historic transactions had to be maintained and new transactions had to correlate to them. Your internal data landscape may have changes that are not quite as global and drastic, but by providing guidance, assistance, change control processes and tools to help your data customers manage and track data change, you’re smoothing the road to more strategic services.

Look for other data life cycle services your office can perform. Besides change control, data provisioning is a key area. Find the groups in your world who are hurting from a “data drought” because there isn’t a smooth provisioning process. In plain English — they can’t get the data they need. The data office can find the right data for them and make it easy to locate, available when needed, and fit for purpose. In HealthTech, this means providing FAIR (Findable Accessible Interoperable Reproducible) data quickly to the right people. Introducing and supporting a good provisioning process to automate high quality, trusted data flows for new data clients will get you noticed.

Another component of the data life cycle mentioned several times during our interviews is a good data validation process. In HealthTech, you may need to validate that an algorithm is doing what is expected. There may be a machine learning algorithm that is too complex to show the clinicians why it is doing what it is doing. In such a case, you can add a manual validation step that uses experts to look at the output to make sure it makes sense and that the data is accurate. The expert confirms proper capture and transformation of information based on source documents. The selected data is tested by the experts who compare it to official documentation explaining what the algorithm is supposed to do. A specific example is a complex algorithm that is used to identify patients who could benefit from a new treatment: you want to make sure your experts can validate the input and output data. In financial services, an example is the transaction testing performed on CDEs in reports that are considered high-risk reports. A team of experts will sample and test transactions to ensure accuracy of the data used to prepare regulatory reports. This can involve tracing individual transactions back to source documents such as the original contracts.

Fix the Report

The first words you might hear from your data customers is “fix my report.” Be sure to communicate that “fix the report” actually means “fix the data”. You must connect the dots between “fixing reports” and good data hygiene resulting from a robust data strategy. When you can show people, in concrete ways, that you fixed their report by fixing specific aspects of the data, you can start talking about best practices and strategy. People want best practices — but sometimes they need to see it in action before the buy-in happens.

One of the CDOs we interviewed has implemented a report certification process as a service performed by his group. They started with one critical report and certified the data that filled each figure on the report. There is a formalized process to validate the data underlying a report: human expertise is also brought in to validate the data used to calculate the figures in the report. This is a report-centric exercise that involves drilling into the underlying data and validating it in order to certify the report. Once all the figures in the report are understood and the details documented, the report is deemed as certified.

They started with a couple of user groups and soon the snowball got rolling. New groups queued up to get their reports certified, too. This is a great example of how a data team can step in, show value, get traction and attract new customers. Furthermore, while going through the certification process, each of these groups learned a lot about their data landscape and gained knowledge about the underlying issues causing inefficiencies and breaks in their reporting processes. They saw how problems such as data domain overlaps, manual interventions, tactical band aids, and black boxes affected the content and preparation of their reports.

Break the Silos

Once you’ve got a few wins in the “fix the data” arena, you can leverage the knowledge gained and start to move towards strategic goals such as data democratization. In other words, it’s time to break the data silos. In most organizations, the data is hard to use and is not being used to its full potential. Most people in your organization think data is the same thing as the individual system, but once you’ve put down a few of the foundational items already mentioned, you can start to show how data is not tied to one system. This is a culture shift, and it comes slowly.

One of the most powerful things a CDO can do is to integrate disparate data sources. Focus on solving a specific problem by linking datasets to create new answers. It may be about bringing translational data or RWD into discovery in biotech, or a reconciliation problem between the front and back offices in financial services. Whatever specific problem you have chosen to tackle, you have probably already found inefficiencies and different ways of managing the same piece of data. You may have already spent months or years gathering data across different systems. Furthermore, you don’t want to build a new system from scratch. You can, however, break the silos in other ways.

Start by taking a few data domains and begin linking known datasets together. You may wish to evaluate some of the newer, scalable tools that would support this data unification effort, such as Tamr. Life science companies may use tools like Tamr to combine data and work schema mapping especially with CDISC (Clinical Data Interchange Standards) data for translational use. Financial services companies may use data unification tools to master their customer data for KYC (know your customer) and AML (anti-money laundering) compliance, customer 360 view, trade reconciliation or to aggregate and classify siloed spend data. In all cases, the payoff is huge.

One piece of advice from a few of the interviewees is to start with unifying data to tackle a specific business challenge. The cleansed and organized dataset you have created can now lead directly to more strategic projects by creating new insights. This is the type of success that you will want to showcase to other data customers and your management to show them the value of unifying data across the board. One tip: your firm is probably already using machine learning to analyze data and make business predictions, but you can also leverage machine learning for less glamorous tasks such as cleansing, organizing, and linking datasets.

Catalog Good Content

A catalog is a powerful tool. You don’t even have to create a separate initiative to create a catalog — you can integrate creating and maintaining a data catalog as part of all your other initiatives. Think of the Library of Congress — without a good system and catalog, no one would be able to find a specific title from the millions of books inside. Datasets are the same. Once you have the framework of a good catalog, your stakeholders will find ways to put it to use. For example, analytics groups may have datasets in house, but the team goes external because they can’t easily see what is available. Even if datasets are decentralized, a centralized catalog that has a good interface and is easy to use becomes foundational for other initiatives. Get your inventory catalogued in one place so everyone can see it. It doesn’t have to be fancy — shared spreadsheets are fine in the beginning.

In the life sciences, pharma and biotech companies are consistently reinvesting in procured data sources and spending time curating and wrangling data sets that may already be available. Establishing a strong data catalog of data assets helps move towards more FAIR data practices. Reducing redundant data curation efforts and repeated procurement of external data assets is a big win.

Start to build other content. A data glossary of terminology is useful to get others involved. Think of it less as a data dictionary and more like a Google Translate for data elements and data points. Create something that works like a wiki, and that will get people involved in the exercise. Help your content creators get comfortable both using and contributing to the glossary, and they won’t be able to live without it.

If you use unstructured data or external data, spend time on categorizing that data and governing it before it gets into your environment. Upfront categorization and documentation go a long way to laying the foundation for strategic initiatives such as machine-readable data. Set standards such as naming conventions, metadata standards, and reference data standards. Work towards robust master data.

Once you get more mature with the building blocks of an ontology (this concept is explained in the next chapter), you can connect them to create powerful data tools. Remember — build the foundation first and add complexity later. Make sure you can sustain what you have.

Bring Data to the Business Case

As part of the “data offense”, a CDO should incorporate the data vantage point in the business case and in the project planning process. Data aspects should be considered starting from project inception. It takes time to get there, and you will need to show the value of your early involvement in critical initiatives before more stakeholders are convinced that you should be at the negotiation table from day one.

A way to get started would be to find the top priorities of your most important stakeholders, understand the goals each initiative is trying to achieve, and then spend some time understanding what data (and metadata) will be needed to support each initiative. Even more important, you should figure out what kind of data holes or roadblocks could jeopardize the success of each initiative. The goal is to create a data plan that supports the goals of the top three initiatives and make sure everyone is on board with the data strategy.

A data ontology is a great way to address your specific business case. An ontology encompasses a representation, formal naming and definition of your triples — subject, predicate, object — that substantiate your business case. As new ontologies are made, their use will improve problem solving within that domain. If you can show each step and how it contributes to the goal, stakeholders will understand why you need to get the foundational building blocks in place first. Remind everyone that you need to start with a glossary, a dictionary and a taxonomy: these tools will enable the development of an ontology that will serve your business case well.

Keep in mind that the data initiatives must be clear and that they won’t align one to one with software and platform deployment initiatives. Good tools to drive the development of your program are assessments and surveys. They can be useful to prioritize projects and initiatives. For instance, you can take a hard look at how data provisioning requests are handled for a specific initiative, and whether there is an approval process. You certainly do not want to put up roadblocks or frustrate stakeholders, but you can streamline the provisioning process by asking questions such as: what data sets do you need? For what purpose? Is it an approved source? How will it be used? Your stakeholders will be interested in an approval process for data needs that supports their business case with data that is findable, accessible, interoperable, ready to use (and re-use), well-described, understood, and of high quality.

Think Ahead

There is no doubt that data is a market enabler. The question is: what should you do to go from a defense mentality to a proactive data strategy that will support the expansion of analytics and help the business get insights and answers to their questions faster? Once the foundations are in place and most of the fires have been put out, it is time to expand analytics and be proactive instead of reactive. Once you can attest to good data health, you can engage data scientists. That’s when you should find holistic thinkers in your organization who are not stuck with a myopic viewpoint and can data mine across silos.

Remember the 80/20 rule: your analysts spend 80 percent of their time wrangling data and 20 percent of the time doing analysis. In order to flip that proportion, concentrate on curating data and integrating it. Budget for people to wrangle the data, and make sure there is a platform for the curated data. You can then expose the curated, unified data to other groups and systems that need it. That’s how you start to develop data as a service. You can encapsulate complexity by building data services that bundle authentication, functionality, and data content. New toolsets and uses will start to evolve naturally. Make sure each of them fits into your strategy.

You should also recognize the potential for change in your industry sector. What’s coming down the pike? How will it play out? What role will data play? As one of our CDO interviewees in the healthcare sector stated, “There will be fundamental shifts in how healthcare is delivered, and it will be data driven.” In financial services, the way algorithms are built has already gone through a fundamental change and has dramatically changed the way business is done and decisions are made. Talk to your colleagues and peers and see where the next trends lie. Ask yourself how you can build capacity and data robustness to embrace fundamental, disruptive industry changes in the future and keep your company at the cutting edge.


This paper has laid out concrete examples of tactical wins and defensive actions that your organization can leverage to gain momentum and deploy offensive strategy. By building in a strategy from the start, you can begin to create a data organization that fits your own vision. It is important to remember to work on getting your data foundations firmly in place, but also never to lose sight of the bigger data picture. It may seem early on that strategic goals will have to wait, but we hope we have shown some ways that you can build a solid pathway towards them.

We hope you have enjoyed this article and would love to hear any thoughts or reactions you have to our article. Feel free to reach out to or write a response down below.

*This article was written for SteepConsult Inc. dba Arrayo by Renée Colwell.


Data Strategy: Getting from Defense to Offense — A User’s Guide