Back to Insights

The Shadow System

Every data organization runs two systems in parallel, whether it acknowledges them or not.

Every data organization runs two systems in parallel, whether it acknowledges them or not.

The first is the ‘official’ one: the data warehouse, executive dashboards, the validated and tested dbt models your team spent months building. The second has no name in any architecture diagram but it is where operations actually happen:

  • Email chains where approvals get resolved
  • Google Forms someone built because the ticketing system was too slow
  • The Excel file a senior underwriter has maintained for four years because it is the only place pricing decisions get tracked end to end.
  • etc.

    Most analytics leaders already know this problem exists. What is less understood is why the standard responses to it, better dashboards, stricter governance, faster pipelines, keep failing to close the gap.

    Surfacing the Data Is Not Enough

    When analytics teams encounter this shadow system, the instinct is to build visibility into it.

    “Just sync the Excel files into the warehouse on a nightly schedule.”

    “Document the workflow so it can be migrated into something more formal.”

    “Create a pipeline that reflects what is already happening.”

    This is structurally incomplete because surfacing the shadow data does not change where decisions happen. The approvals still live organically in email and Zoom calls, the quotes still get built in Excel. What you have added is a latency-prone mirror of a workflow that was never redesigned, maintenance burden without any closed loop between the decision surface and the data layer.

    The shadow system persists because it sits where the work actually happens, where the underwriter builds the quote, where the analyst enters the input, where the manager approves the request. Those surfaces are fast, familiar, and flexible in ways that most enterprise data tools are not. The question is not how to replace them but how to connect them, so that every action taken at that surface becomes a durable, queryable, auditable record in the system where the rest of your data lives.

    Two Examples Worth Paying Attention To

    At Sigma’s workflow conference in San Francisco last week, two case studies illustrated this problem as clearly as anything I have seen.

    At Upland Capital Group, underwriters were pricing insurance risk in Excel-based-raters. The logic was sophisticated, built from years of domain knowledge encoded into formulas and validation rules that experienced underwriters had refined over time. But the quotes those workbooks produced never reached Snowflake.

    They left the underwriter’s machine as files in folders, which meant pricing trends across underwriters were invisible, consistency was unmeasurable, and any attempt to build models on top of historical decisions ran immediately into the fact that the data did not exist in any queryable form.

    At DraftKings, forecasting inputs were moving through a combination of Google Sheets, Fivetran connectors, and Python scripts maintained by different people with uneven documentation and no clear system of record. When something in the output looked wrong, tracing the cause meant reconstructing judgment calls that existed only in Slack threads and personal spreadsheets. The inputs existed somewhere. The outputs existed in the warehouse. The decision logic connecting them lived nowhere that could be queried, audited, or learned from.

    These are not edge cases, they are the default state of some of the most sophisticated, data-intensive organizations doing serious analytical work.

    Write it Back

    Write-back is the upstream connection between the decision surface and the data system, where the act of working in the tool becomes the act of writing to the warehouse at the moment the decision is made, not as a nightly export or a manual upload, but as the primary record that is tracked with integrity within the data model.

    This is different from reverse ETL, which pushes data from a warehouse back into operational tools. Write-back runs the other direction: it makes the surface where decisions happen a native write layer to the system where data lives. In some cases that means building lightweight applications on top of the warehouse. In others it means instrumenting existing tools through add-ins or API integrations so that the Excel workbook or the Google Sheet stops being a silo and starts being a structured data entry layer.

    When write-back exists, every approval is a data point. Every forecast input is an auditable, materialized record that is now codified into the firm’s DNA. Every pricing decision is analyzable across the organization and over time, and can be pulled into experimentation and reinforcement learning exercises.

    The Cost of System Segregation

    Every day the two systems stay separate, you lose a day’s worth of decision data as a steady accumulation of approvals that cannot be measured, forecasts that cannot be audited, and decisions that cannot be analyzed or improved upon.

    The approval chain that lives in email is one you cannot measure, cannot improve, and cannot learn from. You can count the approvals if you go looking, but you cannot analyze the pattern of them, understand which underwriters are pricing consistently, or identify which forecast assumptions have historically been most wrong. Those are the questions whose answers compound into competitive advantage when you can access them and into invisible organizational drag when you cannot.

    Most analytics roadmaps are organized around output quality: better models, faster dashboards, cleaner pipelines. Those investments matter. But they address the official system, not the shadow one, where the actual decisions are being made every day and where the data that would make those decisions analyzable and improvable is being generated and immediately lost because nothing is connected to the moment it happens.

    If Your Two Systems Are Still Separate, That’s Today’s Problem

    Write-back doesn’t have to be a complex architectural overhaul. It is a targeted connection between the surface where your team already works and the system where your data already lives, and the organizations that have made that connection are asking fundamentally different questions than the ones that haven’t.

    If you are thinking of building out a data stack to support your operational workflows, inheriting a data environment after a close, or just tired of decisions that happen outside the system, we would like to talk through what that connection looks like for your organization.