December 12, 2025
Executive Summary
Accurately estimating utility project costs in the early stages of the project lifecycle is difficult, with systematic assumptions leading to significant deviations for portfolio planning and budgeting. Traditional methods often fall short in early project stages, relying on limited data and subjective judgment, which can result in wide variances and uncertainty. Advances in machine learning offer a transformative solution: by analyzing patterns in historical project data, ML models can deliver more precise and reliable early cost estimates. This data-driven approach not only streamlines construction and reduces costs but also improves infrastructure quality. As the models mature, they could even enable near real-time cost simulation during program planning, which could revolutionize how utilities structure multi-year capital programs.
How can machine learning algorithms improve the 'Expected Accuracy Range' from AACE Class 5 to Class 3 for utility projects?
Developing accurate project cost estimates in the infrastructure and utilities sectors is complex and challenging. Whether it's a transmission line upgrade, pipeline replacement, or power distribution project, utilities operate under constant scrutiny to deliver reliable, regulated, and publicly accountable results. Yet traditional cost estimation methods — even those supported by historical databases and experienced estimators — often struggle to deliver accuracy and precision early in the project lifecycle.
Within the AACE International classification system, cost estimates are ranked from Class 5 (the least mature, with limited project definition and data) to Class 1 (the most mature, supported by comprehensive project details). For many utilities, early-phase estimates are often constrained to Class 5 accuracy levels, meaning they can carry an Expected Accuracy Range from -50% up to +100% or more. This range creates a wide variance that poses challenges for budgeting, planning, and regulatory reporting.
Utility project managers often face an inherent tension at this early stage. They need to provide credible forecasts to guide investment decisions, yet limited design details, location-based averages, and historical benchmarks all create significant uncertainty. Even when estimators rely on statistical means based on comparable projects or geographic conditions, those rule-based means do not always cover all the relevant attributes and have not been tested to be scalable and generalizable. Relying simply on historical averages provides only a descriptive snapshot that provides general information about the past but is insufficient for predictive confidence.
These early estimates also carry human bias. In many cases, estimators and project managers are overly conservative in their estimates. There is a natural tendency in large capital projects to add safety margins to avoid budget overruns. While this conservatism is understandable, it can compound inaccuracies across multiple projects and skew long-term financial planning.
In recent years, however, utilities and their consulting partners have begun to explore how data science and machine learning can increase forecasting accuracy. By introducing data-driven intelligence early in the estimation process, utilities can move from descriptive to predictive techniques, uncovering patterns that aren't visible through conventional statistical analysis. When done successfully, this shift can elevate the estimated "Expected Accuracy Range" from AACE Class 5 toward Class 3 ranges (-20% to +30%), providing a significant improvement in forecasting confidence, well before detailed design work is completed.
Machine learning offers a fundamental change in how utilities interpret and use their historical cost data.
How can machine learning improve cost estimation?
Machine learning offers a fundamental change in how utilities interpret and use their historical cost data. Instead of treating past projects as static reference points, ML algorithms learn from patterns, relationships, and dependencies across multiple variables. These variables can include project scope details, geographic conditions, labor rates, commodity prices, equipment specifications, and even non-numeric or categorical factors such as project type or environmental constraints.
Traditional descriptive statistical methods rely mainly on univariate or simple multivariate analysis, focusing on mean or median values for given conditions. In contrast, ML models like regression trees, gradient boosting, and neural networks can process vast arrays of structured and unstructured data to identify complex, nonlinear interactions.
For utility companies, this approach provides two key benefits. The first is improved predictive accuracy for early-stage estimates. For example, a utility preparing an estimate for a new substation upgrade could use ML to draw insights from past projects that share similar characteristics like location, capacity, environmental permitting complexity, and contractor type. Rather than relying on a historical average, the model can calculate a data-grounded prediction with tested real-world patterns. This leads to more trustworthy early estimates, reducing the variance that traditionally defines the Class 5 Expected Accuracy Range.
The second benefit is analytical validation. Once detailed estimates are prepared at later project stages, the ML model serves as a "sanity check." During formal review cycles, utilities can compare the model's predicted costs to the estimates developed by human experts or traditional estimating software. If discrepancies arise, analysts can examine the underlying assumptions, identify outliers, and adjust either the model or the estimate accordingly. This feedback process encourages accountability and helps minimize human bias.
In one pilot case, Exponent developed ML models and found that five key variables could be sufficient to create a practical and predictive cost model. While the specific variables vary by use case, the general principle remains: With the right data modeling practices in place, even limited inputs can offer meaningful insights. Our model was able to progress from early conceptual estimates toward the Class 3 Expected Accuracy Range, often achieving levels within -20% to +30% well before detailed designs were available.
At this point in a project estimate process, ML expertise combined with deep construction management knowledge is invaluable. Data scientists understand how to build, train, and validate models and how to structure datasets, tune algorithms, and measure statistical performance. However, the domain context that defines what variables matter in a construction environment is critical. Construction management experts bring a grounded understanding of sequencing, labor productivity, procurement risk, permitting timelines, and contractor behavior. All these factors significantly affect project cost outcomes but may be invisible in raw data.
When these two disciplines collaborate, cost estimates can be much more accurate. Construction experts can serve as interpreters, translating the nuances of field activity, contracting methods, and regulatory constraints into meaningful data features. They know, for instance, when a cost spike is attributable to genuine complexity rather than to a coding anomaly or why certain variables may have counterintuitive correlations. The data scientists, in turn, convert that contextual insight into refined computational models that can detect and quantify such relationships. This partnership bridges the gap between numerical precision and practical relevance, ensuring predictions are not only mathematically robust but operationally credible.
Moreover, this cross-functional collaboration enhances stakeholder trust. When estimators, project managers, and data analysts jointly present model outputs, clients and regulators gain confidence that the insights are balanced between analytical rigor and real-world feasibility. The result is not a black-box calculation but a dialogue between engineering judgment and statistical inference that advances both technical excellence and organizational alignment.

What is the impact of ML on cost estimation?
The integration of ML into cost estimation represents a growing shift in professional practice. Over the past several years, industry groups such as AACE International have begun exploring data analytics and predictive modeling as a formal extension of cost engineering. While much of this research has emerged only in the last five years, momentum is gaining speed. Utilities that once relied entirely on manual spreadsheets or basic regression analyses are now experimenting with proprietary algorithms to drive improved estimate fidelity.
One key driver for this shift is the need for better decision-quality data. Utility executives and regulators require frequent cost and performance reports to inform leadership reviews, capital planning sessions, and public filings. A small discrepancy in cost projections can ripple through organizational processes, influencing rate cases, investment prioritization, and customer rate adjustments. Even a seemingly minor deviation of 0.01% in portfolio estimates can become material when aggregated across billions of dollars in annual capital spending.
By embedding a well-evaluated ML model into the estimation workflow, utilities can minimize these discrepancies before they propagate. When the model identifies outlier entries or internal inconsistencies in project control data, analysts can intervene proactively, ensuring leadership reports remain credible. This, in turn, strengthens internal confidence and supports more robust discussions with external stakeholders, including regulators and the public.
The collaboration between domain experts and data scientists is quickly becoming the new frontier in cost estimation.
Equally important is the cultural and professional transformation that accompanies these technical innovations. Historically, cost engineering has relied heavily on experience-based intuition. Seasoned estimators amassed credibility through their ability to contextualize numerical trends with field knowledge. Machine learning doesn't replace this expertise — it enhances it. Experienced professionals become stewards of the models, curating data inputs, interpreting outputs, and guiding model retraining to reflect changing project realities. Indeed, the collaboration between domain experts and data scientists is quickly becoming the new frontier in cost estimation.
This interplay ensures that lessons learned from design changes, weather impacts, or labor disruptions are not lost in spreadsheets but converted into meaningful data for the next iteration of the model. Over time, this cyclical feedback loop can build a robust intelligence system for the organization. Each completed project strengthens the predictive accuracy of future ones. In practice, it represents a shift from project-by-project learning to enterprise-level learning, transforming cost estimation into a continuously improving organizational capability.
What are the challenges of using ML in cost estimation?
However, implementing such systems is not without challenges. Data quality remains a primary obstacle. Many utilities possess vast archives of historical project data, but these datasets are usually fragmented across systems, inconsistent in formatting, or incomplete. Often only snapshots of the last updated information are saved, but not the information from all the various stages along the project lifecycle. Building an effective ML-driven cost model requires a rigorous data management effort, including cleansing, normalizing, and classifying thousands of records to help ensure consistent variable definitions and storing snapshots periodically.
Moreover, leveraging algorithmic recommendations requires cultivating organizational trust. Teams need to understand how the model reaches its conclusions, and leaders must see tangible proof that the model's predictions align with business objectives. This is where early pilots and transparent reporting become critical. Once project teams witness that the model consistently challenges and improves their own detailed estimates, confidence can grow organically.
The long-term payoff for this transformation can be substantial. By moving from AACE Class 5 to Class 3 estimation accuracy using ML, utilities can better allocate budgets, improve rate-case preparedness, and enhance their reputation for fiscal discipline. As the models mature, they could even enable near real-time cost simulation during program planning, which could revolutionize how utilities structure multi-year capital programs.
Beyond the utility sector, this approach holds promise for all industries that depend on infrastructure projects with high cost variability. Transportation, oil and gas, telecom, and municipal agencies all face similar constraints in the early phases of project planning. Machine learning offers a shared pathway toward more dependable forecasting, responsible resource allocation, and, ultimately, smarter public and private investment.
Soon, as professional organizations integrate data science and ML tactics into their cost engineering frameworks and as workforce expertise expands, advanced modeling approaches will no longer be viewed as an experimental add-on. It will become a core component in producing, validating, and refining estimates. The boundary between estimation, analytics, and project controls will blur, ushering in a more integrated, intelligence-driven discipline.
For utilities committed to delivering capital programs efficiently and transparently, this evolution marks a crucial step forward. Through the fusion of data science and construction management expertise, we are proving that precision in early estimation is not only possible but measurable. In doing so, we can transform estimation from an exercise in uncertainty to a science of informed prediction that sets new standards for industry.
Frequently Asked Questions
What Can We Help You Solve?
Exponent's construction consultants can assist with every aspect of utility project planning, including cost forecasting, materials selection, contractor selection, site review, and more. Our machine learning and data science experts can create customized solutions to help utilities evaluate, sort, and analyze their infrastructure-related data, delivering insights to help improve construction efficiency, efficacy, and quality.

Construction Consulting
Construction and forensic engineering services to help you tackle tough challenges.

Construction Strategy & Planning
Working closely with construction companies, civil agencies, and asset owners through all phases of projects, Exponent's capital planning and consulting ex...

Construction Project Oversight
Exponent's construction experts can help oversee all aspects of complex construction projects, including scheduling, cost analysis, and engineering to help...

Data Sciences
Acquire, clean, and crunch data for breakthrough insights and informed decision-making.

Data Insights: Decide
Data insights for improved decision-making, leveraging risk-prediction models, financial forecasts, cost evaluations, and custom statistical models.




