June 16, 2026
Executive Summary
Physical AI is advancing into real-world applications and safety frameworks will likely evolve and adapt to these changes. Traditional functional safety approaches may not fully address risks introduced by AI systems, which may exhibit non-deterministic behavior. Lessons from automated vehicles and the transportation industry at large can help frame a comprehensive approach for several safety-critical AI applications. Starting points could include: clarifying automation levels, expanding safety beyond faults, and embedding physics-informed constraints across the product lifecycle.
How to assess and manage physical AI risk using automation levels, layered safety frameworks, and physics-informed constraints
Physical AI refers to systems that combine AI with physical embodiment to sense, decide, and act in the real world. These systems are expanding rapidly across industries, including automated vehicles, industrial robots, service robots, medical systems, industrial machinery, and mining systems.
While pilot testing and demonstrations show promising results, adoption remains uneven, creating a gap between expectations and production-grade implementation. The good news is that a promising example is already on the road: AVs are an evolving and maturing application of physical AI, making road transportation a useful proving ground or roadmap for other embodied AI systems that are moving into warehouses, hospitals, ports, mines, factories, and public and domestic spaces.
Extending automated vehicle safety lessons to physical AI
Over the past decade, AVs have provided examples of how automation influences the way risk, safety, and validation can evolve over time. As similar capabilities extend into new domains, many organizations face a fundamental challenge that's already appeared to road transportation stakeholders: adapting existing approaches to safety-critical systems that have the potential to behave differently in similar situations, given the available data and variable operating conditions.
Critically, systems can function as intended yet may still produce undesirable outcomes in some real-world conditions, particularly when AI models encounter unfamiliar scenarios or boundary conditions. As a result, organizations are expanding safety frameworks to address not only hardware or software faults but performance limitations and uncertainty in AI-driven behavior. A structured approach can promote alignment between safety, validation, and real-world performance.
Clarify the level of automation
Originating in SAE J3016 for on-road driving automation and later generalized in ISO/IEC 22989:2022, the six levels of automation provide a practical framework for describing how decision-making responsibility is shared between humans and systems. This concept is increasingly being extended to other safety-critical systems.
Automation level is not only a technical classification; it is also a product decision. Higher automation may make a system more capable, but it also raises the threshold for validation, documentation, fallback design, and post-deployment monitoring. In some cases, a lower-automation system that performs reliably within a narrower role may be more practical than a higher-automation system that requires more evidence to support.
Lessons from on-road vehicles show that defining a system's level of automation clearly can be key to establishing its operational design domain, or ODD — the specific conditions and environments where it is meant to operate. Defining the ODD in partnership with a Concept of Operation (ConOps) guides system development, supports verification and validation, and sets user expectations for what the system can and cannot do safely by itself. At lower levels (L0-L2), human operators remain responsible for control of the system, while at higher levels (L3-L5), the system assumes greater decision-making authority, with the human operator moving to more of a supervisory role.
Although a goal of higher automation level is to improve safety outcomes, it also places greater responsibility on the system to manage risk. The L0-L5 framework, shown in Figure 1, provides a practical way to define and communicate these distinctions.
For example, a Level 2 system may assist a human operator, but the human remains responsible for controlling the system, making primary decisions and monitoring the environment. A Level 3 system may perform the primary task within a defined ODD, but it still requires human monitoring at all times and human involvement if conditions exceed system capability. At Level 4, the system is expected to manage both the primary task and fallback response within its defined ODD, while also being capable of constraining itself to operation within its ODD. That difference affects not only testing and validation but also user instructions, warnings, handoff design, and how responsibility is allocated between the developer, deployer, and operator.
This distinction often drives real product decisions, including how much automation is described, how much validation evidence is required, and how responsibility is shared between the system and the user.
The same logic applies outside road transportation. In surgical robotics, a system that assists a surgeon with visualization, targeting, or instrument positioning may still leave primary decision-making with the trained clinician. A higher-automation system that independently initiates or executes clinical actions would require a different validation strategy, different user expectations, and a different fallback plan.
Expand safety beyond faults
At higher levels of automation, physical AI systems take on greater responsibility for decision-making. For example, in some cases such as level 4 or 5, fallback actions that were previously handled by human operators. This shift introduces new types of risk that traditional fault-based approaches may not be designed to address.
AVs illustrate how increasing automation influences safety responsibilities. Over the past decade, the automotive sector has expanded safety approaches beyond hardware and software faults to evaluate how systems perform in real-world environments, including how they detect hazards, respond to uncertainty, and handle situations outside expected operating assumptions.
In practice, this means addressing three complementary dimensions of safety:
- Fault-based risks, such as hardware or software faults, addressed through traditional functional safety
- Functional insufficiency, where systems behave as intended but fall short in certain scenarios — often addressed through frameworks such as Safety of the Intended Functionality (SOTIF) as defined in ISO 21448, which focuses on performance limitations rather than system faults
- AI-specific risks, including uncertainty, data bias, and variability in model behavior
This also leads to defining where a system is designed to function, identifying the most safety-relevant scenarios within those boundaries, and validating performance against them.
For example, a Level 2 system designed for highway use may be validated for lane-keeping and obstacle detection under defined speeds, lighting, and weather conditions but may not be validated for complex urban intersections. Similarly, a mobile robot designed to operate in a mapped warehouse aisle may be validated for expected lighting, floor conditions, worker proximity, and traffic patterns but not loading docks, construction sites, and obstacles that behave less predictably.
To operate in new or unfamiliar environments, systems may require additional sensing, operational restrictions, human supervision, or different validation targets before deployment. This process is iterative: Teams refine parameters through operational design domains (ODDs) updates, expanded scenario coverage, and updated validation targets as new behaviors and scenarios emerge. Setting clear boundaries can be critical, along with mechanisms to detect and respond when those boundaries are exceeded.
It can also be important to clearly communicate these boundaries. If a system is not designed for wet floors, outdoor operation, untrained users, mixed pedestrian traffic, or certain lighting conditions, those limitations can be addressed in user instructions, warnings, training materials, and deployment controls. These limitations may be communicated through user interfaces, manuals, operational restrictions, etc., that define when human intervention may be necessary or when the system should not be used.
Regardless of how sophisticated the AI is, a vehicle cannot brake faster than speed, mass, road conditions, and traction allow
Expanding hazard analysis beyond faults to include functional insufficiency can help identify issues such as perception gaps or incomplete scenario coverage. Scaling validation efforts accordingly can become increasingly more important as systems take on more responsibility and human intervention declines, with higher-automation systems requiring broader scenario coverage and stronger supporting evidence before deployment. These approaches are already shaping emerging standards across robotics, machinery, and other safety-critical systems.
Exponent Helps Lead Automation Standards in the U.S.
Use physics-informed constraints to validate real-world AI behavior
Even as AI enables automated decision-making, vehicle behavior remains governed by physical laws. Road transportation illustrates this clearly: Regardless of how sophisticated the AI is, a vehicle cannot brake faster than speed, mass, road conditions, and traction allow. Developers often build in additional safety constraints on top of those physical limits, such as speed restrictions, braking thresholds, fallback responses, and operational boundaries that help keep system behavior within validated conditions. The same principle applies to robots, machinery, and medical systems, where actions must remain consistent with constraints such as kinematics, force, motion, heat, latency, and human proximity.
Small model errors can translate into real-world consequences, especially when human oversight is limited. Physically informed requirements can translate physical constraints into enforceable system boundaries. Continuously monitoring system behavior and operating conditions against validated safety limits can help trigger predictable fallback responses when thresholds are exceeded. Depending on the system, these safeguards may include limiting speed, force, acceleration, or motion; restricting operation within validated conditions; enforcing minimum stopping distances or stability margins; and transitioning to predefined safe states when operating limits are approached.
The same principles apply across domains and extend throughout the product lifecycle. In collaborative robots ("cobots") or industrial robots (arms, humanoids), torque, speed, and force limits can be used to protect nearby workers. In AVs, steering and braking can be constrained by traction and stability limits. In surgical robotics, instrument force thresholds can prevent unintended tissue damage. In energy and utility systems, operations may revert to manual override or fail-safe modes when AI-driven actions exceed validated operating thresholds or when system conditions fall outside expected ranges.
During development, high-fidelity scene-based simulation grounded in physical and engineering constraints can support exploration of boundary conditions and generation of training data; during verification and validation, it can help assess scenario coverage and system performance. Physical constraints may also be embedded directly into model design or supported through fallback controls that define safe system behavior.
In high-consequence environments such as ports, logistics hubs, or mining operations, these constraints can be even more critical. AI-enabled cranes, haul vehicles, or other heavy machinery may rely on simulation and scenario-based validation to evaluate load dynamics, stopping distances, equipment proximity, and fallback behavior before deployment, where full-scale real-world testing may be impractical or unsafe due to adverse physical consequences.
Building a sound engineering approach for physical AI
For teams developing or deploying physical AI, a practical question is not simply whether the system uses AI but what role the AI plays, who or what provides fallback, where the system is intended to operate, what scenarios have been validated against with defined targets and acceptance criteria, what warnings or use limitations are needed, and what evidence supports the claimed level of automation. These questions can be used to determine whether an organization is making an assistive tool, a conditionally automated system, or a system expected to manage safety-critical decisions within a defined domain.
As AI systems evolve across industries, cross-pollinating lessons from AVs, robotics, machinery, utility systems, and other safety-critical domains can help organizations select appropriate frameworks, structure risk assessments, perform simulation and real-world verification and validation, and align with emerging regulatory expectations. As in road transportation, clarifying the level of automation, expanding safety beyond faults, and embedding physics-informed constraints can become a practical foundation for managing physical AI risks for other applications.
What Clients are Talking About
What Can We Help You Solve?
Exponent's multidisciplinary teams help clients assess and manage physical AI risks across project lifecycles. We apply structured risk methods, evaluate AI system performance, and help develop validation strategies, requirements, and documentation to support safer deployment and alignment with evolving standards and regulatory expectations.
AI Consulting
AI offers immense opportunity — but also introduces risk, particularly in high-consequence sectors where safety, trust, and security are non-negotiable. Achievi...
Vehicle Engineering
Rigorous research on the safety and performance of all types of transport and cutting-edge technologies.
Research & Development
Understand emerging technologies in transportation and address existing and future regulatory requirements.
Data Insights: Decide
Data insights for improved decision-making, leveraging risk-prediction models, financial forecasts, cost evaluations, and custom statistical models.
Robotics, Automation & Mechatronics
When robotic systems fail or new product designs call for innovative solutions, clients turn to Exponent.
Failure Analysis
Leading the consulting industry in failure analysis testing and root cause and risk analysis.
Cybersecurity Consulting
Identify cybersecurity risks and implement the security controls you need to effectively combat them.
Insights