Recoverability Audits for Humanoid Push Recovery

Steven Cao & Jay Wu

A humanoid survives one push. How fragile is it to the next one? We frame post-disturbance fragility as a post-hoc audit of a frozen policy, to our knowledge a new angle on push recovery: cloning a disturbed state into parallel Monte Carlo rollouts to label how recoverable it is, distilling that into a 16k-parameter classifier, and using the score to switch a frozen walking policy to a recovery policy only when needed.

The recoverability audit in action: from one disturbed state, parallel branches each take a fresh push, and the fraction that survive becomes a recoverability score.

All quantitative experiments run in NVIDIA Isaac Gym; the robot demonstration clips on this page are rendered in MuJoCo, so the visuals differ slightly from the training simulator.

TL;DR

Problem: surviving one disturbance doesn't mean a robot is stable; it can sit in a hidden-fragile state that looks fine but falls on the next hit.
What's new: auditing post-disturbance fragility (not general safety) on a frozen policy, so it needs no retraining and works on any pre-trained policy.
Method: clone a disturbed state into 50 parallel branches, apply a random future push to each, and measure the fraction that survive as a recoverability score; distill it into a 16k-parameter MLP that runs every control step.
Result: modest stability improvements using audit-gated policy switching and an audit-shaped walking policy.

The Hidden-Fragility Problem

Legged robots can recover from pushes; we train a humanoid walking policy with random pushes injected during training. The catch: in deployment the hits keep coming, and surviving the first doesn't mean you're stable.

Baseline walking policy taking a random push (red arrow) and recovering.

Now apply two pushes. The first differs across the two robots; the second is identical. Only one recovers, so the other was already in a fragile state that looked perfectly fine.

Same identical second push (yellow arrow), opposite outcomes: fragility difficult to spot by eye.

The question: after a disturbance, can we approximate how fragile the state is to future disturbances?

What's New

We label fragility with parallel Monte Carlo rollouts (clone a disturbed state, push each branch, score the fraction that survive) and distill it into a classifier. That recipe isn't new on its own. What is new is how we use it:

We score fragility, not generic safety. Random-push training is robust on average but leaves hidden-fragile pockets; we measure how well a disturbed state survives the next hit.
We do it post-hoc, on a frozen policy. Safety monitors are usually trained jointly with the RL policy; we leave it untouched, so it drops onto any pre-trained controller.

Method: The Recoverability Audit

We freeze the robot 15 steps after a push. That snapshot is the state we audit.

A push (red arrow), then we freeze a few steps later: this frozen state is what we score.

We clone the snapshot into 50 parallel branches, hit each with a fresh random push, and roll forward under the frozen policy. The fraction that survive is the state's recoverability score, which becomes the ground-truth label for our dataset.

Branches cloned from one snapshot, each given a different future push; the share that survive is the recoverability score. A 16-branch rollout is shown here for illustration.

Setup & reproducibility

Simulator: NVIDIA Isaac Gym (4096 envs)
Robot: Unitree G1
Base policy: Standard walking policy, PPO-trained with pushes ≤1.5 m/s.
Audit: Snapshot state 15 steps post-push → 50 branches, one random 1.5 m/s push → roll 100 steps.
Dataset: 50k snapshots (40k/10k), ~81% recoverable
Classifier: Multilayer perceptron with 16,513 params
Recovery policy: PPO fine-tune; upright-focused reward, no pushes

Fragility Classifier

We distill the audit labels into a 16k-parameter MLP over the robot's observations alone. Notably, it aims to reproduce the expensive 50-rollout audit in a single forward pass, cheap enough to query every control step.

To test if the audit can correctly predict fragility, we push the robot, wait, audit, then push again and see if it matches the audit's verdict.

Green → the robot should recover given the second push
Red → the robot should fall given the second push

Green : the robot recovers from the 2nd push ✓

Red : the robot falls from the 2nd push ✓

How Can We Use This?

A cheap, per-step fragility score is only worthwhile if it changes what the robot does. We explored two main ways to put it to work:

Closed-loop policy switching.
Audit-shaped reward.

1. Closed-Loop Policy Switching

When the audit classifier detects a fragile state, we switch from the baseline walking policy to a more stable policy (e.g. recovery policy). Using our earlier protocol, on a:

Green (stable) verdict: robot stays on the baseline policy
Red (fragile) verdict: robot switches to the recovery policy

Building a Recovery Policy

To test switching, we first need a more stable policy to switch to. We finetuned the baseline walking policy on episodes that start from recorded fragile post-push states, tweaking the rewards to focus on staying upright instead of tracking a forward velocity.

Recovery

34%

Walking

22%

Success rate on 100 fragile-start episodes, is always on (~54% better).

Outcome breakdown across the 100 episodes
Only recovery succeeds	18
Only baseline succeeds	6
Both succeed	16
Both fail	60

Closing the Loop

The audit drives the handoff in closed loop: when it flags the state as fragile, control switches from the walking policy to the recovery policy, then switches back to walking once the state is no longer fragile.

Demo Legend

Green walking policy ON
Orange first push applied
Purple recovery policy ON

Does Switching Actually Save the Robot?

Same fragile state, same pushes, with and without the audit-triggered switch:

No switching (baseline walking only): falls.

Audit-triggered switch to recovery: stays up.

2. Audit-Shaped Reward for the Walking Policy

Let's try feeding the audit back into the original walking policy as a training signal to teach the policy to steer clear of fragile states in the first place.

We finetune our best baseline walking policy for about 3000 iterations into an audit-shaped baseline using an additional reward proportional to the fragility classifier's recoverability score.

r_audit post-push bonus

scale small, keeps it auxiliary

ŷ recoverability score from the frozen classifier

binary gate on for 15 steps after a disturbance, else 0

A side-by-side example of the original baseline and the audit-shaped baseline on the same matched episode:

Original baseline walking policy.

Audit-shaped baseline, finetuned with the audit reward.

Results / Experiments

Slowing Movements when Fragile: Not Effective

We also tried slowing down the walking policy's actions whenever the audit flagged fragility. Layering the audit on top this way did not show any significant improvement, so we turn to policy switching instead.

Tuning Policy Switching

Deciding when to switch from the walking policy to the recovery policy is determined by:

How cautious the audit is (the fragility threshold τ).
How soon it checks after a push (the audit delay).

Our first settings were conservative, a high threshold (τ = 0.46) checking late (15 steps after the push), and switching policies barely helped (46% → 45%). Flagging fragility more readily (τ ≈ 0.30–0.34) and checking sooner (10 steps) gave a small but real gain.

Stacking the Interventions

Our two uses of the audit can also be combined: switching to recovery, and an audit-shaped baseline (the walking policy fine-tuned with the audit signal, see Audit-Shaped Reward). We ran all four configurations on the same 100 matched episodes.

Episodes recovered out of 100 (same matched starts and pushes)
Policy	No switch	+ Switch to recovery
Baseline	46	49
Audit-shaped baseline	48	52

Each intervention helps a little on its own, and they stack to the best result (52/100). The gains stay modest because on many of the 100 episodes the robot is already past saving by the time the audit fires, often more than halfway to the ground, so no intervention can recover it.

Restricting to the 24 episodes where the robot was still upright and recoverable at audit time, the same ordering holds and the gains are larger.

Success rate on the 24 still-recoverable episodes
Policy	No switch	+ Switch to recovery
Baseline	50%	62.5%
Audit-shaped baseline	66.7%	70.8%

Future Direction

By the time the audit fires, many states are already past saving. The most useful next step is auditing earlier, or anticipating fragility before the state collapses, so an intervention still has room to work.

The fragility labels also point to which states are worth retraining on, which invites exploration for fragile states, retrains the policy on them, and re-audits. It would also be interesting to see if this approach can transfer to real hardware.

Acknowledgements

We thank Professor Unnat Jain for his guidance and support throughout this project.

@misc{cao_wu_2026_recoverability_audits,
  title        = {Recoverability Audits for Humanoid Push Recovery},
  author       = {Cao, Steven and Wu, Jay},
  year         = {2026},
  howpublished = {\url{https://github.com/jotalis/RA-HPR}}
}

TL;DR

The Hidden-Fragility Problem

What's New

Method: The Recoverability Audit

Fragility Classifier

How Can We Use This?

1. Closed-Loop Policy Switching

Building a Recovery Policy

Closing the Loop

Does Switching Actually Save the Robot?

2. Audit-Shaped Reward for the Walking Policy

Results / Experiments

Slowing Movements when Fragile: Not Effective

Tuning Policy Switching

Stacking the Interventions

Future Direction

Related Work

Acknowledgements