Task-Error Residual Learning for Real-Robot Five-Ball Juggling

Long-exposure photograph of a Barrett WAM arm juggling, showing the ball trajectories as red arcs.

Two Barrett WAM arms sustaining a five-ball cascade juggling pattern.
The residual learner reaches stable juggling on real hardware on the second attempt.

Abstract

For residual learning that refines existing behavior, sample efficiency depends on two things: how much information each rollout returns, and how efficiently the learner uses that information. Reinforcement learning's standard scalar reward carries far less information than the directional task error that defines the task. Random exploration further discards whatever information each rollout returns.

Through residual learning with directional task-error supervision and a task error model that drives sample selection, we achieve stable three-, four-, and five-ball juggling on anthropomorphic Barrett WAM arms. Despite planning and controlling through a deliberately simple stack with idealized assumptions, the system converges from the second attempt: the first attempt drops, after which task error decreases monotonically without further failures. In comparison, five-ball juggling typically takes humans years of practice.

We compare residual learners across two ternary axes, the directional information in the learning feedback and the commitment of the analytic prior, spanning Newton-style Jacobian updates, Composite Bayesian Optimization, and stochastic search methods. Both axes prove necessary: neither directional feedback nor an informative prior suffices alone, and the simplest method that combines them, a fixed-Jacobian Newton update, is the most reliable. The learned residual tolerates a misaligned analytic prior and degraded tracking, with only convergence speed affected. The bottleneck for residual learning on real robots is therefore the information content of the supervision signal and how the learner uses it, not the accuracy of the surrounding stack.

Method Comparison

Bar chart of juggling attempts to first success and to a ten-in-a-row streak, across the feedback-type by prior-specificity matrix, evaluated in simulation at the five-ball cascade.

Fast, reliable convergence needs both directional feedback and an informative prior. Evaluated in simulation at the five-ball cascade across the full feedback × prior-specificity matrix. Bars give the number ofattempts to the first success (solid) and to a 10-attempts-in-a-row streak (light). Mean ± s.d. over 10 seeds in simulation. The Fixed Jacobian Newton update is the simplest method of all and performs best.

Real-Robot Experiments

Three, four, and five balls

Residual learning through Fixed Jacobian Newton updates on a directional task-error signal for three-, four-, and five-ball juggling. Three balls work on the first attempt. Four and five on the second. Each panel in the video shows ten consecutive juggling attempts of the same learning run. Each attempt is capped at 100 throws.

Robustness to controller gains

The deliberately simple control stack consists of a PD tracking controller with a CAD derived feedforward torque around a planned throw trajectory. The choice of control gains mostly affect convergence speed, not final performance. Five ball juggling succeeds at surprisingly low gains. Nominal values are K_p = [400, 400, 200, 200] and K_d = [40, 20, 15, 5]. Each attempt is capped at 10 throws.

Catches and takeoff velocity error per attempt as the proportional gain, derivative gain, and both gains are scaled from 100% down to 25% of nominal.

Sensitivity to the analytic prior

Rotating the analytic prior away from its nominal orientation (0° to 90°) around random axis. A prior on exploration direction only needs to loosely point in the right direction. Even 45° only slightly slows convergence. Each attempt is capped at 10 throws.

Catches and takeoff velocity error per attempt as the analytic Jacobian is rotated from 0 to 90 degrees.

BibTeX

@unpublished{ploeger2026residual,
  title  = {Task-Error Residual Learning for Real-Robot Five-Ball Juggling},
  author = {Ploeger, Kai and Peters, Jan},
  year   = {2026},
  note   = {Under submission},
}

Task-Error Residual Learning forReal-Robot Five-Ball Juggling