Acrobot¶
This family focuses on two-link underactuated pendulum's mounted on a fixed pivot. The agent torques the second joint only, where the first joint swings passively.
All Acrobot variants share the same body, dynamics, and termination rule; they only differ via the reward function.
AcrobotSwingup¶

| Property | Value |
|---|---|
| Canonical ID | mjx/acrobot_swingup-v0 |
| Action space | Box(-1.0, 1.0, (1,), float32) |
| Observation space | Box(-inf, inf, (6,), float32) |
| Episode length | 1000 |
| Config | {"ctrl_dt": 0.01, "sim_dt": 0.01, "naconmax": 0, "njmax": 0} |
Description¶
The pendulum starts at rest, hanging straight down. The agent must swing the tip up to a target position. Only the second joint is actuated, so the tip can't be lifted directly — momentum has to build up through the underactuated dynamics.
Rewards¶
Uses a dense reward with a smooth tolerance indicator over tip-to-target distance:
| Python | |
|---|---|
1 2 3 4 5 | |
tolerance is DM Control's smooth indicator. It returns:
1.0while the input sits insidebounds.- A smooth decay to
value_at_marginas the input moves out tomargin. 0.0past that.
Starting state¶
1 | |
(orientations of the two links followed by their angular velocities; both joints at rest.)
Termination¶
Episode ends when step >= max_steps (default 1000). No early termination.
Usage¶
| Python | |
|---|---|
1 2 | |
Reference¶
Upstream: mujoco_playground/_src/dm_control_suite/acrobot.py.
AcrobotSwingupSparse¶

| Property | Value |
|---|---|
| Canonical ID | mjx/acrobot_swingup_sparse-v0 |
| Action space | Box(-1.0, 1.0, (1,), float32) |
| Observation space | Box(-inf, inf, (6,), float32) |
| Episode length | 1000 |
| Config | {"ctrl_dt": 0.01, "sim_dt": 0.01, "naconmax": 0, "njmax": 0} |
Description¶
The pendulum starts at rest, hanging straight down. The agent must swing the tip up to a target position via the underactuated dynamics — same body, dynamics, and starting state as AcrobotSwingup, but with a tighter target tolerance.
Rewards¶
Uses a sparse reward with a tolerance indicator over tip-to-target distance:
| Python | |
|---|---|
1 2 3 4 5 | |
With margin=0.0 the smooth decay collapses to a step. The indicator becomes binary:
1.0when the tip is withinsmall_target_radiusof the target.0.0otherwise.
Starting state¶
1 | |
Termination¶
Episode ends when step >= max_steps (default 1000). No early termination.
Usage¶
| Python | |
|---|---|
1 2 | |
Reference¶
Upstream: mujoco_playground/_src/dm_control_suite/acrobot.py.