Reacher¶
A two-link planar arm reaching toward a randomised target position. Two variants share the body and dynamics; they differ only in the target's size, which controls how hard the sparse reward is to discover.
ReacherEasy¶

| Property | Value |
|---|---|
| Canonical ID | mjx/reacher_easy-v0 |
| Action space | Box(-1.0, 1.0, (2,), float32) |
| Observation space | Box(-inf, inf, (6,), float32) |
| Episode length | 1000 |
| Config | {"ctrl_dt": 0.02, "sim_dt": 0.005, "naconmax": 0, "njmax": 0} |
Description¶
The arm's fingertip must reach a randomised target somewhere inside the workspace. The "easy" variant uses a generous target radius (reacher.BIG_TARGET), so even a fairly random policy will land inside the target band often enough to bootstrap learning.
Rewards¶
Uses a sparse reward built from tolerance over fingertip-to-target distance, with BIG_TARGET as the upper bound and no margin (so it collapses to a step):
| Python | |
|---|---|
1 2 3 4 | |
With the default zero margin, the indicator becomes binary:
1.0when the fingertip is withinBIG_TARGETof the target.0.0otherwise.
Starting state¶
1 | |
(joint angles followed by the fingertip-to-target offset and joint velocities — both joints initialised at random angles, target randomised inside the workspace.)
Termination¶
Episode ends when step >= max_steps (default 1000). No early termination.
Usage¶
| Python | |
|---|---|
1 2 | |
Reference¶
Upstream: mujoco_playground/_src/dm_control_suite/reacher.py.
ReacherHard¶

| Property | Value |
|---|---|
| Canonical ID | mjx/reacher_hard-v0 |
| Action space | Box(-1.0, 1.0, (2,), float32) |
| Observation space | Box(-inf, inf, (6,), float32) |
| Episode length | 1000 |
| Config | {"ctrl_dt": 0.02, "sim_dt": 0.005, "naconmax": 0, "njmax": 0} |
Description¶
Same arm and dynamics as ReacherEasy, but the target shrinks to reacher.SMALL_TARGET. The smaller catch radius means random exploration almost never lands inside it — algorithms that can't direct exploration get nowhere on this variant, which makes it a useful stress test for learned exploration bonuses or curiosity-style methods.
Rewards¶
Uses a sparse reward built from tolerance over fingertip-to-target distance, with the smaller SMALL_TARGET upper bound:
| Python | |
|---|---|
1 2 3 4 | |
With the default zero margin, the indicator becomes binary:
1.0when the fingertip is withinSMALL_TARGETof the target.0.0otherwise.
Starting state¶
1 | |
Termination¶
Episode ends when step >= max_steps (default 1000). No early termination.
Usage¶
| Python | |
|---|---|
1 2 | |
Reference¶
Upstream: mujoco_playground/_src/dm_control_suite/reacher.py.