In testing a short-rook or similar piece, I don't know how you'd
distinguish the effect of different King-interdiction from the more general
(and presumably much larger) effect on general fighting power due to losing
several moves. The Rook that jumps its first two squares might also derive
a measurable advantage from ease of development and stealthy attacks,
especially if Muller's computer tests are undervaluing the Rook due to
early-game bias.
Controlled testing on Chess pieces is very challenging, since they have so
many interactions and emergent properties; devising two pieces that differ
only in the property you want to test is difficult and fraught with error.