NIPS 2017: Learning to Run

Reinforcement learning environments with musculoskeletal models


Total of submit reward calculation

Posted by Anton_Pechenko almost 2 years ago

After submit client prints Your total reward from this submission: 711.515570

While I just calculated total sum by myself with total_reward = 0.0 …. [next_observation, reward, done, info] = client.env_step(action.tolist()) total_reward += reward

and see 2134.54671077

So, how exactly total reward is calculated?

Posted by Anton_Pechenko  almost 2 years ago

I found that is average total reward of 3 episodes 2134.54671077 / 3 == 711.515570


Posted by spMohanty  almost 2 years ago

Hi @Anton,

This could be because of a mismatch between the osim-rl used by the server and your client. Now the server has been modified to throw an error if the versions don’t match.

Apart from that, we do expect minor differences in cumulative reward calculation because of differences in the OS environment. The grader runs on a 64 bit Ubuntu 16.04 instance.