Learning How to Walk

Reinforcement learning environments with musculoskeletal models


Observations and rewards

Posted by Zxc_596 almost 3 years ago

Hi! I am trying to understand then observation vector and I found a few strange issues. Could you clarify if there is a misunderstanding on my side?

1) If my interpretation of the observation vector is correct, the rewards seems to be calculated on the y delta: I am assuming that positions are in the format [X,Y,Z]:

The pelvis position is set here: https://github.com/stanfordnmbl/osim-rl/blob/master/osim/env/human.py#L76-L78 https://github.com/stanfordnmbl/osim-rl/blob/master/osim/env/human.py#L116-116

And the reward is calculated here (using Y coordinate): https://github.com/stanfordnmbl/osim-rl/blob/master/osim/env/human.py#L28-L29

Is this a bug or where is my mistake?

2) pelvis X speed seem to be the opposite to what I expected. Positive X speed means that the skeleton is moving backwards…

3) left and right (knee, hip, ankle) seem to be swapped at least with respect to what is represented in the visualization. I am using the code in getFootL as a reference. Unless I made mistake, when the right foot is high, I get a high value in the left one.


Thanks a lot for putting this together! It is a lot of fun!

Posted by Lukasz_  over 2 years ago |  Quote

Hi, sorry for the late answer! 1) The position of the pelvis is in the format [rotation, X, Y]. Z direction is blocked. Rotation is around Z axis. 2) I think it relates again to assumption you mentioned in one and here you are talking about pelvis rotation. 3) That one is indeed a mistake, I didn’t pay enough attention since they are symmetric. Thanks for pointing these out and for participation in the challenge.

Btw. You can find more about the model here https://github.com/stanfordnmbl/osim-rl/blob/master/osim/models/gait9dof18musc.osim

Best, Łukasz