Cookies help us deliver our services. By using our services, you agree to our use of cookies. Learn more

crowdAI is shutting down - please read our blog post for more information

Learning How to Walk

Reinforcement learning environments with musculoskeletal models


Visualization "shutdown"

Posted by flopit over 3 years ago


score is improving and I want to visualize what the model learned so far using the example with –visualize. Visualization starts good (moving fwd), but after around 87 steps, it finishes and command line prints “simbody-visualizer. received shutdown message. Goodbye.” The skelet did not fall over or fall back, it actually startet to put one foot down, foot still does not have contact to the ground. Are there any other constraints/states that make the visualization stop and print that message? If so, can you please write which env=observation vector values in which component of the vector/which combinations are dangerous and make the visualization stop/trigger end of epoche - so I can give penalties during training to avoid them?


Best Regards Florian

Posted by spMohanty  over 3 years ago |  Quote

Hi Florian,

As mentioned in the “Evaluation” section, a simulation will also end when the pelvis goes below 0.7 metres. In any case, whenever the simulator will run into a termination condition, it will return the done variable as True when you take a step.

If you look at the basic usage section (https://github.com/stanfordnmbl/osim-rl/blob/master/README.md#basic-usage), you will notice the done variable that I mentioned as one of the return vars from the env.step function.

One way to bypass this issue would be to try something like :

``` import opensim as osim import numpy as np import sys

from keras.models import Sequential, Model from keras.layers import Dense, Activation, Flatten, Input, merge from keras.optimizers import Adam

import numpy as np

from rl.agents import DDPGAgent from rl.memory import SequentialMemory from rl.random import OrnsteinUhlenbeckProcess

from osim.env import *

from keras.optimizers import RMSprop

import argparse import math

Command line parameters

parser = argparse.ArgumentParser(description=’Train or test neural net motor controller’) parser.add_argument(‘–steps’, dest=’steps’, action=’store’, default=10000) parser.add_argument(‘–model’, dest=’model’, action=’store’, default=”example.h5f”) args = parser.parse_args()

Load walking environment

env = GaitEnv(True)

nb_actions = env.action_space.shape[0]

Total number of steps in training

nallsteps = args.steps

Create networks for DDPG

# Next, we build a very simple model. actor = Sequential() actor.add(Flatten(input_shape=(1,) + env.observation_space.shape)) actor.add(Dense(32)) actor.add(Activation(‘relu’)) actor.add(Dense(32)) actor.add(Activation(‘relu’)) actor.add(Dense(32)) actor.add(Activation(‘relu’)) actor.add(Dense(nb_actions)) actor.add(Activation(‘sigmoid’)) print(actor.summary())

action_input = Input(shape=(nb_actions,), name=’action_input’) observation_input = Input(shape=(1,) + env.observation_space.shape, name=’observation_input’) flattened_observation = Flatten()(observation_input) x = merge([action_input, flattened_observation], mode=’concat’) x = Dense(64)(x) x = Activation(‘relu’)(x) x = Dense(64)(x) x = Activation(‘relu’)(x) x = Dense(64)(x) x = Activation(‘relu’)(x) x = Dense(1)(x) x = Activation(‘linear’)(x) critic = Model(input=[action_input, observation_input], output=x) print(critic.summary())

memory = SequentialMemory(limit=100000, window_length=1) random_process = OrnsteinUhlenbeckProcess(theta=.15, mu=0., sigma=.2, size=env.noutput) agent = DDPGAgent(nb_actions=nb_actions, actor=actor, critic=critic, critic_action_input=action_input, memory=memory, nb_steps_warmup_critic=100, nb_steps_warmup_actor=100, random_process=random_process, gamma=.99, target_model_update=1e-3, delta_clip=1.) agent.compile(Adam(lr=.001, clipnorm=1.), metrics=[‘mae’])

agent.load_weights(args.model) # Finally, evaluate our algorithm for 1 episode. _observation = env.reset() for _step in range(500): _action = agent.forward(_observation) _observation, reward, done, info = env.step(_action) ##NOTE: Remove this conditional if you do not want to stop the visualization on termination condition if done: break ```

PS: The last block in the code above is what is mostly relevant, and I havent tested it yet, so there might be some typos here and there :D