Reinforcement learning environments with musculoskeletal models
By Stanford Neuromuscular Biomechanics Laboratory
almost 2 years ago
Thank you everyone for all your submissions, and involvement and effort in helping us trying to teach this skeleton how to walk. You guys made all the efforts that went into setting up the challenge completely worth it, Thank you so much again !
Now that the challenge has ended, we do have a winner !!
Congrats @syllogismos, for the whopping 60 submissions and the leading score of 2865 !!
The leaderboard was really like a thriller over the last few hours of the challenge, with the first two positions being swapped quite a few times. Which brings me to congratulate @shawn who is currently 2nd on the leaderboard with just 9 submissions.
Also, a big shoutout to @ViktorF, for a score of 1728 and 54 submissions all through the challenge. While even my own submissions only mastered kangaroo hopping ( ;) ), your submission was the closest to elegant walking :’))
We would really love to hear more from you guys on your learning and competing experience while participating in this challenge. We would also encourage you to contribute a tutorial on your experiences on this challenge in the CrowdAI KnowledgeBase (nudge nudge ;) ).
Finally, there were many people behind the scenes who worked really hard to make this challenge happen.
@lukasz was the person who first imagined a reinforcement learning challenge like this around OpenSim, and he is also the reason why you all could get started with the challenge with just a few lines of code. @sean is our really really patient lead developer of crowdAI who we pestered a lot for small and big changes for crowdAI to accommodate the format of this challenge. And then theres @marcel, who firstly connected all of us, and then through the whole challenge made sure we were all on track and on time. Thanks @lukasz, @sean and @marcel.
Before I end this post, we will also proudly announce that an adapted (and also a little bit more complicated ) version of this challenge was accepted as one of the official NIPS 2017 Challenges this year. The NIPS Challenge (https://www.crowdai.org/challenges/nips-2017-learning-to-run) starts early June, and we would look forward to having you all competing again to make them skeletons walk/run.
(on behalf of challenge organisers and crowdAI)
PS: We started a crowdAI gitter channel where some of us hang out, please feel free to drop by, say hello, ask questions, give suggestions, discuss potential new challenges.
PS1: Please update your profile pictures O:-) !! The leaderboard would be so much more of a lively place with all your awesome profile pictures up there ;)
almost 2 years ago |
Thanks for hosting a fun challenge, I really enjoyed it, and looking forward to future competitions. And I got lucky also to win this.
TL;DR took modular_rl and made it faster, mainly when it comes to how rollouts are collected.
Here is the summary of what I did all through out the competition. Please forgive me spelling mistakes and grammatical errors, I wanted to do this now, If I don’t do this now, I will do this never and keep procrastinating it.
I was aware of two rl libraries, and wanted to make both of them work with this particular env, one is rllab and other is modular rl, modular rl is pretty straight forward to get started with as it is in python 2.7. and rllab was a little tricky because the latest version is python 3. In the first few days the memory leak issue turned out to be a bottleneck in my computer and I was forced to implement checkpointing networks every iteration and bootstrapping from there, which turned out to be a huge time saver as the competition progressed. And on my local machine 1 iteration used to take almost almost some 90 to 120 mins. So 10 iterations used to take close to 15-18 hours. So everyday I would be training some 10 iterations and restarting the script with the next 10 iterations.
Mean while as this is going on I was determined to make rllab work with GaitEnv, I was overthinking it and trying to research about protocol buffers and all that jazz to make python 2 and python 3 communicate, as basically the only thing an rl agent needs to do is to throw up next state and reward when you give it current action. But then I realized thats the whole point of gym-http-api and thats how the crowd_ai grader evaluates my submissions over http. So one part of the puzzle is figured out. Next I observed the main bottle neck is getting multiple runs parallely. And I didn’t really understand how parallalization works in rllab, I tried to understand the parallel part and was not really sure it works in this case, So I rewrote that part of the code where given an agent you try to get multiple episodes for each iteration, And made it work. So for every iteration I will be running as many agents as there are cores in my machine. Also the rllab part of the experiments, I started doing them on ec2. I moIstly used c4.8xlarge/c3.8xlarge spot instances ~ 32-36 vcpus, So for every iteration I will be getting 32 episodes parallely, instead of 1 like in modular_rl/local computer case. Initially when I benchmarked this sped up the process by a factor of close to 8-10x, each iteration took some 10-15 mins instead of 90-120 mins. So I knew I have to implement this thing on modular_rl also which is giving me good scores on the leaderboard, albeit it is taking very very long time. And the third major issue is the memory leak that is not letting me train more than 10 iterations or million steps on the GaitEnv which is giving segmentation fault. I can simply use memory intensive instances on ec2, but I found a hacky way of starting python2 http_servers programmatically from python 3 code, and killing them every n iterations and restarting them, made n an experiment variable called destroy_envs_every which made my life a lot easier, I don’t have to worry about memory crashes anymore. This number I had to change based on the batch size of my experiment. So on rllab I faced three major problems python2 to 3 interaction, parallelly getting episodes, and memory leak issues. By this time I knew I have to implement the same stuff on modular_rl and I don’t have to worry about python 2-3 interaction there. I started my rllab experiments and started hacking modular rl.
I honestly don’t know how to train on gpus, or if gpus will make my life easier in case of rl problems, unless the env itself takes advantage of gpus. From my limited understanding in both modular_rl and rllab gpus come into picuture during fitting the newly extracted advantages from episodes run, but this is taking only 1-2 minutes, where as extracting the episodic information is taking close to 1 to 2 hours, So it didn’t really concentrate on getting my code work on gpu, nor do I have the budget to try that also. All I worked on was making the 1-2 hour episodes extraction down to 5-20 minutes based on the number of cores that are available to me. Also in case of theano, there is this env flat that specifys the no of cores to use while fitting, So that also sped up a little bit I guess, but I didn’t really worry about it. But I want to know from people who made use of gpus and how to get it to work if any of you did it.
By this time I made the necessary modifications on modular_rl and started training on ec2, and my runs are a lot faster now and is now only limited by the no of cores at my disposal. And I’m making much progress on this. But I was dependent on spot instances to not let my budget blow up.. Which itself is another problem as they used to go down unexpectedly. But even then I’m speding too much on my compute costs.
On rllab with the current hyperparameters and the limited experiments I did, I was not able to compete with my modular_rl agent, So I had to give up, solely based on budget constraints. I wanted to do more.
By this time @shawn came along, and lit some fire under my bum XD
I switched to google cloud where I had 300 usd free credits(which I stored for youtube 8m kaggle competiton). And started using them. I was new to google cloud as well which made things a bit harder, but then now I prefer google cloud a bit more, as some things are much more saner and cleaner than aws IMO as a side note. But as my account is on free trail run, I have restrictions like maximum of 24 cores per region and etc. Which made things a bit tricky. So now I figured I will start 24 core machines in different regions, and use http_gym_api to talk to envs in different regions. This took me a two days to figure out as I was not able to figure out security groups/firewall settings that enable two servers on the cloud to talk(because of unfamiliarity with google cloud). So now I have 24*n agents getting episodes parallely, which made things even more faster, even if there is some time wasted on talking through http. And my code can scale as you put more machines.
I was still running code till the last minute and amazingly I’m making progress even the last day and my 2865 run is from May 2nd only, and found myself hacking even in the last minutes, the last few days were intense. And I ran out of all my free credits and started spending on google cloud also sadly :(
If you look at the video in the leaderboard, you can see although its running, it missteps once or twice and looks like it can end a bit better(in some of my runs that are not as good looks like they are trying to reach for the ribbon in the last seconds like how people do in sprints). So there might be some more juice I can extract, but don’t know If I will further train it. And also it did end up learning the symmetry between left and right, my earlier runs used to be like it used to put just the left leg forward and drag the other leg..
So I had lot of fun doing all this hacking on such a nice problem. Thanks everyone for putting up this problem and looking forward to how others solved this, and future competitions.
almost 2 years ago |
Thank you for organizing this! And congrats to syllogismos on your great results!
As an opportunity to learn more: It would be nice to hear a little more about how/if you modified the reward function and what approach/algorithm you used. (rllab’s DDPG or something else?)
Looking forward to the next challenge!
almost 2 years ago |
Thank you very much for organizing this competition and providing CrowdAI as an alternative platform in general.
All of my models were based on the CMA-ES optimization algorithm, since it can explore the problem space pretty efficiently by running a minimum number of experiments.
These days I am quite busy optimizing my future due to a recent layoff affecting me. As a consequence I need a bit more time to clean up the code and properly document all the learnings and results. I will post a GitHub link here as soon as possible.
* Single Linode with 4 cores to run four models in parallel 24/7
* Ubuntu 16.04 64 bit LTS
* Anaconda2, latest Python 2.7 (could not use 3.x due to the simulation package), Intel MKL
* CMA-ES package: cma