Cookies help us deliver our services. By using our services, you agree to our use of cookies. Learn more

crowdAI is shutting down - please read our blog post for more information

Mapping Challenge

Building Missing Maps with Machine Learning


ope solution journal [local CV 0.934]

Posted by neptune.ml about 3 years ago

Hi everyone,

We have experimented a bit more with our codebase and managed to improve the score significantly and got to 0.934 which would have gotten us top1 in stage1. As of now there is no option to make late submitions so we could only validate on the local validation set. It seems however that the score on validation is pretty much exactly what you get on test.

Anyhow things that made a difference:

  • use resize at training and inference instead of padding. For me this is very surprising and I still don’t understand what is going on with that.

  • use Renset101 as the encoder

  • multistage training: 1. train with larger learning rate with Adam on a subset of 50k images from train (to benefit from lr adjustments) 2. train for a few epochs on the entire 230k dataset 3. decrease the learning by a factor of 10 and train for another few epochs 4. increase the weight on soft dice loss 10x and train for a few more epochs

We will have a clean write-up of our training procedure on the repo soon

This gets you to 0.926 (0.928 with TTA)

  • score heuristics on binarization threshold We experimented with complicated heuristics that involved multiple probability thresholds, non-max suppresion and weighing that pushed the score to 0.934 but we feel that there is a lot of risk involved with using them in stage2. We are currently working on lightgbm-based posprocessing and should be able to reproduce heuristics in a cleaner manner.

Anyhow the best model is still training so it could improve on that still. We are working on cleaning up the repo and dockerization and should have it ready soon.

The second big topic is how well can this model generalize. We made some quick tests and it seems that the competition dataset consist on images so similar that generalization will can be a really big problem. Our strategy to combat those problems will probably involve retraining the model on strongly augmented images (scale, color, blur) combined with multiscale/multcolor TTA.

For those that don’t already know all the code, issues, feature request, tickets and more is freely available in our repo https://github.com/minerva-ml/open-solution-mapping-challenge .

Good luck all!