
Posted by bolis.mrc about 2 years ago | Quote
Hi David,
Impressing result, thanks for sharing!
Best, Marco
Posted by David_Baranger about 2 years ago | Quote
A quick thought on how to translate my approach into ‘machine-learning-speak’:
While not exactly the same, the PRS-approach might be thought of as an extremely simplified version of ensemble learning. Given N SNPs, we fit N experts, where each expert is a regression trained on a single feature. We then compute a weighted sum of the experts, where the weight-vector is the effect-size of the feature in each regression. While this approach virtually guarantees that we won’t capture the maximum amount of variance possible, it also seems to be very good at reducing overfitting. I think this is an especially important point for genetics research, where each individual feature has a very small effect-size. More complex learning-algorithms thus run the risk of latching onto effects which appear to be especially predictive, but are actually just noise.
Posted by Muhammad_Alfiansyah about 2 years ago | Quote
Hi David!
Thank you!!!
Your last comment exlpain why my best model was actually just random forest on all SNP without any feature engineering :D!
Congrats!
Posted by David_Baranger over 1 year ago | Quote
Hey all! It looks like this challenge is officially finished. I’ve posted some additional analyses, where I improve my original prediction by 11%. This is largely through data-cleaning.
https://davidbaranger.com/2018/04/09/improving-genetic-prediction-data-cleaning-meta-analysis/
Best, David
Hi all! I’ve posted an explanation to my solution here: https://davidbaranger.com/2017/10/01/on-predicting-traits-with-genetics/
The post includes a link to a file containing the 5 variables I generated, which made-up the entirety of my model.
Feel free to email me if you have questions. I hope that some of you will be inspired to improve on my work!
Best, David