crowdAI is shutting down - please read our blog post for more information
Explore methodology to identify assets with extreme positive or negative returns.
By Principal Financial Group
about 1 year ago
The organizers have confirmed that we can fit different models at each time period. I’m interested to know if others have been doing this by using the labelled data to perform model selection at each period. So far, I’ve effectively been using a single model optimized just with the average score across all periods. It’s also of course possible to perform separate model selection by using the validation data that we have to evaluate performance separately, at each period. That means training using all the data up to point x, evaluate the model using target data at point x, optimize with the benefit of this feedback, repeat for all periods. This latter option will significantly boost scores but I think in a way that does not reflect general performance when we don’t have validation feedback. Have others been using this latter option? Is this allowed? I don’t think this has been made clear so far. It’s an issue because it seems that final evaluation will be performed not just on the final period (for which we don’t have labelled data) but for all periods.
about 1 year ago |
+1 to this question. In my opinion, the model (or ensemble of models) should generalize well all periods and when we fit one model for each period by using training data of the same period as a validation, it’s just a leak. @spMohanty could you please clarify this topic?