Cookies help us deliver our services. By using our services, you agree to our use of cookies. Learn more

crowdAI is shutting down - please read our blog post for more information

IEEE Investment Ranking Challenge

Explore methodology to identify assets with extreme positive or negative returns.


Use of future data

Posted by lance almost 4 years ago

Hello all

I’m sorry to say that I mistakenly used data from the current period to make predictions on the same period, so my leaderboard score is somewhat inflated. From my initial attempts at correcting this I’ve not been able to get anywhere near my previous score, so the gain from doing it seems to be quite significant. That said, what is there to stop someone from ‘cheating’ in this way, obtaining a big advantage with regards to leaderboard scores? If I understand correctly, final evaluation will include the period from 2002 onwards, so the advantage will remain in the final evaluation of the model.



Posted by Kain  almost 4 years ago |  Quote

After reading the overview again, I think I made the same mistake!


Posted by mkoseoglu  almost 4 years ago |  Quote

This is really very confusing. In the word doc, it says: “Norm_Ret_F6M: The forward 6-month return of a stock, normalized by the mean and standard deviation of the returns in each time_period. For an observation from time_period 2007_1, this would represent the normalized return of the stock from 07/2007 to the end of 12/2007. “

So it means that we should be able to use X variables which belong to 2007_1 in estimating Norm_Ret_F6M from 2007_1. A clarification is needed.

Also your point about cheating is so true, it can very much affect the results in the leaderboard, hence affect who will be invited to the second round.

Posted by lance  almost 4 years ago |  Quote

Yes, it would be nice to have some clarification from the admins. We could volunteer submissions to be removed from the leaderboard but how will you guard against people cheating inadvertently and thus gaining an unfair advantage when it comes to progression to round 2? In addition I’d appreciate some clarification with regards to the following:

How will NDCG be used to grade models? In the overview there’s reference made to the use of NDCG in grading but with regards to leaderboard position it seems that NDCG is completely ignored at the moment.

Are we allowed to fit different models to different data periods? This has the potential to significantly improve scores by basically overfitting - it’s unlikely to improve performance in real usage on future data, so I think it shouldn’t be allowed. Again, if not, how will you guard against this?

What proportion of the test data is used to calculate leaderboard scores? Does this proportion include any of the data in the final period? Presumably final evaluation will be performed on the whole of the test set.

Posted by habedi  almost 4 years ago |  Quote

I think(not 100% sure because I couldn’t reproduce the same score; I discarded the old code) that my top score is maybe incorrect as for the others here, I may have used the labeled data from the same time window(6 months that tests are in; I re-read the docs and it’s said that we ought to use the data up to the last 6 month time window prior to the window were tests are) to build a model, I suspect it because my recent scores are much lower(close to the scores of the most of the other participants).

Is it possible to level my scores to score of the newest submission I had?

Posted by benharlander  almost 4 years ago |  Quote

Hi everyone, thank you for sharing your concerns.

We are actively working with contestants to verify results on the leaderboard. I will follow up soon with more details on how we plan to make the challenge as fair as possible for all contestants.

Posted by benharlander  almost 4 years ago |  Quote

@mkoseoglu The ‘X’ variables from 2007_1 can absolutely be used to make predictions on the future ranks in 2007_1. However, you should not use the observations from 2007_1 to train a model to predict for 2007_1. Is this difference clear?

Posted by benharlander  almost 4 years ago |  Quote

@lance Hi Lance, answers to your questions below.

1) We will update the overview soon to describe how NDCG will be used for the final ranking. 2) Yes, you are able to select different variables and estimate different coefficients for each time period. You are still restricted to only using the data leading up to each time period. In theory, someone could optimize their results to the test set by running several iterations and making changes to each time period’s ranks. This would be very difficult to do with the limited number of daily submissions and not to mention be a very poor solution in Round 2. We hope this deters contestants from trying. 3) You are provided with feedback on 100% of the test data from 2002-2016. You are not provided feedback on the results from 2017. We were already restricted to a small set of stocks to test at each period so we decided to provide feedback on the entire test set rather than a subset. Good estimates of Spearman correlation and NDCG outweighed the risk of overly optimized solutions.

Thanks again for sharing your concerns. Look forward to more updates on these topics soon.

Posted by mkoseoglu  almost 4 years ago |  Quote

@benharlander It is clear now. Thank you!

Posted by lance  almost 4 years ago |  Quote

@benharlander Thanks for your answers. Regarding point two, it’s not necessary to probe the test data in this way - we can quite easily just overfit to the validation data, and this will likely result in test data improvements, because the distributions appear to be very similar. But improvements made in this way give an overly optimistic estimate on performance on future data, when we don’t have the benefit of validation feedback. I can see why you would allow different models as I think it’s very difficult to fairly penalise overfitting in this way, but it’s something to bear in mind as presumably your aim is ultimately to have a model that generalises well to future data.

Posted by lance  almost 4 years ago |  Quote

To be clear, by different models I mean models with grossly different hyperparameters