Cookies help us deliver our services. By using our services, you agree to our use of cookies. Learn more

crowdAI is shutting down - please read our blog post for more information

IEEE Investment Ranking Challenge

Explore methodology to identify assets with extreme positive or negative returns.


Question regarding evaluation

Posted by sshkhr over 3 years ago

Under the evaluation section it is mentioned :

“Each team must test their model using an expanding window procedure. For a given time period, T, an expanding window test allows the model to incorporate all available information up to time T, to generate predictions for time T+1. For example, when predicting the stock rankings in the first half of 2016, the model can include all data from 1996 to no later than year-end 2015. Predictions for the second half of 2016 could then include all the data from the first half of 2016.”

If I am understanding this correctly the training model for making prediction for year 2000 is not allowed to use any data from after the year 2000. However say if this problem is modelled using a sequence model like a Bidirectional Recurrent Neural Net the model will communicate information from the future to improve predictions in the present. Is such a model okay with the evaluation criteria mentioned ?

Posted by sshkhr  over 3 years ago |  Quote

Is it okay if the problem is modelled using a sequence model like Bidirectional Recurrent Neural Network which communicates information from future sequences back to improve present predictions. I wanted to check on this as I am not sure if this is okay under the expanding window test mechanism mentioned under Evaluation.


Posted by benharlander  over 3 years ago |  Quote

@sshkhr Hi, thanks for the question. Frame this problem as if you were making a prediction as of today about the next 6 months. What information would be available to you? If you are making a prediction at time period T, your training set should not include observations from T or any period later than T. Also, your model should not include features that were not available at time T. This is not to say that you have to treat all time periods in your training set as independent, but I don’t think a bidirectional RNN would be appropriate if it relies on the presence of future information to make a prediction for the current time period.