Principal Global Investors

IEEE Investment Ranking Challenge
Explore methodology to identify assets with extreme positive or negative returns.
Challenge ended
541
Submissions207
Participants13.1 k
Views
benharlander hasn't authored any tutorials yet...

First, a reminder of how the data is organized. For a time period, say
2007_1
, you have the future 6month returns/ranks as of 6/30/2007 and the historic inputs available as of 6/30/2007. Therefore, the data corresponding to2017_1
is from Jan 2017 to June 2017 and you are predicting the rank for the future 6 months (July to Dec 2017). There is more detail in the Data Description file on the “Datasets” tab above. Hope this helps. 
My answer will be similar to the last question. The return/rank data and the inputs at the same time period are nonoverlapping. This means that
X1:X70
would have been available to make predictions for this period, as you are predicting the rank for the second half of 2017 (i.e. the end of2017_1
).
IEEE Investment Ranking Challenge  Use of future data  almost 2 years ago
@lance Hi Lance, answers to your questions below.
1) We will update the overview soon to describe how NDCG will be used for the final ranking. 2) Yes, you are able to select different variables and estimate different coefficients for each time period. You are still restricted to only using the data leading up to each time period. In theory, someone could optimize their results to the test set by running several iterations and making changes to each time period’s ranks. This would be very difficult to do with the limited number of daily submissions and not to mention be a very poor solution in Round 2. We hope this deters contestants from trying. 3) You are provided with feedback on 100% of the test data from 20022016. You are not provided feedback on the results from 2017. We were already restricted to a small set of stocks to test at each period so we decided to provide feedback on the entire test set rather than a subset. Good estimates of Spearman correlation and NDCG outweighed the risk of overly optimized solutions.
Thanks again for sharing your concerns. Look forward to more updates on these topics soon.
IEEE Investment Ranking Challenge  Use of future data  almost 2 years ago
@mkoseoglu The ‘X’ variables from 2007_1 can absolutely be used to make predictions on the future ranks in 2007_1. However, you should not use the observations from 2007_1 to train a model to predict for 2007_1. Is this difference clear?
IEEE Investment Ranking Challenge  Use of future data  almost 2 years ago
Hi everyone, thank you for sharing your concerns.
We are actively working with contestants to verify results on the leaderboard. I will follow up soon with more details on how we plan to make the challenge as fair as possible for all contestants.
IEEE Investment Ranking Challenge  Need a little clarification  almost 2 years ago
Hi @baseline, excellent questions.
With the current train/test split, you are unable to leak future data into the training set when predicting for 2017_1
. However, you need to be more careful for previous periods. For example: When predicting the future 6month ranks for 2007_1
, you should not include observations from any periods after 2006_2
. This means the rank+input observations from 2007_1
are also off limits when making predictions for 2007_1
. In the random forest notebook, you will see that for predicting time_period == time
, the training set included all data up to and not including time
.
Thanks for your questions. This is the joy of time series data and the financial markets! :) Let me know if further clarification is needed.
IEEE Investment Ranking Challenge  Notice: Error in NDCG calculation in evaluation script  almost 2 years ago
Hi everyone,
Yesterday, it was brought to our attention that there was a small, but important error in the evaluation script and random forest starter kit. (Thanks @cyga for the tip!) The calculation for NDCG was using a discount factor of $\frac{n}{ni}$ rather than the correct $\frac{n}{n+i}$. See correct formulation for NDCG on the “Overview” tab.
This error would have affected those that have already submitted solutions through the crowdAI API.
Apologies for this inconvenience. Best of luck on the challenge!
Ben Harlander
Data Science & Operations Research  Principal Global Investors 
IEEE Investment Ranking Challenge  different metric in overview and github example  about 2 years ago
@cyga Thanks for the heads up! I assume you were referring to the DCG calculation in the random forest starter kit. The DCG in the overview is the correct formulation. We will update the starter kit to reflect this.
IEEE Investment Ranking Challenge  Question regarding evaluation  about 2 years ago
@sshkhr Hi, thanks for the question. Frame this problem as if you were making a prediction as of today about the next 6 months. What information would be available to you? If you are making a prediction at time period T, your training set should not include observations from T or any period later than T. Also, your model should not include features that were not available at time T. This is not to say that you have to treat all time periods in your training set as independent, but I don’t think a bidirectional RNN would be appropriate if it relies on the presence of future information to make a prediction for the current time period.
Apparently benharlander prefers to keep an air of mystery about them...
IEEE Investment Ranking Challenge  Details for Round 2  almost 2 years ago
Round 2 is open to all challenge participants. Round 1 focused on prototyping models that maximized statistical measures and Round 2 will enhance this with a deeper dive into your methodology and a new set of holdout data from 2017. To compete, all participants must submit the following items:
Final predictions for all time periods
A brief written solution using the IEEE template for conference proceedings. MS Word and LaTeX are both acceptable. At a minimum, the document should include an introduction, description of your methodology, results, and any other information needed to understand your solution and its merit. Tables, charts, and other visuals are highly encouraged.
All code and files needed to reproduce your results uploaded to Gitlab. (More details on this soon)
The top 6 solutions will be selected based on their statistical performance, calculated in the following manner:
Final score = (A+B+C+D)/4
Where:
A = Rank of spearman correlation on holdout data from 2002 – 2016
B = Rank of NDCG score on holdout data from 2002 – 2016
C = Rank of spearman correlation on holdout data from 2017
D = Rank of NDCG score on holdout data from 2017
All ranks will be determined using 3 significant digits. Performance on the data from 2017 will be used as a tiebreaker if needed. Participants will still have access to the crowdAI API to test their submissions on the original dataset, but no feedback on the holdout from 2017 will be provided until the challenge closes.