Explore methodology to identify assets with extreme positive or negative returns.
26 days left
Using the provided data sets of financial predictors and semi-annual returns, participants are challenged to develop a model that will help identify the best-performing stocks in each time-period.
Research Question: Which stocks will experience the highest and lowest returns during the next six months?
Out of the thousands of stocks in the market, small groups will experience exceptionally high or low returns. Considering the distribution of stock returns, a portfolio manager must buy the stocks in the right tail of the distribution and avoid the stocks in the left tail. The performance of an entire equity portfolio is often driven by these key investment decisions. The goal of this challenge is to explore methodology that will increase the probability that portfolio managers identify these stocks with extreme positive or negative returns.
Each team must create a model that ranks a set of stocks based on the expected return over a forward 6-month window. This model can be a risk factor-based strategy (multi-factor model), predictive model, or any other data-based heuristic. There are many ways to approach this task and creative, non-traditional solutions are strongly encouraged. The final model will be tested on each 6-month period from 2002 to 2017.
Analysts rely on a mix of quantitative and qualitative methodology to help investors consistently outperform the market. It’s not enough to be investment experts. Having the right data at the right time plays a critical role in successfully anticipating economic and environmental changes that may impact investment performance. Personalized solutions can be designed to provide a tailored mix of risk and return. Current baseline solutions rely on simple regressions and/or random forest solutions. Current approaches have high explanatory value and low predictive value. Improved solutions would increase predictive accuracy.
Teams are provided with predictors and semi-annual returns for a group of stocks from
2017. This span of 21 years is represented as 42 non-overlapping 6-month periods. In each of the
42 time periods, roughly 900 stocks with the largest market capitalization (i.e., total market value in USD) were selected. Therefore, the selected set of stocks at each time period changes as companies increase or decrease in value. All stock identifiers have been removed and all numeric variables have been anonymized and normalized. Training and test datasets were created by selecting a random sample of stocks at each time period.
60% of stocks were sampled into the training set and the remaining
40% created the test set. Finally, all data from the second half of 2017 was allocated to the test set. This 6-month period will provide a final out-of-sample test of a model’s performance.
Note : Please refer to the starter-kit to quickly get started with the dataset, train a simple Random Forest based model, and make an example submission to crowdai.
Consistent performance over time and through varying market conditions is crucial for any financial model. Each team must test their model using an expanding window procedure. For a given time period, , an expanding window test allows the model to incorporate all available information up to time , to generate predictions for time . For example, when predicting the stock rankings in the first half of 2016, the model can include all data from 1996 to no later than year-end 2015. Predictions for the second half of 2016 could then include all the data from the first half of 2016. The quality of the predicted rankings at each time period will be evaluated in two ways, described below.
Spearman correlation: This metric will describe the overall relationship between the actual rankings and the predicted rankings from the model. Higher values indicate better performance.
Normalized Discounted Cumulative Gain of Top 20%: In reality, analysts and portfolio managers are not concerned with the entire distribution of stocks. They will instead focus on identifying and buying the best-ranking stocks. Normalized Discounted Cumulative Gain (NDCG) is a metric from the information retrieval domain that considers the relevance and confidence (rank position) to describe a model’s rank quality.
Spearman correlation describes how well a model is ranking the stocks at a given time period. Spearman correlation is calculated using the formula below.
Where $d_i$ is the difference between the predicted and actual ranking of stock i.
Spearman correlation has a range from -1 to 1. Models that rank stocks more accurately will produce higher Spearman correlation values. Correlation values will be averaged across all time periods.
Normalized Discounted Cumulative Gain of Top 20%
Normalized Discounted Cumulative Gain (NDCG) is the ratio between the Discounted Cumulative Gain (DCG) and Ideal Discounted Cumulative Gain (IDCG), shown below.
represents the normalized future 6-month return (Norm_Ret_F6M in the dataset) of the ranked stock. With this formula, stocks with better (lower) predicted ranks will have more influence on the ranking quality than stocks with higher predicted ranks. IDCG is the maximum possible DCG, which gives the NDCG score an upper bound of 1. The NDCG will be calculated for each individual 6-month period and then averaged across all periods.
Note that the NDCG is calculated using only the top 20% of a model’s predicted rankings. Therefore, NDCG rewards correctly identifying stocks in the top 20% and ranking them in the correct order. This aligns with the viewpoint of a ‘long-only’ portfolio manager who will focus on buying the best stocks and ignore stocks outside the top 20%.
A more detailed description of NDCG can be found here. This challenge uses a modified formulation of DCG that is tailored to investment ranking.
Update: The evaluation script was incorrectly calculating NDCG as of the challenge launch. This was fixed 03/28. Solutions submitted prior to this date would have provided incorrect results.
Testing your solution
Throughout the competition, teams will be given the opportunity to evaluate their models on the test dataset. Teams can sign in and upload their predictions up to 5 times per day. This will provide an estimate for out-of-sample performance during the competition. Teams should rely on internal model validation procedures and be careful not to optimize results to this one small section of the test dataset.
March 26 : Challenge launch and start of Round 1 - contestants create models and upload predictions to crowdAI.
April 30 : Deadline for Round 1. All solutions must be submitted by 11:59 GMT; Top solutions from leader board invited to Round 2.
May 1 : Start of Round 2 - Contestants explain their methods, results, and conclusions in short paper. Contestants also package code of submitted solution using Docker for testing and evaluation.
May 16 : Deadline for Round 2. All solutions must be submitted by 11:59 GMT.
May 21 : Top 6 solutions selected; Winners provided travel stipend (maximum USD 1000) and invitation to present at the IEEE Data Science Workshop in Lausanne, Switzerland June 4 - June 6.
Top-6 participants on the leaderboard (except the organizers) will be invited (maximum USD 1000 travel stipend, provided by Principal Financial Group) to present at the IEEE Data Science Workshop in Lausanne, June 4-6, 2018.
A starter kit has been prepared which explains how to get access to the dataset, parse it, train a simple random forest based method, and make a submission. It can be accessed at : https://github.com/crowdAI/ieee_investment_ranking_challenge-starter-kit
- Gitter Channel : crowdAI/ieee-investment-ranking-challenge
- Technical issues : https://github.com/crowdAI/ieee_investment_ranking_challenge-starter-kit/issues
- Discussion Forum : https://www.crowdai.org/challenges/ieee-investment-ranking-challenge/topics
We strongly encourage you to use the public channels mentioned above for communications between the participants and the organisers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :