With Russell 3000 you may find lots of stocks which would not be shortable. A larger cap universe (Russell 1000 for example) would make it more likely for shorts to be possible. IB provides stock margin and short borrow on their FTP in these links:
Maybe you can try balancing out the longs and shorts over the same industry or sector clusters. Like 2 long/shorts in oil & gas, 2 in tech, etc.. Clusters can be identified with PCA & DBscan or other methods. Etc.... In my rough initial research this increases the Sharpe ratio.
I'm researching a stat arb strategy also and your post gave me some ideas. Thanks for that. All the best!
Interesting article! I am interested in the return prediction model, and my question is that did you build one prediction model or every model for every stock. In the "The edge" section i read the prediction model is built on every stock. Just want to confirm it :)
1. Why break the strategy into three buckets with different start days? In the fullness of time one bucket that starts every three trading days would rotate across each day of the week so if the win rate stays constant I do t see the value of three tranches with different starts.
2. Do you factor in margin costs with strategies that include a short leg?
1. Because if you don't, you might end up with differences depending on each day you start (e.g., if you start on a Monday, you end up with a 28% annual return; on a Tuesday, 24%; on a Wednesday, 30%; the results obtained by rotating the 3 buckets are more robust, more indicative of what we could expect on live trading)
2. No, I do not have access to historical margin costs... do you have access to this kind of data? If yes, can you point me to where can I get it?
I don’t have a way to estimate historical margin costs. I just didn’t know of you baked those cost into your backtest with some kind of assumption. Thanks again!
If you want to increase the risk-adjusted return of the strategy, it could be an idea to use the sharpe ratio as the target variable instead of the returns (use for example a 1 week Sharpe ratio computed using a winsorized volatility).
I’ve a question about the retraining of the model, you’ve said that you’ve trained the model on a 10 years window with a sliding windows approach, now I think that in order to trade the strategy daily you’re retraining the model each day to be as accurate as possible in the prediction for the future returns
The question that I have is how do you make sure that after each retraining, given the new information on the prices of the stocks that the model comes into possession of, the stocks in which you had invested in the previous days are not changed, for example if the model told you to invest on day x on AAPL long and on MSFT short you are sure that if AAPL does -10% and MSFT +10% at the time of retraining on day x+1 the model cannot tell you that for day x the advice was long on MSFT and short on AAPL?
I hope the question is clear and if anyone wants to give me an answer I can't wait to hear it since I'm new to this quant world
Did you try a number of the different Regression Models you mentioned and if yes, did you see a lot wide difference in the results? (I'm not asking for you to divulge the specific model you're using unless you're happy to share that). Thanks.
If this is a ranking based strategy taking the top N_LONGS and bottom N_SHORTS why would you ever have cash idle? I assume the answer is your setting probability thresholds to put on positions. If that is the case do you require the strategy to always have an equal number of Longs and Shorts?
Because the longs offset the shorts. If you start with $100K:
- When you go long N positions worth $100K, you generate a negative cashflow of $100K
- When you go short N positions worth $100K, you generate a positive cashflow of $100K
Then, excluding what your broker uses as margin collateral for the short positions (e.g., $50K), you end up with cash idle on your account (e.g., 100 - 100 + 100 - 50 = $50K).
Ahhh, thank you for the explanation. So am I right to assume that with 3% position sizes and 40 positions (20 Long and 20 Short) there would be gross exposure of 120%?
Very nice strategy and clear analysis.
With Russell 3000 you may find lots of stocks which would not be shortable. A larger cap universe (Russell 1000 for example) would make it more likely for shorts to be possible. IB provides stock margin and short borrow on their FTP in these links:
Python
import pandas as pd
StockMargin=pd.read_csv(‘ftp://shortstock:%20@ftp3.interactivebrokers.com/stockmargin_final_dtls.IBLLC-US.dat’,delimiter='|’,skiprows=1)
ShortBorrow=pd.read_csv(‘ftp://shortstock:%20@ftp3.interactivebrokers.com/usa.txt’,delimiter='|’,skiprows=1)
Maybe you can try balancing out the longs and shorts over the same industry or sector clusters. Like 2 long/shorts in oil & gas, 2 in tech, etc.. Clusters can be identified with PCA & DBscan or other methods. Etc.... In my rough initial research this increases the Sharpe ratio.
I'm researching a stat arb strategy also and your post gave me some ideas. Thanks for that. All the best!
Thanks, my friend! Great points! Looking forward to reading your study! Cheers
I like the idea of finding the clusters at the sub-industry level, this would make it more like a pair-trading strategy.
Interesting article! I am interested in the return prediction model, and my question is that did you build one prediction model or every model for every stock. In the "The edge" section i read the prediction model is built on every stock. Just want to confirm it :)
Hi, thanks! It's a single model for all stocks
Great article. A couple of questions:
1. Why break the strategy into three buckets with different start days? In the fullness of time one bucket that starts every three trading days would rotate across each day of the week so if the win rate stays constant I do t see the value of three tranches with different starts.
2. Do you factor in margin costs with strategies that include a short leg?
Thanks again for all of your content!
Thanks!
1. Because if you don't, you might end up with differences depending on each day you start (e.g., if you start on a Monday, you end up with a 28% annual return; on a Tuesday, 24%; on a Wednesday, 30%; the results obtained by rotating the 3 buckets are more robust, more indicative of what we could expect on live trading)
2. No, I do not have access to historical margin costs... do you have access to this kind of data? If yes, can you point me to where can I get it?
Thanks again!
I don’t have a way to estimate historical margin costs. I just didn’t know of you baked those cost into your backtest with some kind of assumption. Thanks again!
Great post! Are commisions included?
Yes, 10 bps in every trade to account for trading costs
Very interesting post !
If you want to increase the risk-adjusted return of the strategy, it could be an idea to use the sharpe ratio as the target variable instead of the returns (use for example a 1 week Sharpe ratio computed using a winsorized volatility).
Really interesting article
I’ve a question about the retraining of the model, you’ve said that you’ve trained the model on a 10 years window with a sliding windows approach, now I think that in order to trade the strategy daily you’re retraining the model each day to be as accurate as possible in the prediction for the future returns
The question that I have is how do you make sure that after each retraining, given the new information on the prices of the stocks that the model comes into possession of, the stocks in which you had invested in the previous days are not changed, for example if the model told you to invest on day x on AAPL long and on MSFT short you are sure that if AAPL does -10% and MSFT +10% at the time of retraining on day x+1 the model cannot tell you that for day x the advice was long on MSFT and short on AAPL?
I hope the question is clear and if anyone wants to give me an answer I can't wait to hear it since I'm new to this quant world
Did you try a number of the different Regression Models you mentioned and if yes, did you see a lot wide difference in the results? (I'm not asking for you to divulge the specific model you're using unless you're happy to share that). Thanks.
Yes, and yes. The differences obviously depend on hyperparameters
Great job. Good to see that stat arb is still working.
If this is a ranking based strategy taking the top N_LONGS and bottom N_SHORTS why would you ever have cash idle? I assume the answer is your setting probability thresholds to put on positions. If that is the case do you require the strategy to always have an equal number of Longs and Shorts?
Because the longs offset the shorts. If you start with $100K:
- When you go long N positions worth $100K, you generate a negative cashflow of $100K
- When you go short N positions worth $100K, you generate a positive cashflow of $100K
Then, excluding what your broker uses as margin collateral for the short positions (e.g., $50K), you end up with cash idle on your account (e.g., 100 - 100 + 100 - 50 = $50K).
Ahhh, thank you for the explanation. So am I right to assume that with 3% position sizes and 40 positions (20 Long and 20 Short) there would be gross exposure of 120%?