Statistical Arbitrage

Quantitativo

Nov 10, 2024

Can we get over 20% of annual returns uncorrelated with market risk?

Read →

25 Comments

Carlos Mata

Nov 11

Very nice strategy and clear analysis.

With Russell 3000 you may find lots of stocks which would not be shortable. A larger cap universe (Russell 1000 for example) would make it more likely for shorts to be possible. IB provides stock margin and short borrow on their FTP in these links:

Python

import pandas as pd

StockMargin=pd.read_csv(‘ftp://shortstock:%20@ftp3.interactivebrokers.com/stockmargin_final_dtls.IBLLC-US.dat’,delimiter='|’,skiprows=1)

ShortBorrow=pd.read_csv(‘ftp://shortstock:%20@ftp3.interactivebrokers.com/usa.txt’,delimiter='|’,skiprows=1)

Maybe you can try balancing out the longs and shorts over the same industry or sector clusters. Like 2 long/shorts in oil & gas, 2 in tech, etc.. Clusters can be identified with PCA & DBscan or other methods. Etc.... In my rough initial research this increases the Sharpe ratio.

I'm researching a stat arb strategy also and your post gave me some ideas. Thanks for that. All the best!

Expand full comment

Reply (2)

Quantitativo

Nov 11

Thanks, my friend! Great points! Looking forward to reading your study! Cheers

Expand full comment

BruceTheTurtle

Nov 12

I like the idea of finding the clusters at the sub-industry level, this would make it more like a pair-trading strategy.

Expand full comment

Quant Returns

Apr 24

Hey, nice strategy and great analysis. I have one question about the annual returns showed for each decile. Isn’t an annual log return of 30% for decile 1, astronomically high and unrealistic? When you convert that to an actual annual percentage return it becomes a huge annual return number.

I may be missing something, could you kindly clarify?

Expand full comment

Reply (2)

Quant Returns

Apr 24

Thanks for the quick feedback and clarity. Do the monotonic returns shown also count for meme stocks and bio? I know you said you filter these out but I wondered if you did that when determining the edge or later when you create a strategy. I think it would make sense to see if there is an edge with this filter on?

Expand full comment

Reply (1)

Quantitativo

Apr 25

No, I removed meme stocks and biotech. The key thing about these stocks is that holding them overnight exposes us to extreme positive/negative returns.

You can try to see if there's an edge with them. But even if there is, that's a risk I wouldn't be willing to carry... cheers!

Expand full comment

Quantitativo

Apr 24

Thanks! The returns for each decile are computed as the mean of all returns for that specific decile throughout all periods in the history. Because of that, if we look at them individually, the absolute numbers might not make sense compared to the actual equity curve (which is the return of the top decile minus the return of the bottom decile for each period, sequentially compounded).

The most important aspect of the chart, though, is to see the monotonic relationship: it proves the model has predictive power and can be used to inform a potentially profitable strategy.

Now, there is another element not covered in the article, which is another world in itself: execution. That's why I said "potentially". This model will only translate into a profitable strategy if we manage to execute it such that the costs don't kill the edge, and that we can execute the shorts (not always available).

Cheers!

Expand full comment

Jack Tang

Nov 19

Interesting article! I am interested in the return prediction model, and my question is that did you build one prediction model or every model for every stock. In the "The edge" section i read the prediction model is built on every stock. Just want to confirm it :)

Expand full comment

Reply (1)

Quantitativo

Nov 19

Hi, thanks! It's a single model for all stocks

Expand full comment

Chris Lehman

Nov 13

Great article. A couple of questions:

1. Why break the strategy into three buckets with different start days? In the fullness of time one bucket that starts every three trading days would rotate across each day of the week so if the win rate stays constant I do t see the value of three tranches with different starts.

2. Do you factor in margin costs with strategies that include a short leg?

Thanks again for all of your content!

Expand full comment

Reply (2)

Quantitativo

Nov 13

Thanks!

1. Because if you don't, you might end up with differences depending on each day you start (e.g., if you start on a Monday, you end up with a 28% annual return; on a Tuesday, 24%; on a Wednesday, 30%; the results obtained by rotating the 3 buckets are more robust, more indicative of what we could expect on live trading)

2. No, I do not have access to historical margin costs... do you have access to this kind of data? If yes, can you point me to where can I get it?

Thanks again!

Expand full comment

Chris Lehman

Nov 13

I don’t have a way to estimate historical margin costs. I just didn’t know of you baked those cost into your backtest with some kind of assumption. Thanks again!

Expand full comment

Ricardo

Nov 12

Great post! Are commisions included?

Expand full comment

Reply (1)

Quantitativo

Nov 13

Yes, 10 bps in every trade to account for trading costs

Expand full comment

Luca Micciche

Nov 11

Very interesting post !

If you want to increase the risk-adjusted return of the strategy, it could be an idea to use the sharpe ratio as the target variable instead of the returns (use for example a 1 week Sharpe ratio computed using a winsorized volatility).

Expand full comment

Raekon

Nov 27

I've been doing ML approaches for quite some time and your results look a bit too good to be true to be fair, make sure you double/triple check the implementation details. For example, things like standardization on the whole dataset rather than using only available data up to the row we're currently backtesting (as you would do in a real-life scenario), can very easily introduce some look-ahead bias and invalidate your strategy. This can also happen with certain types of cross-validation and ways the model is being trained. Posting the code that led to these results would be a good way to have it peer-reviewed and potentially waste a lot of time and money on a model that might have subtle flaws.

Expand full comment

Chris Lehman

Nov 24

Do you train your Long and Short legs separately or as part of one big model? Is your experience that training them separately improves performance? Thanks.

Expand full comment

Reply (1)

Quantitativo

Nov 24

Everything is trained in a single model. I don't think it makes sense to train the legs separately... especially because, if you think, you will realize that stat arb strategies value things in relation to other things. The value is relative. So, the more "things" you provide the model, the better comparisons it will make.

In other words... there are days when the model predicts that all stocks will go down. These days, the system goes long on the stocks that it predicts will go less down, and short on the stocks that it predicts will go more down. Same happens in the other direction.

Expand full comment

Andrea Chiavazza

Nov 15

Really interesting article

I’ve a question about the retraining of the model, you’ve said that you’ve trained the model on a 10 years window with a sliding windows approach, now I think that in order to trade the strategy daily you’re retraining the model each day to be as accurate as possible in the prediction for the future returns

The question that I have is how do you make sure that after each retraining, given the new information on the prices of the stocks that the model comes into possession of, the stocks in which you had invested in the previous days are not changed, for example if the model told you to invest on day x on AAPL long and on MSFT short you are sure that if AAPL does -10% and MSFT +10% at the time of retraining on day x+1 the model cannot tell you that for day x the advice was long on MSFT and short on AAPL?

I hope the question is clear and if anyone wants to give me an answer I can't wait to hear it since I'm new to this quant world

Expand full comment

Chris Lehman

Nov 15

Did you try a number of the different Regression Models you mentioned and if yes, did you see a lot wide difference in the results? (I'm not asking for you to divulge the specific model you're using unless you're happy to share that). Thanks.

Expand full comment

Reply (1)

Quantitativo

Nov 15

Yes, and yes. The differences obviously depend on hyperparameters

Expand full comment

Nam Nguyen Ph.D.

Nov 13

Great job. Good to see that stat arb is still working.

Expand full comment

Chris Lehman

Nov 13

If this is a ranking based strategy taking the top N_LONGS and bottom N_SHORTS why would you ever have cash idle? I assume the answer is your setting probability thresholds to put on positions. If that is the case do you require the strategy to always have an equal number of Longs and Shorts?

Expand full comment

Reply (1)

Quantitativo

Nov 13

Because the longs offset the shorts. If you start with $100K:

- When you go long N positions worth $100K, you generate a negative cashflow of $100K

- When you go short N positions worth $100K, you generate a positive cashflow of $100K

Then, excluding what your broker uses as margin collateral for the short positions (e.g., $50K), you end up with cash idle on your account (e.g., 100 - 100 + 100 - 50 = $50K).

Expand full comment

Reply (1)

Chris Lehman

Nov 13

Ahhh, thank you for the explanation. So am I right to assume that with 3% position sizes and 40 positions (20 Long and 20 Short) there would be gross exposure of 120%?

Expand full comment

Quant Trading Rules

Statistical Arbitrage