Fascinating study. It gave me some ideas I'd like to apply to commodities as well. Your study involved lots of calculations.. can you describe some of the tech stack elements you use to process the data effectively? Do you do some database-side processing like in duckdb, etc?
Sorry for the late reply, working like crazy these days... I tend to not overcomplicate my tech stack, keeping things as simple as possible.
I mostly code in Python, and have developed programs to:
- Backtest my ideas (an event-based system I've been developing myself since my master's degree many many years ago)
- Forward test (I use IBKR and their native Python API)
- Live trade (same as above)
I mostly use Norgate data, which has a great integration with Python.
You mentioned DB... I prefer using Postgres (never used DuckDB).
In Python, the most important libraries to me are Numpy, Pandas, Scipy, Talib, Celery, and Matplotlib, on top of the standard library (I used threads/concurrency a lot).
I'd suggest a combination of a commercial backtester (like RealTest) and either something you develop yourself or an open-source option (like Zipline).
Using 2 helps me verify my code is working as intended..
Assuming you have a solid base in math (probability, statistics, linear algebra, and a bit of calculus), you could read any of the books I mentioned. Also, you could practice in Kaggle (if you have a bit more free time).
If you are serious about it, I'd strongly recommend a Master's in Computer Science (which is what I did many years ago :))
Very interesting article. In the last version you say that you use a time limit of 6 days, but the summary says that the max trade duration is 14 days, am I missing something? Thanks for sharing your work
Thanks! Several small details might impact the execution and prevent us from getting out of the trade when the time limit triggers the sell order... so, sometimes, a few orders end up taking a bit longer to get executed
Nice !!! This look incredible, I have built something similar , but it’s only based on fundamental datas , I got something similar around (2570% in the last 10 years). I am really curious how is that QPI tool build, I have thought about something like this( some moving avg - std)/volatility, but wasn’t looking that great..
Fascinating study. It gave me some ideas I'd like to apply to commodities as well. Your study involved lots of calculations.. can you describe some of the tech stack elements you use to process the data effectively? Do you do some database-side processing like in duckdb, etc?
Sorry for the late reply, working like crazy these days... I tend to not overcomplicate my tech stack, keeping things as simple as possible.
I mostly code in Python, and have developed programs to:
- Backtest my ideas (an event-based system I've been developing myself since my master's degree many many years ago)
- Forward test (I use IBKR and their native Python API)
- Live trade (same as above)
I mostly use Norgate data, which has a great integration with Python.
You mentioned DB... I prefer using Postgres (never used DuckDB).
In Python, the most important libraries to me are Numpy, Pandas, Scipy, Talib, Celery, and Matplotlib, on top of the standard library (I used threads/concurrency a lot).
It's a great article introducing how we can reconcile financial domain knowledge with statistical thinking. Thank you for your great work!
Such a good read. Can you suggest any framework to perform backtestings like yours?
I'd suggest a combination of a commercial backtester (like RealTest) and either something you develop yourself or an open-source option (like Zipline).
Using 2 helps me verify my code is working as intended..
Thanks a lot man.
Can you suggest any good resources for those interested in ML but with no experience?
Assuming you have a solid base in math (probability, statistics, linear algebra, and a bit of calculus), you could read any of the books I mentioned. Also, you could practice in Kaggle (if you have a bit more free time).
If you are serious about it, I'd strongly recommend a Master's in Computer Science (which is what I did many years ago :))
Cheers!
Very interesting article. In the last version you say that you use a time limit of 6 days, but the summary says that the max trade duration is 14 days, am I missing something? Thanks for sharing your work
Thanks! Several small details might impact the execution and prevent us from getting out of the trade when the time limit triggers the sell order... so, sometimes, a few orders end up taking a bit longer to get executed
Great article. I too use ML to trade and your article gave me a few ideas that I can try out in my own strategies. Appreciated
Thanks Martyn! That was exactly my intention! I'm glad it helped! Cheers!
I’m really enjoying your material. This was above my head for the most part but interesting to read nonetheless.
Thanks Cory!
Nice !!! This look incredible, I have built something similar , but it’s only based on fundamental datas , I got something similar around (2570% in the last 10 years). I am really curious how is that QPI tool build, I have thought about something like this( some moving avg - std)/volatility, but wasn’t looking that great..
very innovative indeed. i have a technical question. how to estimate the prob of bouncing back? The ml model only outputs 0 and 1
Thanks! And no, that’s wrong: the ML outputs a float number between 0 and 1. Cheers!
yeah, i see. i should use predict_proba() in the xgb_clf, which outputs the probability for the binary classification.
Awesome read!, what were the evaluation scores of the xgboost model? Also was it one model or one model per ticker?