Over the past several months I’ve spent a great deal of my time learning about a topic know as support vector machines. It is one of those topics that only a math or computer science person would ever care to study. That’s because SVM is a specific type of machine learning algorithm. One of its main uses is to classify data points into various categories. Given a set of attributes the algorithm is able to make its best guess as to what category a specific data point fits into. From a math standpoint this is an optimization problem. That’s because the algorithm aims to specify a hyper-plane in a multidimensional space that best separates data points into various categories. Consider the simple example below from Wikipedia:
The figure above demonstrates a very simple 2D example of what SVM aims to accomplish. It separates black points from white points based on characteristics x1 and x2 of each point. In more complicated examples the number of characteristics can be expanded to include many more than just two. Because of this, SVM is useful in enabling our technology to learn and make decisions. Think about all that technology can do nowadays. Algorithms like SVM allow our machines to accomplish nontrivial tasks that once might have seemed impossible.
My use of SVM is in the realm of finance. More specifically, the stock market. My goal with this study has been to construct a model with which I could look to for guidance of whether or not the stock market is going to rise. A simple yes or no. I was not aiming to predict specific dollar amounts. The problem with predicting the stock market is that it moves in a stochastic (random) fashion in the short run. Processes that are truly random cannot be predicted. That’s why we use the word random. However, literature on this subject has conjectured that maybe the stock market is actually some sort of nonlinear process. So even though the day to day fluctuations may seem random to us, there may actually be a highly nonlinear process at work. This is where SVM may be of use. A certain aspect of support vector machines is that they utilize what is known as a kernel trick. This allows us to map the inputs of the model into a higher dimension. Thus, we can attempt to model a nonlinear process.
I’m not going to go into deep detail of my method because it may get boring and because this is the internet and I don’t want my ideas stolen. However, I will share the basics of how I set it up. First, the model that I constructed collected weekly data and would make a prediction whether the stock market would be higher or lower in 8 weeks time. The inputs I included into the model included 5 common stock market technical indicators: RSI, MFI, SO, MACD, and PMO. I calculated each of these for 5 major stock indexes: Nasdaq Composite Index, S&P 100, S&P 500, Dow Jones Industrial Average, and Russell 2000. Next I combined the predictions from 5 separate SVM models– one for each of the stock indexes, and within each model I treated weekly data as if it were monthly data.* When I combined the predictions from the 5 models I implemented a specific rule-based logic that would decide whether or not the overall combined model would put forth a prediction or not. In other words, if the combined model wasn’t super sure of its prediction one week then it would instead decline to predict. Better to not participate than be wrong in my opinion.
Once I was finished constructing the model it was necessary to optimize 10 different parameters. Each of the individual SVM component models used a specific kernel trick known as the radial basis function kernel. Normally the radial basis function has three parameters. However, I elected to hold one of those constant and optimized the remaining two. This portion of the research took the longest because optimizing 10 different parameters is not an easy task. Brute forcing it would take forever and ever. However, I did some reading and found a way around the problem. I do admit, though, the model I have today still may not be completely optimized, but it certainly seems to be working well enough.
Now I will move along to the fun part: the results. The investigative period was from April 1, 2001 to April 1, 2016– so 15 years of data. Of all the weeks that were examined, the model made a prediction about 57% of the time. Of those weeks selected for prediction, 82% proved to be correct (S&P 500). Also, the model predicted a rise in the stock market about 91% of the time (Remember, 91% of 57% of weeks). So it would appear that the model does a really good job at finding the moments in time when the market is poised to run higher. These predictions were robust to the other stock indexes as well. It predicted with greater than 80% accuracy the direction of the other 4 indexes. However, this isn’t very surprising because the stock indexes are all highly correlated.
In my opinion this model has proved to produce solid results, but I will accept these conclusions with a grain of salt until I run some more tests. I’ve reread the Java code I wrote to perform the tests many times and haven’t found any mistakes. But that doesn’t mean there aren’t any. Also, models are subject to over-fitting problems. Support vector machines do a great job to avoid this for the most part though. Training of the model using the SVM algorithm involves cross-fold validation. This is just a fancy way of saying the data is broken up into small segments and verified several times. In my case, I used 7-fold validation. So the data I used was broken into 7 separate components. Then, 6 of the 7 components would be used to train and predict the 7th component. This process was repeated 7 times until all data had been predicted. This process helps to ensure predictions are performed as out-of-sample. This is important because if someone designs a model and then uses that model to predict the data that was used to design the model the logic is recursive and an over-fitting problem may be present. As I continue to work with this model, I hope to further ensure that an over-fitting problem does not exist. Also, I am going to start tracking the model’s predictions in real-time to see if it works.
*The prior statements are vague on purpose because I think they were the key to my success. Of the vast array of papers I read on this subject, I never witnessed anyone approach this problem using the prior two steps. However, I cannot claim to have read everything so maybe someone else has tried a similar approach before.