Stock Prediction in Python (2024)

Published in

Towards Data Science

Stocker is a Python tool for stock exploration. Once we have the required libraries installed (check out the documentation) we can start a Jupyter Notebook in the same folder as the script and import the Stocker class:

from stocker import Stocker

The class is now accessible in our session. We construct an object of the Stocker class by passing it any valid stock ticker (bold is output):

amazon = Stocker('AMZN')AMZN Stocker Initialized. Data covers 1997-05-16 to 2018-01-18.

Just like that we have 20 years of daily Amazon stock data to explore! Stocker is built on the Quandl financial library and with over 3000 stocks to use. We can make a simple plot of the stock history using the plot_stockmethod:

amazon.plot_stock()Maximum Adj. Close = 1305.20 on 2018-01-12.
Minimum Adj. Close = 1.40 on 1997-05-22.
Current Adj. Close = 1293.32.

The analysis capabilities of Stocker can be used to find the overall trends and patterns within the data, but we will focus on predicting the future price. Predictions in Stocker are made using an additive model which considers a time series as a combination of an overall trend along with seasonalities on different time scales such as daily, weekly, and monthly. Stocker uses the prophet package developed by Facebook for additive modeling. Creating a model and making a prediction can be done with Stocker in a single line:

# predict days into the future
model, model_data = amazon.create_prophet_model(days=90)Predicted Price on 2018-04-18 = $1336.98

Notice that the prediction, the green line, contains a confidence interval. This represents the model’s uncertainty in the forecast. In this case, the confidence interval width is set at 80%, meaning we expect that this range will contain the actual value 80% of the time. The confidence interval grows wide further out in time because the estimate has more uncertainty as it gets further away from the data. Any time we make a prediction we must include a confidence interval. Although most people tend to want a simple answer about the future, our forecast must reflect that we live in an uncertain world!

Anyone can make stock predictions: simply pick a number and that’s your estimate (I might be wrong, but I’m pretty sure this is all people on Wall Street do). For us to trust our model we need to evaluate it for accuracy.There are a number of methods in Stocker for assessing model accuracy.

To calculate accuracy, we need a test set and a training set. We need to know the answers — the actual stock price — for the test set, so we will use the past one year of historical data (2017 in our case). When training, we do not let our model see the answers to the test set, so we use three years of data previous to the testing time frame (2014–2016). The basic idea of supervised learning is the model learns the patterns and relationships in the data from the training set and then is able to correctly reproduce them for the test data.

We need to quantify our accuracy, so we using the predictions for the test set and the actual values, we calculate metrics including average dollar error on the testing and training set, the percentage of the time we correctly predicted the direction of a price change, and the percentage of the time the actual price fell within the predicted 80% confidence interval. All of these calculations are automatically done by Stocker with a nice visual:

amazon.evaluate_prediction()Prediction Range: 2017-01-18 to 2018-01-18.Predicted price on 2018-01-17 = $814.77.
Actual price on 2018-01-17 = $1295.00.
Average Absolute Error on Training Data = $18.21.
Average Absolute Error on Testing Data = $183.86.
When the model predicted an increase, the price increased 57.66% of the time.
When the model predicted a decrease, the price decreased 44.64% of the time.
The actual value was within the 80% confidence interval 20.00% of the time.

Those are abysmal stats! We might as well have flipped a coin. If we were using this to invest, we would probably be better off buying something sensible like lottery tickets. However, don’t give up on the model just yet. We usually expect a first model to be rather bad because we are using the default settings (called hyperparameters). If our initial attempts are not successful, we can turn these knobs to make a better model. There are a number of different settings to adjust in a Prophet model, with the most important the changepoint prior scale which controls the amount of weight the model places on shifts in the trend of the data.

Changepoints represent where a time series goes from increasing to decreasing or from increasing slowly to increasingly rapidly (or vice versa). They occur at the places with the greatest change in the rate of the time series. The changepoint prior scale represents the amount of emphasis given to the changepoints in the model. This is used to control overfitting vs. underfitting (also known as the bias vs. variance tradeoff).

A higher prior creates a model with more weight on the changepoints and a more flexible fit. This may lead to overfitting because the model will closely stick to the training data and not be able to generalize to new test data. Lowering the prior decreases the model flexibility which can cause the opposite problem: underfitting. This occurs when our model does not follow the training data closely enough and fails to learn the underlying patterns. Figuring out the proper settings to achieve the right balance is more a matter of engineering than of theory, and here we must rely on empirical results. The Stocker class contains two different ways to choose an appropriate prior: visually and quantitatively. We can start off with the graphical method:

# changepoint priors is the list of changepoints to evaluate
amazon.changepoint_prior_analysis(changepoint_priors=[0.001, 0.05, 0.1, 0.2])

Here, we are training on three years of data and then showing predictions for six months. We do not quantify the predictions here because we are just trying to understand the role of the changepoint prior. This graph does a great job of illustrating under- vs overfitting! The lowest prior, the blue line, does not follow the training data, the black observations , very closely. It kind of does its own thing and picks a route through the general vicinity of the data. In contrast, the highest prior, the yellow line, sticks to the training observations as closely as possible. The default value for the changepoint prior is 0.05 which falls somewhere in between the two extremes.

Notice also the difference in uncertainty (shaded intervals) for the priors. The lowest prior has the largest uncertainty on the training data, but the smallest uncertainty on the test data. In contrast, the highest prior has the smallest uncertainty on the training data but the greatest uncertainty on the test data. The higher the prior, the more confident it is on the training data because it closely follows each observation. When it comes to the test data however, an overfit model is lost without any data points to anchor it. As stocks have quite a bit of variability, we probably want a more flexible model than the default so the model can capture as many patterns as possible.

Now that we have an idea of the effect of the prior, we can numerically evaluate different values using a training and validation set:

amazon.changepoint_prior_validation(start_date='2016-01-04', end_date='2017-01-03', changepoint_priors=[0.001, 0.05, 0.1, 0.2])Validation Range 2016-01-04 to 2017-01-03. cps train_err train_range test_err test_range
0.001 44.507495 152.673436 149.443609 153.341861
0.050 11.207666 35.840138 151.735924 141.033870
0.100 10.717128 34.537544 153.260198 166.390896
0.200 9.653979 31.735506 129.227310 342.205583

Here, we have to be careful that our validation data is not the same as our testing data. If this was the case, we would create the best model for the test data, but then we would just be overfitting the test data and our model could not translate to real world data. In total, as is commonly done in data science, we are using three different sets of data: a training set (2013–2015), a validation set (2016), and a testing set (2017).

We evaluated four priors with four metrics: training error, training range (confidence interval), testing error, and testing range (confidence interval) with all values in dollars. As we saw in the graph, the higher the prior, the lower the training error and the lower the uncertainty on the training data. We also see that a higher prior decreases our testing error, backing up our intuition that closely fitting to the data is a good idea with stocks. In exchange for greater accuracy on the test set, we get a greater range of uncertainty on the test data with the increased prior.

The Stocker prior validation also displays two plots illustrating these points:

Since the highest prior produced the lowest testing error, we should try to increase the prior even higher to see if we get better performance. We can refine our search by passing in additional values to the validation method:

# test more changepoint priors on same validation range
amazon.changepoint_prior_validation(start_date='2016-01-04', end_date='2017-01-03', changepoint_priors=[0.15, 0.2, 0.25,0.4, 0.5, 0.6])

The test set error is minimized at a prior of 0.5. We will set the changepoint prior attribute of the Stocker object appropriately.

amazon.changepoint_prior_scale = 0.5

There are other settings of the model we can adjust, such as the patterns we expect to see, or the number of training years of data the model uses. Finding the best combination simply requires repeating the above procedure with a number of different values. Feel free to try out any settings!

Evaluating Refined Model

Now that our model is optimized, we can again evaluate it:

amazon.evaluate_prediction()Prediction Range: 2017-01-18 to 2018-01-18.Predicted price on 2018-01-17 = $1164.10.
Actual price on 2018-01-17 = $1295.00.
Average Absolute Error on Training Data = $10.22.
Average Absolute Error on Testing Data = $101.19.
When the model predicted an increase, the price increased 57.99% of the time.
When the model predicted a decrease, the price decreased 46.25% of the time.
The actual value was within the 80% confidence interval 95.20% of the time.

That looks better! This shows the importance of model optimization. Using default values provides a reasonable first guess, but we need to be sure we are using the correct model “settings,” just like we try to optimize how a stereo sounds by adjusting balance and fade (sorry for the outdated reference).

Making predictions is an interesting exercise, but the real fun is looking at how well these forecasts would play out in the actual market. Using the evaluate_prediction method, we can “play” the stock market using our model over the evaluation period. We will use a strategy informed by our model which we can then compare to the simple strategy of buying and holding the stock over the entire period.

The rules of our strategy are straightforward:

On each day the model predicts the stock to increase, we purchase the stock at the beginning of the day and sell at the end of the day. When the model predicts a decrease in price, we do not buy any stock.
If we buy stock and the price increases over the day, we make the increase times the number of shares we bought.
If we buy stock and the price decreases, we lose the decrease times the number of shares.

We play this each day for the entire evaluation period which in our case is 2017. To play, add the number of shares to the method call. Stocker will inform us how the strategy played out in numbers and graphs:

# Going big 
amazon.evaluate_prediction(nshares=1000)You played the stock market in AMZN from 2017-01-18 to 2018-01-18 with 1000 shares.When the model predicted an increase, the price increased 57.99% of the time.
When the model predicted a decrease, the price decreased 46.25% of the time.
The total profit using the Prophet model = $299580.00.
The Buy and Hold strategy profit = $487520.00.
Thanks for playing the stock market!

This shows us a valuable lesson: buy and hold! While we would have made a considerable sum playing our strategy, the better bet would simply have been to invest for the long term.

We can try other test periods to see if there are times when our model strategy beats the buy and hold method. Our strategy is rather conservative because we do not play when we predict a market decrease, so we might expect to do better than a holding strategy when the stock takes a downturn.

I knew our model could do it! However, our model only beat the market when we were had the benefit of hindsight to choose the test period.

Future Predictions

Now that we are satisfied we have a decent model, we can make future predictions using the predict_future() method.

amazon.predict_future(days=10)
amazon.predict_future(days=100)

The model is overall bullish on Amazon as are most “professionals.” Additionally, the uncertainty increases the further out in time we make estimates as expected. In reality, if we were using this model to actively trade, we would train a new model every day and would make predictions for a maximum of one day in the future.

While we might not get rich from the Stocker tool, the benefit is in the development rather than the end results! We can’t actually know if we can solve a problem until we try but it’s better to have tried and failed than to have never tried at all! For anyone interested in checking out the code or using Stocker themselves, it is available on GitHub.

As always, I enjoy feedback and constructive criticism. I can be reached on Twitter @koehrsen_will.

FAQs

Can you use Python to predict stock prices? ›

Python has become a valuable tool for financial analysis, allowing you to forecast stock prices and make well-informed decisions with just a few lines of code. In this guide, we'll take you through a straightforward and powerful approach using the Prophet library.

Read On ›

What is the most accurate stock predictor? ›

1. AltIndex – Overall Most Accurate Stock Predictor with Claimed 72% Win Rate. From our research, AltIndex is the most accurate stock predictor to consider today. Unlike other predictor services, AltIndex doesn't rely on manual research or analysis.

Discover More Details ›

What is the best algorithm for stock prediction? ›

The LSTM algorithm has the ability to store historical information and is widely used in stock price prediction (Heaton et al.

Can Python help in stock market? ›

Python is a versatile programming language that is well-suited for stock market analysis due to its extensive data analysis capabilities. This introduction will provide an overview of key concepts and techniques for using Python in financial analysis.

See Details ›

Is it illegal to use AI to predict stocks? ›

Absolutely, it is legal to use AI in the stock market, and many traders and investors are increasingly turning to AI-powered trading platforms like Ethereum Code to enhance their strategies. These platforms leverage AI algorithms to analyze market data, identify trends, and generate trading signals.

Find Out More ›

Which AI model is best for stock prediction? ›

We screened 69 titles and read 43 systematic reviews, including more than 379 studies, before retaining 10 for the final dataset. This work revealed that support vector machines (SVM), long short-term memory (LSTM), and artificial neural networks (ANN) are the most popular AI methods for stock market prediction.

Tell Me More ›

Can you trust stock predictions? ›

While there is no guarantee, the changes in ratings on a company may indicate the direction of their buying patterns. If they start "initial coverage," it may mean that they are considering adding the stock to their portfolios or have already started accumulating the stock.

Show Me More ›

Which stock prediction models are best? ›

Which machine learning algorithm is best for stock prediction? A. LSTM (Long Short-term Memory) is one of the extremely powerful algorithms for time series. It can catch historical trend patterns & predict future values with high accuracy.

Explore More ›

What is the formula for predicting stocks? ›

2.4 Future PE-EPS Method

This method of predicting future price of a stock is based on a basic formula. The formula is shown above (P/E x EPS = Price). According to this formula, if we can accurately predict a stock's future P/E and EPS, we will know its accurate future price.

Which algorithm is best for prediction? ›

Logistic regression is a popular algorithm for predicting a binary outcome, such as “yes” or “no,” based on previous data set observations.

Show Me More ›

Which regression is best for stock prediction? ›

One approach that can be successful for investors and is available in most charting tools is linear regression. Linear regression analyzes two separate variables in order to define a single relationship. In chart analysis, this refers to the variables of price and time.

Read The Full Story ›

Why is stock prediction difficult? ›

One challenge is the presence of multiple factors, both macro and micro, that influence stock prices, such as politics, global economy, and firm performance . Another challenge is the high volatility and nonlinear behavior of the stock market, making it difficult to accurately predict price movements .

See Details ›

Is Python enough for algo trading? ›

Python is the language of choice for algorithmic trading due to its simplicity, versatility, and strong support in libraries or frameworks. It's open source and enjoys good support from various communities.

Get More Info Here ›

Is Python fast enough for trading? ›

Although slower than other programming languages such as Java, C++, or C#, it is more than fast enough for most trading applications.

How to predict stock price in Python? ›

Objectives

Download financial data (Google stock data) from Yahoo Finance using Python.
Read Data from your local machine.
Explore the dataset for a better understanding.
Preprocess the dataset.
Train a regression model.
Test the model.
Evaluate the model.

Nov 30, 2023

Is Python good for forecasting? ›

Python provides libraries that make it easy for data scientist beginners to get started learning how to implement time series forecasting models when carrying out time series forecasting in Python.

View Details ›

What is the best way to predict stock prices? ›

What is the best way to predict stock prices? The best way to predict long-term stock prices is with fundamental analysis. The best way to predict short-term stock prices is with technical analysis.

Can you automate stock trading with Python? ›

We can analyze the stock market, figure out trends, develop trading strategies, and set up signals to automate stock trading – all using Python! The process of algorithmic trading using Python involves a few steps such as selecting the database, installing certain libraries, and historical data extraction.

Learn More ›

How to do prediction using Python? ›

Building Predictive Analytics using Python: Step-by-Step Guide

Load the data. To start with python modeling, you must first deal with data collection and exploration. ...
Data pre-processing. ...
Descriptive stats. ...
Feature engineering. ...
Dataset preparation. ...
Identify the variable. ...
Model development. ...
Hyperparameter tuning.

More items...

Discover More Details ›