Welcome to GrabNGoInfo! In this tutorial, we will talk about hyperparameter tuning and regularization for time series model using prophet in Python. You will learn:

- What are the hyperparameters for a prophet time series model?
- Among all the hyperparameters, what hyperparameters should be tuned, and what hyperparameters are better not to be tuned?
- Among all the hyperparameters that should be tuned, what hyperparameters should be tuned automatically vs. manually?
- How are hyperparameter tunings related to regularization?
- How does data transformation impact the model performance?

If you are not familiar with Prophet, please check out my previous tutorial Time Series Forecasting Of Bitcoin Prices Using Prophet, Multivariate Time Series Forecasting with Seasonality and Holiday Effect Using Prophet, and 3 Ways for Multiple Time Series Forecasting Using Prophet.

**Resources for this post:**

- Python code is at the end of the post. Click here for the Colab notebook
- More video tutorials on time series and hyperparameter tuning
- More blog posts on time series and hyperparameter tuning
- If you prefer the video version of the tutorial, watch the video below on YouTube.

Let’s get started!

### Step 0: Overview of All the Hyperparameters for a Prophet Model

In step 0, we will provide an overview of all the hyperparameters for a prophet model. The hyperparameters are divided into three groups:

- The first group contains hyperparameters that are suitable for automatic tuning. We can specify a list of values and do a grid search for the best value combination. For more information about grid search, please refer to my previous tutorial Hyperparameter Tuning For XGBoost
- The second group contains hyperparameters that are suitable for manual tuning. A human needs to make a judgment on what hyperparameter value to use based on knowledge about data and business.
- The third group contains the hyperparameters that are better left untuned with the default values.

**Group 1: Hyperparameters Suitable for Automatic Tuning**

There are four hyperparameters that are suitable to be tuned automatically using grid search.

`changepoint_prior_scale`

is probably the most impactful parameter according to the Prophet documentation. It determines the scale of the change at the time series trend change point. It is an L1 LASSO regularization penalty term. To learn more about regularization, please refer to my previous tutorial LASSO (L1) Vs Ridge (L2) Vs Elastic Net Regularization For Classification Model- When the scale is too small, the trend does not change much at the change point, and the model tends to underfit. This is because the variance caused by the trend change is likely to be considered as noise with a small
`changepoint_prior_scale`

value. - When the scale is too large, the trend changes a lot at the change point, and the model tends to overfit. The variance caused by noise is considered to be part of the trend.
- The default value for
`changepoint_prior_scale`

is 0.05, and the recommended tuning range is 0.001 to 0.5. This hyperparameter is usually tuned on a log scale.

- When the scale is too small, the trend does not change much at the change point, and the model tends to underfit. This is because the variance caused by the trend change is likely to be considered as noise with a small
`seasonality_prior_scale`

controls the magnitude of the seasonality fluctuation. It is an L2 Ridge regularization penalty term.- When the scale is too small, the magnitude of the seasonality shrinks to a very small value.
- When the scale is too large, the magnitude of the seasonality allows very large fluctuations.
- The default value for
`seasonality_prior_scale`

is 10, meaning that no regularization is applied. The recommended tuning range is 0.01 to 10, where a smaller value corresponds to a smaller magnitude of seasonality. This hyperparameter is usually tuned on a log scale.

`holidays_prior_scale`

determines the scale of holiday effects, and is very similar to the`seasonality_prior_scale`

. The default value is 10, meaning that no regularization is applied. The recommended tuning range is 0.01 to 10, where a smaller value corresponds to a smaller magnitude of holidays.`seasonality_mode`

has two options, additive and multiplicative.- The additive model adds trend, seasonality, and other effects together when making predictions. It is appropriate for the time series models with relatively constant seasonal variation over time.
- The multiplicative model multiplies trend, seasonality, and other effects together when making predictions. It is appropriate for the time series models with increasing or decreasing seasonal variation over time.

**Group 2: Hyperparameters Suitable for Manual Tuning**

Some hyperparameters need to be evaluated by a human-based on business knowledge or data observations for manual tuning.

`changepoint_range`

is a value between 0 and 1 indicating the percentage of historical data that allow a trend change.- The default value is 0.8, meaning that the first 80% of the data allows trend changes, and the last 20% of the data does not allow a trend change. This is because there are not enough data at the end of the time series to identify a trend change with confidence.
- The person who builds the model can increase or decrease the value manually by examining the time series shape and model performance for the last 20% of data.

`growth`

has two options, linear and logistic. The default value is linear, and it can be manually changed to logistic if there is a known growth saturating point.`changepoints`

manually specifying the dates of changepoints. The default value is None, which allows the model to automatically identify and place the change points on the trend. However, if there are any known events such as the rebranding of the business, the corresponding date can be manually added.`yearly_seasonality`

has three options, auto, True, and False. The default is auto, which automatically turns the yearly seasonality on when there are at least two cycles of data.- We can manually set
`yearly_seasonality`

to be True to force the yearly seasonality if there are less than two years of data. `yearly_seasonality`

is usually not set to False, because it’s more effective to leave it on and turn down the seasonal effects by tuning`seasonality_prior_scale`

[2].

- We can manually set
`weekly_seasonality`

and`daily_seasonality`

can be handled in the same way as the`yearly_seasonality`

.`holidays`

takes in a dataframe with specified holidays and special events. We can manually include or exclude holidays or events, or change the number of days impacted by the holiday to tune this hyperparameter. The magnitude of the holiday effects should be tuned by`holidays_prior_scale`

.

**Group 3: Hyperparameters Suitable for No Tuning**

There are five hyperparameters that should not be tuned.

`n_changepoints`

is the number of changepoints on the trend. The default value is 25. Rather than increasing or decreasing the number of changepoints, the prophet documentation[2] suggested focusing on increasing or decreasing the flexibility at those trend changes, which is done with`changepoint_prior_scale`

.`interval_width`

determines the uncertainty interval for the predictions. The default value is 0.8, meaning that the prediction upper bound`yhat_upper`

and the prediction lower bound`yhat_lower`

are based on the 80% uncertainty interval.`interval_width`

does not impact the model prediction, so there is no need to tune this hyperparameter.`uncertainty_samples`

is the number of samples used for uncertainty interval calculation. The default value is 1000, and increasing the value decreases the variance of the uncertainty interval. This hyperparameter does not impact the model prediction, so there is no need to tune it.`mcmc_samples`

is an integer that determines if the model uses a full Bayesian inference or a Maximum a Posterior (MAP) for model training and prediction.- When it’s greater than 0, a full Bayesian inference with the specified number of MCMC samples will be used. Bayesian inference usually needs at least 1000 samples to get reasonable results.
- When it equals to 0, a Maximum a Posterior (MAP) estimation will be used. The default value for
`mcmc_samples`

is 0, indicating that Maximum a Posterior (MAP) estimation is used by default. The prophet documentation suggests leaving this hyperparameter unchanged[2].

`stan_backend`

needs to be specified when both`pystan`

and`cmdstanpy`

backends are set up. This hyperparameter does not impact model training and prediction, so there is no need to tune it.

### Step 1: Install and Import Libraries

In the first step, we will install and import libraries.

`yfinance`

is the python package for pulling stock data from Yahoo Finance. `prophet`

is the package for the time series model. After installing `yfinance`

and `prophet`

, they are imported into the notebook.

We also import `pandas`

and `numpy`

for data processing, `seaborn`

and `matplotlib`

for visualization, and `itertools`

, `cross_validation`

, and `performance_metrics`

for hyperparameter tuning.

# Install libraries !pip install yfinance prophet # Data processing import pandas as pd import numpy as np # Get time series data import yfinance as yf # Prophet model for time series forecast from prophet import Prophet # Visualization import seaborn as sns import matplotlib.pyplot as plt # Hyperparameter tuning import itertools from prophet.diagnostics import cross_validation, performance_metrics

### Step 2: Pull Data

The second step pulls stock data from Yahoo Finance API. We will pull 2 years of daily data from the beginning of 2020 to the end of 2021.

`start_date = '2020-01-02'`

because January 1st is a holiday, and there is no stock data on holidays and weekends.`end_date = '2022-01-01'`

because`yfinance`

excludes the end date, so we need to add one day to the last day of the data end date.

# Data start date start_date = '2020-01-02' # Data end date end_date = '2022-01-01' # yfinance excludes the end date, so we need to add one day to the last day of data

The goal of the time series model is to predict the closing price of Google’s stock, so Google’s ticker `GOOG`

is used for pulling the data.

Prophet requires at least two columns as inputs: a `ds`

column and a `y`

column.

- The
`ds`

column has the time information. Currently we have the date as the index, so we reset the index and rename`date`

to`ds`

. - The y column has the time series values. In this example, because we are predicting Google’s closing price, the column name for the price is changed to
`y`

.

# Pull close data from Yahoo Finance for the list of tickers ticker_list = ['GOOG'] data = yf.download(ticker_list, start=start_date, end=end_date)[['Close']] # Change column names data = data.reset_index() data.columns = ['ds', 'y'] # Take a look at the data data.head()

Using `.info`

, we can see that the dataset has 505 records and there are no missing values.

# Data information data.info()

Output

<class 'pandas.core.frame.DataFrame'> RangeIndex: 505 entries, 0 to 504 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ds 505 non-null datetime64[ns] 1 y 505 non-null float64 dtypes: datetime64[ns](1), float64(1) memory usage: 8.0 KB

Next, let’s visualize the closing prices of the two tickers using `seaborn`

, and add the legend to the plot using `matplotlib`

. We can see that the price for Google increased a lot starting in late 2020, and almost doubled in late 2021.

# Visualize data using seaborn sns.set(rc={'figure.figsize':(12,8)}) sns.lineplot(x=data['ds'], y=data['y']) plt.legend(['Google'])

### Step 3: Baseline Model Using Default Hyperparameters

In step 3, we will build a baseline model using the default prophet hyperparameters.

# Initiate the model baseline_model = Prophet() # Fit the model on the training dataset baseline_model.fit(data)

Output

INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. <prophet.forecaster.Prophet at 0x7fd0536db410>

Prophet automatically fits daily, weekly, and yearly seasonalities if the time series is more than two cycles long.

The model information shows that the yearly seasonality and the daily seasonality are disabled.

- The daily seasonality is disabled because we do not have sub-daily time series.
- The yearly seasonality is disabled although we have two years of data because there are no stock price data on holidays and weekends, so we have less than 365 data points for each year.

Next, let’s do cross-validation for the baseline model to get the model performance. Prophet has a `cross_validation`

function to automate the comparison between the actual and the predicted values.

`model=baseline_model`

specifies the model name.`initial='200 days'`

means the initial model will be trained on the first 200 days of data.`period='30 days'`

means 30 days will be added to the training dataset for each additional model.`horizon = '30 days'`

means that the model forecasts the next 30 days. When only`horizon`

is given, Prophet defaults`initial`

to be triple the`horizon`

, and`period`

to be half of the`horizon`

.`parallel="processes"`

enables parallel processing for cross-validation. When the parallel cross-validation can be done on a single machine, “processes” provide the highest performance. For larger problems,`dask`

can be used to do cross-validation on multiple machines.

# Cross validation baseline_model_cv = cross_validation(model=baseline_model, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") baseline_model_cv.head()

Based on the output from the cross-validation, we can get the model performance using the method `performance_metrics`

.

- The model performance metrics are calculated using a rolling window. The average performance metrics values are calculated for each value of the horizon. Since we have the horizon value of 30 days, there will be 30 average values, one for each day.
`rolling_window`

determines the window size, which is the percentage of forecasted data points to include in the calculation.`rolling_window=0`

computes performance metrics separately for each horizon.`rolling_window=0.1`

is the default value, which computes performance metrics using about 10% of the predictions in each window.`rolling_window=1`

computes performance metrics using all the forecasted data. We are using`rolling_window=1`

in this tutorial. to get a single performance metric number.`rolling_window<0`

computes each data point without averaging (i.e.,`MSE`

will actually be a squared error with no mean)[3]

# Model performance metrics baseline_model_p = performance_metrics(baseline_model_cv, rolling_window=1) baseline_model_p.head()

Prophet provides six commonly used performance metrics:

- Mean Squared Error(MSE) sums up the squared difference between actual and prediction and is divided by the number of predictions.
- Root Mean Square Error(RMSE) takes the square root of MSE.
- Mean Absolute Error(MAE) sums up the absolute difference between actual and prediction and is divided by the number of predictions.
- Mean Absolute Percentage Error(MAPE) sums up the absolute percentage difference between actual and prediction and is divided by the number of predictions. MAPE is independent of the magnitude of data, so it can be used to compare different forecasts. But it’s undefined when the actual value is zero.
- Median Absolute Percentage Error(MDAPE) is similar to MAPE. The difference is that it calculates the median instead of taking the average of the absolute percentage difference.
- Symmetric Mean Absolute Percentage Error(SMAPE) is similar to MAPE. The difference is that when calculating absolute percentage error, the denominator is the actual value for MAPE and the average of the actual and predicted value for SMAPE.

We can see that the baseline model using default hyperparameters has 4.45% Mean Absolute Percentage Error (MAPE), meaning that the predictions are 4.45% off the actual values.

# Get the performance metric value baseline_model_p['mape'].values[0]

Output

0.04445730639004379

### Step 4: Models with Manual Hyperparameter Changes

In step 4, we will check the hyperparameters for manual tuning.

`changepoint_range`

will be increased from 0.8 to 0.9 to see if there is a trend change near the end of the time series. This is because the shape of the time series seems to flatter starting around October of 2021 based on human observation.`growth`

is kept at the default value`linear`

because there is no saturating point for the stock price data.`changepoints`

is kept at the default value is None, which allows the model to automatically identify and place the change points on the trend. This is because there are not any known big events that are likely to impact the Google stock prices in the time range used for the model.`yearly_seasonality`

is kept at the automated value of False because we do not have more than 2 cycles of yearly data for the model training.`weekly_seasonality`

is kept at the automated value of True because we have more than 2 cycles of weekly data.`daily_seasonality`

is kept at the automated value of False because we do not have sub-daily data.`holidays`

takes in a dataframe with specified holidays and special events. We can manually include two special events, COVID start and Super Bowl.

The manually changed hyperparameters will be evaluated one by one. If the change improves the model performance, we will keep the change, otherwise, we will switch back to the default values.

**Step 4.1: Change Point Range**

Firstly, let’s increase `changepoint_range`

from 0.8 to 0.9 to see if there is a trend change near the end of the time series.

# Initiate the model manual_model = Prophet(changepoint_range=0.9) # Fit the model on the training dataset manual_model.fit(data) # Cross validation manual_model_cv = cross_validation(manual_model, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") # Model performance metrics manual_model_p = performance_metrics(manual_model_cv, rolling_window=1) manual_model_p['mape'].values[0]

Output

0.05132206898098496

The Mean Absolute Percentage Error (MAPE) from cross-validation is 5.13%, which is higher than the baseline model of 4.45%, indicating that we should not increase the value of `changepoint_range`

.

**Step 4.2: Holidays**

Next, let’s create two special events: COVID start and Super Bowl.

- For the COVID start event, we set the date to be March 15th of 2020, then extend the event to 15 days before and 15 days after using
`lower_window`

and`upper_window`

separately. - For the Super Bowl event, we set the date in 2020 and 2021 separately, and extend the event to 7 days before and 1 day after.

Then the two events are concatenated together into one dataframe called `events`

.

# COVID time window COVID = pd.DataFrame({ 'holiday': 'COVID', 'ds': pd.to_datetime(['2020-03-15']), 'lower_window': -15, 'upper_window': 15, }) # Super Bowl time window superbowl = pd.DataFrame({ 'holiday': 'superbowl', 'ds': pd.to_datetime(['2020-02-02', '2021-02-07']), 'lower_window': -7, 'upper_window': 1, }) # Combine all events events = pd.concat((COVID, superbowl)) # Take a look at the events data events

After adding the special events, the Mean Absolute Percentage Error (MAPE) from cross-validation is 4.64%, which is higher than the baseline model of 4.45%, indicating that the special events are not predictive of the stock prices. Therefore, we should not include special events in the model.

# Add special events manual_model = Prophet(holidays=events) # Fit the model on the training dataset manual_model.fit(data) # Cross validation manual_model_cv = cross_validation(manual_model, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") # Model performance metrics manual_model_p = performance_metrics(manual_model_cv, rolling_window=1) manual_model_p['mape'].values[0]

Output

0.046355656191352575

### Step 5: Automatic Hyperparameter Tuning

There are four hyperparameters that are suitable to be tuned automatically using grid search: `changepoint_prior_scale`

, `seasonality_prior_scale`

, `holidays_prior_scale`

, and `seasonality_mode`

.

`holidays_prior_scale`

is not applicable because there is no holiday or special events included in the model. We will use grid search to find the best values for the other three hyperparameters.

- Firstly, four values for
`changepoint_prior_scale`

, five values for`seasonality_prior_scale`

, and two values for`seasonality_mode`

are used to set up the parameter grid. - Then, all combinations of parameters are created and saved in a variable called
`all_params`

. `mapes`

is an empty list created for storing the Mean Absolute Percentage Error (MAPE) for each combination of parameters.- After that, we loop through each parameter combination to build a time series model, and save the cross-validation performance results in the list called
`mapes`

. - Finally, we get the best parameters based on the minimum value of Mean Absolute Percentage Error (MAPE), and build a new model using the best parameters.

# Set up parameter grid param_grid = { 'changepoint_prior_scale': [0.001, 0.05, 0.08, 0.5], 'seasonality_prior_scale': [0.01, 1, 5, 10, 12], 'seasonality_mode': ['additive', 'multiplicative'] } # Generate all combinations of parameters all_params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())] # Create a list to store MAPE values for each combination mapes = [] # Use cross validation to evaluate all parameters for params in all_params: # Fit a model using one parameter combination m = Prophet(**params).fit(data) # Cross-validation df_cv = cross_validation(m, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") # Model performance df_p = performance_metrics(df_cv, rolling_window=1) # Save model performance metrics mapes.append(df_p['mape'].values[0]) # Tuning results tuning_results = pd.DataFrame(all_params) tuning_results['mape'] = mapes # Find the best parameters best_params = all_params[np.argmin(mapes)] print(best_params)

Output

{'changepoint_prior_scale': 0.05, 'seasonality_prior_scale': 1.0, 'seasonality_mode': 'additive'}

We can see that the best parameters are `changepoint_prior_scale`

equals 0.05, `seasonality_prior_scale`

equals 1, and `seasonality_mode`

equals `additive`

.

# Fit the model using the best parameters auto_model = Prophet(changepoint_prior_scale=best_params['changepoint_prior_scale'], seasonality_prior_scale=best_params['seasonality_prior_scale'], seasonality_mode=best_params['seasonality_mode']) # Fit the model on the training dataset auto_model.fit(data) # Cross validation auto_model_cv = cross_validation(auto_model, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") # Model performance metrics auto_model_p = performance_metrics(auto_model_cv, rolling_window=1) auto_model_p['mape'].values[0]

Output

0.044162208439371395

The Mean Absolute Percentage Error (MAPE) from cross-validation is 4.42% for the model with the best parameters, which is better than the baseline model of 4.45%.

### Step 6: Automatic Hyperparameter Tuning using Log Data

The prophet model documentation[2] mentioned some hyperparameters are best tuned in log scale. In step 6, we will transform the data to the log form, and then do the automatic hyperparameter tuning.

- Firstly, a copy of the modeling data is created with the name
`data_log`

- Then, the log scale data is created by taking the natural log of the stock prices.
- After that, the original stock prices are deleted and the log scale stock prices are renamed to
`y`

. - Lastly, the new log-scale data is used for grid search, and the best parameters are used for the final model.

# Create a copy of the data data_log = data.copy() # Create the log scale data by taking the natual log of the stock prices. data_log['y_log'] = np.log(data['y']) # Delete the stock price and rename the log scale stock price to y data_log = data_log.drop('y', axis=1).rename(columns={'y_log': 'y'}) # Take a look at the data data_log.head()

# Parameter grid param_grid = { 'changepoint_prior_scale': [0.001, 0.05, 0.08, 0.5], 'seasonality_prior_scale': [0.01, 1.0, 5, 10, 12], 'seasonality_mode': ['additive', 'multiplicative'] } # Generate all combinations of parameters all_params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())] mapes = [] # Store the MAPEs for each params here # Use cross validation to evaluate all parameters for params in all_params: # Fit a model using one parameter combination m = Prophet(**params).fit(data_log) # Cross-validation df_cv = cross_validation(m, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") # Model performance df_p = performance_metrics(df_cv, rolling_window=1) # Save model performance metrics mapes.append(df_p['mape'].values[0]) # Tuning results best_params = all_params[np.argmin(mapes)] # Best parameters print(best_params) # Train model using best parameters auto_model_log = Prophet(changepoint_prior_scale=best_params['changepoint_prior_scale'], seasonality_prior_scale=best_params['seasonality_prior_scale'], seasonality_mode=best_params['seasonality_mode']) # Fit the model on the training dataset auto_model_log.fit(data_log) # Cross validation auto_model_log_cv = cross_validation(auto_model_log, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") # Model performance metrics auto_model_log_p = performance_metrics(auto_model_log_cv, rolling_window=1) auto_model_log_p['mape'].values[0]

Output

{'changepoint_prior_scale': 0.08, 'seasonality_prior_scale': 0.01, 'seasonality_mode': 'additive'} 0.006668734217672668

The Mean Absolute Percentage Error (MAPE) from cross-validation is 0.67% for the log-scale model with the best parameters, which is much better than the baseline model of 4.45%.

### Put All Code Together

#--------------------------------------------# # Step 1: Install and Import Libraries #--------------------------------------------# # Install libraries !pip install yfinance prophet # Data processing import pandas as pd import numpy as np # Get time series data import yfinance as yf # Prophet model for time series forecast from prophet import Prophet # Visualization import seaborn as sns import matplotlib.pyplot as plt # Hyperparameter tuning import itertools from prophet.diagnostics import cross_validation, performance_metrics #--------------------------------------------# # Step 2: Pull Data #--------------------------------------------# # Data start date start_date = '2020-01-02' # Data end date end_date = '2022-01-01' # yfinance excludes the end date, so we need to add one day to the last day of data # Pull close data from Yahoo Finance for the list of tickers ticker_list = ['GOOG'] data = yf.download(ticker_list, start=start_date, end=end_date)[['Close']] # Change column names data = data.reset_index() data.columns = ['ds', 'y'] # Take a look at the data data.head() # Data information data.info() # Visualize data using seaborn sns.set(rc={'figure.figsize':(12,8)}) sns.lineplot(x=data['ds'], y=data['y']) plt.legend(['Google']) #--------------------------------------------# # Step 3: Baseline Model Using Default Hyperparameters #--------------------------------------------# # Initiate the model baseline_model = Prophet() # Fit the model on the training dataset baseline_model.fit(data) # Cross validation baseline_model_cv = cross_validation(model=baseline_model, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") baseline_model_cv.head() # Model performance metrics baseline_model_p = performance_metrics(baseline_model_cv, rolling_window=1) baseline_model_p.head() # Get the performance value baseline_model_p['mape'].values[0] #--------------------------------------------# # Step 4: Models with Manual Hyperparameter Changes #--------------------------------------------# # Initiate the model manual_model = Prophet(changepoint_range=0.9) # Fit the model on the training dataset manual_model.fit(data) # Cross validation manual_model_cv = cross_validation(manual_model, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") # Model performance metrics manual_model_p = performance_metrics(manual_model_cv, rolling_window=1) manual_model_p['mape'].values[0] # COVID time window COVID = pd.DataFrame({ 'holiday': 'COVID', 'ds': pd.to_datetime(['2020-03-15']), 'lower_window': -15, 'upper_window': 15, }) # Super Bowl time window superbowl = pd.DataFrame({ 'holiday': 'superbowl', 'ds': pd.to_datetime(['2020-02-02', '2021-02-07']), 'lower_window': -7, 'upper_window': 1, }) # Combine all events events = pd.concat((COVID, superbowl)) # Take a look at the events data events # Add special events manual_model = Prophet(holidays=events) # Fit the model on the training dataset manual_model.fit(data) # Cross validation manual_model_cv = cross_validation(manual_model, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") # Model performance metrics manual_model_p = performance_metrics(manual_model_cv, rolling_window=1) manual_model_p['mape'].values[0] #--------------------------------------------# # Step 5: Automatic Hyperparameter Tuning #--------------------------------------------# # Set up parameter grid param_grid = { 'changepoint_prior_scale': [0.001, 0.05, 0.08, 0.5], 'seasonality_prior_scale': [0.01, 1, 5, 10, 12], 'seasonality_mode': ['additive', 'multiplicative'] } # Generate all combinations of parameters all_params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())] # Create a list to store MAPE values for each combination mapes = [] # Use cross validation to evaluate all parameters for params in all_params: # Fit a model using one parameter combination m = Prophet(**params).fit(data) # Cross-validation df_cv = cross_validation(m, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") # Model performance df_p = performance_metrics(df_cv, rolling_window=1) # Save model performance metrics mapes.append(df_p['mape'].values[0]) # Tuning results tuning_results = pd.DataFrame(all_params) tuning_results['mape'] = mapes # Find the best parameters best_params = all_params[np.argmin(mapes)] print(best_params) # Fit the model using the best parameters auto_model = Prophet(changepoint_prior_scale=best_params['changepoint_prior_scale'], seasonality_prior_scale=best_params['seasonality_prior_scale'], seasonality_mode=best_params['seasonality_mode']) # Fit the model on the training dataset auto_model.fit(data) # Cross validation auto_model_cv = cross_validation(auto_model, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") # Model performance metrics auto_model_p = performance_metrics(auto_model_cv, rolling_window=1) auto_model_p['mape'].values[0] #--------------------------------------------# # Step 6: Automatic Hyperparameter Tuning using Log Data #--------------------------------------------# # Create a copy of the data data_log = data.copy() # Create the log scale data by taking the natual log of the stock prices. data_log['y_log'] = np.log(data['y']) # Delete the stock price and rename the log scale stock price to y data_log = data_log.drop('y', axis=1).rename(columns={'y_log': 'y'}) # Take a look at the data data_log.head() # Parameter grid param_grid = { 'changepoint_prior_scale': [0.001, 0.05, 0.08, 0.5], 'seasonality_prior_scale': [0.01, 1.0, 5, 10, 12], 'seasonality_mode': ['additive', 'multiplicative'] } # Generate all combinations of parameters all_params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())] mapes = [] # Store the MAPEs for each params here # Use cross validation to evaluate all parameters for params in all_params: # Fit a model using one parameter combination m = Prophet(**params).fit(data_log) # Cross-validation df_cv = cross_validation(m, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") # Model performance df_p = performance_metrics(df_cv, rolling_window=1) # Save model performance metrics mapes.append(df_p['mape'].values[0]) # Tuning results best_params = all_params[np.argmin(mapes)] # Best parameters print(best_params) # Train model using best parameters auto_model_log = Prophet(changepoint_prior_scale=best_params['changepoint_prior_scale'], seasonality_prior_scale=best_params['seasonality_prior_scale'], seasonality_mode=best_params['seasonality_mode']) # Fit the model on the training dataset auto_model_log.fit(data_log) # Cross validation auto_model_log_cv = cross_validation(auto_model_log, initial='200 days', period='30 days', horizon = '30 days', parallel="processes") # Model performance metrics auto_model_log_p = performance_metrics(auto_model_log_cv, rolling_window=1) auto_model_log_p['mape'].values[0]

### Summary

In this tutorial, we talked about hyperparameter tuning and regularization for time series model using prophet in Python. You learned:

- What are the hyperparameters for a prophet time series model?
- Among all the hyperparameters, what hyperparameters should be tuned, and what hyperparameters are better not to be tuned?
- Among all the hyperparameters that should be tuned, what hyperparameters should be tuned automatically vs. manually?
- How are hyperparameter tunings related to regularization?
- How does data transformation impact the model performance?

For more information about data science and machine learning, please check out myÂ YouTube channelÂ andÂ Medium PageÂ or follow me onÂ LinkedIn.

### Recommended Tutorials

- GrabNGoInfo Machine Learning Tutorials Inventory
- Time Series Anomaly Detection Using Prophet in Python
- 3 Ways for Multiple Time Series Forecasting Using Prophet in Python
- Time Series Forecasting Of Bitcoin Prices Using Prophet
- Multivariate Time Series Forecasting with Seasonality and Holiday Effect Using Prophet in Python
- Four Oversampling And Under-Sampling Methods For Imbalanced Classification Using Python
- Recommendation System: User-Based Collaborative Filtering
- Sentiment Analysis Without Modeling: TextBlob vs. VADER vs. Flair

### References

[1] PSU Stat510 Applied Time Series Analysis Chapter 5.1 Decomposition Models

ClementThanks for another fantastic article. The place else

may just anyone get that type of information in such an ideal way of writing?

I have a presentation subsequent week, and I’m at the look for such info.

DaisyI have learn a few excellent stuff here. Definitely

value bookmarking for revisiting. I wonder how a lot attempt you put to create one

of these great informative web site.

AmyThank you, Daisy! Glad to hear your like the content!

PerryHello there! I could have sworn I’ve been to this blog before but after browsing through some of the post I realized it’s new to me.

Nonetheless, I’m definitely glad I found it and I’ll be bookmarking and checking back frequently!

AmyThank you, Perry! I am glad to hear you found the blog helpful!

https://bestweighingscale.inHi to every body, it’s my first go to see of this website; this weblog includes awesome and actually excellent data

designed for visitors.

AmyThank you!

https://shoaibtechtips.com/rajbet/Iâ€™m not that much of a online reader to be honest

but your blogs really nice, keep it up! I’ll go ahead and bookmark your site to come back later on. Many

thanks

AmyThank you!

jobloo.inYou made some decent points there. I looked on the web to learn more about the issue and found most individuals will go

along with your views on this web site.

parimatch promotionsAn outstanding share! I’ve just forwarded this onto a co-worker who has been doing a little research on this.

And he in fact ordered me dinner due to the fact that

I discovered it for him… lol. So let me reword this…. Thank

YOU for the meal!! But yeah, thanx for spending time to talk about this topic

here on your blog.

AlanThanks for sharing your knowledge, especially about turning the data into log data to improve the model. About that I’m having trouble returning the values â€‹â€‹to the real numerical notation. In the end, when converting the auto_model_log_cv dataframe to normal numerical notation, the [‘y’] values â€‹â€‹are not equal to the real data in normal notation and [‘yhat’] is greatly overestimating the model in relation to df_cv[[‘y’, ‘yhat’]] initial prediction with normal notation. The question is, Is there any way to keep the performance improvement achieved with log data to display the actual data in normal notation? Thank you very much.