Support Vector Machine (SVM) Hyperparameter Tuning In Python

Support Vector Machine (SVM) Hyperparameter Tuning In Python

Support Vector Machine (SVM) is a supervised machine learning model for classifications and regressions. Since SVM is commonly used for classification, we will use the classification model as an example in this tutorial. We will cover:

  • What’s the intuition for the Support Vector Machine (SVM) algorithm?
  • What are the most critical hyperparameters for Support Vector Machine (SVM)?
  • How to tune hyperparameters for Support Vector Machine (SVM) using grid search, random search, and Bayesian optimization?

Resources for this post:

Step 1: Support Vector Machine (SVM) algorithm

In step 1, we will discuss the intuition behind the Support Vector Machine (SVM) algorithm. At a high level, the algorithm follows three steps:

  1. Create hyperplanes that separate the classes.
  2. Compare the margin of the hyperplanes and pick the hyperplane with the largest margin.
    • Margin is the shortest distance between the hyperplane and the data points.
    • Maximal Margin Classifier picks a hyperplane that maximizes the margin. One drawback of the Maximal Margin Classifier is that it is sensitive to the outliers in the training dataset.
    • Support Vector Machine (SVM) solves the sensitivity problem by allowing misclassifications. The margin is called a soft margin when misclassifications are permitted. Therefore, a Support Vector Classifier is also called a Soft Margin Classifier. The data points on edge and within the soft margin are called support vectors.
    • The number of misclassifications allowed in the soft margin is determined by comparing the cross-validation results. The one with the best cross-validation result will be selected.
  3. Make predictions for the new data points based on which side of the hyperplane the new data falls in.

Step 2: Support Vector Machine (SVM) Hyperparameters

In step 2, we will discuss the hyperparameters for Support Vector Machine (SVM).

In python’s sklearn implementation of the Support Vector Classification model, there is a list of different hyperparameters. You can check out the complete list in the sklearn documentation here. The most critical hyperparameters for SVM are kernelC, and gamma.

  • kernel function transforms the training dataset into higher dimensions to make it linearly separatable. The default kernel function for the python implementation of the support vector classifier is the Radial Basis Function, which is usually referred to as rbf. The kernel function can take other values such as linearpolyrbfsigmoidprecomputed, or callable.
  • C is the l2 regularization parameter. The value of C is inversely proportional to the strength of the regularization. To learn more about regularization, please check out my previous tutorial on LASSO (L1) Vs Ridge (L2) Vs Elastic Net Regularization For Classification Model
    • When C is small, the penalty for misclassification is small, and the strength of the regularization is large. So a decision boundary with a large margin will be selected.
    • When C is large, the penalty for misclassification is large, and the strength of the regularization is small. A decision boundary with a small margin will be selected to reduce misclassifications.
  • gamma is the kernel coefficient for rbfpoly, and sigmoid. It can be seen as the inverse of the support vector influence radius. The gamma parameter highly impacts the model performance. Gamma can take the value of scaleauto, or a float value. The default value for the python sklearn implementation is scale since version 0.22.
    • When gamma is small, the support vector influence radius is high. If the gamma value is too small, the radius of the support vectors covers the whole training dataset, and the pattern of the data will not be captured.
    • When gamma is large, the support vector influence radius is low. If the gamma value is too large, the support vector radius is too small to utilize C to prevent overfitting.

Step 3: Import Libraries

In the third step, let’s import the Python libraries needed for this tutorial.

We will use the breast cancer dataset for this tutorial, so datasets from sklearn needs to be imported. pandas and numpy are imported for data processing. StandardScaler is for data standardization.

For model training, we imported train_test_split for creating training and testing datasets and SVC for the support vector classification model.

For hyperparameter tuning, we imported StratifiedKFoldGridSearchCVRandomizedSearchCV from sklearn. We also imported hyperopt and cross_val_score for Bayesian optimization.

# Dataset
from sklearn import datasets

# Data processing
import pandas as pd
import numpy as np

# Standardize the data
from sklearn.preprocessing import StandardScaler

# Modeling 
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# Hyperparameter tuning
from sklearn.model_selection import StratifiedKFold, GridSearchCV, RandomizedSearchCV, cross_val_score
from hyperopt import tpe, STATUS_OK, Trials, hp, fmin, STATUS_OK, space_eval

Step 4: Read Data

In the fourth step, the breast cancer data from the sklearn library is loaded and transformed into a pandas dataframe.

# Load the breast cancer dataset
data = datasets.load_breast_cancer()

# Put the data in pandas dataframe format
df = pd.DataFrame(data=data.data, columns=data.feature_names)
df['target']=data.target

# Check the data information
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 31 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         569 non-null    float64
 15  compactness error        569 non-null    float64
 16  concavity error          569 non-null    float64
 17  concave points error     569 non-null    float64
 18  symmetry error           569 non-null    float64
 19  fractal dimension error  569 non-null    float64
 20  worst radius             569 non-null    float64
 21  worst texture            569 non-null    float64
 22  worst perimeter          569 non-null    float64
 23  worst area               569 non-null    float64
 24  worst smoothness         569 non-null    float64
 25  worst compactness        569 non-null    float64
 26  worst concavity          569 non-null    float64
 27  worst concave points     569 non-null    float64
 28  worst symmetry           569 non-null    float64
 29  worst fractal dimension  569 non-null    float64
 30  target                   569 non-null    int64  
dtypes: float64(30), int64(1)
memory usage: 137.9 KB

The information summary shows that the dataset has 569 records and 31 columns.

# Check the target value distribution
df['target'].value_counts(normalize=True)

The target variable distribution shows 63% of ones and 37% of zeros in the dataset. One means the patient has breast cancer, and 0 represents the patient does not have breast cancer.

Step 5: Train Test Split

In step 5, we split the dataset into 80% training and 20% testing dataset. random_state makes the random split results reproducible.

# Train test split
X_train, X_test, y_train, y_test = train_test_split(df[df.columns.difference(['target'])], df['target'], test_size=0.2, random_state=42)

# Check the number of records in training and testing dataset.
print(f'The training dataset has {len(X_train)} records.')
print(f'The testing dataset has {len(X_test)} records.')

The training dataset has 455 records, and the testing dataset has 114 records.

Step 6: Standardization

In step 6, we will standardize the features.

Standardization is to change the features to the same scale. It is calculated by extracting the mean and divided by the standard deviation of each feature. After standardization, each feature has zero mean and unit standard deviation.

Standardization should be fit on the training dataset only to prevent test dataset information from leaking into the training process. Then, the test dataset is standardized using the fitting results from the training dataset.

There are different types of scalers. StandardScaler and MinMaxScaler are most commonly used. For a dataset with outliers, we can use RobustScaler.

In this tutorial, we will use StandardScaler.

# Initiate scaler
sc = StandardScaler()

# Standardize the training dataset
X_train_transformed = pd.DataFrame(sc.fit_transform(X_train),index=X_train.index, columns=X_train.columns)

# Standardized the testing dataset
X_test_transformed = pd.DataFrame(sc.transform(X_test),index=X_test.index, columns=X_test.columns)

# Summary statistics after standardization
X_train_transformed.describe().T

We can see that after using StandardScaler, all the features have zero mean and unit standard deviation.

Let’s get the summary statistics for the training data before standardization as well, and we can see that the mean and standard deviation can be very different in scale. For example, the area error has a mean value of 40 and a standard deviation of 47. On the other hand, the compactness error has a mean of about 0.026 and a standard deviation of 0.019.

# Summary statistics before standardization
X_train.describe().T

Step 7: Support Vector Machine (SVM) Default Hyperparameters

In step 7, we will create a Support Vector Machine (SVM) model with default hyperparameters as the baseline model.

# Check default values
svc = SVC()
params = svc.get_params()
params_df = pd.DataFrame(params, index=[0])
params_df.T

We can see that the default hyperparameter has the C value of 1, the gamma value of scale, and the kernel value of rbf.

Next, let’s fit the model using the standardized training data and check the accuracy score. We get 98.25% accuracy for the default hyperparameters.

# Run model
svc.fit(X_train_transformed, y_train)

# Accuracy score
print(f'The accuracy score of the model is {svc.score(X_test_transformed, y_test):.4f}')
The accuracy score of the model is 0.9825

In step 8, we will use grid search to find the best hyperparameter combinations for the Support Vector Machine (SVM) model. Grid search is an exhaustive hyperparameter search method. It trains models for every combination of specified hyperparameter values. Therefore, it can take a long time to run if we test out more hyperparameters and values, especially for larger datasets.

For this reason, we would like to have the grid search space relatively small so the process can finish in a reasonable timeframe. The search space includes the hyperparameters and their values grid search builds models for. In this example, we will tune three hyperparameters, C, gamma, and kernel. The other hyperparameters can be tuned in the same way.

Using the logspace function from the numpy library, we created three values for C and three values for gamma. For gamma, the sklearn values of 'scale' and 'auto' are also included, so there are a total of 5 values for gamma.

# List of C values
C_range = np.logspace(-1, 1, 3)
print(f'The list of values for C are {C_range}')

# List of gamma values
gamma_range = np.logspace(-1, 1, 3)
print(f'The list of values for gamma are {gamma_range}')
The list of values for C are [ 0.1  1.  10. ]
The list of values for gamma are [ 0.1  1.  10. ]

Two kernels, 'rbf' and 'poly', will be tested.

Scoring is the metric to evaluate the cross-validation results for each model. We set scoring = ['accuracy']. The scoring option can take more than one metric in the list.

StratifiedKFold is used for the cross-validation. It helps us keep the class ratio in the folds the same as the training dataset. n_splits=3 means we are doing 3-fold cross-validation. shuffle=True means the data are shuffled before splitting. random_state=0 makes the shuffle reproducible.

# Define the search space
param_grid = { 
    # Regularization parameter.
    "C": C_range,
    # Kernel type
    "kernel": ['rbf', 'poly'],
    # Gamma is the Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
    "gamma": gamma_range.tolist()+['scale', 'auto']
    }

# Set up score
scoring = ['accuracy']

# Set up the k-fold cross-validation
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=0)

We specified a few options for GridSearchCV.

  • estimator=svc means we are using Support Vector Classifier as the model.
  • param_grid=param_grid takes our pre-defined search space for the grid search.
  • scoring=scoring set the performance evaluation metric. Because we set the scoring to ‘accuracy’, the model will use accuracy as the evaluation metric.
  • refit='accuracy' enables refitting the model with the best parameters on the whole training dataset.
  • n_jobs=-1 means parallel processing using all the processors.
  • cv=kfold takes the StratifiedKFold we defined.
  • verbose controls the number of messages returned by the grid search. The higher the number, the more information is returned. verbose=0 means silent.
# Define grid search
grid_search = GridSearchCV(estimator=svc, 
                           param_grid=param_grid, 
                           scoring=scoring, 
                           refit='accuracy', 
                           n_jobs=-1, 
                           cv=kfold, 
                           verbose=0)

# Fit grid search
grid_result = grid_search.fit(X_train_transformed, y_train)

# Print grid search summary
grid_result
GridSearchCV(cv=StratifiedKFold(n_splits=3, random_state=0, shuffle=True),
             estimator=SVC(), n_jobs=-1,
             param_grid={'C': array([ 0.1,  1. , 10. ]),
                         'gamma': [0.1, 1.0, 10.0, 'scale', 'auto'],
                         'kernel': ['rbf', 'poly']},
             refit='accuracy', scoring=['accuracy'])

The grid search cross-validation results show that the C value of 1, gamma value of scale and kernel of 'rbf' gave us the best results. This happened to be the same hyperparameters as the default value. The best average training accuracy is 96.93%, and the accuracy score for the testing dataset is 98.25%, which is the same as the baseline model.

# Print the best accuracy score for the training dataset
print(f'The best accuracy score for the training dataset is {grid_result.best_score_:.4f}')

# Print the hyperparameters for the best score
print(f'The best hyperparameters are {grid_result.best_params_}')

# Print the best accuracy score for the testing dataset
print(f'The accuracy score for the testing dataset is {grid_search.score(X_test_transformed, y_test):.4f}')
The best accuracy score for the training dataset is 0.9693
The best hyperparameters are {'C': 1.0, 'gamma': 'scale', 'kernel': 'rbf'}
The accuracy score for the testing dataset is 0.9825

In step 9, we use a random search for Support Vector Machine (SVM) hyperparameter tuning. Since random search randomly picks a subset of hyperparameter combinations, we can afford to try more values.

If at least one of the parameters is a distribution, sampling with replacement is used for a random search. Sampling without replacement is used if all parameters are provided as a list. Each list is treated as a uniform distribution.

We increased the number of C and gamma values from 3 to 21 for the random search. For gamma, the sklearn values of 'scale' and 'auto' are also included, so there are a total of 23 values for gamma.

# List of C values
C_range = np.logspace(-10, 10, 21)
print(f'The list of values for C are {C_range}')

# List of gamma values
gamma_range = np.logspace(-10, 10, 21)
print(f'The list of values for gamma are {gamma_range}')
The list of values for C are [1.e-10 1.e-09 1.e-08 1.e-07 1.e-06 1.e-05 1.e-04 1.e-03 1.e-02 1.e-01
 1.e+00 1.e+01 1.e+02 1.e+03 1.e+04 1.e+05 1.e+06 1.e+07 1.e+08 1.e+09
 1.e+10]
The list of values for gamma are [1.e-10 1.e-09 1.e-08 1.e-07 1.e-06 1.e-05 1.e-04 1.e-03 1.e-02 1.e-01
 1.e+00 1.e+01 1.e+02 1.e+03 1.e+04 1.e+05 1.e+06 1.e+07 1.e+08 1.e+09
 1.e+10]

The same scoring metric and cross-validation values used in grid search are used for the random search. But for a random search, we need to specify a value for n_iter, the number of parameter combinations sampled. We are randomly testing 100 combinations for this example.

# Define the search space
param_grid = { 
    # Regularization parameter.
    "C": C_range,
    # Kernel type
    "kernel": ['rbf', 'poly'],
    # Gamma is the Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
    "gamma": gamma_range.tolist()+['scale', 'auto']
    }

# Set up score
scoring = ['accuracy']

# Set up the k-fold cross-validation
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=0)

# Define random search
random_search = RandomizedSearchCV(estimator=svc, 
                           param_distributions=param_grid, 
                           n_iter=100,
                           scoring=scoring, 
                           refit='accuracy', 
                           n_jobs=-1, 
                           cv=kfold, 
                           verbose=0)

# Fit grid search
random_result = random_search.fit(X_train_transformed, y_train)

# Print grid search summary
random_result
RandomizedSearchCV(cv=StratifiedKFold(n_splits=3, random_state=0, shuffle=True),
                   estimator=SVC(), n_iter=100, n_jobs=-1,
                   param_distributions={'C': array([1.e-10, 1.e-09, 1.e-08, 1.e-07, 1.e-06, 1.e-05, 1.e-04, 1.e-03,
       1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05,
       1.e+06, 1.e+07, 1.e+08, 1.e+09, 1.e+10]),
                                        'gamma': [1e-10, 1e-09, 1e-08, 1e-07,
                                                  1e-06, 1e-05, 0.0001, 0.001,
                                                  0.01, 0.1, 1.0, 10.0, 100.0,
                                                  1000.0, 10000.0, 100000.0,
                                                  1000000.0, 10000000.0,
                                                  100000000.0, 1000000000.0,
                                                  10000000000.0, 'scale',
                                                  'auto'],
                                        'kernel': ['rbf', 'poly']},
                   refit='accuracy', scoring=['accuracy'])

The random search cross-validation results show that the C value of 1000, gamma value of 0.0001 and kernel of ‘rbf’ gave us the best results. The best average training accuracy is 97.14%, and the accuracy score for the testing dataset is 97.37%, which is slightly lower than the grid search results.

Note that because each time the random search is performed, a set of different hyperparameters will be randomly selected, so the best hyperparameters can be different each time random search is conducted.

# Print the best accuracy score for the training dataset
print(f'The best accuracy score for the training dataset is {random_result.best_score_:.4f}')

# Print the hyperparameters for the best score
print(f'The best hyperparameters are {random_result.best_params_}')

# Print the best accuracy score for the testing dataset
print(f'The accuracy score for the testing dataset is {random_search.score(X_test_transformed, y_test):.4f}')
The best accuracy score for the training dataset is 0.9714
The best hyperparameters are {'kernel': 'rbf', 'gamma': 0.0001, 'C': 1000.0}
The accuracy score for the testing dataset is 0.9737

Step 10: Hyperparameter Tuning Using Bayesian Optimization

In step 10, we apply Bayesian optimization on the same search space as the random search.

There are different types of Bayesian optimization. Hyperopt is used in this example.

We defined an objective function that takes in the parameters and returns the loss. Since the goal is to maximize the accuracy value, we set max(scores) as the best_score, and set the loss to be -best_score. This setting ensures maximizing accuracy while minimizing the loss.

fmin is used to optimize the objective function. Hyperopt currently has three algorithms. They are random search, Tree of Parzen Estimators (TPE), and adaptive TPE. We are using TPE as the search algorithm.

# Space
space = {
    'C' : hp.choice('C', C_range),
    'gamma' : hp.choice('gamma', gamma_range.tolist()+['scale', 'auto']),
    'kernel' : hp.choice('kernel', ['rbf', 'poly'])
}

# Set up the k-fold cross-validation
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=0)

# Objective function
def objective(params):
    
    svc = SVC(**params)
    scores = cross_val_score(svc, X_train_transformed, y_train, cv=kfold, scoring='accuracy', n_jobs=-1)

    # Extract the best score
    best_score = max(scores)

    # Loss must be minimized
    loss = - best_score

    # Dictionary with information for evaluation
    return {'loss': loss, 'params': params, 'status': STATUS_OK}

# Trials to track progress
bayes_trials = Trials()

# Optimize
best = fmin(fn = objective, space = space, algo = tpe.suggest, max_evals = 100, trials = bayes_trials)
100%|██████████| 100/100 [00:06<00:00, 16.61it/s, best loss: -0.9867549668874173]

After the Bayesian optimization search, we get the best loss of -0.9868, meaning that the accuracy value is 98.68%.

We can print out the parameters for best loss and their index in the search space. We got the C value of 1, gamma value of 0.1, and kernel value of 'poly'.

# Print the index of the best parameters
print(best)

# Print the values of the best parameters
print(space_eval(space, best))
{'C': 10, 'gamma': 9, 'kernel': 1}
{'C': 1.0, 'gamma': 0.1, 'kernel': 'poly'}

Next, we apply the best hyperparameters to the SVC and make predictions. It gives an accuracy score of 94.74%.

Theoretically, the Bayesian optimization algorithm is more efficient than the random search algorithm. This is because random search picks hyperparameters for each model independently, while Bayesian optimization utilizes previous models’ information when choosing the hyperparameters for the next model.

The Bayesian optimization best model performance is not as good, probably because the number of max_evals = 100 is not large enough for the algorithm to find the optimal values.

# Train model using the best parameters
svc_bo = SVC(C=space_eval(space, best)['C'], gamma=space_eval(space, best)['gamma'], kernel=space_eval(space, best)['kernel']).fit(X_train_transformed,y_train)

# Print the best accuracy score for the testing dataset
print(f'The accuracy score for the testing dataset is {svc_bo.score(X_test_transformed, y_test):.4f}')
The accuracy score for the testing dataset is 0.9474

Step 11: Put All Code Together

###### Step 1: Support Vector Machine (SVM) algorithm

# No code in this step


###### Step 2: Support Vector Machine (SVM) Hyperparameters

# No code in this step


###### Step 3: Import Libraries

# Dataset
from sklearn import datasets

# Data processing
import pandas as pd
import numpy as np

# Standardize the data
from sklearn.preprocessing import StandardScaler

# Modeling 
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# Hyperparameter tuning
from sklearn.model_selection import StratifiedKFold, GridSearchCV, RandomizedSearchCV, cross_val_score
from hyperopt import tpe, STATUS_OK, Trials, hp, fmin, STATUS_OK, space_eval


###### Step 4: Read Data

# Load the breast cancer dataset
data = datasets.load_breast_cancer()

# Put the data in pandas dataframe format
df = pd.DataFrame(data=data.data, columns=data.feature_names)
df['target']=data.target

# Check the data information
df.info()

# Check the target value distribution
df['target'].value_counts(normalize=True)


###### Step 5: Train Test Split

# Train test split
X_train, X_test, y_train, y_test = train_test_split(df[df.columns.difference(['target'])], df['target'], test_size=0.2, random_state=42)

# Check the number of records in training and testing dataset.
print(f'The training dataset has {len(X_train)} records.')
print(f'The testing dataset has {len(X_test)} records.')


###### Step 6: Standardization

# Initiate scaler
sc = StandardScaler()

# Standardize the training dataset
X_train_transformed = pd.DataFrame(sc.fit_transform(X_train),index=X_train.index, columns=X_train.columns)

# Standardized the testing dataset
X_test_transformed = pd.DataFrame(sc.transform(X_test),index=X_test.index, columns=X_test.columns)

# Summary statistics after standardization
X_train_transformed.describe().T


###### Step 7: Support Vector Machine (SVM) Default Hyperparameters

# Check default values
svc = SVC()
params = svc.get_params()
params_df = pd.DataFrame(params, index=[0])
params_df.T

# Run model
svc.fit(X_train_transformed, y_train)

# Accuracy score
print(f'The accuracy score of the model is {svc.score(X_test_transformed, y_test):.4f}')


###### Step 8: Hyperparameter Tuning Using Grid Search

# List of C values
C_range = np.logspace(-1, 1, 3)
print(f'The list of values for C are {C_range}')

# List of gamma values
gamma_range = np.logspace(-1, 1, 3)
print(f'The list of values for gamma are {gamma_range}')

# Define the search space
param_grid = { 
    # Regularization parameter.
    "C": C_range,
    # Kernel type
    "kernel": ['rbf', 'poly'],
    # Gamma is the Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
    "gamma": gamma_range.tolist()+['scale', 'auto']
    }

# Set up score
scoring = ['accuracy']

# Set up the k-fold cross-validation
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=0)

# Define grid search
grid_search = GridSearchCV(estimator=svc, 
                           param_grid=param_grid, 
                           scoring=scoring, 
                           refit='accuracy', 
                           n_jobs=-1, 
                           cv=kfold, 
                           verbose=0)

# Fit grid search
grid_result = grid_search.fit(X_train_transformed, y_train)

# Print grid search summary
grid_result

# Print the best accuracy score for the training dataset
print(f'The best accuracy score for the training dataset is {grid_result.best_score_:.4f}')

# Print the hyperparameters for the best score
print(f'The best hyperparameters are {grid_result.best_params_}')

# Print the best accuracy score for the testing dataset
print(f'The accuracy score for the testing dataset is {grid_search.score(X_test_transformed, y_test):.4f}')


###### Step 9: Hyperparameter Tuning Using Random Search

# List of C values
C_range = np.logspace(-10, 10, 21)
print(f'The list of values for C are {C_range}')

# List of gamma values
gamma_range = np.logspace(-10, 10, 21)
print(f'The list of values for gamma are {gamma_range}')

# Define the search space
param_grid = { 
    # Regularization parameter.
    "C": C_range,
    # Kernel type
    "kernel": ['rbf', 'poly'],
    # Gamma is the Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
    "gamma": gamma_range
    }

# Set up score
scoring = ['accuracy']

# Set up the k-fold cross-validation
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=0)

# Define random search
random_search = RandomizedSearchCV(estimator=svc, 
                           param_distributions=param_grid, 
                           n_iter=100,
                           scoring=scoring, 
                           refit='accuracy', 
                           n_jobs=-1, 
                           cv=kfold, 
                           verbose=0)

# Fit grid search
random_result = random_search.fit(X_train_transformed, y_train)

# Print grid search summary
random_result

# Print the best accuracy score for the training dataset
print(f'The best accuracy score for the training dataset is {random_result.best_score_:.4f}')

# Print the hyperparameters for the best score
print(f'The best hyperparameters are {random_result.best_params_}')

# Print the best accuracy score for the testing dataset
print(f'The accuracy score for the testing dataset is {random_search.score(X_test_transformed, y_test):.4f}')


###### Step 10: Hyperparameter Tuning Using Bayesian Optimization

# Space
space = {
    'C' : hp.choice('C', C_range),
    'gamma' : hp.choice('gamma', gamma_range),
    'kernel' : hp.choice('kernel', ['rbf', 'poly'])
}

# Set up the k-fold cross-validation
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=0)

# Objective function
def objective(params):
    
    svc = SVC(**params)
    scores = cross_val_score(svc, X_train_transformed, y_train, cv=kfold, scoring='accuracy', n_jobs=-1)

    # Extract the best score
    best_score = max(scores)

    # Loss must be minimized
    loss = - best_score

    # Dictionary with information for evaluation
    return {'loss': loss, 'params': params, 'status': STATUS_OK}

# Trials to track progress
bayes_trials = Trials()

# Optimize
best = fmin(fn = objective, space = space, algo = tpe.suggest, max_evals = 100, trials = bayes_trials)

# Print the index of the best parameters
print(best)

# Print the values of the best parameters
print(space_eval(space, best))

# Train model using the best parameters
svc_bo = SVC(C=space_eval(space, best)['C'], gamma=space_eval(space, best)['gamma'], kernel=space_eval(space, best)['kernel']).fit(X_train_transformed,y_train)

# Print the best accuracy score for the testing dataset
print(f'The accuracy score for the testing dataset is {svc_bo.score(X_test_transformed, y_test):.4f}')

Summary

In this tutorial, we covered how to tune Support Vector Machine (SVM) hyperparameters using Python. you learned:

  • What’s the intuition for the Support Vector Machine (SVM) algorithm?
  • What are the most important hyperparameters for Support Vector Machine (SVM)?
  • How to do hyperparameter tuning for Support Vector Machine (SVM) in Python using grid search, random search, and Bayesian optimization?

We used a toy dataset for this example to illustrate the hyperparameter tuning process. In real projects, I usually see random search and Bayesian optimization produce the best model performances.

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

References

Leave a Comment

Your email address will not be published. Required fields are marked *