gpr

Author: Christian Elias Anderssen Dalan ceadyy@gmail.com,

With the help of Audun Skau Hansen a.s.hansen@kjemi.uio.no

April 2022

`Kernel`

Kernel class Author: Christian Elias Anderssen Dalan ceadyy@gmail.com

covariance_function: The function that will be used to calculate the covariance between our datasets

Source code in btjenesten/gpr.py

class Kernel():
    """
    Kernel class 
    Author: Christian Elias Anderssen Dalan <ceadyy@gmail.com>

    Parameters:
    -----------
    covariance_function:
    The function that will be used to calculate the covariance between our datasets

    """

    def __init__(self, covariance_function):
        self.covariance_function = covariance_function

    def K(self, X1, X2, params):
        """
        Function that returns the covariance matrix given our datasets

        Parameters:
        -----------
        X1: Dataset 1 (Often the training set)

        X2: Dataset 2 (Often the target set)

        Returns:
        ----------
        self.covariance_function(X1, X2) : covariance matrix given our datasets X1 and X2.
        """
        if np.isscalar(X1):
            X1 = np.array([X1])
        if np.isscalar(X2):
            X2 = np.array([X2])

        return self.covariance_function(X1, X2, params)

`K(X1, X2, params)`

Function that returns the covariance matrix given our datasets

X1: Dataset 1 (Often the training set)

X2: Dataset 2 (Often the target set)

self.covariance_function(X1, X2) : covariance matrix given our datasets X1 and X2.

Source code in btjenesten/gpr.py

def K(self, X1, X2, params):
    """
    Function that returns the covariance matrix given our datasets

    Parameters:
    -----------
    X1: Dataset 1 (Often the training set)

    X2: Dataset 2 (Often the target set)

    Returns:
    ----------
    self.covariance_function(X1, X2) : covariance matrix given our datasets X1 and X2.
    """
    if np.isscalar(X1):
        X1 = np.array([X1])
    if np.isscalar(X2):
        X2 = np.array([X2])

    return self.covariance_function(X1, X2, params)

`Regressor`

Gaussian process regressor class

Author: Christian Elias Anderssen Dalan ceadyy@gmail.com

kernel: Specifies the type of covarince function we want for our regressor. If none is provided the default is the radial basis function

training_data_X: Training data inputs, also called features

training_data_Y: Training data outputs, also called labels

Source code in btjenesten/gpr.py

class Regressor():
    """
    Gaussian process regressor class

    Author: Christian Elias Anderssen Dalan <ceadyy@gmail.com>

    Parameters:
    -----------
    kernel:
    Specifies the type of covarince function we want for our regressor. 
    If none is provided the default is the radial basis function

    training_data_X: 
    Training data inputs, also called features

    training_data_Y:
    Training data outputs, also called labels

    params:

    """

    def __init__(self, training_data_X, training_data_Y, kernel = None, params = 1, normalize = False, normalize_log = False):
        if kernel == None:
            self.kernel = Kernel(RBF)
        else:
            self.kernel = Kernel(kernel)



        msg = "Expected 2D array. If you only have one feature reshape training data using array.reshape(-1, 1)"
        assert training_data_X.ndim != 1, msg

        if normalize:
            """
            Normalize X and Y data

            """
            self.normalize, self.recover = normalize_training_data_x(training_data_X)
            self.training_data_X = self.normalize(training_data_X)

            self.normalization_factor_y = np.abs(training_data_Y).max()
            self.training_data_Y = training_data_Y/self.normalization_factor_y


        else:
            self.normalize, self.recover = no_normalization(training_data_X)
            self.normalization_factor_y = 1.0
            self.training_data_X = training_data_X
            self.training_data_Y = training_data_Y

        if normalize_log:
            """
            Normalize X and Y data

            """
            self.normalize, self.recover = normalize_training_data_x_log(training_data_X)
            self.training_data_X = self.normalize(training_data_X)

            self.normalization_factor_y = np.abs(training_data_Y).max()
            self.training_data_Y = training_data_Y/self.normalization_factor_y

        self.params = 1 # 

    def predict(self, input_data_X, training_data_X = None, training_data_Y = None, return_variance = False):
        """
        Predicts output values for some input data given a set of training data 

        Author: Christian Elias Anderssen Dalan <ceadyy@gmail.com>

        Parameters:
        -----------
        input_data_X:
        Input features that the gpr will evaluate.

        training_data_X:
        training data inputs.

        training_data_Y:
        training data outputs.

        return_variance:
        Returns variance for each prediction if this is true

        Returns:
        -----------
        predicted_y:
        Predicted output data given cooresponding input_data_X and a set of training data
        inputs and outputs (training_data_X, training_data_Y)

        predicted_variance:
        Predicted variance for each point of predicted output.
        """

        if training_data_X == None or training_data_Y == None:
            K_11 = self.kernel.K(self.training_data_X, self.training_data_X, self.params)
            K_12 = self.kernel.K(self.training_data_X, self.normalize(input_data_X), self.params)
            K_21 = K_12.T
            K_22 = self.kernel.K(self.normalize(input_data_X), self.normalize(input_data_X), self.params)
            assert (np.linalg.det(K_11) != 0), "Singular matrix. Training data might have duplicates."
            KT = np.linalg.solve(K_11, K_12).T

            predicted_y = KT.dot(self.training_data_Y)

        else:
            K_11 = self.kernel.K(self.normalize(training_data_X), self.normalize(training_data_X), self.params)
            K_12 = self.kernel.K(self.normalize(training_data_X), self.normalize(input_data_X), self.params)
            K_21 = self.kernel.K(self.normalize(input_data_X), self.normalize(training_data_X), self.params)
            K_22 = self.kernel.K(self.normalize(input_data_X), self.normalize(input_data_X), self.params)

            assert (np.linalg.det(K_11) != 0), "Singular matrix. Training data might have duplicates."
            KT = np.linalg.solve(K_11, K_12).T

            predicted_y = KT.dot(training_data_Y)

        predicted_y = predicted_y.ravel()*self.normalization_factor_y

        if return_variance:
            predicted_variance = np.diag(K_22 - KT @ K_12)

            y_var_negative = predicted_variance < 0
            if np.any(y_var_negative):
                predicted_variance.setflags(write="True")
                predicted_variance[y_var_negative] = 0

            return predicted_y, predicted_variance
        else:
            return predicted_y

    def score(self, input_data_X, input_data_Y):
        """
        Returns the average and maximum error of our predict method.

        Author: Christian Elias Anderssen Dalan <ceadyy@gmail.com>

        Parameters:
        -----------
        input_data_X:
        input data that the gpr will predict corresponding output data to.

        input_data_Y:
        Corresponding true ouput data for input_data_X.

        Returns:
        --------
        avg_error - the average error between the predicted values and the true values
        max_error - the maximum error between the predicted values and the true values
        """

        predicted_y = self.predict(input_data_X)
        avg_error = np.mean(np.abs(predicted_y - input_data_Y))
        max_error = np.max(np.abs(predicted_y - input_data_Y))
        return avg_error, max_error

    def aquisition(self, minimize_prediction=True, x0 = None, l=1.2, delta=0.1, method = "COBYLA"):
        """
        Returns the point at which our model function is predicted to have the highest value.

        Author: Christian Elias Anderssen Dalan <ceadyy@gmail.com>

        Parameters:
        -----------
        minimize_prediction:
        If your task is to minimize some model function, this parameter is True. If your task is to maximize the model function
        this parameter is False.

        l:
        Exploration parameter. Scales how much the standard deviation should impact the function value. l = 1
        means that the function maximized/minimized equals predicted value +/- the standard deviation.

        x0:
        Initial guess. If not specified it will use the point at which the training data is the largest/smallest.

        delta:
        Hyperparameter that tunes UCB around measured datapoints.

        Returns:
        --------
        p - The predicted point at which an evaluation would yield the highest/lowest value
        """
        if minimize_prediction: #Minimization process
            if x0 == None:
                x0_index = np.where(self.training_data_Y == np.min(self.training_data_Y))

                x0 = self.training_data_X[x0_index]

            objective_function = lambda x, predict = self.predict : predict(x)
            std_x = lambda x, predict = self.predict : np.sqrt(np.abs(np.diag(predict(x, return_variance = True)[1])))
            objective_noise = lambda x, std = std_x : (1 - std(x))**2 * delta + std(x)

            UCB = lambda x, exploit = objective_function, explore = objective_noise: exploit(x) + l*explore(x) 

            def UCB(x, f = UCB):
                x = x.reshape(1, -1)
                return f(x)

            minimization = minimize(UCB, x0, method = method)
            p = minimization.x
            return p

        else: #Maximization process
            if x0 == None:
                x0_index = np.where(self.training_data_Y == np.max(self.training_data_Y))

                x0 = self.training_data_X[x0_index]

            objective_function = lambda x, predict = self.predict : predict(x)
            std_x = lambda x, predict = self.predict : np.sqrt(np.abs(np.diag(predict(x, return_variance = True)[1])))
            objective_noise = lambda x, std = std_x : (1 - std(x))**2 * delta + std(x)

            UCB = lambda x, exploit = objective_function, explore = objective_noise : -1*(exploit(x) + l*explore(x))
            def UCB(x, f = UCB):
                x = x.reshape(1, -1)
                return f(x)

            minimization = minimize(UCB, x0, method = method)
            p = minimization.x
            return p

    def update(self, new_X, new_Y, tol=1e-5):
        """
        Updates the training data in accordance to some newly measured data.

        Author: Christian Elias Anderssen Dalan <ceadyy@gmail.com>

        Parameters:
        -----------
        new_X:
        Set of new features that have been measured.

        new_Y:
        Corresponding set of labels to new_X.

        tol:
        Tolerance which the training data set can differ from new points. If this is too low you may encounter singular 
        covariance matrices.
        """

        assert type(new_Y) is np.ndarray, "Data error!!!!! Needs to be array."
        assert type(new_X) is np.ndarray, "Data error!!!!! Needs to be array."

        for measurement in new_X.reshape(-1, self.training_data_X.shape[1]):
            for i in range(len(self.training_data_X)):
                if np.allclose(measurement, self.training_data_X[i], atol = tol):
                    print(f"The model has most likely converged! {measurement} already exists in the training set.")
                    return True
        """
        old_X_shape = self.training_data_X.shape
        old_Y_shape = len(self.training_data_Y)

        new_X_shape = np.array(self.training_data_X.shape)
        new_Y_shape = len(new_Y)

        new_X_shape[0] += new_X.shape[0]
        new_Y_shape += len(new_Y)

        new_training_data_X = np.zeros(new_X_shape)
        new_training_data_Y = np.zeros(new_Y_shape)

        new_training_data_X[:-old_X_shape.shape[0]] = self.training_data_X
        new_training_data_X[-new_X.shape[0]:] = new_X 

        new_training_data_Y[:-old_Y_shape] = self.training_data_Y
        new_training_data_Y[-new_Y_shape:] = new_Y
        """
        #print("X1 shape ",self.training_data_X.shape)
        #print("X2 shape ",.shape)
        new_X = new_X.reshape(-1, self.training_data_X.shape[1])

        new_training_data_X = np.concatenate((self.training_data_X, self.normalize(new_X)))
        new_training_data_Y = np.concatenate((self.training_data_Y, new_Y/self.normalization_factor_y))

        #indexes = np.argsort(new_training_data_X)

        self.training_data_X = new_training_data_X#[indexes]
        self.training_data_Y = new_training_data_Y#[indexes]

        return False

`aquisition(minimize_prediction=True, x0=None, l=1.2, delta=0.1, method='COBYLA')`

Returns the point at which our model function is predicted to have the highest value.

Author: Christian Elias Anderssen Dalan ceadyy@gmail.com

minimize_prediction: If your task is to minimize some model function, this parameter is True. If your task is to maximize the model function this parameter is False.

l: Exploration parameter. Scales how much the standard deviation should impact the function value. l = 1 means that the function maximized/minimized equals predicted value +/- the standard deviation.

x0: Initial guess. If not specified it will use the point at which the training data is the largest/smallest.

delta: Hyperparameter that tunes UCB around measured datapoints.

p - The predicted point at which an evaluation would yield the highest/lowest value

Source code in btjenesten/gpr.py

def aquisition(self, minimize_prediction=True, x0 = None, l=1.2, delta=0.1, method = "COBYLA"):
    """
    Returns the point at which our model function is predicted to have the highest value.

    Author: Christian Elias Anderssen Dalan <ceadyy@gmail.com>

    Parameters:
    -----------
    minimize_prediction:
    If your task is to minimize some model function, this parameter is True. If your task is to maximize the model function
    this parameter is False.

    l:
    Exploration parameter. Scales how much the standard deviation should impact the function value. l = 1
    means that the function maximized/minimized equals predicted value +/- the standard deviation.

    x0:
    Initial guess. If not specified it will use the point at which the training data is the largest/smallest.

    delta:
    Hyperparameter that tunes UCB around measured datapoints.

    Returns:
    --------
    p - The predicted point at which an evaluation would yield the highest/lowest value
    """
    if minimize_prediction: #Minimization process
        if x0 == None:
            x0_index = np.where(self.training_data_Y == np.min(self.training_data_Y))

            x0 = self.training_data_X[x0_index]

        objective_function = lambda x, predict = self.predict : predict(x)
        std_x = lambda x, predict = self.predict : np.sqrt(np.abs(np.diag(predict(x, return_variance = True)[1])))
        objective_noise = lambda x, std = std_x : (1 - std(x))**2 * delta + std(x)

        UCB = lambda x, exploit = objective_function, explore = objective_noise: exploit(x) + l*explore(x) 

        def UCB(x, f = UCB):
            x = x.reshape(1, -1)
            return f(x)

        minimization = minimize(UCB, x0, method = method)
        p = minimization.x
        return p

    else: #Maximization process
        if x0 == None:
            x0_index = np.where(self.training_data_Y == np.max(self.training_data_Y))

            x0 = self.training_data_X[x0_index]

        objective_function = lambda x, predict = self.predict : predict(x)
        std_x = lambda x, predict = self.predict : np.sqrt(np.abs(np.diag(predict(x, return_variance = True)[1])))
        objective_noise = lambda x, std = std_x : (1 - std(x))**2 * delta + std(x)

        UCB = lambda x, exploit = objective_function, explore = objective_noise : -1*(exploit(x) + l*explore(x))
        def UCB(x, f = UCB):
            x = x.reshape(1, -1)
            return f(x)

        minimization = minimize(UCB, x0, method = method)
        p = minimization.x
        return p

`predict(input_data_X, training_data_X=None, training_data_Y=None, return_variance=False)`

Predicts output values for some input data given a set of training data

Author: Christian Elias Anderssen Dalan ceadyy@gmail.com

input_data_X: Input features that the gpr will evaluate.

training_data_X: training data inputs.

training_data_Y: training data outputs.

return_variance: Returns variance for each prediction if this is true

predicted_y: Predicted output data given cooresponding input_data_X and a set of training data inputs and outputs (training_data_X, training_data_Y)

predicted_variance: Predicted variance for each point of predicted output.

Source code in btjenesten/gpr.py

def predict(self, input_data_X, training_data_X = None, training_data_Y = None, return_variance = False):
    """
    Predicts output values for some input data given a set of training data 

    Author: Christian Elias Anderssen Dalan <ceadyy@gmail.com>

    Parameters:
    -----------
    input_data_X:
    Input features that the gpr will evaluate.

    training_data_X:
    training data inputs.

    training_data_Y:
    training data outputs.

    return_variance:
    Returns variance for each prediction if this is true

    Returns:
    -----------
    predicted_y:
    Predicted output data given cooresponding input_data_X and a set of training data
    inputs and outputs (training_data_X, training_data_Y)

    predicted_variance:
    Predicted variance for each point of predicted output.
    """

    if training_data_X == None or training_data_Y == None:
        K_11 = self.kernel.K(self.training_data_X, self.training_data_X, self.params)
        K_12 = self.kernel.K(self.training_data_X, self.normalize(input_data_X), self.params)
        K_21 = K_12.T
        K_22 = self.kernel.K(self.normalize(input_data_X), self.normalize(input_data_X), self.params)
        assert (np.linalg.det(K_11) != 0), "Singular matrix. Training data might have duplicates."
        KT = np.linalg.solve(K_11, K_12).T

        predicted_y = KT.dot(self.training_data_Y)

    else:
        K_11 = self.kernel.K(self.normalize(training_data_X), self.normalize(training_data_X), self.params)
        K_12 = self.kernel.K(self.normalize(training_data_X), self.normalize(input_data_X), self.params)
        K_21 = self.kernel.K(self.normalize(input_data_X), self.normalize(training_data_X), self.params)
        K_22 = self.kernel.K(self.normalize(input_data_X), self.normalize(input_data_X), self.params)

        assert (np.linalg.det(K_11) != 0), "Singular matrix. Training data might have duplicates."
        KT = np.linalg.solve(K_11, K_12).T

        predicted_y = KT.dot(training_data_Y)

    predicted_y = predicted_y.ravel()*self.normalization_factor_y

    if return_variance:
        predicted_variance = np.diag(K_22 - KT @ K_12)

        y_var_negative = predicted_variance < 0
        if np.any(y_var_negative):
            predicted_variance.setflags(write="True")
            predicted_variance[y_var_negative] = 0

        return predicted_y, predicted_variance
    else:
        return predicted_y

`score(input_data_X, input_data_Y)`

Returns the average and maximum error of our predict method.

Author: Christian Elias Anderssen Dalan ceadyy@gmail.com

input_data_X: input data that the gpr will predict corresponding output data to.

input_data_Y: Corresponding true ouput data for input_data_X.

avg_error - the average error between the predicted values and the true values max_error - the maximum error between the predicted values and the true values

Source code in btjenesten/gpr.py

def score(self, input_data_X, input_data_Y):
    """
    Returns the average and maximum error of our predict method.

    Author: Christian Elias Anderssen Dalan <ceadyy@gmail.com>

    Parameters:
    -----------
    input_data_X:
    input data that the gpr will predict corresponding output data to.

    input_data_Y:
    Corresponding true ouput data for input_data_X.

    Returns:
    --------
    avg_error - the average error between the predicted values and the true values
    max_error - the maximum error between the predicted values and the true values
    """

    predicted_y = self.predict(input_data_X)
    avg_error = np.mean(np.abs(predicted_y - input_data_Y))
    max_error = np.max(np.abs(predicted_y - input_data_Y))
    return avg_error, max_error

`update(new_X, new_Y, tol=1e-05)`

Updates the training data in accordance to some newly measured data.

Author: Christian Elias Anderssen Dalan ceadyy@gmail.com

new_X: Set of new features that have been measured.

new_Y: Corresponding set of labels to new_X.

tol: Tolerance which the training data set can differ from new points. If this is too low you may encounter singular covariance matrices.

Source code in btjenesten/gpr.py

def update(self, new_X, new_Y, tol=1e-5):
    """
    Updates the training data in accordance to some newly measured data.

    Author: Christian Elias Anderssen Dalan <ceadyy@gmail.com>

    Parameters:
    -----------
    new_X:
    Set of new features that have been measured.

    new_Y:
    Corresponding set of labels to new_X.

    tol:
    Tolerance which the training data set can differ from new points. If this is too low you may encounter singular 
    covariance matrices.
    """

    assert type(new_Y) is np.ndarray, "Data error!!!!! Needs to be array."
    assert type(new_X) is np.ndarray, "Data error!!!!! Needs to be array."

    for measurement in new_X.reshape(-1, self.training_data_X.shape[1]):
        for i in range(len(self.training_data_X)):
            if np.allclose(measurement, self.training_data_X[i], atol = tol):
                print(f"The model has most likely converged! {measurement} already exists in the training set.")
                return True
    """
    old_X_shape = self.training_data_X.shape
    old_Y_shape = len(self.training_data_Y)

    new_X_shape = np.array(self.training_data_X.shape)
    new_Y_shape = len(new_Y)

    new_X_shape[0] += new_X.shape[0]
    new_Y_shape += len(new_Y)

    new_training_data_X = np.zeros(new_X_shape)
    new_training_data_Y = np.zeros(new_Y_shape)

    new_training_data_X[:-old_X_shape.shape[0]] = self.training_data_X
    new_training_data_X[-new_X.shape[0]:] = new_X 

    new_training_data_Y[:-old_Y_shape] = self.training_data_Y
    new_training_data_Y[-new_Y_shape:] = new_Y
    """
    #print("X1 shape ",self.training_data_X.shape)
    #print("X2 shape ",.shape)
    new_X = new_X.reshape(-1, self.training_data_X.shape[1])

    new_training_data_X = np.concatenate((self.training_data_X, self.normalize(new_X)))
    new_training_data_Y = np.concatenate((self.training_data_Y, new_Y/self.normalization_factor_y))

    #indexes = np.argsort(new_training_data_X)

    self.training_data_X = new_training_data_X#[indexes]
    self.training_data_Y = new_training_data_Y#[indexes]

    return False

`no_normalization(training_data)`

generate functions which essentially does nothing

Author: No-one

Source code in btjenesten/gpr.py

def no_normalization(training_data):
    """
    generate functions which essentially does nothing

    Author: No-one
    """
    def normalize(training_data):
        return training_data

    def recover(training_data):
        return training_data

    return normalize, recover

`normalize_training_data_x(training_data)`

generate functions to normalize and recover unnormalized training data

Author: Audun

(more detailed explanation is required)

Source code in btjenesten/gpr.py

def normalize_training_data_x(training_data):
    """
    generate functions to normalize and recover unnormalized
    training data

    Author: Audun

    (more detailed explanation is required)
    """
    mean  = np.mean(training_data, axis =0 )
    bound = np.max(training_data, axis = 0)-np.min(training_data, axis = 0)

    def normalize(training_data, mean= mean, bound = bound):
        training_data_normalized = training_data - mean[None,:]
        training_data_normalized *= (.5*bound[None, :])**-1
        return training_data_normalized

    def recover(training_data_normalized, mean= mean, bound = bound):
        return training_data_normalized*(.5*bound[None,:]) + mean[None, :]

    return normalize, recover

`normalize_training_data_x_log(training_data)`

generate functions to normalize and recover unnormalized training data

Author: Audun

(more detailed explanation is required)

Source code in btjenesten/gpr.py

def normalize_training_data_x_log(training_data):
    """
    generate functions to normalize and recover unnormalized
    training data

    Author: Audun

    (more detailed explanation is required)
    """
    #mean  = np.mean(training_data, axis =0 )
    min_ = np.min(training_data, axis =0 )  - 1e-3
    bound = np.max(training_data, axis = 0)-np.min(training_data, axis = 0)



    def normalize(training_data, min_= min_, bound = bound):
        training_data_normalized = training_data 

        return np.log(training_data_normalized)

    def recover(training_data_normalized, min_= min_, bound = bound):
        return np.exp(training_data_normalized)

    return normalize, recover

`Constant(X1, X2, k=0.5)`

Kernel that returns a constant covariance value between $x_i$ and $x_j$ Useful if all values depend on eachother equally.

X1: Dataset 1

X2: Dataset 2

k: constant that determines the covariance.

A matrix with the same shape as our input data. All matrix elements have the value k.

Source code in btjenesten/kernels.py

def Constant(X1, X2, k = 0.5):
    """
    Kernel that returns a constant covariance value between $x_i$ and $x_j$
    Useful if all values depend on eachother equally.

    Parameters:
    -----------
    X1: Dataset 1

    X2: Dataset 2

    k: constant that determines the covariance.

    Returns:
    A matrix with the same shape as our input data. All matrix elements have the value k.
    """

    return np.ones(X1.shape)*k

`Funny_trigonometric(X1, X2, k=1)`

Kernel that I made only for fun. May work for extravagant datasets

X1: Dataset 1

X2: Dataset 2

k: constant that determines the frequency of the trigonometric functions.

A covariance matrix that might be a bit crazy.

Source code in btjenesten/kernels.py

def Funny_trigonometric(X1, X2, k = 1):
    """
    Kernel that I made only for fun. May work for extravagant datasets

    Parameters:
    -----------
    X1: Dataset 1

    X2: Dataset 2

    k: constant that determines the frequency of the trigonometric functions.

    Returns:
    A covariance matrix that might be a bit crazy.
    """

    return np.sin(-k*(x[:, None] - y[None,:])**2) - np.cos(-k*(x[:, None] - y[None,:])**2)

`RBF(X1, X2, l=np.array([1.0]))`

Radial basis function of the form: $$ e^{-l * d(X_i,X_j)} $$

X1: Dataset 1

X2: Dataset 2 l: Length scale parameter. Can be adjusted to adjust the covariance between $x_i$ and $x_j$. Increasing l will decrease the covariance, and vice versa.

A matrix with the same shape as our input data, where the elemets are: $e^{-l \cdot d(x_i, x_j)}$ where $d(x_i, x_j)$ is the difference between element $x_i$ in X1 and element $x_j$ in X2.

Source code in btjenesten/kernels.py

def RBF(X1, X2, l = np.array([1.0])):
    """
    Radial basis function of the form:
    $$
    e^{-l * d(X_i,X_j)}
    $$
    Parameters:
    -----------
    X1: Dataset 1

    X2: Dataset 2
    l: Length scale parameter. 
    Can be adjusted to adjust the covariance between $x_i$ and $x_j$. 
    Increasing l will decrease the covariance, and vice versa.

    Returns:
    -----------
    A matrix with the same shape as our input data, where the elemets are:
    $e^{-l \cdot d(x_i, x_j)}$ where $d(x_i, x_j)$ is the difference between element $x_i$ in X1
    and element $x_j$ in X2.
    """

    if type(l) is not np.ndarray:
        # patch for scalar length parameter
        l = np.array([l])

    ld = np.sum(l[None, :]*(X1.reshape(X1.shape[0],-1)[:, None] - X2.reshape(X2.shape[0],-1)[None,])**2, axis = 2)

    return np.exp(-ld)