Knowledge base

Optimization

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Regularization

1 minute read

Regularization is a process of simplifying complex machine learning models in order to prevent overfitting. It may be applied explicitly by adding a penalty ...

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Root finding algorithms

5 minute read

In numeric analysis root finding is equivalent to finding zeros of a continuous function. In case of complex and composite function finding values of a varia...

Quasi-Newton methods

2 minute read

The classical Newton’s method for optimization is known for being computationally expensive when it comes to calculating the gradient of the Hessian matrix a...

Conjugate gradient

less than 1 minute read

Conjugate gradient methods are well suited for solving both linear and non-linear systems of equations.

Trust region

4 minute read

Trust region is a general strategy of optimization of an objective function by constructing an approximation of the function around a certain point and measu...

Line search

5 minute read

Generally speaking, line search is one of the two (the other is trust region) major strategies in non-linear optimization. Line search methods first determin...

Nedler-Mead algorithm

1 minute read

The Nedlear-Mead optimization algorithm employs the concept of simplex - a special type of polytope in $n$-dimensional space which has $n+1$ vertices, that i...

Simplex method

6 minute read

Linear programming aims at finding the best outcome of mathematical problems provided constraints. The requirements for the linear programming problems are d...

Newton’s method

4 minute read

Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

Probability

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Degrees of freedom in statistics

1 minute read

In statistics degrees of freedom is the number of independent elements forming a final statistic which are free to vary without violation of an imposed const...

Probability bounds

5 minute read

When applying a model to a certain process it is mostly impossible to be fully certain about the values of the parameters, the output of the process, and the...

Moment generating functions

1 minute read

In general moment generating functions (MGF) are used as an alternative for using probability density distribution. The name implies that with MGF it is poss...

Back to top ↑

Regression

Interpolation

9 minute read

Generally speaking, interpolation is a way of generating new data points which exactly fit into some given set of points. It is used for approximation of com...

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Heteroscedasticity

3 minute read

What is heteroscedasticity

Linear least squares

4 minute read

Ordinary least squares method (OLS) is a widely used method which estimates the parameters of a linear regression by minimizing the sum of the squares of the...

Correlation and variance in linear regression

7 minute read

What is covariance?

LOESS

3 minute read

It is quite common to have a relation between two values represented as a scatterplot. However, visual representation of a scatterplot alone can often be non...

Non-linear regression

7 minute read

Although application of linear regression is straightforward and characterized by many useful statistical properties, in the real life many processes have no...

Back to top ↑

Calculus

Understanding integrals

5 minute read

Building intuition

Mean value theorem

1 minute read

The mean value theorem states that if a function is continuous and differentiable between two points then there exists a point between them where the tangent...

Taylor series

2 minute read

One usually neglected topic in calculus is understanding Taylor series. At first glance they seem to be rather impractical, however, they form building block...

Differential equations

3 minute read

Differential equation is an equation with a function and at least one of its derivatives. Here is an example:

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Understanding derivatives

3 minute read

Building intuition

Limits and continuity

1 minute read

Let’s build an idea about formal definition of limit.

Back to top ↑

Linear Algebra

Imaginary number and rotation of vectors

5 minute read

Introduction

Vector projection

1 minute read

In order to understand more advanced concepts from vector operations it is good to have an intuition on such terms as vector projection. Below is a two-dimen...

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Eigenvectors and eigenvalues

4 minute read

In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...

QR decomposition

4 minute read

QR decomposition may be a numerically stable algorithm for solving systems of linear equations. Similar to singular value decomposition it provides mechanism...

Change of basis

3 minute read

Let’s recall that the basis is a set of linearly independent vectors which form the full span of vector space. With regard to matrices the basis can be viewe...

Linear transformations

7 minute read

Linear algebra is all about making transformations of linear combinations of numbers, called vectors, and arrays of vectors, called matrices. The elements of...

Back to top ↑

Time series

Exponential smoothing models

5 minute read

In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Change point analysis

2 minute read

Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...

Back to top ↑

Classification

Logistic regression

3 minute read

Logistic regression is one of the simplests algorithms for binary classification, and it is based on the linear regression. In its core, it uses a linear com...

Support vector machine

less than 1 minute read

Support vector machine (SVM) is mainly used for building classification models in supervised machine learning, however the ideas employed by SVM can be also ...

Back to top ↑

Clustering

Outliers detection

5 minute read

This article explores machine learning models used to identify unusual or atypical observations within data sets.

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

Feature Extraction

Principal component analysis and singular value decomposition

8 minute read

Principal component analysis

Back to top ↑

NLP

Text similarity

2 minute read

Doing text analysis you may encounter a case when you need to select the most similar text or a group of text to another chosen text. One way to do it is to...

Back to top ↑

Neural Networks

12 minute read

In this article

Back to top ↑

Supervised Learning

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

Approximation

Fourier series

6 minute read

A Fourier series is a way to approximate a periodic function as the sum of simple sine and cosine functions which makes it useful for analyzing and synthesiz...

Back to top ↑