Posts by tag

loss-function

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Regularization

1 minute read

Regularization is a process of simplifying complex machine learning models in order to prevent overfitting. It may be applied explicitly by adding a penalty ...

Quasi-Newton methods

2 minute read

The classical Newton’s method for optimization is known for being computationally expensive when it comes to calculating the gradient of the Hessian matrix a...

Conjugate gradient

less than 1 minute read

Conjugate gradient methods are well suited for solving both linear and non-linear systems of equations.

Trust region

4 minute read

Trust region is a general strategy of optimization of an objective function by constructing an approximation of the function around a certain point and measu...

Line search

5 minute read

Generally speaking, line search is one of the two (the other is trust region) major strategies in non-linear optimization. Line search methods first determin...

Nedler-Mead algorithm

1 minute read

The Nedlear-Mead optimization algorithm employs the concept of simplex - a special type of polytope in $n$-dimensional space which has $n+1$ vertices, that i...

Newton’s method

4 minute read

Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

vector

Fourier series

6 minute read

A Fourier series is a way to approximate a periodic function as the sum of simple sine and cosine functions which makes it useful for analyzing and synthesiz...

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Text similarity

2 minute read

Doing text analysis you may encounter a case when you need to select the most similar text or a group of text to another chosen text. One way to do it is to...

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Eigenvectors and eigenvalues

4 minute read

In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...

Change of basis

3 minute read

Let’s recall that the basis is a set of linearly independent vectors which form the full span of vector space. With regard to matrices the basis can be viewe...

Linear transformations

7 minute read

Linear algebra is all about making transformations of linear combinations of numbers, called vectors, and arrays of vectors, called matrices. The elements of...

Back to top ↑

derivative

Mean value theorem

1 minute read

The mean value theorem states that if a function is continuous and differentiable between two points then there exists a point between them where the tangent...

Newton’s method

4 minute read

Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Taylor series

2 minute read

One usually neglected topic in calculus is understanding Taylor series. At first glance they seem to be rather impractical, however, they form building block...

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

normal-distribution

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

gradient-descent

Linear least squares

4 minute read

Ordinary least squares method (OLS) is a widely used method which estimates the parameters of a linear regression by minimizing the sum of the squares of the...

Conjugate gradient

less than 1 minute read

Conjugate gradient methods are well suited for solving both linear and non-linear systems of equations.

Line search

5 minute read

Generally speaking, line search is one of the two (the other is trust region) major strategies in non-linear optimization. Line search methods first determin...

Non-linear regression

7 minute read

Although application of linear regression is straightforward and characterized by many useful statistical properties, in the real life many processes have no...

Newton’s method

4 minute read

Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...

Back to top ↑

variance

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Degrees of freedom in statistics

1 minute read

In statistics degrees of freedom is the number of independent elements forming a final statistic which are free to vary without violation of an imposed const...

Probability bounds

5 minute read

When applying a model to a certain process it is mostly impossible to be fully certain about the values of the parameters, the output of the process, and the...

Back to top ↑

differentiation

Mean value theorem

1 minute read

The mean value theorem states that if a function is continuous and differentiable between two points then there exists a point between them where the tangent...

Taylor series

2 minute read

One usually neglected topic in calculus is understanding Taylor series. At first glance they seem to be rather impractical, however, they form building block...

Differential equations

3 minute read

Differential equation is an equation with a function and at least one of its derivatives. Here is an example:

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

matrix

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Eigenvectors and eigenvalues

4 minute read

In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...

Change of basis

3 minute read

Let’s recall that the basis is a set of linearly independent vectors which form the full span of vector space. With regard to matrices the basis can be viewe...

Linear transformations

7 minute read

Linear algebra is all about making transformations of linear combinations of numbers, called vectors, and arrays of vectors, called matrices. The elements of...

Back to top ↑

nonlinear-regression

Conjugate gradient

less than 1 minute read

Conjugate gradient methods are well suited for solving both linear and non-linear systems of equations.

Trust region

4 minute read

Trust region is a general strategy of optimization of an objective function by constructing an approximation of the function around a certain point and measu...

Line search

5 minute read

Generally speaking, line search is one of the two (the other is trust region) major strategies in non-linear optimization. Line search methods first determin...

Nedler-Mead algorithm

1 minute read

The Nedlear-Mead optimization algorithm employs the concept of simplex - a special type of polytope in $n$-dimensional space which has $n+1$ vertices, that i...

Non-linear regression

7 minute read

Although application of linear regression is straightforward and characterized by many useful statistical properties, in the real life many processes have no...

Back to top ↑

hypothesis-testing

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

linear-regression

Logistic regression

3 minute read

Logistic regression is one of the simplests algorithms for binary classification, and it is based on the linear regression. In its core, it uses a linear com...

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Linear least squares

4 minute read

Ordinary least squares method (OLS) is a widely used method which estimates the parameters of a linear regression by minimizing the sum of the squares of the...

Back to top ↑

calculus

Differential equations

3 minute read

Differential equation is an equation with a function and at least one of its derivatives. Here is an example:

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

second-derivative

Quasi-Newton methods

2 minute read

The classical Newton’s method for optimization is known for being computationally expensive when it comes to calculating the gradient of the Hessian matrix a...

Newton’s method

4 minute read

Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...

Taylor series

2 minute read

One usually neglected topic in calculus is understanding Taylor series. At first glance they seem to be rather impractical, however, they form building block...

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

Taylor-series

Moment generating functions

1 minute read

In general moment generating functions (MGF) are used as an alternative for using probability density distribution. The name implies that with MGF it is poss...

Mean value theorem

1 minute read

The mean value theorem states that if a function is continuous and differentiable between two points then there exists a point between them where the tangent...

Non-linear regression

7 minute read

Although application of linear regression is straightforward and characterized by many useful statistical properties, in the real life many processes have no...

Back to top ↑

probability-density-function

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

mean

Change point analysis

2 minute read

Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Degrees of freedom in statistics

1 minute read

In statistics degrees of freedom is the number of independent elements forming a final statistic which are free to vary without violation of an imposed const...

Probability bounds

5 minute read

When applying a model to a certain process it is mostly impossible to be fully certain about the values of the parameters, the output of the process, and the...

Back to top ↑

regression

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

residuals

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Degrees of freedom in statistics

1 minute read

In statistics degrees of freedom is the number of independent elements forming a final statistic which are free to vary without violation of an imposed const...

Back to top ↑

slope

Mean value theorem

1 minute read

The mean value theorem states that if a function is continuous and differentiable between two points then there exists a point between them where the tangent...

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

gradient

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

machine-learning

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

penalization

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Change point analysis

2 minute read

Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

learning-rate

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Line search

5 minute read

Generally speaking, line search is one of the two (the other is trust region) major strategies in non-linear optimization. Line search methods first determin...

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

span

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Change of basis

3 minute read

Let’s recall that the basis is a set of linearly independent vectors which form the full span of vector space. With regard to matrices the basis can be viewe...

Linear transformations

7 minute read

Linear algebra is all about making transformations of linear combinations of numbers, called vectors, and arrays of vectors, called matrices. The elements of...

Back to top ↑

svd

Linear least squares

4 minute read

Ordinary least squares method (OLS) is a widely used method which estimates the parameters of a linear regression by minimizing the sum of the squares of the...

QR decomposition

4 minute read

QR decomposition may be a numerically stable algorithm for solving systems of linear equations. Similar to singular value decomposition it provides mechanism...

Back to top ↑

dot-product

Vector projection

1 minute read

In order to understand more advanced concepts from vector operations it is good to have an intuition on such terms as vector projection. Below is a two-dimen...

Text similarity

2 minute read

Doing text analysis you may encounter a case when you need to select the most similar text or a group of text to another chosen text. One way to do it is to...

QR decomposition

4 minute read

QR decomposition may be a numerically stable algorithm for solving systems of linear equations. Similar to singular value decomposition it provides mechanism...

Back to top ↑

kernel

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Back to top ↑

symmetric-matrix

Newton’s method

4 minute read

Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Back to top ↑

positive-definite-matrix

Newton’s method

4 minute read

Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Back to top ↑

steepest-descent

Conjugate gradient

less than 1 minute read

Conjugate gradient methods are well suited for solving both linear and non-linear systems of equations.

Trust region

4 minute read

Trust region is a general strategy of optimization of an objective function by constructing an approximation of the function around a certain point and measu...

Line search

5 minute read

Generally speaking, line search is one of the two (the other is trust region) major strategies in non-linear optimization. Line search methods first determin...

Back to top ↑

confidence-interval

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Probability bounds

5 minute read

When applying a model to a certain process it is mostly impossible to be fully certain about the values of the parameters, the output of the process, and the...

Back to top ↑

error-term

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

significance-test

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

degrees-of-freedom

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

z-score

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

binomial-distribution

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

probability-distribution

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

cosine

Fourier series

6 minute read

A Fourier series is a way to approximate a periodic function as the sum of simple sine and cosine functions which makes it useful for analyzing and synthesiz...

Vector projection

1 minute read

In order to understand more advanced concepts from vector operations it is good to have an intuition on such terms as vector projection. Below is a two-dimen...

Text similarity

2 minute read

Doing text analysis you may encounter a case when you need to select the most similar text or a group of text to another chosen text. One way to do it is to...

Back to top ↑

conditional-probability

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Back to top ↑

derivation

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

functions

Taylor series

2 minute read

One usually neglected topic in calculus is understanding Taylor series. At first glance they seem to be rather impractical, however, they form building block...

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

integration

Differential equations

3 minute read

Differential equation is an equation with a function and at least one of its derivatives. Here is an example:

Back to top ↑

polynomial

Interpolation

9 minute read

Generally speaking, interpolation is a way of generating new data points which exactly fit into some given set of points. It is used for approximation of com...

Taylor series

2 minute read

One usually neglected topic in calculus is understanding Taylor series. At first glance they seem to be rather impractical, however, they form building block...

Back to top ↑

basis

Eigenvectors and eigenvalues

4 minute read

In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...

Linear transformations

7 minute read

Linear algebra is all about making transformations of linear combinations of numbers, called vectors, and arrays of vectors, called matrices. The elements of...

Back to top ↑

inverse

Linear transformations

7 minute read

Linear algebra is all about making transformations of linear combinations of numbers, called vectors, and arrays of vectors, called matrices. The elements of...

Back to top ↑

determinant

Eigenvectors and eigenvalues

4 minute read

In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...

Linear transformations

7 minute read

Linear algebra is all about making transformations of linear combinations of numbers, called vectors, and arrays of vectors, called matrices. The elements of...

Back to top ↑

matrix-diagonalization

Eigenvectors and eigenvalues

4 minute read

In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...

Change of basis

3 minute read

Let’s recall that the basis is a set of linearly independent vectors which form the full span of vector space. With regard to matrices the basis can be viewe...

Back to top ↑

column-space

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Eigenvectors and eigenvalues

4 minute read

In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...

Back to top ↑

principal-component-analysis

Eigenvectors and eigenvalues

4 minute read

In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...

Back to top ↑

rank

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Back to top ↑

ill-conditioned-matrix

Linear least squares

4 minute read

Ordinary least squares method (OLS) is a widely used method which estimates the parameters of a linear regression by minimizing the sum of the squares of the...

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Back to top ↑

hessian

Quasi-Newton methods

2 minute read

The classical Newton’s method for optimization is known for being computationally expensive when it comes to calculating the gradient of the Hessian matrix a...

Newton’s method

4 minute read

Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...

Back to top ↑

taylor-expansion

Trust region

4 minute read

Trust region is a general strategy of optimization of an objective function by constructing an approximation of the function around a certain point and measu...

Newton’s method

4 minute read

Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...

Back to top ↑

Hessian

Non-linear regression

7 minute read

Although application of linear regression is straightforward and characterized by many useful statistical properties, in the real life many processes have no...

Newton’s method

4 minute read

Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...

Back to top ↑

simplex

Nedler-Mead algorithm

1 minute read

The Nedlear-Mead optimization algorithm employs the concept of simplex - a special type of polytope in $n$-dimensional space which has $n+1$ vertices, that i...

Simplex method

6 minute read

Linear programming aims at finding the best outcome of mathematical problems provided constraints. The requirements for the linear programming problems are d...

Back to top ↑

secant-line

Root finding algorithms

5 minute read

In numeric analysis root finding is equivalent to finding zeros of a continuous function. In case of complex and composite function finding values of a varia...

Mean value theorem

1 minute read

The mean value theorem states that if a function is continuous and differentiable between two points then there exists a point between them where the tangent...

Back to top ↑

minimization

Trust region

4 minute read

Trust region is a general strategy of optimization of an objective function by constructing an approximation of the function around a certain point and measu...

Line search

5 minute read

Generally speaking, line search is one of the two (the other is trust region) major strategies in non-linear optimization. Line search methods first determin...

Back to top ↑

classification

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Support vector machine

less than 1 minute read

Support vector machine (SVM) is mainly used for building classification models in supervised machine learning, however the ideas employed by SVM can be also ...

Back to top ↑

lowess

LOESS

3 minute read

It is quite common to have a relation between two values represented as a scatterplot. However, visual representation of a scatterplot alone can often be non...

LOESS

3 minute read

It is quite common to have a relation between two values represented as a scatterplot. However, visual representation of a scatterplot alone can often be non...

Back to top ↑

outliers

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

LOESS

3 minute read

It is quite common to have a relation between two values represented as a scatterplot. However, visual representation of a scatterplot alone can often be non...

Back to top ↑

variation

Change point analysis

2 minute read

Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...

Back to top ↑

covariance-matrix

Back to top ↑

pseudoinverse

Linear least squares

4 minute read

Ordinary least squares method (OLS) is a widely used method which estimates the parameters of a linear regression by minimizing the sum of the squares of the...

Back to top ↑

null-hypothesis

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

p-value

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

significance-level

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

false-discovery-rate

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

margin-of-error

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

t-test

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

t-distribution

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

Student's-distribution

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

z-value

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Back to top ↑

likelihood-function

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

MLE

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

heteroscedasticity

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

homoscedasticity

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

Levenshtein-distance

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Text similarity

2 minute read

Doing text analysis you may encounter a case when you need to select the most similar text or a group of text to another chosen text. One way to do it is to...

Back to top ↑

autocorrelation

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

cost-function

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Change point analysis

2 minute read

Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...

Back to top ↑

PDF

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

Bayes-theorem

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Back to top ↑

Kullback–Leibler-divergence

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Back to top ↑

KL-divergence

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Back to top ↑

gaussian-process

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Back to top ↑

prior-distribution

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Back to top ↑

posterior-distribution

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Back to top ↑

covariance

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Back to top ↑

sigmoid-function

Logistic regression

3 minute read

Logistic regression is one of the simplests algorithms for binary classification, and it is based on the linear regression. In its core, it uses a linear com...

Back to top ↑

stationarity

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Back to top ↑

AR

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Back to top ↑

unit-root

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Back to top ↑

lag

Exponential smoothing models

5 minute read

In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

complex-number

Fourier series

6 minute read

A Fourier series is a way to approximate a periodic function as the sum of simple sine and cosine functions which makes it useful for analyzing and synthesiz...

Back to top ↑

exponential-function

Fourier series

6 minute read

A Fourier series is a way to approximate a periodic function as the sum of simple sine and cosine functions which makes it useful for analyzing and synthesiz...

Back to top ↑

mean-shift

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

limits

Back to top ↑

continuity

Back to top ↑

discontinuity

Back to top ↑

rate-of-change

Back to top ↑

extremities

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

extreme-points

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

analysis

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

saddle-point

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

minimum

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

maximum

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

concavity

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

extreme-value-theorem

Analyzing functions

5 minute read

Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...

Back to top ↑

equations

Differential equations

3 minute read

Differential equation is an equation with a function and at least one of its derivatives. Here is an example:

Back to top ↑

exponential-model

Differential equations

3 minute read

Differential equation is an equation with a function and at least one of its derivatives. Here is an example:

Back to top ↑

logistic-model

Differential equations

3 minute read

Differential equation is an equation with a function and at least one of its derivatives. Here is an example:

Back to top ↑

Maclaurin-series

Taylor series

2 minute read

One usually neglected topic in calculus is understanding Taylor series. At first glance they seem to be rather impractical, however, they form building block...

Back to top ↑

local-minimum

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

global-minimum

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

SGD

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

momentum

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

adagrad

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

adadelta

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

adam

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

RMS

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

decaying-average

Gradient descent

14 minute read

Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...

Back to top ↑

vector-space

Linear transformations

7 minute read

Linear algebra is all about making transformations of linear combinations of numbers, called vectors, and arrays of vectors, called matrices. The elements of...

Back to top ↑

dimensions

Linear transformations

7 minute read

Linear algebra is all about making transformations of linear combinations of numbers, called vectors, and arrays of vectors, called matrices. The elements of...

Back to top ↑

unit-vector

Linear transformations

7 minute read

Linear algebra is all about making transformations of linear combinations of numbers, called vectors, and arrays of vectors, called matrices. The elements of...

Back to top ↑

linear-equations

Linear transformations

7 minute read

Linear algebra is all about making transformations of linear combinations of numbers, called vectors, and arrays of vectors, called matrices. The elements of...

Back to top ↑

linear-transformation

Change of basis

3 minute read

Let’s recall that the basis is a set of linearly independent vectors which form the full span of vector space. With regard to matrices the basis can be viewe...

Back to top ↑

vector-reflection

QR decomposition

4 minute read

QR decomposition may be a numerically stable algorithm for solving systems of linear equations. Similar to singular value decomposition it provides mechanism...

Back to top ↑

householder-reflection

QR decomposition

4 minute read

QR decomposition may be a numerically stable algorithm for solving systems of linear equations. Similar to singular value decomposition it provides mechanism...

Back to top ↑

matrix-decomposition

QR decomposition

4 minute read

QR decomposition may be a numerically stable algorithm for solving systems of linear equations. Similar to singular value decomposition it provides mechanism...

Back to top ↑

gram-schmidt

QR decomposition

4 minute read

QR decomposition may be a numerically stable algorithm for solving systems of linear equations. Similar to singular value decomposition it provides mechanism...

Back to top ↑

vector-projection

QR decomposition

4 minute read

QR decomposition may be a numerically stable algorithm for solving systems of linear equations. Similar to singular value decomposition it provides mechanism...

Back to top ↑

identity-matrix

Eigenvectors and eigenvalues

4 minute read

In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...

Back to top ↑

PCA

Eigenvectors and eigenvalues

4 minute read

In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...

Back to top ↑

SVD

Eigenvectors and eigenvalues

4 minute read

In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...

Back to top ↑

singular-value-decomposition

Eigenvectors and eigenvalues

4 minute read

In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...

Back to top ↑

full-rank-matrix

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Back to top ↑

condition-number

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Back to top ↑

null-space

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Back to top ↑

linear-combination

Matrix properties

5 minute read

Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...

Back to top ↑

taylor-series

Newton’s method

4 minute read

Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...

Back to top ↑

Schwarz’s-theorem

Newton’s method

4 minute read

Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...

Back to top ↑

nonlinear-least-squares

Non-linear regression

7 minute read

Although application of linear regression is straightforward and characterized by many useful statistical properties, in the real life many processes have no...

Back to top ↑

Gauss-Newton-method

Non-linear regression

7 minute read

Although application of linear regression is straightforward and characterized by many useful statistical properties, in the real life many processes have no...

Back to top ↑

Jacobian

Non-linear regression

7 minute read

Although application of linear regression is straightforward and characterized by many useful statistical properties, in the real life many processes have no...

Back to top ↑

Levenberg–Marquardt-algorithm

Non-linear regression

7 minute read

Although application of linear regression is straightforward and characterized by many useful statistical properties, in the real life many processes have no...

Back to top ↑

linear-programming

Simplex method

6 minute read

Linear programming aims at finding the best outcome of mathematical problems provided constraints. The requirements for the linear programming problems are d...

Back to top ↑

constraints

Simplex method

6 minute read

Linear programming aims at finding the best outcome of mathematical problems provided constraints. The requirements for the linear programming problems are d...

Back to top ↑

objective-function

Simplex method

6 minute read

Linear programming aims at finding the best outcome of mathematical problems provided constraints. The requirements for the linear programming problems are d...

Back to top ↑

maximization

Simplex method

6 minute read

Linear programming aims at finding the best outcome of mathematical problems provided constraints. The requirements for the linear programming problems are d...

Back to top ↑

tangent-line

Mean value theorem

1 minute read

The mean value theorem states that if a function is continuous and differentiable between two points then there exists a point between them where the tangent...

Back to top ↑

Wolfe-conditions

Line search

5 minute read

Generally speaking, line search is one of the two (the other is trust region) major strategies in non-linear optimization. Line search methods first determin...

Back to top ↑

Armijo-rule

Line search

5 minute read

Generally speaking, line search is one of the two (the other is trust region) major strategies in non-linear optimization. Line search methods first determin...

Back to top ↑

search-direction

Line search

5 minute read

Generally speaking, line search is one of the two (the other is trust region) major strategies in non-linear optimization. Line search methods first determin...

Back to top ↑

cauchy-point

Trust region

4 minute read

Trust region is a general strategy of optimization of an objective function by constructing an approximation of the function around a certain point and measu...

Back to top ↑

dogleg-method

Trust region

4 minute read

Trust region is a general strategy of optimization of an objective function by constructing an approximation of the function around a certain point and measu...

Back to top ↑

supervised-machine-learning

Support vector machine

less than 1 minute read

Support vector machine (SVM) is mainly used for building classification models in supervised machine learning, however the ideas employed by SVM can be also ...

Back to top ↑

svm

Support vector machine

less than 1 minute read

Support vector machine (SVM) is mainly used for building classification models in supervised machine learning, however the ideas employed by SVM can be also ...

Back to top ↑

function-minimization

Conjugate gradient

less than 1 minute read

Conjugate gradient methods are well suited for solving both linear and non-linear systems of equations.

Back to top ↑

rank-one-update

Quasi-Newton methods

2 minute read

The classical Newton’s method for optimization is known for being computationally expensive when it comes to calculating the gradient of the Hessian matrix a...

Back to top ↑

jacobian

Quasi-Newton methods

2 minute read

The classical Newton’s method for optimization is known for being computationally expensive when it comes to calculating the gradient of the Hessian matrix a...

Back to top ↑

Sherman-Morrison-formula

Quasi-Newton methods

2 minute read

The classical Newton’s method for optimization is known for being computationally expensive when it comes to calculating the gradient of the Hessian matrix a...

Back to top ↑

cross-product

Quasi-Newton methods

2 minute read

The classical Newton’s method for optimization is known for being computationally expensive when it comes to calculating the gradient of the Hessian matrix a...

Back to top ↑

secant-method

Root finding algorithms

5 minute read

In numeric analysis root finding is equivalent to finding zeros of a continuous function. In case of complex and composite function finding values of a varia...

Back to top ↑

bisection-method

Root finding algorithms

5 minute read

In numeric analysis root finding is equivalent to finding zeros of a continuous function. In case of complex and composite function finding values of a varia...

Back to top ↑

Brent's-method

Root finding algorithms

5 minute read

In numeric analysis root finding is equivalent to finding zeros of a continuous function. In case of complex and composite function finding values of a varia...

Back to top ↑

Newton's-method

Root finding algorithms

5 minute read

In numeric analysis root finding is equivalent to finding zeros of a continuous function. In case of complex and composite function finding values of a varia...

Back to top ↑

zero-finding

Root finding algorithms

5 minute read

In numeric analysis root finding is equivalent to finding zeros of a continuous function. In case of complex and composite function finding values of a varia...

Back to top ↑

false-position-method

Root finding algorithms

5 minute read

In numeric analysis root finding is equivalent to finding zeros of a continuous function. In case of complex and composite function finding values of a varia...

Back to top ↑

Broyden's-method

Root finding algorithms

5 minute read

In numeric analysis root finding is equivalent to finding zeros of a continuous function. In case of complex and composite function finding values of a varia...

Back to top ↑

inverse-quadratic-interpolation

Root finding algorithms

5 minute read

In numeric analysis root finding is equivalent to finding zeros of a continuous function. In case of complex and composite function finding values of a varia...

Back to top ↑

smoothing

LOESS

3 minute read

It is quite common to have a relation between two values represented as a scatterplot. However, visual representation of a scatterplot alone can often be non...

Back to top ↑

scatterplot

LOESS

3 minute read

It is quite common to have a relation between two values represented as a scatterplot. However, visual representation of a scatterplot alone can often be non...

Back to top ↑

least-squares

LOESS

3 minute read

It is quite common to have a relation between two values represented as a scatterplot. However, visual representation of a scatterplot alone can often be non...

Back to top ↑

weighting

LOESS

3 minute read

It is quite common to have a relation between two values represented as a scatterplot. However, visual representation of a scatterplot alone can often be non...

Back to top ↑

non-parametric-regression

LOESS

3 minute read

It is quite common to have a relation between two values represented as a scatterplot. However, visual representation of a scatterplot alone can often be non...

Back to top ↑

local-regression

LOESS

3 minute read

It is quite common to have a relation between two values represented as a scatterplot. However, visual representation of a scatterplot alone can often be non...

Back to top ↑

integral

Back to top ↑

area

Back to top ↑

net-change

Back to top ↑

Riemann-sum

Back to top ↑

fundamental-theorem-of-calculus

Back to top ↑

improper-integrals

Back to top ↑

probability

Back to top ↑

pdf

Back to top ↑

moment

Moment generating functions

1 minute read

In general moment generating functions (MGF) are used as an alternative for using probability density distribution. The name implies that with MGF it is poss...

Back to top ↑

mgf

Moment generating functions

1 minute read

In general moment generating functions (MGF) are used as an alternative for using probability density distribution. The name implies that with MGF it is poss...

Back to top ↑

probability-density-distribution

Moment generating functions

1 minute read

In general moment generating functions (MGF) are used as an alternative for using probability density distribution. The name implies that with MGF it is poss...

Back to top ↑

skewness

Moment generating functions

1 minute read

In general moment generating functions (MGF) are used as an alternative for using probability density distribution. The name implies that with MGF it is poss...

Back to top ↑

kurtosis

Moment generating functions

1 minute read

In general moment generating functions (MGF) are used as an alternative for using probability density distribution. The name implies that with MGF it is poss...

Back to top ↑

expected-value

Moment generating functions

1 minute read

In general moment generating functions (MGF) are used as an alternative for using probability density distribution. The name implies that with MGF it is poss...

Back to top ↑

Hoeffding's-inequality

Probability bounds

5 minute read

When applying a model to a certain process it is mostly impossible to be fully certain about the values of the parameters, the output of the process, and the...

Back to top ↑

Markov's-inequality

Probability bounds

5 minute read

When applying a model to a certain process it is mostly impossible to be fully certain about the values of the parameters, the output of the process, and the...

Back to top ↑

Chebyshev's-inequality

Probability bounds

5 minute read

When applying a model to a certain process it is mostly impossible to be fully certain about the values of the parameters, the output of the process, and the...

Back to top ↑

Chernoff-bound

Probability bounds

5 minute read

When applying a model to a certain process it is mostly impossible to be fully certain about the values of the parameters, the output of the process, and the...

Back to top ↑

moment-generating-function

Probability bounds

5 minute read

When applying a model to a certain process it is mostly impossible to be fully certain about the values of the parameters, the output of the process, and the...

Back to top ↑

Weak-law-of-large-numbers

Probability bounds

5 minute read

When applying a model to a certain process it is mostly impossible to be fully certain about the values of the parameters, the output of the process, and the...

Back to top ↑

Bernoulli-distribution

Probability bounds

5 minute read

When applying a model to a certain process it is mostly impossible to be fully certain about the values of the parameters, the output of the process, and the...

Back to top ↑

correlation

Back to top ↑

correlation-coefficients

Back to top ↑

covariation

Back to top ↑

Pearson's-correlation-coefficient

Back to top ↑

coefficient-of-determination

Back to top ↑

r-squared

Back to top ↑

adjusted-r-squared

Back to top ↑

regression-model

Back to top ↑

variance-of-coefficients

Back to top ↑

principal-components

Back to top ↑

pca

Back to top ↑

dimensionality-reduction

Back to top ↑

matrix-transformation

Back to top ↑

feature-extraction

Back to top ↑

eigendecomposition

Back to top ↑

singular-vectors

Back to top ↑

singular-values

Back to top ↑

scatter-matrix

Back to top ↑

truncated-svd

Back to top ↑

linear-model

Degrees of freedom in statistics

1 minute read

In statistics degrees of freedom is the number of independent elements forming a final statistic which are free to vary without violation of an imposed const...

Back to top ↑

type-1-error

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

type-2-error

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

p-value-hacking

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

alternative-hypothesis

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

power-analysis

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

sample-size

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

family-wise-error-rate

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

multiple-testing

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

Benjamini–Hochberg-procedure

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

FWER

Hypothesis testing

11 minute read

In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...

Back to top ↑

Pearson's-chi-square-test

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

z-test

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

F-distribution

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

continuity-correction

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

Welch's-t-test

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

exact-Fisher's-test

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

Barnard's-test

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

contingency-table

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

G-test

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

paired-t-test

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

McNemar's-test

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

ANOVA

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

analysis-of-variance

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

F-test

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

F-statistic

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

Games-Howell-test

Hypothesis test parametric statistics

23 minute read

This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...

Back to top ↑

ols

Linear least squares

4 minute read

Ordinary least squares method (OLS) is a widely used method which estimates the parameters of a linear regression by minimizing the sum of the squares of the...

Back to top ↑

normal-equations

Linear least squares

4 minute read

Ordinary least squares method (OLS) is a widely used method which estimates the parameters of a linear regression by minimizing the sum of the squares of the...

Back to top ↑

qr-decomposition

Linear least squares

4 minute read

Ordinary least squares method (OLS) is a widely used method which estimates the parameters of a linear regression by minimizing the sum of the squares of the...

Back to top ↑

sample

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Back to top ↑

Central-limit-theorem

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Back to top ↑

standard-error

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Back to top ↑

standard-deviation

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Back to top ↑

sample-variance

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Back to top ↑

sample-mean

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Back to top ↑

unbiased-estimate

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Back to top ↑

confidence-level

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Back to top ↑

t-value

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Back to top ↑

t-statistic

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Back to top ↑

z-statistic

Sampling distribution

5 minute read

In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...

Back to top ↑

statistical-inference

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

goodness-of-fit

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

joint-probability

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

log-likelihood

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

likelihood-ratio

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

likelihood-ratio-test

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

Wald-test

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

Wald-statistic

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

chi-squre-distribution

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

Lagrange-multiplier-test

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

score-test

Maximum likelihood estimation

6 minute read

In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...

Back to top ↑

Breusch–Pagan-test

Back to top ↑

White-test

Back to top ↑

nlp

Text similarity

2 minute read

Doing text analysis you may encounter a case when you need to select the most similar text or a group of text to another chosen text. One way to do it is to...

Back to top ↑

natural-language-processing

Text similarity

2 minute read

Doing text analysis you may encounter a case when you need to select the most similar text or a group of text to another chosen text. One way to do it is to...

Back to top ↑

documents-similarity

Text similarity

2 minute read

Doing text analysis you may encounter a case when you need to select the most similar text or a group of text to another chosen text. One way to do it is to...

Back to top ↑

vector-magnitude

Text similarity

2 minute read

Doing text analysis you may encounter a case when you need to select the most similar text or a group of text to another chosen text. One way to do it is to...

Back to top ↑

multiple-regression

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

multivariable-regression

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

gaussian-noise

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

multicollinearity

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

correlation-coefficient

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

Cochrane–Orcutt-estimation

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

Prais–Winsten-estimation

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

weighted-least-squares

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

WLS

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

generalized-least-squares

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

GLS

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

feasible-generalized-least-squares

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

FGLS

Linear regression

14 minute read

Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...

Back to top ↑

median

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Back to top ↑

sign-test

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Back to top ↑

Wilcoxon-signed-rank-test

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Back to top ↑

Mann–Whitney-U-test

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Back to top ↑

U-statistic

Hypothesis test non-parametric statistics

6 minute read

This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...

Back to top ↑

change-point

Change point analysis

2 minute read

Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...

Back to top ↑

anomaly-detections

Change point analysis

2 minute read

Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...

Back to top ↑

search-method

Change point analysis

2 minute read

Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...

Back to top ↑

Hausdorf-metric

Change point analysis

2 minute read

Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...

Back to top ↑

breakpoint

Change point analysis

2 minute read

Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...

Back to top ↑

break-point

Change point analysis

2 minute read

Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...

Back to top ↑

F1-score

Change point analysis

2 minute read

Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...

Back to top ↑

statistics

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

z-table

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

six-sigma

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

central-limit-theorem

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

probability-mass-function

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

chi-square-distribution

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

Bernoulli-trial

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

geometric-distribution

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

exponential-distribution

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

poisson-distribution

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

uniform-distribution

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

negative-binomial-distribution

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

studentized-range-distribution

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

beta-distribution

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

gamma-function

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

Gaussian-distribution

Statistical distributions

16 minute read

A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...

Back to top ↑

prior-probability

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Back to top ↑

Bayes-rule

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Back to top ↑

maximum-likelihood-estimation

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Back to top ↑

posterior-probability

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Back to top ↑

MAP

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Back to top ↑

maximum-a-posteriori

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Back to top ↑

conjugate-prior-distribution

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Back to top ↑

expected-a-posteriori

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Back to top ↑

EAP

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Back to top ↑

marginal-probability

Bayesian inference

4 minute read

Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...

Back to top ↑

vector-length

Vector projection

1 minute read

In order to understand more advanced concepts from vector operations it is good to have an intuition on such terms as vector projection. Below is a two-dimen...

Back to top ↑

vector-multiplication

Vector projection

1 minute read

In order to understand more advanced concepts from vector operations it is good to have an intuition on such terms as vector projection. Below is a two-dimen...

Back to top ↑

Euclidean-distance

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Back to top ↑

Manhattan-distance

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Back to top ↑

Minkowski-distance

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Back to top ↑

vector-norm

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Back to top ↑

normed-vector-space

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Back to top ↑

Mahalanobis-distance

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Back to top ↑

cosine-distance

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Back to top ↑

cosine-similarity

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Back to top ↑

Hamming-distance

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Back to top ↑

relative-entropy

Distance metrics

7 minute read

In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...

Back to top ↑

bayesian-statistics

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Back to top ↑

evidence-lower-bound

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Back to top ↑

ELBO

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Back to top ↑

mean-field-variational-family

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Back to top ↑

coordinate-ascent-variational-inference

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Back to top ↑

CAVI

Variational inference

3 minute read

Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...

Back to top ↑

gaussian-distribution

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Back to top ↑

log-marginal-likelihood

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Back to top ↑

sparse-variational-gaussian-process

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Back to top ↑

variational-inference

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Back to top ↑

likelihood

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Back to top ↑

marginal-likelihood

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Back to top ↑

whitened-parameterization

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Back to top ↑

parameterization-trick

Gaussian process

16 minute read

The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...

Back to top ↑

binary-classification

Logistic regression

3 minute read

Logistic regression is one of the simplests algorithms for binary classification, and it is based on the linear regression. In its core, it uses a linear com...

Back to top ↑

logistic-function

Logistic regression

3 minute read

Logistic regression is one of the simplests algorithms for binary classification, and it is based on the linear regression. In its core, it uses a linear com...

Back to top ↑

odds

Logistic regression

3 minute read

Logistic regression is one of the simplests algorithms for binary classification, and it is based on the linear regression. In its core, it uses a linear com...

Back to top ↑

log-odds

Logistic regression

3 minute read

Logistic regression is one of the simplests algorithms for binary classification, and it is based on the linear regression. In its core, it uses a linear com...

Back to top ↑

regularization

Logistic regression

3 minute read

Logistic regression is one of the simplests algorithms for binary classification, and it is based on the linear regression. In its core, it uses a linear com...

Back to top ↑

overfittings

Regularization

1 minute read

Regularization is a process of simplifying complex machine learning models in order to prevent overfitting. It may be applied explicitly by adding a penalty ...

Back to top ↑

L2-regularization

Regularization

1 minute read

Regularization is a process of simplifying complex machine learning models in order to prevent overfitting. It may be applied explicitly by adding a penalty ...

Back to top ↑

L1-regularization

Regularization

1 minute read

Regularization is a process of simplifying complex machine learning models in order to prevent overfitting. It may be applied explicitly by adding a penalty ...

Back to top ↑

Ridge-regularization

Regularization

1 minute read

Regularization is a process of simplifying complex machine learning models in order to prevent overfitting. It may be applied explicitly by adding a penalty ...

Back to top ↑

Lasso-regularization

Regularization

1 minute read

Regularization is a process of simplifying complex machine learning models in order to prevent overfitting. It may be applied explicitly by adding a penalty ...

Back to top ↑

Ridge-regression

Regularization

1 minute read

Regularization is a process of simplifying complex machine learning models in order to prevent overfitting. It may be applied explicitly by adding a penalty ...

Back to top ↑

Lasso-regression

Regularization

1 minute read

Regularization is a process of simplifying complex machine learning models in order to prevent overfitting. It may be applied explicitly by adding a penalty ...

Back to top ↑

Elastic-Net

Regularization

1 minute read

Regularization is a process of simplifying complex machine learning models in order to prevent overfitting. It may be applied explicitly by adding a penalty ...

Back to top ↑

logistic-regression

Back to top ↑

activation-function

Back to top ↑

neuron

Back to top ↑

perceptron

Back to top ↑

deep-neural-networks

Back to top ↑

hidden-layer

Back to top ↑

node

Back to top ↑

backpropagation

Back to top ↑

ReLU

Back to top ↑

bias

Back to top ↑

vanishing-gradient

Back to top ↑

saturated-neuron

Back to top ↑

dying-ReLU

Back to top ↑

ELU

Back to top ↑

softmax

Back to top ↑

feed-forward-networks

Back to top ↑

mse

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Back to top ↑

l2

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Back to top ↑

mean-squared-error

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Back to top ↑

quadratic-loss

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Back to top ↑

l1

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Back to top ↑

mean-absolute-error

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Back to top ↑

mae

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Back to top ↑

m-estimator

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Back to top ↑

huber-loss

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Back to top ↑

bisquare-loss-function

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Back to top ↑

entropy

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Back to top ↑

cross-entropy

Loss functions

7 minute read

In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...

Back to top ↑

Durbin–Watson-test

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Back to top ↑

Durbin–Watson-statistic

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Back to top ↑

white-noise

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Back to top ↑

Breusch–Godfrey-test

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Back to top ↑

ADF

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Back to top ↑

KPSS

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Back to top ↑

Kwiatkowski-Phillips-Schmidt-Shin-test

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Back to top ↑

Dickey-Fuller-test

Time series tests

6 minute read

This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.

Back to top ↑

MA

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

ARMA

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

autoregression

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

moving-average

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

random-walk

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

differencing

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

ACF

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

autocorrelation-function

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

PACF

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

partial-autocorrelation-function

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

SARIMA

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

SARIMAX

ARIMA models

10 minute read

In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...

Back to top ↑

trend

Exponential smoothing models

5 minute read

In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.

Back to top ↑

seasonal-component

Exponential smoothing models

5 minute read

In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.

Back to top ↑

level-component

Exponential smoothing models

5 minute read

In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.

Back to top ↑

Holt-linear-trend

Exponential smoothing models

5 minute read

In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.

Back to top ↑

damped-trend

Exponential smoothing models

5 minute read

In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.

Back to top ↑

Holt-Winters-seasonal-method

Exponential smoothing models

5 minute read

In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.

Back to top ↑

Holt-linear-trend-method

Exponential smoothing models

5 minute read

In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.

Back to top ↑

state-space-model

Exponential smoothing models

5 minute read

In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.

Back to top ↑

additive-error

Exponential smoothing models

5 minute read

In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.

Back to top ↑

multiplicative-error

Exponential smoothing models

5 minute read

In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.

Back to top ↑

decision-tree

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

bagging

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

random-forest

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

bootstrap-aggregation

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

ensemble-techniques

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

extremely-randomized-trees

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

extra-trees

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

mean-decrease-in-impurity

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

MDI

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

AdaBoost

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

boosting

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

gradient-boosting

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

histogram-based-gradient-boosting

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

xgboost

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

extreme-gradient-boosting

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

gradient-boosting-machine

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

GBM

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

LightGBM

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

gradient-based-one-side-sampling

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

GOSS

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

exclusive-feature-bundling

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

EFB

Tree-like algorithms

10 minute read

The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...

Back to top ↑

Euler-number

Back to top ↑

real-number

Back to top ↑

trigonometric-function

Back to top ↑

Euler's-formula

Back to top ↑

vector-rotation

Back to top ↑

Euler-identity

Back to top ↑

missing-values

Interpolation

9 minute read

Generally speaking, interpolation is a way of generating new data points which exactly fit into some given set of points. It is used for approximation of com...

Back to top ↑

spline

Interpolation

9 minute read

Generally speaking, interpolation is a way of generating new data points which exactly fit into some given set of points. It is used for approximation of com...

Back to top ↑

nearest-neighbour

Interpolation

9 minute read

Generally speaking, interpolation is a way of generating new data points which exactly fit into some given set of points. It is used for approximation of com...

Back to top ↑

Vandermonde-matrix

Interpolation

9 minute read

Generally speaking, interpolation is a way of generating new data points which exactly fit into some given set of points. It is used for approximation of com...

Back to top ↑

Lagrange-multipliers

Interpolation

9 minute read

Generally speaking, interpolation is a way of generating new data points which exactly fit into some given set of points. It is used for approximation of com...

Back to top ↑

Newton's-divided-differences

Interpolation

9 minute read

Generally speaking, interpolation is a way of generating new data points which exactly fit into some given set of points. It is used for approximation of com...

Back to top ↑

cubic-spline

Interpolation

9 minute read

Generally speaking, interpolation is a way of generating new data points which exactly fit into some given set of points. It is used for approximation of com...

Back to top ↑

Bezier-curve

Interpolation

9 minute read

Generally speaking, interpolation is a way of generating new data points which exactly fit into some given set of points. It is used for approximation of com...

Back to top ↑

periodic-function

Fourier series

6 minute read

A Fourier series is a way to approximate a periodic function as the sum of simple sine and cosine functions which makes it useful for analyzing and synthesiz...

Back to top ↑

sine

Fourier series

6 minute read

A Fourier series is a way to approximate a periodic function as the sum of simple sine and cosine functions which makes it useful for analyzing and synthesiz...

Back to top ↑

fundamental-period

Fourier series

6 minute read

A Fourier series is a way to approximate a periodic function as the sum of simple sine and cosine functions which makes it useful for analyzing and synthesiz...

Back to top ↑

harmonic-frequency

Fourier series

6 minute read

A Fourier series is a way to approximate a periodic function as the sum of simple sine and cosine functions which makes it useful for analyzing and synthesiz...

Back to top ↑

RBF

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Back to top ↑

radial-basis-function-kernel

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Back to top ↑

squared-exponential-kernel

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Back to top ↑

matern-kernel

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Back to top ↑

rational-quadratic-kernel

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Back to top ↑

periodic-kernel

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Back to top ↑

linear-kernel

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Back to top ↑

dot-product-kernel

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Back to top ↑

kernel-density-estimation

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Back to top ↑

KDE

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Back to top ↑

kernel-bandwidth

Kernels overview

5 minute read

In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...

Back to top ↑

segmentation

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

unsupervised-learning

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

hierarchical-clustering

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

agglomerative-clustering

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

dendrogram

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

k-means

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

ward-linkage

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

affinity-propagation

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

spectral-clustering

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

DBSCAN

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

HDBSCAN

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

OPTICS

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

BIRCH

Clustering overview

21 minute read

Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...

Back to top ↑

unsupervised-machine-learning

Outliers detection

5 minute read

This article explores machine learning models used to identify unusual or atypical observations within data sets.

Back to top ↑

semi-supervised-machine-learning

Outliers detection

5 minute read

This article explores machine learning models used to identify unusual or atypical observations within data sets.

Back to top ↑

anomaly-detection

Outliers detection

5 minute read

This article explores machine learning models used to identify unusual or atypical observations within data sets.

Back to top ↑

novelty-detection

Outliers detection

5 minute read

This article explores machine learning models used to identify unusual or atypical observations within data sets.

Back to top ↑

local-outlier-factor

Outliers detection

5 minute read

This article explores machine learning models used to identify unusual or atypical observations within data sets.

Back to top ↑

LOF

Outliers detection

5 minute read

This article explores machine learning models used to identify unusual or atypical observations within data sets.

Back to top ↑

GLOSH

Outliers detection

5 minute read

This article explores machine learning models used to identify unusual or atypical observations within data sets.

Back to top ↑

isolation-forest

Outliers detection

5 minute read

This article explores machine learning models used to identify unusual or atypical observations within data sets.

Back to top ↑