Kernels overview
In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...
In this article What are kernels Kernel density estimation Kernels in Gaussian process Squared exponential kernel Matern kernel ...
In machine learning the loss function (or cost, or cost function) is something that lets the machine to actually “learn”. The loss function is the function o...
Regularization is a process of simplifying complex machine learning models in order to prevent overfitting. It may be applied explicitly by adding a penalty ...
Variational inference is a technique which is usually employed for approximating complex models, where computation of the exact conditional distribution is i...
In machine learning the distance metrics are used as a measure of similarity between two datapoints - smaller distance means more similarity. They are utiliz...
In numeric analysis root finding is equivalent to finding zeros of a continuous function. In case of complex and composite function finding values of a varia...
The classical Newton’s method for optimization is known for being computationally expensive when it comes to calculating the gradient of the Hessian matrix a...
Conjugate gradient methods are well suited for solving both linear and non-linear systems of equations.
Trust region is a general strategy of optimization of an objective function by constructing an approximation of the function around a certain point and measu...
Generally speaking, line search is one of the two (the other is trust region) major strategies in non-linear optimization. Line search methods first determin...
The Nedlear-Mead optimization algorithm employs the concept of simplex - a special type of polytope in $n$-dimensional space which has $n+1$ vertices, that i...
Linear programming aims at finding the best outcome of mathematical problems provided constraints. The requirements for the linear programming problems are d...
Similarly to gradient descent, Newton’s method is used for finding a minimum of a function through iterations. Unlike gradient descent however, it uses a sec...
Gradient descent is bread and butter of machine learning. It is a fundamental method of finding the minimum of the loss function in neural networks, hence it...
Bayesian inference is the method of statistical inference where an estimated probability is updated when new data arrives. In a way, it may be viewed as an u...
A set of discrete values or a range of continuous values of a random variable is characterized by a certain probability distribution. In statistics various d...
This is an overview of the most commonly used non-parametric statistics in hypothesis testing explaining when to use each. Unlike the parametric counterparts...
In statistics the likelihood function is used to estimate the goodness of fit of the parameters to the observed data. The maximum likelihood estimation (MLE)...
In real life situations it is often impossible to calculate statistics such as mean and variance for the whole population. Instead we may only be able to dra...
This is an overview of the most commonly used parametric statistics in hypothesis testing explaining when to use each. The parametric statistics here assume ...
In this article What is hypothesis testing P-value hacking Type I and type II errors Statistical power Power analysis Multiple hypothesis testin...
In statistics degrees of freedom is the number of independent elements forming a final statistic which are free to vary without violation of an imposed const...
When applying a model to a certain process it is mostly impossible to be fully certain about the values of the parameters, the output of the process, and the...
In general moment generating functions (MGF) are used as an alternative for using probability density distribution. The name implies that with MGF it is poss...
Generally speaking, interpolation is a way of generating new data points which exactly fit into some given set of points. It is used for approximation of com...
The Gaussian process may be viewed as a prediction technique which mainly solves regression problems by fitting a line to some given data (although it may be...
Regression analysis is used for estimating the relationship between variables, usually one dependent and one or several independent variables. Having a regre...
What is heteroscedasticity
Ordinary least squares method (OLS) is a widely used method which estimates the parameters of a linear regression by minimizing the sum of the squares of the...
What is covariance?
It is quite common to have a relation between two values represented as a scatterplot. However, visual representation of a scatterplot alone can often be non...
Although application of linear regression is straightforward and characterized by many useful statistical properties, in the real life many processes have no...
Building intuition
The mean value theorem states that if a function is continuous and differentiable between two points then there exists a point between them where the tangent...
One usually neglected topic in calculus is understanding Taylor series. At first glance they seem to be rather impractical, however, they form building block...
Differential equation is an equation with a function and at least one of its derivatives. Here is an example:
Given an equation, understanding of derivatives allows analyzing functions beyond simple plotting and eyeballing them. Below I will describe some of the tech...
Building intuition
Let’s build an idea about formal definition of limit.
Introduction
In order to understand more advanced concepts from vector operations it is good to have an intuition on such terms as vector projection. Below is a two-dimen...
Below I’ve provided intuition behind some of the properties of matrices, which are useful to understand in order to apply them in more complex algorithms. R...
In machine learning eigenvectors and eigenvalues is one of the heavily used concepts, which among other things is fundamental for dimension reduction and fin...
QR decomposition may be a numerically stable algorithm for solving systems of linear equations. Similar to singular value decomposition it provides mechanism...
Let’s recall that the basis is a set of linearly independent vectors which form the full span of vector space. With regard to matrices the basis can be viewe...
Linear algebra is all about making transformations of linear combinations of numbers, called vectors, and arrays of vectors, called matrices. The elements of...
In this article we shall make a breakdown of the exponential smoothing models which are used in time series forecasting.
In this article we shall make a breakdown of the basic ARIMA time-series model. We shall describe each component of the model: autoregressive process (AR) an...
This article makes an overview of most commonly used statistical tests used in validation of the assumptions of the time series.
Change point analysis is basically used for determining whether and where an ordered series of values (usually time series) changed their behaviour. The chan...
Logistic regression is one of the simplests algorithms for binary classification, and it is based on the linear regression. In its core, it uses a linear com...
Support vector machine (SVM) is mainly used for building classification models in supervised machine learning, however the ideas employed by SVM can be also ...
This article explores machine learning models used to identify unusual or atypical observations within data sets.
Clustering may be viewed as grouping of data points according to their similarity so that the most similar points end up being in the same cluster. Clusterin...
Principal component analysis
Doing text analysis you may encounter a case when you need to select the most similar text or a group of text to another chosen text. One way to do it is to...
In this article
The tree-like algorithms are a wondrous array of machine learning methods that draw inspiration from the branching structure of the natural world. They emplo...
A Fourier series is a way to approximate a periodic function as the sum of simple sine and cosine functions which makes it useful for analyzing and synthesiz...