In the previous post, we introduced what word embeddings are and what they can do. This time, we’ll try to make sense of them. What problem do they solve? How can they help computers understand natural language?

If you already have a solid understanding of word embeddings and are well into your data science career, skip ahead to the next part!

Subspace embedding is a powerful tool to simplify the matrix calculation and analyze high dimensional data, especially for sparse matrix.

Nowadays, dimensionality is a serious problem of data analysis as the huge data we experience today results in very sparse sets and very high dimensions. Although, data scientists have long used tools such as principal component analysis (PCA) and independent component analysis (ICA) to project the high-dimensional data onto a subspace, but all those techniques reply on the computation of the eigenvectors of a $n \times n$ matrix, a very expensive operation (e.g., spectral decomposition) for high dimension $n$. Moreover, even though eigenspace has many important properties, it does not lead good approximations for many useful measures such as vector norms. We discuss another method random projection to reduce dimensionality.

A short summary and comparison of different platforms. Based on this blog and (Zhang et al., 2017).

This post means to give you a nutshell description of bias-variance decomposition.

Are we really stuck in the local minima rather than anything else?

This post will introduce some normalization related tricks in neural networks.

This post means to help starters to understand the math behind Gradient Descent (GD).

If you’re writing an article for this blog, please follow these guidelines.