Machine learning is a type of artificial intelligence that enables computers to detect patterns and establish baseline behavior using algorithms that learn through training or observation. It can process and analyze vast amounts of data that are simply impractical for humans.

  • Supervised learning — the machine is presented with a set of inputs and expected outputs, later given a new input the output is predicted.
  • Unsupervised learning — the machine aims to find patterns, within a dataset without explicit input from a human as to what these patterns might look like.


Clustering is the assignment of objects to homogeneous groups (called clusters) while making sure that objects in different groups are not similar. Clustering is considered an unsupervised task as it aims to describe the hidden structure of the objects.


There are many clustering algorithms, each has its advantages and disadvantages. A popular algorithm for clustering is K-means, which aims to identify the best k cluster centers in an iterative manner. Cluster centers are served as “representatives” of the objects associated with the cluster. k-means’ key features are also its drawbacks:

  • The number of clusters (k) must be given explicitly. In some cases, the number of different groups is unknown.
  • k-means iterative nature might lead to an incorrect result due to convergence to a local minimum.
  • The clusters are assumed to be spherical.


A different clustering algorithm is OPTICS, which is a density-based clustering algorithm. Density-based clustering, unlike centroid-based clustering, works by identifying “dense” clusters of points, allowing it to learn clusters of arbitrary shape and densities. OPTICS can also identify outliers (noise) in the data by identifying scattered objects.

Dimensionality Reduction

In the field of machine learning, it is useful to apply a process called dimensionality reduction to high dimensional data. The purpose of this process is to reduce the number of features under consideration, where each feature is a dimension that partly represents the objects.

  • Selecting from the existing features (feature selection)
  • Extracting new features by combining the existing features (feature extraction)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store