Machine learning
Clustering
🧩 What is it?
Clustering (or grouping) is a technique of unsupervised Machine Learning used to discover natural groups within a dataset without needing those data to be previously labeled. Unlike classification and regression, where the model learns from known examples, in clustering the system detects patterns and similarities on its own, grouping the data that shares common characteristics.
🔧 How does it work?
The algorithm analyzes the dataset and calculates similarities between them based on certain criteria, such as the distance between points in a multidimensional space. The Euclidean distance is the most common in numerical spaces: the smaller the distance between two points, the greater their similarity.
From these similarities, it forms groups or clusters, so that the elements within the same group are more similar to each other than to those of other groups. These groups are not predefined; instead, the model builds them automatically based on the data.
🧠 When is it used?
Clustering is very useful when there is no known target variable, but there is a desire to explore or segment the data to understand its structure. In industrial or logistical environments using TOKII, it can be applied to:
Segmentation of teams or sensors according to similar operational behavior.
Grouping of energy consumption patterns to detect areas of efficiency or waste.
Detection of atypical behaviors by identifying points that do not fit into any group.
Classification of assets based on their maintenance history or performance.
🎯 Practical example: Grouping of buildings
Imagine you manage several buildings and have sensors that record variables such as indoor temperature, energy consumption, number of people, and use of the air conditioning system. You do not know in advance what type of behavior each building has, but you are interested in knowing if there are usage patterns.
With clustering, you can apply an algorithm like K-Means or DBSCAN to that historical data. The model will group the buildings according to their similar patterns: for example, it may detect that some have a high occupancy and high consumption profile (possibly offices), others with low occupancy and stable consumption (warehouses), and others with intermittent use (event centers).
The interesting thing is that you don’t need to tell the model how many types of buildings there are or how they behave: the algorithm discovers them for you. This allows you to make specific decisions by group, optimize maintenance, or set differentiated alerts according to the actual detected usage type.