No-code ML-Lab
Set up training
Define the input variables, the target (if applicable), the size of the test set, and the model parameters.
⚙️ Model Configuration
Allows you to define the variables that will be used for training:
Features: selected fields as independent variables (model inputs).
Target variable: field that the model will attempt to predict (only in supervised learning).
Test size: proportion of the dataset that will be used to evaluate the model (for example, 20%).
Random state: optional number that controls the randomness of the partition (for reproducibility).

🧪 Hyperparameters
Shows the specific parameters of the selected algorithm. They vary depending on the estimator. In the case of a decision tree, for example:
Criterion: splitting criterion (e.g.
gini
,entropy
).max_depth: maximum depth of the tree.
min_samples_split: minimum samples to split a node.
min_samples_leaf: minimum samples in a leaf of the tree.
These parameters directly affect how the model is built and its performance.
🧼 Preprocessing Details
In the preprocessing configuration interface of the No-code ML module of TOKII, each variable of the dataset can be customized before model training.

The options available by field allow the user to adjust the treatment of the data through three key operations:
✅ Scaling
Allows applying normalization or standardization techniques so that numerical variables are on a comparable scale. This is especially important for algorithms sensitive to the magnitude of the data (such as regression, neural networks, or K-means). Typical options include:
MinMax (normalizes between 0 and 1).
Z-score (mean 0, standard deviation 1).
🧩 Encoding
Transforms categorical variables (like “On/Off” or region names) into numerical formats compatible with algorithms. Some examples:
One-hot encoding: creates a column for each category.
Label encoding: assigns a number to each distinct value.
🩹 Imputation Strategy
Defines how to handle missing values (null or empty) so that the model does not fail due to lack of data. Common strategies include:
Mean or median for numerical variables.
Most frequent value for categorical.
Record deletion (although not always recommended if there are many data missing).
Additionally, there is a global option to remove duplicates from the dataset before training, useful to avoid redundancy in the data and improve the quality of the model.
📊 Dataset Statistics
Includes:
Histogram by variable: shows the distribution of values for each field.
Categorical variables: are visualized as proportions.
It is useful for understanding the behavior of the data before training the model.
➕ Sessions
Here you manage the training executions. You can:
CREATE new training sessions with the current configuration.
View results of each session later (status, metrics, descriptions, etc.).