Tutorial on anomaly detection task#
import numpy as np
from sklearn.preprocessing import StandardScaler
from ice.anomaly_detection.datasets import AnomalyDetectionSmallTEP
from ice.anomaly_detection.models import AutoEncoderMLP
Download the dataset.
dataset = AnomalyDetectionSmallTEP()
dataset.df
xmeas_1 | xmeas_2 | xmeas_3 | xmeas_4 | xmeas_5 | xmeas_6 | xmeas_7 | xmeas_8 | xmeas_9 | xmeas_10 | ... | xmv_2 | xmv_3 | xmv_4 | xmv_5 | xmv_6 | xmv_7 | xmv_8 | xmv_9 | xmv_10 | xmv_11 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
run_id | sample | |||||||||||||||||||||
413402073 | 1 | 0.25038 | 3674.0 | 4529.0 | 9.2320 | 26.889 | 42.402 | 2704.3 | 74.863 | 120.41 | 0.33818 | ... | 53.744 | 24.657 | 62.544 | 22.137 | 39.935 | 42.323 | 47.757 | 47.510 | 41.258 | 18.447 |
2 | 0.25109 | 3659.4 | 4556.6 | 9.4264 | 26.721 | 42.576 | 2705.0 | 75.000 | 120.41 | 0.33620 | ... | 53.414 | 24.588 | 59.259 | 22.084 | 40.176 | 38.554 | 43.692 | 47.427 | 41.359 | 17.194 | |
3 | 0.25038 | 3660.3 | 4477.8 | 9.4426 | 26.875 | 42.070 | 2706.2 | 74.771 | 120.42 | 0.33563 | ... | 54.357 | 24.666 | 61.275 | 22.380 | 40.244 | 38.990 | 46.699 | 47.468 | 41.199 | 20.530 | |
4 | 0.24977 | 3661.3 | 4512.1 | 9.4776 | 26.758 | 42.063 | 2707.2 | 75.224 | 120.39 | 0.33553 | ... | 53.946 | 24.725 | 59.856 | 22.277 | 40.257 | 38.072 | 47.541 | 47.658 | 41.643 | 18.089 | |
5 | 0.29405 | 3679.0 | 4497.0 | 9.3381 | 26.889 | 42.650 | 2705.1 | 75.388 | 120.39 | 0.32632 | ... | 53.658 | 28.797 | 60.717 | 21.947 | 39.144 | 41.955 | 47.645 | 47.346 | 41.507 | 18.461 | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
312148819 | 956 | 0.24842 | 3694.2 | 4491.2 | 9.3946 | 26.780 | 42.655 | 2708.3 | 74.765 | 120.41 | 0.32959 | ... | 53.891 | 24.580 | 63.320 | 21.867 | 38.868 | 36.061 | 48.088 | 45.470 | 41.463 | 17.078 |
957 | 0.22612 | 3736.4 | 4523.1 | 9.3655 | 26.778 | 42.730 | 2711.0 | 75.142 | 120.38 | 0.32645 | ... | 53.675 | 21.831 | 64.142 | 22.027 | 38.842 | 39.144 | 44.560 | 45.598 | 41.591 | 16.720 | |
958 | 0.22386 | 3692.8 | 4476.5 | 9.3984 | 26.673 | 42.528 | 2712.7 | 74.679 | 120.43 | 0.32484 | ... | 54.233 | 22.053 | 59.228 | 22.235 | 39.040 | 35.116 | 45.737 | 45.490 | 41.884 | 16.310 | |
959 | 0.22561 | 3664.2 | 4483.0 | 9.4293 | 26.435 | 42.469 | 2710.2 | 74.857 | 120.38 | 0.31932 | ... | 53.335 | 22.248 | 60.567 | 21.820 | 37.979 | 33.394 | 48.503 | 45.512 | 40.630 | 20.996 | |
960 | 0.22585 | 3717.6 | 4492.8 | 9.4061 | 26.869 | 42.176 | 2710.5 | 74.722 | 120.41 | 0.31926 | ... | 53.217 | 22.225 | 63.429 | 22.259 | 37.986 | 34.810 | 47.810 | 45.639 | 41.898 | 18.378 |
153300 rows × 52 columns
dataset.target
run_id sample
413402073 1 0
2 0
3 0
4 0
5 0
..
312148819 956 1
957 1
958 1
959 1
960 1
Name: target, Length: 153300, dtype: int64
Split the data into train and test sets by run_id
.
Scale the data.
scaler = StandardScaler()
dataset.df[dataset.train_mask] = scaler.fit_transform(dataset.df[dataset.train_mask])
dataset.df[dataset.test_mask] = scaler.transform(dataset.df[dataset.test_mask])
Create the AutoEncoderMLP model.
model = AutoEncoderMLP(window_size=10, lr=0.001, num_epochs=3, verbose=True)
model.fit(dataset.df[dataset.train_mask])
Epoch 1, Loss: 0.7313
Epoch 2, Loss: 0.6942
Epoch 3, Loss: 0.6606
model.model
Sequential(
(0): MLP(
(mlp): Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=520, out_features=256, bias=True)
(2): ReLU()
(3): Linear(in_features=256, out_features=128, bias=True)
(4): ReLU()
(5): Linear(in_features=128, out_features=64, bias=True)
(6): ReLU()
)
)
(1): MLP(
(mlp): Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=64, out_features=128, bias=True)
(2): ReLU()
(3): Linear(in_features=128, out_features=256, bias=True)
(4): ReLU()
(5): Linear(in_features=256, out_features=520, bias=True)
)
)
)
Evaluate the metrics.
metrics = model.evaluate(dataset.df[dataset.test_mask], dataset.target[dataset.test_mask])
metrics
{'accuracy': 0.7545263157894737,
'true_positive_rate': [0.705975],
'false_positive_rate': [0.04881012658227848]}