Tutorial on fault diagnosis task#

import numpy as np
from sklearn.preprocessing import StandardScaler

from ice.fault_diagnosis.datasets import FaultDiagnosisSmallTEP
from ice.fault_diagnosis.models import MLP
/Users/vitalijpozdnakov/miniconda3/envs/ice/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Download the dataset.

dataset = FaultDiagnosisSmallTEP()
Downloading small_tep: 100%|██████████| 18.2M/18.2M [00:01<00:00, 10.1MB/s]
Extracting df.csv: 58.6MB [00:00, 153MB/s]                            
Extracting train_mask.csv: 9.77MB [00:00, 1.60GB/s]                   
Extracting target.csv: 9.77MB [00:00, 2.33GB/s]                   
Reading data/small_tep/df.csv: 100%|██████████| 153300/153300 [00:01<00:00, 79692.16it/s]
Reading data/small_tep/target.csv: 100%|██████████| 153300/153300 [00:00<00:00, 1771006.14it/s]
Reading data/small_tep/train_mask.csv: 100%|██████████| 153300/153300 [00:00<00:00, 1837204.89it/s]
dataset.df
xmeas_1 xmeas_2 xmeas_3 xmeas_4 xmeas_5 xmeas_6 xmeas_7 xmeas_8 xmeas_9 xmeas_10 ... xmv_2 xmv_3 xmv_4 xmv_5 xmv_6 xmv_7 xmv_8 xmv_9 xmv_10 xmv_11
run_id sample
413402073 1 0.25038 3674.0 4529.0 9.2320 26.889 42.402 2704.3 74.863 120.41 0.33818 ... 53.744 24.657 62.544 22.137 39.935 42.323 47.757 47.510 41.258 18.447
2 0.25109 3659.4 4556.6 9.4264 26.721 42.576 2705.0 75.000 120.41 0.33620 ... 53.414 24.588 59.259 22.084 40.176 38.554 43.692 47.427 41.359 17.194
3 0.25038 3660.3 4477.8 9.4426 26.875 42.070 2706.2 74.771 120.42 0.33563 ... 54.357 24.666 61.275 22.380 40.244 38.990 46.699 47.468 41.199 20.530
4 0.24977 3661.3 4512.1 9.4776 26.758 42.063 2707.2 75.224 120.39 0.33553 ... 53.946 24.725 59.856 22.277 40.257 38.072 47.541 47.658 41.643 18.089
5 0.29405 3679.0 4497.0 9.3381 26.889 42.650 2705.1 75.388 120.39 0.32632 ... 53.658 28.797 60.717 21.947 39.144 41.955 47.645 47.346 41.507 18.461
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
312148819 956 0.24842 3694.2 4491.2 9.3946 26.780 42.655 2708.3 74.765 120.41 0.32959 ... 53.891 24.580 63.320 21.867 38.868 36.061 48.088 45.470 41.463 17.078
957 0.22612 3736.4 4523.1 9.3655 26.778 42.730 2711.0 75.142 120.38 0.32645 ... 53.675 21.831 64.142 22.027 38.842 39.144 44.560 45.598 41.591 16.720
958 0.22386 3692.8 4476.5 9.3984 26.673 42.528 2712.7 74.679 120.43 0.32484 ... 54.233 22.053 59.228 22.235 39.040 35.116 45.737 45.490 41.884 16.310
959 0.22561 3664.2 4483.0 9.4293 26.435 42.469 2710.2 74.857 120.38 0.31932 ... 53.335 22.248 60.567 21.820 37.979 33.394 48.503 45.512 40.630 20.996
960 0.22585 3717.6 4492.8 9.4061 26.869 42.176 2710.5 74.722 120.41 0.31926 ... 53.217 22.225 63.429 22.259 37.986 34.810 47.810 45.639 41.898 18.378

153300 rows × 52 columns

dataset.target
run_id     sample
413402073  1          0
           2          0
           3          0
           4          0
           5          0
                     ..
312148819  956       20
           957       20
           958       20
           959       20
           960       20
Name: target, Length: 153300, dtype: int64

Split the data into train and test sets by run_id.

Scale the data.

scaler = StandardScaler()
dataset.df[dataset.train_mask] = scaler.fit_transform(dataset.df[dataset.train_mask])
dataset.df[dataset.test_mask] = scaler.transform(dataset.df[dataset.test_mask])

Create the MLP model.

model = MLP(window_size=10, lr=0.001, verbose=True)
model.fit(dataset.df[dataset.train_mask], dataset.target[dataset.train_mask])
Creating sequence of samples: 100%|██████████| 105/105 [00:00<00:00, 777.58it/s]
Epochs ...:  10%|█         | 1/10 [00:02<00:24,  2.70s/it]
Epoch 1, Loss: 0.4350
Epochs ...:  20%|██        | 2/10 [00:05<00:23,  2.88s/it]
Epoch 2, Loss: 0.5717
Epochs ...:  30%|███       | 3/10 [00:08<00:19,  2.82s/it]
Epoch 3, Loss: 0.3677
Epochs ...:  40%|████      | 4/10 [00:11<00:16,  2.79s/it]
Epoch 4, Loss: 0.6005
Epochs ...:  50%|█████     | 5/10 [00:13<00:13,  2.77s/it]
Epoch 5, Loss: 0.7325
Epochs ...:  60%|██████    | 6/10 [00:16<00:11,  2.86s/it]
Epoch 6, Loss: 0.3584
Epochs ...:  70%|███████   | 7/10 [00:19<00:08,  2.87s/it]
Epoch 7, Loss: 0.5394
Epochs ...:  80%|████████  | 8/10 [00:22<00:05,  2.81s/it]
Epoch 8, Loss: 0.3939
Epochs ...:  90%|█████████ | 9/10 [00:27<00:03,  3.57s/it]
Epoch 9, Loss: 0.3941
Epochs ...: 100%|██████████| 10/10 [00:35<00:00,  3.52s/it]
Epoch 10, Loss: 0.4601

Evaluate the metrics.

metrics = model.evaluate(dataset.df[dataset.test_mask], dataset.target[dataset.test_mask])
metrics
Creating sequence of samples: 100%|██████████| 105/105 [00:00<00:00, 345.39it/s]
                                                             
{'accuracy': 0.7945664160401003}