Tutorial on fault diagnosis task#
import numpy as np
from sklearn.preprocessing import StandardScaler
from ice.fault_diagnosis.datasets import FaultDiagnosisSmallTEP
from ice.fault_diagnosis.models import MLP
/Users/vitalijpozdnakov/miniconda3/envs/ice/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Download the dataset.
dataset = FaultDiagnosisSmallTEP()
Downloading small_tep: 100%|██████████| 18.2M/18.2M [00:01<00:00, 10.1MB/s]
Extracting df.csv: 58.6MB [00:00, 153MB/s]
Extracting train_mask.csv: 9.77MB [00:00, 1.60GB/s]
Extracting target.csv: 9.77MB [00:00, 2.33GB/s]
Reading data/small_tep/df.csv: 100%|██████████| 153300/153300 [00:01<00:00, 79692.16it/s]
Reading data/small_tep/target.csv: 100%|██████████| 153300/153300 [00:00<00:00, 1771006.14it/s]
Reading data/small_tep/train_mask.csv: 100%|██████████| 153300/153300 [00:00<00:00, 1837204.89it/s]
dataset.df
xmeas_1 | xmeas_2 | xmeas_3 | xmeas_4 | xmeas_5 | xmeas_6 | xmeas_7 | xmeas_8 | xmeas_9 | xmeas_10 | ... | xmv_2 | xmv_3 | xmv_4 | xmv_5 | xmv_6 | xmv_7 | xmv_8 | xmv_9 | xmv_10 | xmv_11 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
run_id | sample | |||||||||||||||||||||
413402073 | 1 | 0.25038 | 3674.0 | 4529.0 | 9.2320 | 26.889 | 42.402 | 2704.3 | 74.863 | 120.41 | 0.33818 | ... | 53.744 | 24.657 | 62.544 | 22.137 | 39.935 | 42.323 | 47.757 | 47.510 | 41.258 | 18.447 |
2 | 0.25109 | 3659.4 | 4556.6 | 9.4264 | 26.721 | 42.576 | 2705.0 | 75.000 | 120.41 | 0.33620 | ... | 53.414 | 24.588 | 59.259 | 22.084 | 40.176 | 38.554 | 43.692 | 47.427 | 41.359 | 17.194 | |
3 | 0.25038 | 3660.3 | 4477.8 | 9.4426 | 26.875 | 42.070 | 2706.2 | 74.771 | 120.42 | 0.33563 | ... | 54.357 | 24.666 | 61.275 | 22.380 | 40.244 | 38.990 | 46.699 | 47.468 | 41.199 | 20.530 | |
4 | 0.24977 | 3661.3 | 4512.1 | 9.4776 | 26.758 | 42.063 | 2707.2 | 75.224 | 120.39 | 0.33553 | ... | 53.946 | 24.725 | 59.856 | 22.277 | 40.257 | 38.072 | 47.541 | 47.658 | 41.643 | 18.089 | |
5 | 0.29405 | 3679.0 | 4497.0 | 9.3381 | 26.889 | 42.650 | 2705.1 | 75.388 | 120.39 | 0.32632 | ... | 53.658 | 28.797 | 60.717 | 21.947 | 39.144 | 41.955 | 47.645 | 47.346 | 41.507 | 18.461 | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
312148819 | 956 | 0.24842 | 3694.2 | 4491.2 | 9.3946 | 26.780 | 42.655 | 2708.3 | 74.765 | 120.41 | 0.32959 | ... | 53.891 | 24.580 | 63.320 | 21.867 | 38.868 | 36.061 | 48.088 | 45.470 | 41.463 | 17.078 |
957 | 0.22612 | 3736.4 | 4523.1 | 9.3655 | 26.778 | 42.730 | 2711.0 | 75.142 | 120.38 | 0.32645 | ... | 53.675 | 21.831 | 64.142 | 22.027 | 38.842 | 39.144 | 44.560 | 45.598 | 41.591 | 16.720 | |
958 | 0.22386 | 3692.8 | 4476.5 | 9.3984 | 26.673 | 42.528 | 2712.7 | 74.679 | 120.43 | 0.32484 | ... | 54.233 | 22.053 | 59.228 | 22.235 | 39.040 | 35.116 | 45.737 | 45.490 | 41.884 | 16.310 | |
959 | 0.22561 | 3664.2 | 4483.0 | 9.4293 | 26.435 | 42.469 | 2710.2 | 74.857 | 120.38 | 0.31932 | ... | 53.335 | 22.248 | 60.567 | 21.820 | 37.979 | 33.394 | 48.503 | 45.512 | 40.630 | 20.996 | |
960 | 0.22585 | 3717.6 | 4492.8 | 9.4061 | 26.869 | 42.176 | 2710.5 | 74.722 | 120.41 | 0.31926 | ... | 53.217 | 22.225 | 63.429 | 22.259 | 37.986 | 34.810 | 47.810 | 45.639 | 41.898 | 18.378 |
153300 rows × 52 columns
dataset.target
run_id sample
413402073 1 0
2 0
3 0
4 0
5 0
..
312148819 956 20
957 20
958 20
959 20
960 20
Name: target, Length: 153300, dtype: int64
Split the data into train and test sets by run_id
.
Scale the data.
scaler = StandardScaler()
dataset.df[dataset.train_mask] = scaler.fit_transform(dataset.df[dataset.train_mask])
dataset.df[dataset.test_mask] = scaler.transform(dataset.df[dataset.test_mask])
Create the MLP model.
model = MLP(window_size=10, lr=0.001, verbose=True)
model.fit(dataset.df[dataset.train_mask], dataset.target[dataset.train_mask])
Creating sequence of samples: 100%|██████████| 105/105 [00:00<00:00, 777.58it/s]
Epochs ...: 10%|█ | 1/10 [00:02<00:24, 2.70s/it]
Epoch 1, Loss: 0.4350
Epochs ...: 20%|██ | 2/10 [00:05<00:23, 2.88s/it]
Epoch 2, Loss: 0.5717
Epochs ...: 30%|███ | 3/10 [00:08<00:19, 2.82s/it]
Epoch 3, Loss: 0.3677
Epochs ...: 40%|████ | 4/10 [00:11<00:16, 2.79s/it]
Epoch 4, Loss: 0.6005
Epochs ...: 50%|█████ | 5/10 [00:13<00:13, 2.77s/it]
Epoch 5, Loss: 0.7325
Epochs ...: 60%|██████ | 6/10 [00:16<00:11, 2.86s/it]
Epoch 6, Loss: 0.3584
Epochs ...: 70%|███████ | 7/10 [00:19<00:08, 2.87s/it]
Epoch 7, Loss: 0.5394
Epochs ...: 80%|████████ | 8/10 [00:22<00:05, 2.81s/it]
Epoch 8, Loss: 0.3939
Epochs ...: 90%|█████████ | 9/10 [00:27<00:03, 3.57s/it]
Epoch 9, Loss: 0.3941
Epochs ...: 100%|██████████| 10/10 [00:35<00:00, 3.52s/it]
Epoch 10, Loss: 0.4601
Evaluate the metrics.
metrics = model.evaluate(dataset.df[dataset.test_mask], dataset.target[dataset.test_mask])
metrics
Creating sequence of samples: 100%|██████████| 105/105 [00:00<00:00, 345.39it/s]
{'accuracy': 0.7945664160401003}