Anomaly detection models#

BaseAnomalyDetection#

class ice.anomaly_detection.models.base.BaseAnomalyDetection(window_size: int, stride: int, batch_size: int, lr: float, num_epochs: int, device: str, verbose: bool, name: str, random_seed: int, val_ratio: float, save_checkpoints: bool, threshold_level: float = 0.95)[source]#

Bases: BaseModel, ABC

Base class for all anomaly detection models.

Parameters:

window_size (int) – The window size to train the model.
stride (int) – The time interval between first points of consecutive sliding windows in training.
batch_size (int) – The batch size to train the model.
lr (float) – The learning rate to train the model.
num_epochs (float) – The number of epochs to train the model.
device (str) – The name of a device to train the model. cpu and cuda are possible.
verbose (bool) – If true, show the progress bar in training.
name (str) – The name of the model for artifact storing.
random_seed (int) – Seed for random number generation to ensure reproducible results.
val_ratio (float) – Proportion of the dataset used for validation, between 0 and 1.
save_checkpoints (bool) – If true, store checkpoints.
threshold_level (float) – Takes a value from 0 to 1. It specifies the quantile in the distribution of errors on the training dataset at which the threshold value is set.

fit(df: DataFrame, target: Optional[Series] = None, epochs: Optional[int] = None, save_path: Optional[str] = None, trial: Optional[Trial] = None, force_model_ctreation: bool = False)[source]#

Fit (train) the model by a given dataset.

Parameters:

df (pandas.DataFrame) – A dataframe with sensor data. Index has two columns: run_id and sample. All other columns a value of sensors.
target (pandas.Series) – A series with target values. Indes has two columns: run_id and sample. It is omitted for anomaly detection task.
epochs (int) – The number of epochs for training step. If None, self.num_epochs parameter is used.
save_path (str) – Path to save checkpoints. If None, the path is created automatically.

load_checkpoint(checkpoint_path: str)[source]#

Load checkpoint.

Parameters:: checkpoint_path (str) – Path to load checkpoint.

AutoEncoderMLP#

class ice.anomaly_detection.models.autoencoder.AutoEncoderMLP(window_size: int, stride: int = 1, batch_size: int = 128, lr: float = 0.001, num_epochs: int = 10, device: str = 'cpu', verbose: bool = False, name: str = 'ae_anomaly_detection', random_seed: int = 42, val_ratio: float = 0.15, save_checkpoints: bool = False, threshold_level: float = 0.95, hidden_dims: list = [256, 128, 64])[source]#

Bases: BaseAnomalyDetection

MLP autoencoder consists of MLP encoder and MLP decoder parts. Each sample is reshaped to a vector (B, L, C) -> (B, L * C) for calculations and to a vector (B, L * C) -> (B, L, C) for the output. Where B is the batch size, L is the sequence length, C is the number of sensors.

Parameters:

window_size (int) – The window size to train the model.
stride (int) – The time interval between first points of consecutive sliding windows in training.
batch_size (int) – The batch size to train the model.
lr (float) – The larning rate to train the model.
num_epochs (float) – The number of epochs to train the model.
device (str) – The name of a device to train the model. cpu and cuda are possible.
verbose (bool) – If true, show the progress bar in training.
name (str) – The name of the model for artifact storing.
random_seed (int) – Seed for random number generation to ensure reproducible results.
val_ratio (float) – Proportion of the dataset used for validation, between 0 and 1.
save_checkpoints (bool) – If true, store checkpoints.
threshold_level (float) – Takes a value from 0 to 1. It specifies the quantile in the distribution of errors on the training dataset at which the threshold value is set.
hidden_dims (list) – Dimensions of hidden layers in encoder/decoder.

AnomalyTransformer#

class ice.anomaly_detection.models.transformer.AnomalyTransformer(window_size: int = 100, stride: int = 1, batch_size: int = 128, lr: float = 0.0001, num_epochs: int = 10, device: str = 'cpu', verbose: bool = False, name: str = 'transformer_anomaly_detection', random_seed: int = 42, val_ratio: float = 0.15, save_checkpoints: bool = False, threshold_level: float = 0.95, d_model: int = 256, n_heads: int = 8, e_layers: int = 3, d_ff: int = 256, dropout: float = 0.0, activation: str = 'gelu')[source]#

Bases: BaseAnomalyDetection

Anomaly Transformer was presented at ICLR 2022: “Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy”. https://openreview.net/forum?id=LzQQ89U1qm_

Parameters:

window_size (int) – The window size to train the model.
stride (int) – The time interval between first points of consecutive sliding windows in training.
batch_size (int) – The batch size to train the model.
lr (float) – The larning rate to train the model.
num_epochs (float) – The number of epochs to train the model.
device (str) – The name of a device to train the model. cpu and cuda are possible.
verbose (bool) – If true, show the progress bar in training.
name (str) – The name of the model for artifact storing.
random_seed (int) – Seed for random number generation to ensure reproducible results.
val_ratio (float) – Proportion of the dataset used for validation, between 0 and 1.
save_checkpoints (bool) – If true, store checkpoints.
threshold_level (float) – Takes a value from 0 to 1. It specifies the quantile in the distribution of errors on the training dataset at which the threshold value is set.
threshold_value (float) – Threshold value is calculated after the model is trained. It sets the error limit above which the data sample defines as anomaly.
d_model (int) – Dimension of model.
n_heads (int) – Number of heads.
e_layers (int) – Number of encoder layers.
d_ff (int) – Dimension of MLP.
dropout (float) – The rate of dropout.
activation (str) – Activation (‘relu’, ‘gelu’).

STGAT-MAD#

class ice.anomaly_detection.models.stgat.STGAT_MAD(window_size: int, stride: int = 1, batch_size: int = 128, lr: float = 0.001, num_epochs: int = 10, device: str = 'cpu', verbose: bool = False, name: str = 'stgat_anomaly_detection', random_seed: int = 42, val_ratio: float = 0.15, save_checkpoints: bool = False, threshold_level: float = 0.95, embed_dim: Optional[int] = None, layer_numb: int = 2, lstm_n_layers: int = 1, lstm_hid_dim: int = 150, recon_n_layers: int = 1, recon_hid_dim: int = 150, dropout: float = 0.2)[source]#

Bases: BaseAnomalyDetection

Stgat-Mad was presented at ICASSP 2022: “Stgat-Mad : Spatial-Temporal Graph Attention Network For Multivariate Time Series Anomaly Detection”. https://ieeexplore.ieee.org/abstract/document/9747274/

Parameters:

window_size (int) – The window size to train the model.
stride (int) – The time interval between first points of consecutive sliding windows in training.
batch_size (int) – The batch size to train the model.
lr (float) – The larning rate to train the model.
num_epochs (float) – The number of epochs to train the model.
device (str) – The name of a device to train the model. cpu and cuda are possible.
verbose (bool) – If true, show the progress bar in training.
name (str) – The name of the model for artifact storing.
random_seed (int) – Seed for random number generation to ensure reproducible results.
val_ratio (float) – Proportion of the dataset used for validation, between 0 and 1.
save_checkpoints (bool) – If true, store checkpoints.
threshold_level (float) – Takes a value from 0 to 1. It specifies the quantile in the distribution of errors on the training dataset at which the threshold value is set.
embed_dim (int) – Embedding dimension.
layer_numb (int) – Number of layers.
lstm_n_layers (int) – Number of LSTM layers.
lstm_hid_dim (int) – Hidden dimension of LSTM layers.
recon_n_layers (int) – Number of reconstruction layers.
recon_hid_dim (int) – Hidden dimension of reconstruction layers.
dropout (float) – The rate of dropout.

GSL-GNN#

class ice.anomaly_detection.models.gnn.GSL_GNN(window_size: int, stride: int = 1, batch_size: int = 128, lr: float = 0.001, num_epochs: int = 10, device: str = 'cpu', verbose: bool = False, name: str = 'gnn_anomaly_detection', random_seed: int = 42, val_ratio: float = 0.15, save_checkpoints: bool = False, threshold_level: float = 0.95, alpha: float = 0.2, k: Optional[int] = None)[source]#

Bases: BaseAnomalyDetection

GNN autoencoder consists of encoder with graph convolutional layers and MLP decoder parts. The graph describing the data is constructed during the training process using trainable parameters.

Parameters:

window_size (int) – The window size to train the model.
stride (int) – The time interval between first points of consecutive sliding windows in training.
batch_size (int) – The batch size to train the model.
lr (float) – The larning rate to train the model.
num_epochs (float) – The number of epochs to train the model.
device (str) – The name of a device to train the model. cpu and cuda are possible.
verbose (bool) – If true, show the progress bar in training.
name (str) – The name of the model for artifact storing.
random_seed (int) – Seed for random number generation to ensure reproducible results.
val_ratio (float) – Proportion of the dataset used for validation, between 0 and 1.
save_checkpoints (bool) – If true, store checkpoints.
threshold_level (float) – Takes a value from 0 to 1. It specifies the quantile in the distribution of errors on the training dataset at which the threshold value is set.
alpha (float) – Saturation rate for adjacency matrix.
k (int) – Limit on the number of edges in the adjacency matrix.