Anomaly detection models#

BaseAnomalyDetection#

class ice.anomaly_detection.models.base.BaseAnomalyDetection(window_size: int, stride: int, batch_size: int, lr: float, num_epochs: int, device: str, verbose: bool, name: str, random_seed: int, val_ratio: float, save_checkpoints: bool, threshold_level: float = 0.95)[source]#

Bases: BaseModel, ABC

Base class for all anomaly detection models.

Parameters:
  • window_size (int) – The window size to train the model.

  • stride (int) – The time interval between first points of consecutive sliding windows in training.

  • batch_size (int) – The batch size to train the model.

  • lr (float) – The learning rate to train the model.

  • num_epochs (float) – The number of epochs to train the model.

  • device (str) – The name of a device to train the model. cpu and cuda are possible.

  • verbose (bool) – If true, show the progress bar in training.

  • name (str) – The name of the model for artifact storing.

  • random_seed (int) – Seed for random number generation to ensure reproducible results.

  • val_ratio (float) – Proportion of the dataset used for validation, between 0 and 1.

  • save_checkpoints (bool) – If true, store checkpoints.

  • threshold_level (float) – Takes a value from 0 to 1. It specifies the quantile in the distribution of errors on the training dataset at which the threshold value is set.

fit(df: DataFrame, target: Optional[Series] = None, epochs: Optional[int] = None, save_path: Optional[str] = None, trial: Optional[Trial] = None, force_model_ctreation: bool = False)[source]#

Fit (train) the model by a given dataset.

Parameters:
  • df (pandas.DataFrame) – A dataframe with sensor data. Index has two columns: run_id and sample. All other columns a value of sensors.

  • target (pandas.Series) – A series with target values. Indes has two columns: run_id and sample. It is omitted for anomaly detection task.

  • epochs (int) – The number of epochs for training step. If None, self.num_epochs parameter is used.

  • save_path (str) – Path to save checkpoints. If None, the path is created automatically.

load_checkpoint(checkpoint_path: str)[source]#

Load checkpoint.

Parameters:

checkpoint_path (str) – Path to load checkpoint.

AutoEncoderMLP#

class ice.anomaly_detection.models.autoencoder.AutoEncoderMLP(window_size: int, stride: int = 1, batch_size: int = 128, lr: float = 0.001, num_epochs: int = 10, device: str = 'cpu', verbose: bool = False, name: str = 'ae_anomaly_detection', random_seed: int = 42, val_ratio: float = 0.15, save_checkpoints: bool = False, threshold_level: float = 0.95, hidden_dims: list = [256, 128, 64])[source]#

Bases: BaseAnomalyDetection

MLP autoencoder consists of MLP encoder and MLP decoder parts. Each sample is reshaped to a vector (B, L, C) -> (B, L * C) for calculations and to a vector (B, L * C) -> (B, L, C) for the output. Where B is the batch size, L is the sequence length, C is the number of sensors.

Parameters:
  • window_size (int) – The window size to train the model.

  • stride (int) – The time interval between first points of consecutive sliding windows in training.

  • batch_size (int) – The batch size to train the model.

  • lr (float) – The larning rate to train the model.

  • num_epochs (float) – The number of epochs to train the model.

  • device (str) – The name of a device to train the model. cpu and cuda are possible.

  • verbose (bool) – If true, show the progress bar in training.

  • name (str) – The name of the model for artifact storing.

  • random_seed (int) – Seed for random number generation to ensure reproducible results.

  • val_ratio (float) – Proportion of the dataset used for validation, between 0 and 1.

  • save_checkpoints (bool) – If true, store checkpoints.

  • threshold_level (float) – Takes a value from 0 to 1. It specifies the quantile in the distribution of errors on the training dataset at which the threshold value is set.

  • hidden_dims (list) – Dimensions of hidden layers in encoder/decoder.

AnomalyTransformer#

class ice.anomaly_detection.models.transformer.AnomalyTransformer(window_size: int = 100, stride: int = 1, batch_size: int = 128, lr: float = 0.0001, num_epochs: int = 10, device: str = 'cpu', verbose: bool = False, name: str = 'transformer_anomaly_detection', random_seed: int = 42, val_ratio: float = 0.15, save_checkpoints: bool = False, threshold_level: float = 0.95, d_model: int = 256, n_heads: int = 8, e_layers: int = 3, d_ff: int = 256, dropout: float = 0.0, activation: str = 'gelu')[source]#

Bases: BaseAnomalyDetection

Anomaly Transformer was presented at ICLR 2022: “Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy”. https://openreview.net/forum?id=LzQQ89U1qm_

Parameters:
  • window_size (int) – The window size to train the model.

  • stride (int) – The time interval between first points of consecutive sliding windows in training.

  • batch_size (int) – The batch size to train the model.

  • lr (float) – The larning rate to train the model.

  • num_epochs (float) – The number of epochs to train the model.

  • device (str) – The name of a device to train the model. cpu and cuda are possible.

  • verbose (bool) – If true, show the progress bar in training.

  • name (str) – The name of the model for artifact storing.

  • random_seed (int) – Seed for random number generation to ensure reproducible results.

  • val_ratio (float) – Proportion of the dataset used for validation, between 0 and 1.

  • save_checkpoints (bool) – If true, store checkpoints.

  • threshold_level (float) – Takes a value from 0 to 1. It specifies the quantile in the distribution of errors on the training dataset at which the threshold value is set.

  • threshold_value (float) – Threshold value is calculated after the model is trained. It sets the error limit above which the data sample defines as anomaly.

  • d_model (int) – Dimension of model.

  • n_heads (int) – Number of heads.

  • e_layers (int) – Number of encoder layers.

  • d_ff (int) – Dimension of MLP.

  • dropout (float) – The rate of dropout.

  • activation (str) – Activation (‘relu’, ‘gelu’).

STGAT-MAD#

class ice.anomaly_detection.models.stgat.STGAT_MAD(window_size: int, stride: int = 1, batch_size: int = 128, lr: float = 0.001, num_epochs: int = 10, device: str = 'cpu', verbose: bool = False, name: str = 'stgat_anomaly_detection', random_seed: int = 42, val_ratio: float = 0.15, save_checkpoints: bool = False, threshold_level: float = 0.95, embed_dim: Optional[int] = None, layer_numb: int = 2, lstm_n_layers: int = 1, lstm_hid_dim: int = 150, recon_n_layers: int = 1, recon_hid_dim: int = 150, dropout: float = 0.2)[source]#

Bases: BaseAnomalyDetection

Stgat-Mad was presented at ICASSP 2022: “Stgat-Mad : Spatial-Temporal Graph Attention Network For Multivariate Time Series Anomaly Detection”. https://ieeexplore.ieee.org/abstract/document/9747274/

Parameters:
  • window_size (int) – The window size to train the model.

  • stride (int) – The time interval between first points of consecutive sliding windows in training.

  • batch_size (int) – The batch size to train the model.

  • lr (float) – The larning rate to train the model.

  • num_epochs (float) – The number of epochs to train the model.

  • device (str) – The name of a device to train the model. cpu and cuda are possible.

  • verbose (bool) – If true, show the progress bar in training.

  • name (str) – The name of the model for artifact storing.

  • random_seed (int) – Seed for random number generation to ensure reproducible results.

  • val_ratio (float) – Proportion of the dataset used for validation, between 0 and 1.

  • save_checkpoints (bool) – If true, store checkpoints.

  • threshold_level (float) – Takes a value from 0 to 1. It specifies the quantile in the distribution of errors on the training dataset at which the threshold value is set.

  • embed_dim (int) – Embedding dimension.

  • layer_numb (int) – Number of layers.

  • lstm_n_layers (int) – Number of LSTM layers.

  • lstm_hid_dim (int) – Hidden dimension of LSTM layers.

  • recon_n_layers (int) – Number of reconstruction layers.

  • recon_hid_dim (int) – Hidden dimension of reconstruction layers.

  • dropout (float) – The rate of dropout.

GSL-GNN#

class ice.anomaly_detection.models.gnn.GSL_GNN(window_size: int, stride: int = 1, batch_size: int = 128, lr: float = 0.001, num_epochs: int = 10, device: str = 'cpu', verbose: bool = False, name: str = 'gnn_anomaly_detection', random_seed: int = 42, val_ratio: float = 0.15, save_checkpoints: bool = False, threshold_level: float = 0.95, alpha: float = 0.2, k: Optional[int] = None)[source]#

Bases: BaseAnomalyDetection

GNN autoencoder consists of encoder with graph convolutional layers and MLP decoder parts. The graph describing the data is constructed during the training process using trainable parameters.

Parameters:
  • window_size (int) – The window size to train the model.

  • stride (int) – The time interval between first points of consecutive sliding windows in training.

  • batch_size (int) – The batch size to train the model.

  • lr (float) – The larning rate to train the model.

  • num_epochs (float) – The number of epochs to train the model.

  • device (str) – The name of a device to train the model. cpu and cuda are possible.

  • verbose (bool) – If true, show the progress bar in training.

  • name (str) – The name of the model for artifact storing.

  • random_seed (int) – Seed for random number generation to ensure reproducible results.

  • val_ratio (float) – Proportion of the dataset used for validation, between 0 and 1.

  • save_checkpoints (bool) – If true, store checkpoints.

  • threshold_level (float) – Takes a value from 0 to 1. It specifies the quantile in the distribution of errors on the training dataset at which the threshold value is set.

  • alpha (float) – Saturation rate for adjacency matrix.

  • k (int) – Limit on the number of edges in the adjacency matrix.