.. _custom_data: ########### Custom Data ########### Users can independently add datasets according to the inheritance scheme. The user dataset should be described by a class that inherits from the base dataset class :ref:`base_dataset`. The class should implement the method **set_name_public_link**. The method does not take any parameters other than the standard self-parameter. The method should assign the name of the dataset to the class attribute **name** and assign a download link to the class attribute **public_link**. Currently, only download links from the cloud service `Yandex Disk `_ are supported. The download link should point to a **zip** archive that contains files specific to each task. Fault Diagnosis """"""""""""""" Dataset should be provided as ZIP archive with a set of files in Comma-Separated Values (CSV) format. The set of files includes: 1. df.csv - a file with a table of sensor values at each point in time. The rows contain a sequence of values. The first two columns of the table are run_id and sample. The run_id column contains the run identifiers of the industrial process. The sample column contains the sequence element numbers. All other columns contain sensor values. 2. target.csv - file with the table of industrial process states. Table columns: run_id, sample, target. The run_id column contains the run identifier of the industrial process. The sample column contains the number of the sequence element. The target column contains the state number of the industrial process, where 0 is the normal state. 3. train_mask.csv - file with the table of the mask of the training sample. Columns of the table: run_id, sample, train_mask. The run_id column contains the identifier of the industrial process start. The sample column contains the sequence element number. The train_mask column contains the value of the training sample attribute, where 0 is not a training sample, 1 is a training sample. Anomaly Detection """"""""""""""""" Dataset should be provided in the same way as for the Fault Diagnosis task. RUL """ The data should be provided as a ZIP archive containing a set of files in Comma-Separated Values (CSV) format. The set of files includes: File train.csv. Rows contain a sequence of values. The first three columns of the table are: run_id, sample, and rul. The run_id column contains identifiers of the industrial process run. The sample column contains the sequence element numbers. The rul column contains the time until the device fails. All other columns contain sensor values. File test.csv. Rows contain a sequence of values. The first three columns of the table are: run_id, sample, and rul. The run_id column contains identifiers of the industrial process run. The sample column contains the sequence element numbers. The rul column contains the time until the device fails. All other columns contain sensor values. HI "" The data should be provided as a ZIP archive containing a set of files in Comma-Separated Values (CSV) format. The set of files includes: case_n.csv — a set of n files with a table of sensor values at each moment in time, each file representing the results of measuring the degradation of a device within a single process. Rows contain a sequence of values. Table columns contain characteristics of materials and processes (unchanged within the process), a set of sensor readings over time, and the measured degradation value. The column run_id contains identifiers of the industrial process run. Additional preprocessing steps """""""""""""""""""""""""""""" In the case of a structurally complex dataset or its parts that need to be processed simultaneously according to a specific rule, it is permissible to override the **_load** method with the attributes **num_chunks** and **force_download** in the user class that inherits from :ref:`base_dataset`. The **num_chunks** attribute is responsible for the number of loaded data fragments, and **force_download** controls the mandatory download of the file from the cloud even if the directory contains a folder with the data.