Skip to the content.

Dataset description

Participants are provided with a dataset consisting of EEG recordings from subjects exposed to various musical pieces that evoke specific emotions. The dataset includes EEG recordings with corresponding musical stimuli (Spotify ID) and annotated emotional states.

Data Collection.

We measure EEG data in a controlled lab, using a 32-channel EPOC Flex EEG recording system with saline sensors. The data is measured at a sampling rate of 128 Hz. All electrodes are placed according to international 10-20 standards. The dataset contains data from 34 young subjects. Each subject listened to 16 trials, each of approximately 90 seconds in length. 8 trials were chosen for each subject from his personal music. Other 8 trials were randomly selected among other participants’ preferences. The emotion felt in each trial is self-assessed after the listening phase, using the Geneva Emotion Wheel (https://www.unige.ch/cisa/gew). The wheel is divided into 20 emotion families and are placed in a circle according to a valence-power reference system:

Training Dataset.

The training set contains 294 trials, with data coming from 26 subjects. For each subject we have about 12 labelled trials, each of approximately 90 seconds.

Each trial comes with the following information:

For labels we divided the above-mentioned circle into 4 groups that correspond to our emotional states:

Test Dataset.

The test set consists of two parts: held-out trials and held-out subjects. Data will be released to participants without the label (i.e. emotion) and subject information.

The first part will be used in both task 1 and task 2, while the second only in task2:

For task 1 the held-out trials test set also contains 44 additional trials either with or without stimulation information.

Preprocessing

We provide two versions of the dataset.

The first data version is the raw EEG data. The second version of the dataset has been preprocessed in EEGLab.

Particularly, a FIR filter was applied between 0.5 and 40 Hz, while artifacts were removed using Independent Component Analysis (ICA). This version of the dataset is refered as pruned in the dataset release.

Challenge participants are free to perform their own preprocessing on both versions of the datasets.

Files structure

Each EEG file is placed in the tree structure according to the data type - raw or pruned - and the split to which it belongs - train or held-out trials or held-out subjects test set.

Each file is named using the convention ID_eeg.EXT where ID is the identifier of the trial and EXT is fif for both raw and pruned data.

Supposing the root name is dataset we have the following structure

dataset
├── splits_subject_identification.json
├── splits_emotion_recognition.json
├── raw
│   ├── train
│   │   ├── 1135903657_eeg.fif
│   │   └── ...
│   ├── test_trial 
│   └── test_subject       
└── pruned
    ├── train
    │   ├── 1135903657_eeg.fif
    │   └── ...
    ├── test_trial 
    └── test_subject

Although the dataset structure is consistent across both tasks, not all files within the test sets folder will be utilized for each task at inference time.

The list of IDs that will serve as the test set is specified in two separate JSON files:

Predictions made for IDs not included in these JSON files will be disregarded during evaluation.