Skip to content

BIDS

If you are following the project workflow, by now you should have a project set up on the cluster and you might be in the process of organizing your raw data. The next step is to convert your data into BIDS format.

BIDS is an acronym that stands for Brain Imaging Data Structure. It is a standardized format for the organization and the description of neuroscientific data (see the schema below).
In a nutshell, BIDS provides a framework for standardizing file names, folder structures and metadata for different kinds of data such as neuroimaging, behavioral, physiological etc.
We recommend you to go through the BIDS specification page and BIDS Starter Kit pages to understand more in depth the concepts and the phylosophy behind BIDS.

.
├── CHANGES
├── dataset_description.json
├── participants.tsv
├── README
└── sub-01
    ├── anat
       ├── sub-01_T2starw.nii.gz
       └── sub-01_T1W.nii.gz
    ├── func
       ├── sub-01_task-mybrillianttask_bold.nii.gz
       └── sub-01_task-mybrillianttask_events.tsv
    └── task-mybrillianttask_bold.json

Why BIDS?

  • It provides a logical and intelligible structure for data of different kinds.
  • It makes your data clear and reusable by other researchers, thus increasing the chance of collaborations and the chance of being cited.
  • It helps you to organize your data without coming up by yourself with an efficient and flexible structure.
  • It has been adopted by all the major tools for neuroscientific analysis.
  • It improves science and also your science.

How to BIDS

In the following tabs, we explain how to convert your unprocessed raw data into BIDS for different modalities. We focus only on specific tools that are either widely used or provide some advantages for the users.

Before you start converting your data we strongly recommend to go through the BIDS specification page relative to your modalities of interest. Make sure you read the common principles to grasp the jargon used for BIDS.

Using heudiconv to convert fMRI data to BIDS

Heudiconv is a python library which helps you to convert f/MRI data to BIDS with little effort. You do not need to be very proficient in python to use it, although some basics would help you. You can either follow this brief tutorial or use the tutorials provided by the heudiconv developers and users.

Step 1

In order to convert correctly your data, heudiconv leverages on so called heuristics.
Heuristics are rules that you provide to the software to specify which files you would like to convert.

You must specify such rules within a python script that is then passed to heudiconv. You do not need to create this script from scratch, you can generate a template by running the following line on the terminal:

heudiconv --files your_dicom_file -o bids_output_dir -f convertall -s number_of_the_subject -c none

The previous command does not convert any of your data, but it creates, within the output directory, a hidden folder, .heudiconv, containing a template of the script heuristic.py and a dicominfo.tsv which lists all the information about the dicom files necessary for specifing the heuristics.

Check your dicoms

It is a good practice to extract your dicom files and check by yourself the file names to assess that they match with the names stored by heudiconv in the dicominfo.tsv.

Delete .heudiconv

Before running the actual BIDS convertion, make sure to delete the .heudiconv directory, otherwise the conversion will not produce a correct BIDS dataset.

Step 2

The following example represents a heuristic.py script taylored for a specific dataset. You can also use this template and adapt it to your own dataset.

Heudiconv heuristics
from __future__ import annotations

import logging
from typing import Optional

from heudiconv.utils import SeqInfo

lgr = logging.getLogger("heudiconv")


def create_key(
    template: Optional[str],
    outtype: tuple[str, ...] = ("nii.gz",),
    annotation_classes: None = None,
) -> tuple[str, tuple[str, ...], None]:
    """
    This function allows you to create
    the keys necessary to extract the files
    that you want to convert.
    """

    if template is None or not template:
        raise ValueError("Template must be a valid format string")
    return (template, outtype, annotation_classes)


def infotodict(
    seqinfo: list[SeqInfo],
) -> dict[tuple[str, tuple[str, ...], None], list[str]]:
    """Heuristic evaluator for determining which runs belong where

    allowed template fields - follow python string module:

    item: index within category
    subject: participant id
    seqitem: run number during scanning
    subindex: sub index within group
    """

    # Create keys: This section is specific to each project
    # you need to create your own keys and the info dictionary

    t1w = create_key("sub-{subject}/anat/sub-{subject}_acq-mprage_T1w")
    task = create_key("sub-{subject}/func/sub-{subject}_task-foodchoice_run-{item:01d}_bold")
    fmap_mag = create_key("sub-{subject}/fmap/sub-{subject}_acq-grefieldmapping_magnitude")
    fmap_phasediff = create_key("sub-{subject}/fmap/sub-{subject}_acq-grefieldmapping_phasediff")
    epi = create_key("sub-{subject}/fmap/sub-{subject}_dir-AP-run-{item:01d}_epi")


    # the dictionary info must contain your keys
    info: dict[tuple[str, tuple[str, ...], None], list[str]] = { 
            t1w: [],
            task: [],
            fmap: [],
            epi: []
            }

    for s in seqinfo:
        """
        The namedtuple `s` contains the following fields:

        * total_files_till_now
        * example_dcm_file
        * series_id
        * dcm_dir_name
        * unspecified2
        * unspecified3
        * dim1
        * dim2
        * dim3
        * dim4
        * TR
        * TE
        * protocol_name
        * is_motion_corrected
        * is_derived
        * patient_id
        * study_description
        * referring_physician_name
        * series_description
        * image_type
        """

        # The following conditional statements constitute
        # the core of the heuristics. 
        # Modify them according to the dicom file names and 
        # other features (like data size) that 
        # can help select the correct files

    if "t1_mpr" in s.protocol_name:
        info[t1w].append(s.series_id)
    elif 'fMRI_SMS2_2.2iso_66sl_TR2_run' in s.protocol_name and s.dim4 == 200:
        info[task].append(s.series_id)
    elif 'field_mapping' in s.protocol_name and s.dim3 == 132:
        info[fmap_mag].append(s.series_id)
    elif 'field_mapping' in s.protocol_name and s.dim3 == 66:
        info[fmap_phasediff].append(s.series_id)
    elif 'EPI' in s.protocol_name:
        info[epi].append(s.series_id)


    return info

The essential sections of the script that you need to modify are the keys and the actual heuristics, keep in mind that you have to modify keys and heuristics according to your dataset:

  • Keys: In the dicominfo.tsv (in .heudiconv) file you find the exact naming of the dicom files, you need to go through the list of files and decide which of them are important for your analysis.
    Once you have made your decision, you can create the proper keys by following the naming conventions explained in the BIDS tutorial. Keys create the BIDS compliant file names and structure.

    • Start with your anatomical images which all go into the anat folder:

      t1w = create_key("sub-{subject}/anat/sub-{subject}_acq-mprage_T1w")
      
      • The sub-{subject}/ folder where {subject} is a placeholder that heudiconv fills in with the correct subject ID.
      • The folder anat/ is dedicated to all the anatomical images.
      • In acq-<label> you provide the type of acquisition e.g. mprage (this is not mandatory).
      • The Tw1 suffix is the standard name for T1 weighted images. Keep in mind that you might have have different anatomical images.
    • Set up the functional images which go into the func folder.

      task = create_key("sub-{subject}/func/sub-{subject}_task-foodchoice_run-{item:01d}_bold")
      
      • As for the anatomical image you need to add the sub-{subject}/ folder.
      • The func folder is dedicated to the functional images.
      • The task-<taskname> indicates the name of the task.
      • _run- is just a place holder for the run number that heudiconv fills in automatically.
      • The bold suffix indicates the type of functional file we are dealing with.
    • Set up the field maps images which go into the fmap folder. The previous anatomical and functional images are rather standard for fMRI experiments, the field maps can be different and sometimes absent.

      fmap_mag = create_key("sub-{subject}/fmap/sub-{subject}_acq-grefieldmapping_magnitude")
      fmap_phasediff = create_key("sub-{subject}/fmap/sub-{subject}_acq-grefieldmapping_phasediff")
      epi = create_key("sub-{subject}/fmap/sub-{subject}_dir-AP-run-{item:01d}_epi")
      
      • As for the other images you set up the subject folder sub-{subject}.
      • The fmap folder is dedicated to the fieldmap related images.
      • In this particular case for fmap_mag and fmap_phasediff we specified the _acq-<label>_ which changes for different cases.
      • The suffixes magnitude and phasediff and epi are mandatory for these types of fieldmap images, please refer to the official page for all the possible cases.
      • The _dir-AP_ indicates the Phase-Encoding direction of the EPI image, please refer to the official page for further information.
  • Heuristics: Now we need to create a few basic rules to tell heudiconv which files to extract.

    In the following example we simply match the name of a dicom we would like to include in the BIDS conversion with the file names contained in s.protocol_name, if there is a match we add the file in the info dictionary.

    if "t1_mpr" in s.protocol_name: # name matching
        info[t1w].append(s.series_id) # append to the dictionary
    
    Sometimes a file name matching is not enough to extract a the correct file, so we can combine more rules like the following cases, in which we match the file name and the file size.

    elif 'fMRI_SMS2_2.2iso_66sl_TR2_run' in s.protocol_name and s.dim4 == 200:
        info[task].append(s.series_id)
    
    elif 'field_mapping' in s.protocol_name and s.dim3 == 132:
        info[fmap_mag].append(s.series_id)
    elif 'field_mapping' in s.protocol_name and s.dim3 == 66:
        info[fmap_phasediff].append(s.series_id)
    

    As you might guess you need to create rules on a case by case basis.

Step 3

Now you just need to run heudiconv for the actual conversion with the following line, please for further information about heudiconv specs visit the software page or type heudiconv --help in the terminal:

heudiconv --files <file_containing_dicoms> -o <directory_for_bids/> -f <heuristic.py> -s <subject_number> -c dcm2niix -b --overwrite

The following tree represents the result of the BIDS conversion.

    .
    ├── CHANGES
    ├── dataset_description.json
    ├── participants.json
    ├── participants.tsv
    ├── README
    ├── scans.json
    ├── sub-01
    │   ├── anat
    │   │   ├── sub-01_acq-mprage_T1w.json
    │   │   └── sub-01_acq-mprage_T1w.nii.gz
    │   ├── fmap
    │   │   ├── sub-01_acq-grefieldmapping_magnitude1.json
    │   │   ├── sub-01_acq-grefieldmapping_magnitude1.nii.gz
    │   │   ├── sub-01_acq-grefieldmapping_phasediff.json
    │   │   ├── sub-01_acq-grefieldmapping_phasediff.nii.gz
    │   │   ├── sub-01_dir-AP_run-1_epi.json
    │   │   └── sub-01_dir-AP_run-1_epi.nii.gz
    │   ├── func
    │   │   ├── sub-01_task-foodchoice_run-1_bold.json
    │   │   ├── sub-01_task-foodchoice_run-1_bold.nii.gz
    │   │   ├── sub-01_task-foodchoice_run-1_events.tsv
    │   │   ├── sub-01_task-foodchoice_run-2_bold.json
    │   │   ├── sub-01_task-foodchoice_run-2_bold.nii.gz
    │   │   ├── sub-01_task-foodchoice_run-2_events.tsv
    │   │   ├── sub-01_task-foodchoice_run-3_bold.json
    │   │   ├── sub-01_task-foodchoice_run-3_bold.nii.gz
    │   │   ├── sub-01_task-foodchoice_run-3_events.tsv
    │   │   ├── sub-01_task-foodchoice_run-4_bold.json
    │   │   ├── sub-01_task-foodchoice_run-4_bold.nii.gz
    │   │   ├── sub-01_task-foodchoice_run-4_events.tsv
    │   │   ├── sub-01_task-foodchoice_run-5_bold.json
    │   │   ├── sub-01_task-foodchoice_run-5_bold.nii.gz
    │   │   └── sub-01_task-foodchoice_run-5_events.tsv
    │   └── sub-01_scans.tsv
    └── task-foodchoice_bold.json

Note that heudiconv creates also events.tsv, tabular files containing the header: onset duration trial_type response_time stim_file. Your task is to complete these files with the logs acquired during the neural recording according to the bids conventions. Additionally you should create events.json files which contain the description of each column included in events.tsv.

Given the variety of EEG data formats, BIDS recommends two main formats (we also strongly suggest to use these two formats):

  • European format: .edf
  • BrainVision format: .vhdr, .vmrk, .eeg

Other BIDS accepted formats, although not recommended by BIDS, are Biosemi: .bdf and EEGLAB: .fdt, .set.

EEG conversion with Fieldtrip (Matlab based)

The following code has been adapted from an example provided in the fieldtrip page, for further information read the official documentation.

  • sub is the subject number as integer, runs is a vector containing the run numbers.
  • cfg: It is the general struct used by fieldtrip to provide the information to data2bids that is the actual fieldtrip converter. cfg offers the possibility to provide the information also for the metadata, please check the official documentation.
  • cfg.method: This field defines the kind of action you want to take with the converter. In case your eeg data type is not recommedended by BIDS, meaning is neither the BrainVision nor the European format, it is set to 'convert' and the data will be converted to the BrainVision format. Otherwise it is set to 'copy' and the data will be only copied and restructured according to BIDS.

This function considers eeg data split into different runs. Often raw data consist of one single recording per subject.

Fieldtrip bids conversion
function cfg = eeg2bids(sub, runs, datadir, ext)
%EEG2BIDS bids fieldtrip convertion 
%   sub  = integer 
%   runs = vector od integers 
%   datadir = data directory
%   ext     = data extension (.bdf, .edf etc...)


for run = runs

    cfg = [];
    if strcmp(ext, '.vhdr') | strcmp(ext, '.edf')
        cfg.method = 'copy';
    else
        cfg.method = 'convert'; % this method specifies whether you want to convert data or copy and restructure them
    end
    cfg.datatype  = 'eeg';

    % specify the input file name
    cfg.dataset   = fullfile(datadir, join(['sub-', num2str(sub, '%02.f'), '_run_', num2str(run), ext], ''));

    % specify the output directory
    cfg.bidsroot  = 'rawdata';
    cfg.run       = run;
    cfg.sub       = num2str(sub, '%02.f');

    % specify the information for the scans.tsv file
    cfg.scans.acq_time = datestr(now, 'yyyy-mm-ddThh:MM:SS');

    % specify some general information that will be added to the eeg.json file
    cfg.InstitutionName             = 'OvGU';
    cfg.InstitutionalDepartmentName = 'Intitute of Psychology';
    cfg.InstitutionAddress          = 'Intitute of Psychology, Universitaetsplatz, Magdeburg';

    % provide the task name (used in the BIDS files) and long description of the task
    cfg.TaskName        = 'guessmeaning';
    cfg.TaskDescription = 'Subjects were asked to derive the meaning of pseudowords from the context';

    % these are EEG specific
    cfg.eeg.PowerLineFrequency = 50;   % insert the power line frequency
    cfg.eeg.EEGReference       = 'average';

    data2bids(cfg);
end
end

By running the previous script you should obtain the following BIDS dataset

rawdata
├── dataset_description.json
├── participants.tsv
├── README
└── sub-20
    ├── eeg
    │   ├── sub-20_task-guessmeaning_run-1_channels.tsv
    │   ├── sub-20_task-guessmeaning_run-1_eeg.eeg
    │   ├── sub-20_task-guessmeaning_run-1_eeg.json
    │   ├── sub-20_task-guessmeaning_run-1_eeg.vhdr
    │   ├── sub-20_task-guessmeaning_run-1_eeg.vmrk
    │   ├── sub-20_task-guessmeaning_run-1_events.tsv
    │   ├── sub-20_task-guessmeaning_run-2_channels.tsv
    │   ├── sub-20_task-guessmeaning_run-2_eeg.eeg
    │   ├── sub-20_task-guessmeaning_run-2_eeg.json
    │   ├── sub-20_task-guessmeaning_run-2_eeg.vhdr
    │   ├── sub-20_task-guessmeaning_run-2_eeg.vmrk
    │   └── sub-20_task-guessmeaning_run-2_events.tsv
    └── sub-20_scans.tsv

EEG conversion with MNE (Python based)

Behavioral data

BIDS for behavioral data follows conventions similar to those used for the events.tsv files with neural recordings. We recommend to read carefully the description provided in the BIDS website.

Behavioral data general rules

  • What kind of data: Any behavioral measures (with no concurrent neural recordings).
  • File types: Tabular data as .tsv with the following header trial response response_time stim_file (further entries can be added please refer to the BIDS page above). Metadata as .json files.
  • Where: Data must go under the <beh>/ folder.
  • How: Data must have the following format:
    <matches>_beh.tsv
    <matches>_beh.json
    where <matches> can be sub-012_task-mytaskname. In case you have multiple sessions and runs your file might be:sub-012_ses-1_task-mytaskname_run-1-beh.tsv and the relative metadata sub-012_ses-1_task-mytaskname_run-1-beh.json.

Save your behavioral data already in BIDS

It is convenient to save your raw behavioral files as BIDS compliant.

Given the large variety of structures and formats that researchers use for their behavioral experiments, we do not provide an example.

Physiological data

We recommend to read the dedicated page in the BIDS website.

Physio data general rules

  • What kind of data: Cardiac, respiratory and other continuous recordings
  • File types: Continuous recordings as compressed files .tsv.gz (no header). Metadata as .json files.
  • Where: Data can go under different <datatype>/ folders, such as func, anat, dwi, meg, eeg, ieeg, or beh
  • How: Data must have the following format:
    <matches>[_recording-<label>]_physio.tsv.gz <matches>[_recording-<label>]_physio.json where <matches> can be sub-012_task-mytaskname and _recoding-<label> can be used to distinguish between two or more type of recordings e.g. recording-breathing and recording-eyetracking. In case you have multiple sessions and runs your file might be: sub-012_ses-1_task-mytaskname_run-1-breathing_physio.tsv.gz and the relative metadata sub-012_ses-1_task-mytaskname_run-1-breathing_physio.json.

BIDS for eye-tracking is currently under development, you find more about the BEP20 proposal in the draft of the paper and int the draft of the BIDS official web page.

In case you want to convert your eye-tracking data using already the structure proposed in the BEP20, the best way is to use the python based library eye2bids. It works quite well with data collected with all the major eye-trackers. The library is already available on Cecile.

BEP20 is still under development

Keep in mind the BEP20 is still under development, thus something might change in the final version.

Populating metadata files

TODO

Validate your BIDS dataset

After you have accomplished the BIDS conversion, the last step is to validate your dataset. Keep in mind that this step is very important to ensure that your dataset actually complies with BIDS.

The validation can be done on Cecile by using bids-validator. The bids-validator is installed on Cecile, you can use it as any other software, if you do not know how to use software on Cecile, please refer to the software page.

Periodical validation checks

In order to keep the minimal standards on Cecile, there will be periodical BIDS validation on your dataset, in case a dataset is not valid you will receive an email asking to make your dataset BIDS compliant.