BIDS
If you are following the project workflow, by now you should have a project set up on the cluster and you might be in the process of organizing your raw data. The next step is to convert your data into BIDS format.
BIDS is an acronym that stands for Brain Imaging Data Structure. It is a standardized format for the organization and the description of neuroscientific data (see the schema below).
In a nutshell, BIDS provides a framework for standardizing file names, folder structures and metadata for different kinds of data such as neuroimaging, behavioral, physiological etc.
We recommend you to go through the BIDS specification page and BIDS Starter Kit pages to understand more in depth the concepts and the phylosophy behind BIDS.
.
├── CHANGES
├── dataset_description.json
├── participants.tsv
├── README
└── sub-01
├── anat
│ ├── sub-01_T2starw.nii.gz
│ └── sub-01_T1W.nii.gz
├── func
│ ├── sub-01_task-mybrillianttask_bold.nii.gz
│ └── sub-01_task-mybrillianttask_events.tsv
└── task-mybrillianttask_bold.json
Why BIDS?
- It provides a logical and intelligible structure for data of different kinds.
- It makes your data clear and reusable by other researchers, thus increasing the chance of collaborations and the chance of being cited.
- It helps you to organize your data without coming up by yourself with an efficient and flexible structure.
- It has been adopted by all the major tools for neuroscientific analysis.
- It improves science and also your science.
How to BIDS
In the following tabs, we explain how to convert your unprocessed raw data into BIDS for different modalities. We focus only on specific tools that are either widely used or provide some advantages for the users.
Before you start converting your data we strongly recommend to go through the BIDS specification page relative to your modalities of interest. Make sure you read the common principles to grasp the jargon used for BIDS.
Using heudiconv to convert fMRI data to BIDS
Heudiconv is a python library which helps you to convert f/MRI data to BIDS with little effort. You do not need to be very proficient in python to use it, although some basics would help you. You can either follow this brief tutorial or use the tutorials provided by the heudiconv developers and users.
Step 1
In order to convert correctly your data, heudiconv leverages on so called heuristics.
Heuristics are rules that you provide to the software to specify which files you would like to convert.
You must specify such rules within a python script that is then passed to heudiconv. You do not need to create this script from scratch, you can generate a template by running the following line on the terminal:
The previous command does not convert any of your data, but it creates, within the output directory, a hidden folder, .heudiconv
, containing a template of the script heuristic.py
and a dicominfo.tsv
which lists all the information about the dicom files necessary for specifing the heuristics.
Check your dicoms
It is a good practice to extract your dicom files and check by yourself the file names to assess that they match with the names stored by heudiconv in the dicominfo.tsv
.
Delete .heudiconv
Before running the actual BIDS convertion, make sure to delete the .heudiconv directory, otherwise the conversion will not produce a correct BIDS dataset.
Step 2
The following example represents a heuristic.py
script taylored for a specific dataset. You can also use this template and adapt it to your own dataset.
from __future__ import annotations
import logging
from typing import Optional
from heudiconv.utils import SeqInfo
lgr = logging.getLogger("heudiconv")
def create_key(
template: Optional[str],
outtype: tuple[str, ...] = ("nii.gz",),
annotation_classes: None = None,
) -> tuple[str, tuple[str, ...], None]:
"""
This function allows you to create
the keys necessary to extract the files
that you want to convert.
"""
if template is None or not template:
raise ValueError("Template must be a valid format string")
return (template, outtype, annotation_classes)
def infotodict(
seqinfo: list[SeqInfo],
) -> dict[tuple[str, tuple[str, ...], None], list[str]]:
"""Heuristic evaluator for determining which runs belong where
allowed template fields - follow python string module:
item: index within category
subject: participant id
seqitem: run number during scanning
subindex: sub index within group
"""
# Create keys: This section is specific to each project
# you need to create your own keys and the info dictionary
t1w = create_key("sub-{subject}/anat/sub-{subject}_acq-mprage_T1w")
task = create_key("sub-{subject}/func/sub-{subject}_task-foodchoice_run-{item:01d}_bold")
fmap_mag = create_key("sub-{subject}/fmap/sub-{subject}_acq-grefieldmapping_magnitude")
fmap_phasediff = create_key("sub-{subject}/fmap/sub-{subject}_acq-grefieldmapping_phasediff")
epi = create_key("sub-{subject}/fmap/sub-{subject}_dir-AP-run-{item:01d}_epi")
# the dictionary info must contain your keys
info: dict[tuple[str, tuple[str, ...], None], list[str]] = {
t1w: [],
task: [],
fmap: [],
epi: []
}
for s in seqinfo:
"""
The namedtuple `s` contains the following fields:
* total_files_till_now
* example_dcm_file
* series_id
* dcm_dir_name
* unspecified2
* unspecified3
* dim1
* dim2
* dim3
* dim4
* TR
* TE
* protocol_name
* is_motion_corrected
* is_derived
* patient_id
* study_description
* referring_physician_name
* series_description
* image_type
"""
# The following conditional statements constitute
# the core of the heuristics.
# Modify them according to the dicom file names and
# other features (like data size) that
# can help select the correct files
if "t1_mpr" in s.protocol_name:
info[t1w].append(s.series_id)
elif 'fMRI_SMS2_2.2iso_66sl_TR2_run' in s.protocol_name and s.dim4 == 200:
info[task].append(s.series_id)
elif 'field_mapping' in s.protocol_name and s.dim3 == 132:
info[fmap_mag].append(s.series_id)
elif 'field_mapping' in s.protocol_name and s.dim3 == 66:
info[fmap_phasediff].append(s.series_id)
elif 'EPI' in s.protocol_name:
info[epi].append(s.series_id)
return info
The essential sections of the script that you need to modify are the keys and the actual heuristics, keep in mind that you have to modify keys and heuristics according to your dataset:
-
Keys: In the
dicominfo.tsv
(in.heudiconv
) file you find the exact naming of the dicom files, you need to go through the list of files and decide which of them are important for your analysis.
Once you have made your decision, you can create the proper keys by following the naming conventions explained in the BIDS tutorial. Keys create the BIDS compliant file names and structure.-
Start with your anatomical images which all go into the
anat
folder:- The
sub-{subject}/
folder where{subject}
is a placeholder that heudiconv fills in with the correct subject ID. - The folder
anat/
is dedicated to all the anatomical images. - In
acq-<label>
you provide the type of acquisition e.g.mprage
(this is not mandatory). - The
Tw1
suffix is the standard name for T1 weighted images. Keep in mind that you might have have different anatomical images.
- The
-
Set up the functional images which go into the
func
folder.- As for the anatomical image you need to add the
sub-{subject}/
folder. - The
func
folder is dedicated to the functional images. - The
task-<taskname>
indicates the name of the task. _run-
is just a place holder for the run number that heudiconv fills in automatically.- The
bold
suffix indicates the type of functional file we are dealing with.
- As for the anatomical image you need to add the
-
Set up the field maps images which go into the
fmap
folder. The previous anatomical and functional images are rather standard for fMRI experiments, the field maps can be different and sometimes absent.fmap_mag = create_key("sub-{subject}/fmap/sub-{subject}_acq-grefieldmapping_magnitude") fmap_phasediff = create_key("sub-{subject}/fmap/sub-{subject}_acq-grefieldmapping_phasediff") epi = create_key("sub-{subject}/fmap/sub-{subject}_dir-AP-run-{item:01d}_epi")
- As for the other images you set up the subject folder
sub-{subject}
. - The
fmap
folder is dedicated to the fieldmap related images. - In this particular case for
fmap_mag
andfmap_phasediff
we specified the_acq-<label>_
which changes for different cases. - The suffixes
magnitude
andphasediff
andepi
are mandatory for these types of fieldmap images, please refer to the official page for all the possible cases. - The
_dir-AP_
indicates the Phase-Encoding direction of theEPI
image, please refer to the official page for further information.
- As for the other images you set up the subject folder
-
-
Heuristics: Now we need to create a few basic rules to tell heudiconv which files to extract.
In the following example we simply match the name of a
dicom
we would like to include in the BIDS conversion with the file names contained ins.protocol_name
, if there is a match we add the file in theinfo
dictionary.Sometimes a file name matching is not enough to extract a the correct file, so we can combine more rules like the following cases, in which we match the file name and the file size.if "t1_mpr" in s.protocol_name: # name matching info[t1w].append(s.series_id) # append to the dictionary
elif 'fMRI_SMS2_2.2iso_66sl_TR2_run' in s.protocol_name and s.dim4 == 200: info[task].append(s.series_id)
elif 'field_mapping' in s.protocol_name and s.dim3 == 132: info[fmap_mag].append(s.series_id) elif 'field_mapping' in s.protocol_name and s.dim3 == 66: info[fmap_phasediff].append(s.series_id)
As you might guess you need to create rules on a case by case basis.
Step 3
Now you just need to run heudiconv for the actual conversion with the following line, please for further information about heudiconv specs visit the software page or type heudiconv --help
in the terminal:
heudiconv --files <file_containing_dicoms> -o <directory_for_bids/> -f <heuristic.py> -s <subject_number> -c dcm2niix -b --overwrite
The following tree represents the result of the BIDS conversion.
.
├── CHANGES
├── dataset_description.json
├── participants.json
├── participants.tsv
├── README
├── scans.json
├── sub-01
│ ├── anat
│ │ ├── sub-01_acq-mprage_T1w.json
│ │ └── sub-01_acq-mprage_T1w.nii.gz
│ ├── fmap
│ │ ├── sub-01_acq-grefieldmapping_magnitude1.json
│ │ ├── sub-01_acq-grefieldmapping_magnitude1.nii.gz
│ │ ├── sub-01_acq-grefieldmapping_phasediff.json
│ │ ├── sub-01_acq-grefieldmapping_phasediff.nii.gz
│ │ ├── sub-01_dir-AP_run-1_epi.json
│ │ └── sub-01_dir-AP_run-1_epi.nii.gz
│ ├── func
│ │ ├── sub-01_task-foodchoice_run-1_bold.json
│ │ ├── sub-01_task-foodchoice_run-1_bold.nii.gz
│ │ ├── sub-01_task-foodchoice_run-1_events.tsv
│ │ ├── sub-01_task-foodchoice_run-2_bold.json
│ │ ├── sub-01_task-foodchoice_run-2_bold.nii.gz
│ │ ├── sub-01_task-foodchoice_run-2_events.tsv
│ │ ├── sub-01_task-foodchoice_run-3_bold.json
│ │ ├── sub-01_task-foodchoice_run-3_bold.nii.gz
│ │ ├── sub-01_task-foodchoice_run-3_events.tsv
│ │ ├── sub-01_task-foodchoice_run-4_bold.json
│ │ ├── sub-01_task-foodchoice_run-4_bold.nii.gz
│ │ ├── sub-01_task-foodchoice_run-4_events.tsv
│ │ ├── sub-01_task-foodchoice_run-5_bold.json
│ │ ├── sub-01_task-foodchoice_run-5_bold.nii.gz
│ │ └── sub-01_task-foodchoice_run-5_events.tsv
│ └── sub-01_scans.tsv
└── task-foodchoice_bold.json
Note that heudiconv creates also events.tsv
, tabular files containing the header: onset duration trial_type response_time stim_file
. Your task is to complete these files with the logs acquired during the neural recording according to the bids conventions.
Additionally you should create events.json
files which contain the description of each column included in events.tsv
.
EEG BIDS recommended formats
Given the variety of EEG data formats, BIDS recommends two main formats (we also strongly suggest to use these two formats):
- European format:
.edf
- BrainVision format:
.vhdr
,.vmrk
,.eeg
Other BIDS accepted formats, although not recommended by BIDS, are Biosemi: .bdf
and EEGLAB: .fdt
, .set
.
EEG conversion with Fieldtrip (Matlab based)
The following code has been adapted from an example provided in the fieldtrip page, for further information read the official documentation.
sub
is the subject number as integer,runs
is a vector containing the run numbers.cfg
: It is the generalstruct
used by fieldtrip to provide the information todata2bids
that is the actual fieldtrip converter.cfg
offers the possibility to provide the information also for the metadata, please check the official documentation.cfg.method
: This field defines the kind of action you want to take with the converter. In case your eeg data type is not recommedended by BIDS, meaning is neither the BrainVision nor the European format, it is set to'convert'
and the data will be converted to the BrainVision format. Otherwise it is set to'copy'
and the data will be only copied and restructured according to BIDS.
This function considers eeg data split into different runs. Often raw data consist of one single recording per subject.
function cfg = eeg2bids(sub, runs, datadir, ext)
%EEG2BIDS bids fieldtrip convertion
% sub = integer
% runs = vector od integers
% datadir = data directory
% ext = data extension (.bdf, .edf etc...)
for run = runs
cfg = [];
if strcmp(ext, '.vhdr') | strcmp(ext, '.edf')
cfg.method = 'copy';
else
cfg.method = 'convert'; % this method specifies whether you want to convert data or copy and restructure them
end
cfg.datatype = 'eeg';
% specify the input file name
cfg.dataset = fullfile(datadir, join(['sub-', num2str(sub, '%02.f'), '_run_', num2str(run), ext], ''));
% specify the output directory
cfg.bidsroot = 'rawdata';
cfg.run = run;
cfg.sub = num2str(sub, '%02.f');
% specify the information for the scans.tsv file
cfg.scans.acq_time = datestr(now, 'yyyy-mm-ddThh:MM:SS');
% specify some general information that will be added to the eeg.json file
cfg.InstitutionName = 'OvGU';
cfg.InstitutionalDepartmentName = 'Intitute of Psychology';
cfg.InstitutionAddress = 'Intitute of Psychology, Universitaetsplatz, Magdeburg';
% provide the task name (used in the BIDS files) and long description of the task
cfg.TaskName = 'guessmeaning';
cfg.TaskDescription = 'Subjects were asked to derive the meaning of pseudowords from the context';
% these are EEG specific
cfg.eeg.PowerLineFrequency = 50; % insert the power line frequency
cfg.eeg.EEGReference = 'average';
data2bids(cfg);
end
end
By running the previous script you should obtain the following BIDS dataset
rawdata
├── dataset_description.json
├── participants.tsv
├── README
└── sub-20
├── eeg
│ ├── sub-20_task-guessmeaning_run-1_channels.tsv
│ ├── sub-20_task-guessmeaning_run-1_eeg.eeg
│ ├── sub-20_task-guessmeaning_run-1_eeg.json
│ ├── sub-20_task-guessmeaning_run-1_eeg.vhdr
│ ├── sub-20_task-guessmeaning_run-1_eeg.vmrk
│ ├── sub-20_task-guessmeaning_run-1_events.tsv
│ ├── sub-20_task-guessmeaning_run-2_channels.tsv
│ ├── sub-20_task-guessmeaning_run-2_eeg.eeg
│ ├── sub-20_task-guessmeaning_run-2_eeg.json
│ ├── sub-20_task-guessmeaning_run-2_eeg.vhdr
│ ├── sub-20_task-guessmeaning_run-2_eeg.vmrk
│ └── sub-20_task-guessmeaning_run-2_events.tsv
└── sub-20_scans.tsv
EEG conversion with MNE (Python based)
Behavioral data
BIDS for behavioral data follows conventions similar to those used for the events.tsv
files with neural recordings.
We recommend to read carefully the description provided in the BIDS website.
Behavioral data general rules
- What kind of data: Any behavioral measures (with no concurrent neural recordings).
- File types: Tabular data as
.tsv
with the following headertrial response response_time stim_file
(further entries can be added please refer to the BIDS page above). Metadata as.json
files. - Where: Data must go under the
<beh>/
folder. - How: Data must have the following format:
<matches>_beh.tsv
<matches>_beh.json
where<matches>
can besub-012_task-mytaskname
. In case you have multiple sessions and runs your file might be:sub-012_ses-1_task-mytaskname_run-1-beh.tsv
and the relative metadatasub-012_ses-1_task-mytaskname_run-1-beh.json
.
Save your behavioral data already in BIDS
It is convenient to save your raw behavioral files as BIDS compliant.
Given the large variety of structures and formats that researchers use for their behavioral experiments, we do not provide an example.
Physiological data
We recommend to read the dedicated page in the BIDS website.
Physio data general rules
- What kind of data: Cardiac, respiratory and other continuous recordings
- File types: Continuous recordings as compressed files
.tsv.gz
(no header). Metadata as.json
files. - Where: Data can go under different
<datatype>/
folders, such asfunc
,anat
,dwi
,meg
,eeg
,ieeg
, orbeh
- How: Data must have the following format:
<matches>[_recording-<label>]_physio.tsv.gz
<matches>[_recording-<label>]_physio.json
where<matches>
can besub-012_task-mytaskname
and_recoding-<label>
can be used to distinguish between two or more type of recordings e.g.recording-breathing
andrecording-eyetracking
. In case you have multiple sessions and runs your file might be:sub-012_ses-1_task-mytaskname_run-1-breathing_physio.tsv.gz
and the relative metadatasub-012_ses-1_task-mytaskname_run-1-breathing_physio.json
.
BIDS for eye-tracking is currently under development, you find more about the BEP20 proposal in the draft of the paper and int the draft of the BIDS official web page.
In case you want to convert your eye-tracking data using already the structure proposed in the BEP20, the best way is to use the python
based library eye2bids
. It works quite well with data collected with all the major eye-trackers. The library is already available on Cecile.
BEP20 is still under development
Keep in mind the BEP20 is still under development, thus something might change in the final version.
Populating metadata files
TODO
Validate your BIDS dataset
After you have accomplished the BIDS conversion, the last step is to validate your dataset. Keep in mind that this step is very important to ensure that your dataset actually complies with BIDS.
The validation can be done on Cecile by using bids-validator
. The bids-validator
is installed on Cecile, you can use it as any other software, if you do not know how to use software on Cecile, please refer to the software page.
Periodical validation checks
In order to keep the minimal standards on Cecile, there will be periodical BIDS validation on your dataset, in case a dataset is not valid you will receive an email asking to make your dataset BIDS compliant.