Data analysis

Coding is a prerequisite

Data analysis is inextricably linked to coding, therefore if you are new to coding we recommend you to start reading the section about good coding practices before continuing with the current section.

Data analysis constitutes one of the core aspects of your research and also the phase of your research that requires the most intense management.

How to build a well structured analysis

Analyses can easily become cumbersome and difficult to mantain in ways that we have not predicted. To avoid mistakes and to waste time deciphering our data structure, we recommend a simple and intuitive workflow that you might have already adopted before:

Chunk your analysis according to sensible, logical macro-steps (e.g. preprocessing, time_frequency analysis etc.).
Create folders for each macro-step and store in there all the scripts relative to each micro-step (in the example below, the mne-preproc folder constitutes a macro-step and the scripts, preproc_step_1.py, preproc_step_2.py etc., are micro-steps).
Each macro-step folder should also contain a pipeline script, mne-preproc.py in the example, that is used to combine the micro-steps allowing to run those steps all at once.
Each micro and macro-step should be implemented to run one subject/instance at the time. This enables you to run all the subjects in parallel in the cluster.
You should have a general pipeline script that combines all the macro-steps together.

code/
├── mne-main_pipeline/
│   ├── README
│   └── main_pipeline_script.py
├── mne-preproc/
│   ├── README
│   ├── mne-preproc.py
│   ├── preproc_step_1.py
│   ├── preproc_step_2.py
│   └── preproc_step_3.py
├── mne-time_freq/
│   ├── README
│   ├── mne-time_freq.py
│   ├── time_freq_step_1.py
│   └── time_freq_step_2.py
└── mne-stats
    ├── README
    └── mne-stats.py

How to build a pipeline

At its core an analysis pipeline is nothing else than a collection of steps executable in a logical sequence. In this instance, we explicitly mean a script in its simplest form that combines all the steps in a very clear and intelligible way. You can create a pipeline in many ways, we suggest you a few options with examples that you might use. Keep in mind that often analysis libraries or toolboxes offer the option of creating a pipeline (Nilearn, SPM), if this is the case, we recommend to use them.

nypipepythonmake file/shell scriptEEG/MEG specific pipelinefMRI specific pipeline

Nypipe is the most sophisticated and powerful tool to create a pipeline for cognitive neuroscience. It is a python library that enables you to create an analysis pipeline by combining different external tools (e.g. SPM, FSL) and/or custom scripts from different languages such as Python, Matlab and shell.

Nipype pipeline example

    Coming soon...

The following example represents only a very essential scaffolding for a potential pipeline

Python pipeline example

"""
Import all the necessary steps. Imagine that
the script preproc_step_1.py contains 
a function or a class called step_1 and
so on for the other steps.
"""

from preproc_step_1 import step_1
from preproc_step_2 import step_2
from preproc_step_3 import step_3


class Pipeline:
    """
    Toy Pipeline scaffolding.
    It can be used and modified to accomodate
    different kinds of analyses.

    Arguments
    ---------
    input : In this case is numpy array
            it could be any kind of data
    """

    def __init__(self, input):
        self.input = input

    def _load_data(self):
        """
        This method simply loads your data,
        in this toy example just loading a (1xn) numpy array
        """
        return np.load(self.input)

    def _check_io(self, data, condition=None, error=None):
        """
        This method checks
        whether inputs or outputs
        meet specific criteria,
        e.g. data type, shape
        or something else.
        It throws an error in case
        data do not meet even one of criterion
        """
        for check, error in zip(condition, error):
            print(check, error)
            if check(data):
                print("Data are fine")
            else:
                raise ValueError(error)

    def __call__(self, pl_dict):
        """
        This method makes it callable and
        we use it to run through the for loop
        each step, while checking inputs and outpus

        Arguments
        ---------
        pl_dict : dict, it is a dictionary that contains
                dictionaries with the name of each step as keys,
                as many as they are the steps,
                in each dictionary we specify the name of the function
                or method (in case we have them in a class) that we
                need for that step. The parameters needed by
                the function/method, the criteria for checking
                the correcteness of input/output and the error message
                we would like to throw in case the input/output does
                not satisfy certain criteria. The criteria requires
                either a lambda function or any other function that
                returns a boolean

        data : np.array, in this toy example it returns a numpy array,
            in a actual scenario, can return whatever type you need

        """
        data = self._load_data()
        for k, v in pl_dict.items():
            print(f"running {k}")
            # check whether criteria are met
            if v["criteria"] is not None:
                self._check_io(data, v["criteria"], v["error_message"])
            # extract function/method from the dictionary
            func = v["method_name"]
            # extract parameters from the dictionary
            par = v["par"]
            # run step
            data = func(data, par)

        return data


if __name__ == "__main__":
    import numpy as np

    DATA_POINTS = 20
    PL_DICT = {
        "step_1": {
            "method_name": step_1,
            "par": (0, 3),
            "criteria": [
                (lambda x: x.shape[1] == DATA_POINTS),
                (lambda x: x.any().min() >= 0 and x.any().max() <= 1),
            ],
            "error_message": [
                "data points do not match",
                "data are not with the range min >= 0 and <= 1",
            ],
        },
        "step_2": {
            "method_name": step_2,
            "par": ("mask", "ROI"),
            "criteria": None,
            "error_message": None,
        },
        "step_3": {
            "method_name": step_3,
            "par": (0.5, 1),
            "criteria": None,
            "error_message": None,
        },
    }

    PREPROC_PL = Pipeline("data.npy")
    print(PREPROC_PL(PL_DICT))

Coming soon...

List of pipelines by the major EEG/MEG analysis libraries:

Fieldtrip pipeline (Matlab based)
EEGlab pipeline (Matlab based)
MNE pipeline (Python based)

Nilearn pipeline for Machine Learning can be extended to include also GLMs (Python based - it works with BIDS)
fmriprep is a pipeline for fMRI preprocessing which combines steps from different tools (Python and shell based - it works with BIDS)

Why do I want to adopt this workflow?

This approach, besides being very intuitive, provides a modular and logical structure. Each micro-step can create intermediate outputs that can be systematically checked for correcteness, allowing to identify more easily potential mistakes and to keep track of inputs and outputs.