Benchmarking framework for machine learning with fNIRS


  • loading of open access datasets

  • signal processing and feature extraction on fNIRS data

  • training, fine-tuning and evaluation of machine learning models (including deep learning)

  • production of training graphs, metrics and other useful figures for evaluation

  • benchmarking and comparison of machine learning models

  • supervised, self-supervised and transfer learning

  • much more!

Source code on GitLab

Recommendation checklist

Below is a proposed checklist of recommendations toward best practices for machine learning with fNIRS.

Is the generalisation goal adequate with the experimental design of the data collection (eg. no order effect)?
Is the training, validation, and test set split adequate with the generalisation goal (eg. unseen subject, unseen session)?
Has nested cross-validation been used, with the outer cross-validation (leaving out the test sets) for evaluation and the inner cross-validation (leaving out the validation sets) for hyperparameter selection?
Has hyperparameter selection been performed and has it been done only on the validation sets?
Have the test sets been used for evaluation only (eg. no hyperparameter selection performed on the test sets)?
Has it been ensured that test sets were not included when performing normalisation?
Has it been ensured that there was no overlap between training, validation, and test sets (especially when epoching with sliding windows)?
Have the appropriate metrics been used (eg. accuracy, precision, recall)?
Has a statistical analysis been performed to demonstrate the significance of the results compared to chance level and other models?
Have the number of classes and chance level clearly been stated?
Have the inputs of the models been described (eg. input shape, number of examples per class)?
Has the cross-validation implementation been described (including data shuffling if used)?
Has hyperparameter selection been described (eg. selection method, hyperparameter ranges)?
Have the details of each model been provided (eg. architecture, hyperparameters)?
Has the statistical analysis of the results been described (eg. name of the tests, verification of assumptions, p-values)?

Contributing to the repository

Contributions from the community to this repository are highly appreciated. We are mainly interested in contributions to:

  • improving the recommendation checklist

  • adding support for new open access datasets

  • adding support for new machine learning models

  • adding more fNIRS signal processing techniques

  • improving the documentation

  • tracking bugs

Contributions are encouraged under the form of issues (for reporting bugs or requesting new features) and merge requests (for fixing bugs and implementing new features). Please refer to this tutorial for creating merge requests from a fork of the repository.


This project is licensed under the GNU General Public License v3+, if you are using BenchNIRS please cite this article.

Indices and tables