Open Data

A list of the different open data sets developed by our team is given below.

Ghent Semi-spontaneous Speech Paradigm validation dataset

The purpose of the Ghent Semi-spontaneous Speech Paradigm (GSSP) dataset is to validate a newly developed speech acquisition paradigm by examining the speech style. The dataset contains more than 1000 raw audio recordings from over 80 participants. Each participant described 30 images with a consistent emotional load using the GSSP method and read aloud a fixed text seven times. The data was collected using an online web application whose code and accompanying analysis notebooks can be found on GitHub.

AI4NTD KK2.0 P1.5 STH & SCHm Dataset

Dataset for detecting Soil-Transmitted Helminths and Schistosoma mansoni eggs in Kato-Katz smears

Soil-transmitted helminth infections are caused by different species, also known as roundworms, whipworms, and hookworms. Approximately 1.5 billion people are infected with soil-transmitted helminths worldwide, and infected children are nutritionally and physically impaired. Worms are transmitted by eggs present in human faeces, which contaminate the soil in areas where sanitation is poor.

The global targets to eliminate soil transmitted helminthiasis morbidity depends on the accurate assessment of the prevalence and intensities of infections in the populations. The 2030 WHO roadmap outlines a goal to eliminate STH and SCH infections as a public health problem by 2030.

To support and accelerate this roadmap we have been developing a low-cost slide scanner to digitalise Kato-Katz stool thick smears. Using the scanner we have collected and annotated a substantial dataset of four helminth eggs.

Context-aware lifestyle monitoring

Real World Dataset for lifestyle monitoring through Human Activity Recognition (HAR) with wrist-worn wearable device.

In recent years the advancement of wearable device technology has made their use possible in a wide range of applications, including healthcare. Wearable devices are equipped with motion sensors that allow for activity tracking. They are comfortable, non-stigmatizing and unobtrusive which makes them ideal for continuous fitness and lifestyle monitoring and evaluation.

This dataset consists of accelerometer data from wrist-worn wearable device (Empatica E4). The data is collected in the real world with no collection protocol and in the participants’ natural environment over several weeks.

This dataset is licensed under CC BY-SA 4.0

Data Analytics for Health and Connected Care

Data Analytics in Health and Connected Care (DAHCC) resembles both the way and the data to describe the connected care applications, the used sensors to create such care applications together with their link to the people who are involved by or with those care applications (e.g. patients, healthcare professionals etc.).

The DAHCC resource exists out of 3 main components:

  • A large dataset of daily life activities, bot provided in raw and knowledge graph format.
  • The DAHCC Ontology capturing care, patient, daily life activity recognition and lifestyle domain knowledge.
  • Connected Care Applications which shows the potential of combining data with ontological meta-data.