I always find population study, a study that is conducted in representative samples from the population, to be fascinating. Especially due to its potential to process various information toward beneficial policies and actions. So, when I first learned about bioinformatics data analysis, I kept looking for the possibility of applying it in population level data, particularly to answer health-related research questions. Seeing how big data transform into beneficial interpretations for wider use, intrigues me to get involved in this research area.
When I moved to Finland to follow my husband, I felt lucky because Finland is one of the countries that have comprehensive population data ranging from Cancer Registry that has been established since 1953 to the most recently added Register of Primary Health Care Visits in 2011. The records are originally meant to develop new ways to model the complex relationships between health and risk factors using high-resolution longitudinal data. The findings are then used to develop preventive and personalized health care for general citizens. For example, The National FINRISK Study, which was initiated in 1972, was highly successful in presenting preventable measures to tackle the high incidence of cardiovascular diseases, marked by decline of risk factor levels and coronary heart disease morbidity and mortality in the province of North Karelia, Eastern Finland at the time1. In 1976, the study was then expanded to the entire nation to target more broadly on major non-communicable diseases. When I joined the Turku Data Science Research group in 2021, I got the opportunity to take part in analyzing this cohort, particularly those collected in 2002 (FINRISK 2002), aiming to explore how microbiome, collections of microbes that live with us, influence long-term health status2.
High-quality data is indeed a foundation of a successful health and care system. It could serve as a system enabler for integrative data information towards better healthcare policy. For example, it helps in deciding on the best care, researching and improving treatments, addressing health inequalities, managing contagious diseases, improving efficiency, and planning services for now and future. When COVID-19 pandemic occurs, insight from population data is critical to help public health and humanitarian leaders to respond more effectively to the pandemic, particularly by analyzing preventive actions, the spread of the disease, population mobility, and systems or people’s resilience to cope with the virus. An effective response is always needed with every crisis, and here is why the collective effort for population data could play an essential role.
Although leveraging the population’s data for better healthcare has been implemented by many countries, several parts of population are still understudied. As an example, in genomic study, the vast majority (86%) of the population data only cover individuals of European descent3. This could result in missed scientific opportunities that could exacerbate health disparities. One famous example is the difficulty in generalizing the use of polygenic risk score, an estimate of genetic risk, in different populations. Despite its increased power to predict certain diseases such as breast cancer and cardiovascular diseases in European-descent populations, the prediction’s accuracy decreases when applied to populations with increasing genetic distance from the study cohort. This example shows that the lack of diversity in population data may result in inaccurate assessment of risk and lack of interventions, especially in under-studied populations. Hence, comparing a diverse set of populations unravels the possibility of gaining valuable information for greater insight in understanding the complexity of certain health conditions.
On the optimistic side, several initiatives have been started recently for understudied populations. In genomic fields, several flourishing studies have targeted low- and middle-income countries in Africa, Asia, and Latin America also indigenous populations study in Australia. There are several factors that contribute to their success such as: (1) international project funding that promotes inclusion; (2) large-scale training for the local community to contribute to the project; (3) strategic collaborations with an established institute to support both infrastructures and knowledge experts; and importantly, (4) clear ethical guidelines to build trust between community and researcher.3 I am quite hopeful as well, as Biomedical and Genome Science Initiative (BGSi) also started recently this year in my country, Indonesia. This could have a major impact, as Indonesia is the 4th most populated country (based on population number), and home to 1,340 recognised ethnic groups.
Despite a massive increase in data collection, there has been relatively little progress in data analysis and application4. A joint effort of a large number of scientists with a diverse sets of skills could be part of the solutions. One of example is a collaborative scientific competition, known as Challenges could provide a unique way of engaging researchers to collectively solve a complex problem, and provide a framework for robust methodologies for data analysis, including for health data setting. Recently, our research group has been taking part in organizing an Open Challenge that adopts the data and problems from the FINRISK cohort, where we invite everyone to provide novel insight in predicting heart failure using information on conventional risk factors and microbiome compositions.
To summarize, population data has opened opportunities to substantially improve health outcomes. Ensuring the inclusion of diverse populations could accelerate progress. To warrant such practice, many aspects need to be accessed, and successful projects could serve as an excellent example. Importantly, an appropriate governance framework must be developed and enforced to protect individuals and ensure that healthcare delivery is tailored to the characteristics and values of the target communities. Finally, collective effort could help to accelerate the translation of data toward improving health outcomes that could benefit everyone.