EMBL-EBI: Building the Open Data Backbone for Modern Biology

EMBL’s European Bioinformatics Institute has become one of the world’s most important providers of open biological data. From its roots in European molecular biology, it now supports research, healthcare, industry and AI, while facing the rising demands of scale, trust, funding, international collaboration and long-term digital resilience every day globally.

EMBL’s European Bioinformatics Institute, widely known as EMBL-EBI, occupies a distinctive place in global science. Based at the Wellcome Genome Campus near Cambridge, it is one of the six sites of the European Molecular Biology Laboratory, an intergovernmental organisation created to advance molecular biology research, training and technology. EMBL-EBI’s own story grew from a practical challenge: biological research was beginning to generate data at a pace that traditional publication and laboratory systems could not manage. From its origins in nucleotide sequence data services and its establishment in the UK in the 1990s, the institute has developed into a public digital infrastructure for the life sciences. Its resources now cover genomes, proteins, chemicals, gene expression, disease associations, samples, pathways, literature and much more. For business readers, its importance is not only scientific. EMBL-EBI shows how a mission-led organisation can create long-term economic and social value by making trusted information widely available.

That value is increasingly measurable. An independent economic impact report by Frontier Economics found that EMBL-EBI’s open data resources deliver multibillion-pound annual benefits, largely through time saved by researchers and organisations using its services. The report estimated productivity gains of more than £11.8 billion a year across public and private sectors, and benefits more than 100 times greater than the cost of maintaining the resources. Such figures help explain why open biodata has moved from being a specialist academic concern to becoming essential infrastructure for life sciences, healthcare, agriculture, biotechnology and environmental research. EMBL-EBI’s services include globally recognised resources such as the GWAS Catalog, AlphaFold Database, UniProt tools and the International Genome Sample Resource. These platforms help scientists interpret genetic variation, predict protein structures, map identifiers, submit functional genomics data and analyse vast datasets. The institute’s role is not simply to store information; it helps make data findable, usable, comparable and ready for reuse.

The current challenges facing bioinformatics are substantial. Biology is now a data-intensive discipline, with sequencing, imaging, clinical genomics, biodiversity programmes and high-throughput laboratory methods producing information at extraordinary speed. The pressure is not only technical, although storage, processing, security and service continuity matter greatly. The deeper challenge is trust. Researchers, clinicians and companies need data that is current, clearly licensed, well annotated and connected to evidence from literature and expert review. EMBL-EBI’s data stewardship principles directly address this requirement. Its commitment to open access, machine-readable licensing where possible, open-source software and long-term resilience reflects an understanding that scientific data must be maintained, not merely launched. At the same time, compliance expectations are growing, including around data provenance, privacy, benefit sharing and international regulation. In this environment, EMBL-EBI’s combination of expert curation, robust infrastructure and international governance is a competitive advantage for the communities it serves.

Artificial intelligence has raised the stakes further. The success of AlphaFold demonstrated how high-quality open biological data can help transform a field, with EMBL-EBI hosting protein structure predictions that are now used by scientists worldwide. Yet AI also places new demands on data providers. Models require scale, consistency and quality, but their outputs must still be evaluated by knowledgeable people and connected back to evidence. EMBL-EBI is responding by integrating text mining, machine learning, large-language models and other AI approaches into curation and annotation, while retaining rigorous quality control. This is a measured response to innovation: automation can extend expert capacity, but it cannot replace scientific judgement. The institute is also investing in training, supporting users at different career stages and across sectors, from academic researchers to industry teams. Its international recruitment approach and multicultural workforce strengthen this capability, bringing together people with the technical and biological skills needed to manage a rapidly changing field.

EMBL-EBI’s future rests on keeping essential biological data open, reliable and genuinely useful worldwide today. Strong stewardship will matter as researchers generate richer datasets across medicine, agriculture and biodiversity studies. AI will amplify discovery, but trusted curation must remain central to scientific confidence and reproducibility. International partnerships will help protect services from political, technical and financial uncertainty over time globally. The institute’s history suggests it will meet change by serving scientists first with discipline consistently.

Hot topics

Finance

Marketing

Politics

Strategy

Hot topics

Finance

Marketing

Politics

Strategy

EMBL-EBI: Building the Open Data Backbone for Modern Biology

Topics

Related Articles

Company

Headlines

Newsletter