Innovative project from UCPH researchers: Using deep learning to automatically generate phylogenetic trees from digitised specimens

This interdisciplinary research is new in the science world and is already groundbreaking

May 14, 2025

Image: Roberta Eleanor Hunt, Postdoc in Computer Science working on the project PHYLORAMA.

14.000 beetles. 1 deep learning algorithm.

Beetles, the largest group of organisms on Earth, are now front figure in a collections-based research at Natural History Museum of Denmark. Researchers behind the project are using digital images of specimens from the museum’s collection and a deep learning algorithm to generate phylogenetic trees.

Phylogenetic trees are very important because they tell us how species originated, evolved, dispersed or got extinct. By combining computer science and entomology researchers can trace the past evolution of biodiversity and predict how life on Earth may look in the future.

PHYLORAMA and digital data

The University of Copenhagen project is called PHYLORAMA and the overall goal is to generate phylogenetic trees in an automatic and efficient way using specimen morphology. The data which is mainly images of numerous identified species stems from the Natural History Museum of Denmark’s collections. This project showcases a new form of research in the science world. Roberta Eleanor Hunt, Postdoc in Computer Science at NHMD, states: ‘There are only a couple of people in the world doing what we are trying to do, but there are ton of people in computer science trying to do clustering and tree generation and many others in biology focusing on improving traditional phylogenetic methods’. Roberta Eleanor Hunt and her supervisor, Professor and Section Manager at NMHD Kim Steenstrup Pedersen, are using results from the developed fields of deep clustering and phylogenetic research to tackle this difficult biological problem in a new way.

The deep learning algorithm Roberta Eleanor Hunt is creating, attempts to capture relevant phylogenetic signals from images. This is a difficult problem as ‘Some beetles are small, some are large, some are dark or coloured, some are chubby or skinny and so on. So, this diversity could be informative but at the same time it could be misleading because also there is a convergence here. Not all large beetles are phylogenetically related for example, they evolved large size independently’, says Alexey Solodovnikov, Lecturer and curator at NHMD. He continues: ‘But ideally what Roberta works towards is that we try to make these algorithms clever and informed, so that they can extract only phylogenetically useful information from these images in the future’. This is what makes PHYLORAMAs goal special and innovative. 

No collection data, no project

For the PHYLORAMA project to succeed all the scientific collection data is highly needed, Roberta Eleanor Hunt explains: ‘That is why it is really exciting that DaSSCo and other projects are trying to digitise and take more pictures. We are fairly sure that will make these algorithms a lot better’.

That goes to show that digitised collections can have a massive impact on research about biodiversity. ‘I think the most important is to realize how huge the hidden potential of the biological collections is in the museum. And how to wake up this Sleeping Beauty so to say. It’s a huge amount of information, but it’s very difficult to mobilize it in high throughput way’, says Alexey Solodovnikov. And even though DaSSCo currently is working hard to digitise everything through pictures, he hopes to do even faster and better in the future: ’If you digitise specimens with more advanced technology like the micro CT scanners that for example are widely used in medicine, we could get more anatomical features that are not seen from the outside. So potentially, there are automatic ways to get more morphological data for phylogenetics’. With more data comes more knowledge and possibilities, which researchers will only need more of in the future to figure out Earths challenges within biodiversity.

Data sharing by DaSSCo and more and more other natural history museums all over the world can significantly help researchers achieve more insight into several aspects of biodiversity. Especially since the deep learning algorithm can be transferred to all kinds of specimens. The beetles are only the beginning.

Info box:

  • Phylogenetic trees show evolutionary relationships between organisms.
  • Entomology deals with the study of insects. 
  • Morphology is the part of biology that deals with structure and form of the insect
  • Deep learning algorithms are a type of machine learning, a subfield to artificial intelligence

Go to the research paper here.

You may want to read