How brilliant minds and computers detect HPV
There are more than 200 known types of human papillomavirus (HPV). Infections are very common and often produce no symptoms. While some cause benign skin lesions such as warts, others can lead to the development of malignant tumors that pose serious risks to health and life.
This is why HPV DNA testing – analyzing genetic material to detect the presence of the virus and precisely determine its type – is so important. It plays a key role in preventing cancerous changes and, when such changes do occur, in ensuring that appropriate treatment can be introduced as early as possible.
HPV is a major risk factor for cervical cancer. Persistent infection with oncogenic HPV types can also lead to cancers of the vulva, vagina, anus, and rectum. It is also associated with cancers of the mouth and throat, particularly oropharyngeal cancers. Globally, HPV caused an estimated 620,000 cancer cases in women and 70,000 in men in 2019. In absolute numbers, this represents a substantial patient population, which is why researchers are working intensively on faster and more accurate methods for detecting viral genetic material.

Overcoming challenges
Among the scientists working to address this problem is Dr. Marek Nowicki of the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM UW) at the University of Warsaw. Together with his colleagues, he developed HPV-KITE, a sequence-analysis tool that can detect HPV genotypes from virtually any DNA and RNA next-generation sequencing datasets.
“Viruses in the HPV family vary in length – up to about 8,000 nucleotides, or ‘characters.’ In sequencing data, we virtually never see the complete viral sequences. But if enough fragments are present, we can classify the sample as HPV-positive, indicating the presence of HPV infection,” explains Dr. Marek Nowicki.
Accurately identifying HPV genetic sequences comes with three major challenges. First, HPV is not a single virus but a large group of more than 100 closely related types. Like close relatives, they are genetically similar yet distinct, and these subtle differences make precise identification difficult. Second, biological samples contain large amounts of non-viral DNA, which significantly increases both data volume and processing time. Finally, HPV shows substantial genetic diversity and sequence variability.
The algorithm developed by researchers at the ICM UW is designed to handle all of these challenges with remarkable effectiveness.
Idea + computing power = success
The method relies on analyzing k-mers – short sequences of length k made up of the letters A, C, T, and G, which represent the nucleotides in DNA. To determine whether a given sequence is a variant of HPV DNA, the algorithm computes the Tversky index, a measure of similarity between the tested sequence and a known reference template. The process is sped up by distributing the calculations across multiple computers working in parallel.
“Most existing solutions run on a single machine or, at best, across multiple processing threads. In our solution, computations can be carried out on a single computer, a network of machines, or larger infrastructures such as computing clusters, the cloud, or a supercomputer. This is handled automatically, with load balancing, in order to obtain results as quickly as possible in terms of wall-clock time,” explains Dr. Nowicki, describing the parallel computing method used.

Not Just HPV
Researchers at the ICM UW have shown that their algorithm is among the fastest, most accurate, and easiest-to-use tools for detecting HPV genotypes in next-generation sequencing data. Importantly, it is also highly scalable – its performance improves as additional computational resources, such as processors, are added.
Beyond HPV research, the method has broader applications. It can also be used effectively to detect the DNA of a wide range of other microorganisms.
Read more:
https://heap-exposome.eu/2021/03/31/the-heap-interview-piotr-bala-talks-metagenomics/
https://icm.edu.pl/blog/2025/04/16/szybkie-wykrywanie-sekwencji-wirusow-hpv
https://academic.oup.com/bib/article/26/2/bbaf155/8109669
* Data from the Polish National Cancer Registry (onkologia.org.pl)
The text was originally published in Polish on the Serwis Naukowy UW website on June 23, 2025.
