17/04/24 |

Arabica coffee reference genome is sequenced with the participation of Brazilian researchers

Enter multiple e-mails separated by comma.

Researchers from 16 countries, including Brazil, sequenced the reference genome of Arabica coffee, the world's most consumed coffee species. Three researchers are from Embrapa Coffee and another eight from institutions that comprise the Coffee Research Consortium, which Embrapa coordinates. A scientific paper published on the 15th in Nature Genetics, a high-impact scientific journal, offers unprecedented information regarding the genome and population genomics of this species, which reveals the history of diversification of cultivars that are currently grown. 

The Embrapa Coffee researcher Alan Andrade explains that the group of scientists, in which he participates, conducted a complete structural genetic mapping of Coffea arabica with the highest quality achieved to date. “That led us to what we call the reference genome. In 2004, here in Brazil we were pioneers in the functional sequencing of the genome for the arabica species. As for the structural one, we have come to know the order of the genes within the DNA sequences and the intergenic regions that compose the genome, which is not possible to see in the functional sequencing”. Thus it has become easier to identify genes that are responsible for specific traits in the coffee plants, such as resistance to disease and drought, berry size, aroma or flavor.

The researcher Luiz Filipe Pereira, from Embrapa Coffee as well, says that important advances are already being achieved based on the results obtained. "As we have been immersed in this work for years, we have been developing several studies focused on Brazilian coffee farming using the data from this study."

He explained that the detailed genome makes it possible to identify genetic variations of DNA bases associated with phenotypic traits like resistance to diseases, for instance. “Thus, the analysis of the plants' DNA allowed us to quickly select those that have resistance, accelerating breeding and genetic improvement”, Pereira detailed.

Data from the new sequencing are also being applied in the development of technologies for coffee certification and traceability. The study also included the participation of Embrapa Coffee researcher Lilian Padilha, who worked with the Agronomic Institute (IAC) team. 

 

Evolution of arabica coffee

 

With the new genetic mappings, the full genome sequences and structures of the species Coffea arabica, Coffea eugenioides and, once again, Coffea Canephora. were compared. The goal was to reveal the evolution of the species, the role of the genes, the mechanism of gene regulation, and identify sequence structures and the elements that were conserved or differentiated. The gene family, evolutionary development, the duplication of the entire genome, and the selective pressure suffered were also analyzed.

According to the researchers, “modern genomic tools and a detailed understanding of the origin and history of breeding of contemporary varieties are vital for the development of new cultivars of arabica coffee, which are better adapted to climate change and agricultural practices”.

They re-sequenced the full genome of 41 wild and cultivated accessions of this species, even an eighteenth-century specimen used by the Swedish naturalist Carl Linnaeus, which provided an in-depth analysis of C. arabica's history and dissemination routes.

C. arabica is a polyploid species, more precisely an allotetraploid, as it carries 44 chromosomes. It is the result of a natural hybridization event between the ancestors of the current Coffea canephora (Robusta coffee) and Coffea eugenioides, which have 22 chromosomes each and are classified as diploids. Such duplication of the entire genome is given the acronym WGD. Scientists had difficulty pinpointing exactly when – and where – such allopolyploidization event occurred, with estimates ranging between 10,000 and 1 million years ago.

Through computational modeling, the researchers searched for signatures of the species' foundation by performing analyzes on C. arabica genomes. The models show three population bottlenecks in the course of history, the oldest of which occurred about 29,000 generations ago, or 610,000 years ago.

This suggests that Arabica was formed some time between 360,000 and 610,000 years ago and had its population rise and fall in periods of warming and cooling of the Earth for thousands of years, before eventually being cultivated in Ethiopia and Yemen, and then spread across the globe.

Coffee plants were once thought to have been first cultivated in Ethiopia, but the varieties collected by researchers around the Great Rift Valley, which stretches from Southeast Africa to Asia, showed a clear geographically division. The wild varieties in the study have all originated from the western side, while the cultivated varieties are all from the eastern side, closer to the Bab al-Mandab strait, which separates Africa from Yemen.

This would be in line with the evidence that coffee cultivation may have started mostly in Yemen at around the 15th century, followed by a move to India, which supports the legend of the smuggling of “seven seeds” by the Indian monk Baba Budan around the year 1600. Thus, the diversity of Yemeni coffee could be the foundation for all the main arabica varieties today.

For scholars, polyploidy is a powerful evolutionary force that has shaped genome evolution in many eukaryotic lineages, possibly offering adaptive advantages in times of global change. However, contemporary Arabica cultivars descend from the lineages Typica or Bourbon, which have particularly low genetic diversity, are susceptible to many pests and diseases like coffee rust, and can only be successfully cultivated in limited regions of the world.

In 1927, a spontaneous C. canephora hybrid that is resistant to the fungus H. vastatrix, which causes rust, was identified in the island of Timor. Bsed on the new Arabica reference genome, studies on plants in this lineage made it possible to identify a new target site to potentially improve resistance to pathogens like the fungus at stake. The new genome sequencing supplied other findings, such as which wild varieties are closest to the Arabica coffee that is cultivated nowadays. The scientists also discovered that the Typica variety, an ancient Dutch cultivar that had originated either in India or Sri Lanka, is probably the mother of the Bourbon variety, which widely used in the preparation of specialty coffees.

 

At the frontier of coffee genomics

Since the beginning of the 19th century, Brazil has led the world's coffee production and exports, which has been present in the country for nearly 300 years. This leadership has been anchored by extensive research related to coffee farming, dating back to the creation of IAC's Coffee Section in 1923. From then on, the country has not stopped doing studies concerning the crop.

A few years later, in 1929, the creation of the Genetics Section kickstarted work on coffee genetics and breeding. Since then, dozens of institutions have started to conduct studies for the coffee sector or were created around it, such as Embrapa Coffee and the Coffee Research Consortium, which currently comprises about 40 research bodies whose work focuses on the crop.

With regard to the genetic sequencing of coffee plants, Embrapa has made important advances.  In 2004, Alan Andrade, Carlos Colombo (a researcher at IAC) and Luiz Gonzaga (a researcher at the Paraná Rural Development Institute - IAPAR) coordinated the first functional sequencing of the arabica coffee genome in a project by the Coffee Research Consortium funded by the São Paulo State Research Support Foundation (FAPESP), which also had the participation of Luiz Filipe Pereira and which generated the largest database for coffee in the world at the time, with 200,000 DNA sequences.

The result of that work was decisive for the first full sequencing of Coffea canephora ten years later, by an international consortium composed of 11 countries, with significant participation by Andrade and Pereira.

Another important genome sequencing was that of the coffee leaf miner, one of the main coffee pests, which was concluded in 2022 in a project led by researchers from Embrapa Genetic Resources and Biotechnology, with the participation of researchers from Embrapa Tropical Agroindustry, Embrapa Coffee, Embrapa Cerrados, Embrapa Maize and Sorghum and the Federal University of Viçosa (UFV)

The illustrations in the present news item are featured in the paper "The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars". Click here to read the publication. 

Rose Lane César (MTb 2978/DF)
Embrapa Coffee

Press inquiries

Phone number: +55 61 3448-1551

Translation: Mariana Medeiros (13044/DF)
Superintendency of Communications

Further information on the topic
Citizen Attention Service (SAC)
www.embrapa.br/contact-us/sac/