How Pl@ntBERT helps us understand nature’s patterns
Plants do not live in isolation. Each ecosystem, from a forest to a meadow or a wetland, has its own “community” of species that live together and depend on each other in complex ways. Understanding how these species assemble and interact is one of the main challenges in ecology. This knowledge is crucial to protect biodiversity, restore damaged ecosystems, and track how nature is changing.

In our recent study published in Nature Plants, we developed Pl@ntBERT, a computer model that learns to read the “language” of plant communities. Just as large language models like ChatGPT learn patterns in words and sentences, Pl@ntBERT learns patterns in lists of plant species. It was trained on more than 1.4 million vegetation surveys from across Europe, representing over 14,000 species (most of European flora). Each survey records which plant species grow together at a particular site and how abundant they are.
By analysing these millions of “sentences” of plant names, Pl@ntBERT learned the hidden rules (the “syntax“) that determine which species tend to co-occur. For example, certain grasses often appear with specific flowers in a meadow, while other species rarely share the same space. This kind of knowledge is difficult to describe using traditional statistics, but machine learning can uncover these subtle patterns automatically.
We then tested Pl@ntBERT in two ways.
First, we asked it to fill in missing species from incomplete plant lists. When we removed a species from a community, Pl@ntBERT was able to guess which one was missing much better than traditional ecological models (by over 16% compared to classical co-occurrence methods). This means the model has learned meaningful relationships between species, even when some data are missing.
Second, we used Pl@ntBERT to identify habitat types (for example, distinguishing a coastal dune from a wet meadow or a forest) based solely on the species present. Here too, it outperformed existing expert systems and other machine learning methods. It correctly assigned habitats to vegetation plots about 92% of the time, showing that it can accurately recognize the ecological signature of different environments.

Beyond accuracy, what makes Pl@ntBERT exciting is its ability to generalize. Because it learns from patterns in data rather than fixed rules, it can handle the enormous diversity of European habitats and adapt to new contexts. It can also suggest likely but unrecorded species in field surveys, helping ecologists detect possible omissions and improve data quality.
Importantly, our study does not claim that artificial intelligence can replace human expertise in ecology. Instead, Pl@ntBERT acts as a complementary tool, a way to see structure in the vast complexity of nature that would be impossible to capture by hand. Ecologists still provide the essential field observations and ecological understanding that guide and validate the model’s predictions.
Pl@ntBERT and all its code are freely available as an open source GitHub repository so that other researchers, conservationists, and even nature enthusiasts can explore it. A simple online demo is also accessible through Hugging Face, allowing anyone to test how the model predicts missing species or identifies habitats from species lists.
By teaching computers to “speak” the language of plants, we hope to open new ways to monitor biodiversity and understand how ecosystems are organized. The approach can eventually be extended beyond Europe and could support projects like Pl@ntNet by improving how plant observations are interpreted in context.
Nature has its own grammar, and with Pl@ntBERT we are beginning to decode it.