How network science helps us understand the fundamentals of language

July 7, 2023 Hanna Landenmark Conferences Interviews News & Policy

It is a busy time for the network sciences at PLOS. On June 20, we announced a new journal as an addition to our portfolio, PLOS Complex Systems. PLOS Complex Systems will be a community-led journal focused on research to understand the drivers and behaviors of complex systems, and will enable rapid dissemination of groundbreaking results, cross-fertilization of knowledge, and increased collaboration to address the fundamental questions that affect individuals and global societies. For more on this announcement, please see here.

In addition, PLOS will have a large presence at the NetSci2023 conference, 10-14 July. PLOS ONE Senior Editor Hugh Cowley will be in attendance, and is happy to meet with interested authors, reviewers and Editorial Board members. Attendees at this conference will have plenty of opportunities to hear talks by PLOS ONE Editorial Board members and Guest Editors, such as Renaud Lambiotte, Mirta Galesic, Marta Sales-Pardo, Hocine Cherifi, Alberto Aleta, Ceyhun Eksin, Dion O’Neale, Luis M. Rocha, Fabio Saracco, Petter Holme, Fragkiskos Papadopoulos and Tiago Peixoto.

In a paper published by PLOS ONE on June 23, 2023, Michael S. Vitevitch and Mary Sale of the University of Kansas explore whether or not languages may have a phonological “backbone” of words that would allow speakers to communicate with an essential number of words in many different situations. They found that the English language appears to have a kernel lexicon containing words that may be key to language development or rehabilitation, which they discovered using network simplification with phonological criteria. Below, we speak with Professor Vitevitch about the inspiration behind and outlook from this study.

Prof. Vitevitch’s research applies the mathematical tools of network science to language, and also examines various types of speech errors (including the tip of the tongue state) and auditory illusions (like the speech to song illusion). You can learn more about his research and obtain copies of his publications at his website: http://people.ku.edu/~mvitevit/

PLOS: Your study looks at the idea of a “phonological backbone”. What led to the idea that such a backbone would exist?

MV: Previous studies in my lab had identified “important” nodes in a network of phonological word-forms at the micro-level (i.e., identifying individual nodes that were “important”) and at the meso-level (i.e., identifying a subset of nodes that were “important”). When I read in PLOS ONE (Neal Z.P. (2022). Backbone: An R Package to Extract Network Backbones. PLOS ONE, 17 (5), https://doi.org/10.1371/journal.pone.0269137) about a new R package that would extract the backbone of a network to form a simplified sub-network of a more complex, denser network, I wondered if this technique could be used to identify “important” nodes at the macro-level in the phonological network (i.e., at the level of the whole network). We assumed that the nodes and connections that would “survive” the backbone extraction process would be those that were most “important” to the network. Previous studies had used other approaches—such as the most frequently occurring words in the language, or the words that are learned early in life—to identify an essential or kernel vocabulary, so we were really interested in seeing what a phonological criterion might produce.

PLOS: Were there any surprises about the features of the words that you found to constitute the backbone in this English lexicon?

MV: Our network was built by connecting words that sounded similar to each other by changing a sound, known as a phoneme, in one word to form another word. By adding, substituting, or deleting a phoneme in the word cat, you get the other words that would be connected in the network to cat, like at, scat, hat, cut, or can. That’s the only information encoded in the network. After extracting the backbone from the whole network of approximately 20,000 words we found that the approximately 6000 nodes and connections that “survived” tended to be short words, occurred often in the language, and were still connected in a way that allowed you to get from one node to another in the backbone very quickly. We were surprised to find that even though information like the frequency with which a word occurs in the language wasn’t directly encoded in the network, the backbone contained words that occurred often in the language. Such words are recognized and produced more quickly and accurately than words that occur less often in the language, and tend to be acquired earlier in life, so our simple phonological criterion yielded a kernel vocabulary that was comparable in size and content to kernel vocabularies that had been identified using other criteria. The fact that all of these different approaches converge on a kernel vocabulary comparable in size and content suggests that these words might be important for many aspects of language processing in humans and perhaps in machines as well.

PLOS: What first made you interested in applying the study of networks to language learning and cognition?

MV: Back in the early 2000’s I was teaching a graduate class on artificial neural networks (a different kind of network than the complex networks used in the present study), and I wanted a popular press book to use in the class to spark interest in the students before diving into the research papers that were heavier on mathematics. As I was preparing materials for the class, I read Barabási’s book Linked to see if it would be a suitable candidate for the popular press book for the class. I quickly realized that the book wasn’t about artificial neural networks (or what are often called deep-learning networks now), but I couldn’t put the book down because it kept making me think about a way to use this other type of network to map out the relationships among all the words in that part of memory known as the mental lexicon. Instead of just looking at a word and the words immediately around it that sounded similar, I now had a set of tools to see if words that were similar to a word a few steps away also might influence various language processes, such as the perception, production, or acquisition of words. Instead of the six-degrees of Kevin Bacon, think of the six degrees of the word ‘cat’. That led to a new direction in my research—looking at how the structure of the phonological network influences various language and cognitive processes—that I’ve been pursuing for the past decade and a half now.

PLOS: You made the data available for this study through OSF. What made you choose this way of sharing your data?

MV: My co-author and I liked that OSF is a third-party that is independent of any journal, university, or research institution, so we felt like this option would allow us to make our materials available to researchers regardless of what the future held (e.g., journal changing publishers, employment at another institution, retirement from the field, etc.). Hopefully, our materials will still be available and useful to researchers long after we are gone.

Disclaimer: Views expressed by contributors are solely those of individual contributors, and not necessarily those of PLOS.

Leave a Reply Cancel reply