Understanding vagueness

  • 10 March 2022
  • 3 minutes

Gonville & Caius College Fellow Dr Guy Emerson is seeking precise predictions of vagueness.

Dr Emerson, a computational linguist, is the Executive Director of Cambridge Language Sciences, an interdisciplinary research centre. His research goals are to uncover what it means to know a language and to advance machine learning.

But for someone who read mathematics as an undergraduate at Trinity College, Cambridge, it seems incongruous that something as difficult to define and predict as vagueness is a focus.

“Language is full of vagueness. This applies to any concept you can think of – you don’t have clear boundaries,” Dr Emerson says.

“It might seem strange to have a precise mathematical model of vagueness, but the idea is whether we can predict how people might use a vague word. People communicate with each other despite the fact language is full of vagueness all the time. Can we try to quantify how people use vague language?”

Dr Emerson uses and develops tools from Artificial Intelligence, or machine learning, but he acknowledges that these tools have had more success on some tasks than on others.

He says: “There’s been huge progress in machine learning in recent years. But when you get machine learning models that are doing things that seem superhuman, it tends to be when there’s a clearly defined task to solve and the model is given a huge amount of data to learn how to do that task. Language models are often trained on orders of magnitude more text than a human could read in their whole lifetime.

“But I don’t do so much work on the practical, applied side. I’m most interested in questions about modelling human language use and learning, and there the best machine learning models are still way behind. So I see it more as trying to develop a model of human language learning.

“There’s a theoretical side and an empirical side to my research. On the empirical side it’s implementing a computational model, running it on some data, evaluating it on some other data and you get some quantitative results at the end. And you can compare how well different models capture the data.

“On the theoretical side it’s trying to develop different models which could be more capable of modelling a certain phenomenon. Ideally the theory is borne out in the experiments. But it doesn’t always work out as neatly as you’d like!”

Dr Emerson’s research uses the English language, for practical reasons such as the availability of resources, and his motivations are about understanding language itself – not about developing a smart speaker-style device.

He adds: “One of the reasons I really enjoy this field is because there are so many connecting aspects to it: cognitive science, linguistics, philosophy of language, mathematics, computer science.

“Humans can learn words in many different ways – in a real-world context, or used in a sentence, or with a definition. A human very naturally will combine these sources of information.

“How do you develop a model which can combine these into a single representation of meaning – how do we get to that?”

Explore