matthias-heyde-8HLMLrkyLvE-unsplash-1440x1080

ARC Blog

The Next Generation of Search Engines

What are some of the problems with conventional keyword-based search engines? Today, William Short from the 2022 ARC Cohort explores the limitations of existing search tools and offers an alternative.

Learn more about William and his work

Search engines are a quintessentially modern invention, but people have always relied on special tools to make finding relevant information easier. In the 3rd c. BCE, the Greek poet Callimachus created the first bibliographic catalogue to help navigate the Great Library of Alexandria. The table of contents was introduced in the first century BCE, and the back-of-the-book index followed quickly on the invention of the printing press in the mid-15th century. But the availability of almost inconceivable volumes of text on the Internet has made such tools indispensable – so much so, that nowadays it is hard to imagine doing anything at all without the assistance of on-line search tools. No wonder that Google alone handles almost 2 trillion keyword searches per year!

Yet there are surprising statistics that suggest our search tools aren’t getting the job done. According to a poll conducted by Ipsos, fully 70% of Internet search users in the UK report feeling ‘very’ or ‘fairly’ frustrated with on-line search. (It’s not only general Internet searching: a 2019 study of users of Web-based consumer health information resources in the US found very similar rates of dissatisfaction). What’s more, for businesses, the problem goes beyond mere feelings of frustration. Lookeen has estimated that time spent by employees on information retrieval tasks can cost companies nearly £11,000 per employee, per year. Another study found that in medical research settings, unsuccessful search tasks can cost organisations a staggering $4000 per query!

But if the table of contents and subject index have remained popular, why is the use of search technologies so frustrating and costly? One explanation is the so-called ‘keyword selection problem’. Most search engines are designed to work by literal string matching. Whatever a user inputs as a query is matched – more or less exactly – against a set of documents. (Yes, some can perform ‘fuzzy’ matching like allowing wildcards, correcting spelling mistakes, or expanding queries through synonyms, but the basic mechanism is the same). However, users cannot know beforehand how the information they want to find happens to be expressed in a given dataset. This places the burden on them to guess the precise wording that will satisfy their information need.

The ‘keyword selection problem’ is not simply a terminological issue, though. As I see it, search also presents a deeper problem of expectations, about what computers, as ‘intelligent’ machines, are supposed to be able to do for us. What I mean is that people think in terms of concepts, not keywords. We conceptualize things in ways that do not fit the rigid logic of Aristotelian categories, but instead depend on idiosyncratically human ways of understanding the world. Our conceptual systems are pervasively organised through metaphorical mappings, according to prototypes, and in terms of ad-hoc categories. So – to put things at their most general – we also expect our information-finding tools, whatever they are like, to be at least partially reflective of and responsive to how we think about and make sense of information, just as we expect information to be structured and organised according to human sense-making behaviours.

Keyword-based search engines do not satisfy this expectation, as they ‘make sense of’ information in a way that is fundamentally foreign to our habits of thought. Differently from the back-of-the-book index or table of contents, with on-line search engines no intelligent agency intervenes anywhere in the process. Put differently, it is as if we want to attribute a mind to our search tools – and are disappointed when computers turn out be thoughtless mimics. And this disconnect can lead to lost productivity, incomplete or faulty information, and all sorts of missed opportunities. To a frustrating cognitive dissonance.

Senseful AI is one of a new generation of so-called ‘semantic’ search engines that aim to make search more efficient, more productive, and more rewarding by leveraging people’s understanding and use of natural language. Backed by a rich linguistic and conceptual knowledge-base, Senseful AI enables text data to be queried by concepts and by relations between concepts, independently of how these concepts happen to be expressed in words. For example, a user searching for ‘parts of the body’ with a conventional keyword-based search engine will only obtain results that contain this exact wording. By contrast, Senseful AI understands the relationship of the body to its part and can give results that include ‘arms’, ‘legs’, ‘heart’, ‘lungs’, etc. In situations where users cannot know beforehand the wording of relevant information, this kind of functionality can be hugely time- and effort-saving – helping, rather than hindering, users find what’s important to them.

 

Photo credit: Matthias Heyde via Unsplash