Monday, December 23, 2024

Google Scholar is not broken (yet), but there are alternatives

Must read

For many Google Scholar has become a critical piece of research infrastructure. Yet, revelations in the manipulability of its metrics and its inclusion of AI generated papers have led some to ask is it still functional? Kirsten Elliott argues rather than being broken, these issues reflect the limitations of any academic search tool, but for those done with the platform there are alternatives.


For those who haven’t ventured into the ‘other place’, this post builds on some threads posted on BlueSky, in response to discussions on that platform about Google Scholar’s “brokenness.”

The question of whether Google Scholar is broken has the obvious answer of “It depends”: on what it’s being used for, how it’s being used, and what alternatives are available.

Google Scholar has advantages over traditional academic databases like Scopus and Web of Science: it’s free to use, requires no log in for searching, and has more comprehensive coverage, especially of non-journal sources such as books and theses. These benefits are particularly important for unaffiliated scholars without institutional access to resources, and those in the humanities.

Google Scholar is used for many different kinds of academic information-seeking: finding the full text of an article, exploratory searches on a broad topic, forwards citation chasing (i.e. looking at where a publication has been cited), finding citation metrics to demonstrate research impact, and even systematic review searching. For each of these purposes there are different criteria for whether it is the best tool, or even appropriate to use at all.

As AI generated publications proliferate, Google Scholar is particularly vulnerable to being swamped by fake research.

However, there are downsides to Google Scholar. Where most other academic databases have inclusion criteria for what will and will not be indexed, typically at journal level, Google Scholar relies on web scraping. Publications deemed excluded elsewhere on the grounds of poor quality or integrity concerns are likely to be picked up by Google Scholar. Even when there is clear evidence of citation manipulation papers are not removed, as evidenced in the case of Larry the Cat and his impressive H-Index. As AI generated publications proliferate, Google Scholar is particularly vulnerable to being swamped by fake research.

Another key difference from most academic databases is that Google Scholar, like Google, ranks results. The algorithm for doing so is not transparent – studies have attempted to reverse engineer it, but they become dated very quickly. The ranking is probably based on a combination of the number of citations, number of times the searched words appear in title and full text, and date, with more recent research appearing higher. Many users of Google Scholar look only at the first few pages of results, as there are diminishing returns in looking beyond that.  Doing so may exacerbate the Matthew Effect, with highly cited works more likely to accrue future citations and the bias towards English-language publications.

There is no perfect version of the algorithm that presents the “best” results for all possible searches, because what “best” means varies by purpose and discipline.

The ranking algorithm might result in unexpected results, like a foundation work in a discipline disappearing from the first page, or a dissertation appearing unexpectedly high. Google Scholar searches are not consistently reproducible – anecdotally, this undermines trust in its results and creates a perception of brokenness. There is no perfect version of the algorithm that presents the “best” results for all possible searches, because what “best” means varies by purpose and discipline. Finding the most recent publications is far more important in medicine than the humanities, for example.

One mitigation of some of the problems with Google Search is to use the Publish or Perish software rather than searching it directly. Doing so allows for the saving of exact search terms, so searches can be accurately repeated at a later date. There are options to sort by alternatives to the default Google ranking, including number of citations and date.

On a more philosophical level, there are objections to the lack of transparency in the data used and presented by Google Scholar. Key stakeholders in research such as universities and funders are increasingly advocating for open research. Whilst efforts so far have focussed primarily on openness of publications, the recent Barcelona Declaration applies the principles of openness to information about research, with signatories committing to “work with services and systems that support and enable open research information.” Google Scholar cannot meaningfully be said to do so given the opacity of the processes for inclusion and ranking of research outputs. The closed proprietary systems run by large profit-making companies like Elsevier and Clarivate clearly do not meet this criterion either.

Another point in the Barcelona Declaration is the “sustainability of infrastructures,” and that is a key concern for Google Scholar. It’s unclear what the long-term funding model is, and if it will be maintained in the future, meaning it might be a wise choice to explore other options.

It’s unclear what the long-term funding model is, and if it will be maintained in the future, meaning it might be a wise choice to explore other options.

There are alternatives to Google Scholar which operate from an open research ethos and are free to use. Three prominent alternatives are The Lens, Matilda and OpenAlex.

The one I’ve used most is OpenAlex. One study has found it to have comparable coverage to Web of Science and Scopus, and my own limited testing found significantly more publications indexed from social sciences and humanities subjects. Their code is fully open, and the data is reusable. The system has thorough documentation and, in my experience, the OpenAlex team are responsive to feedback. The levels of transparency and engagement with the academic community are significant advantages over Google Scholar. OpenAlex is still relatively new, and the data is not perfect. The author disambiguation process, for example, struggles with authors like myself who have published across disciplines.

What does all this mean moving forward?  For researchers, there is value in reflecting on current searching practices and whether Google Scholar is still the best option for their purposes, given the caveats above, and bearing in mind the limitations and biases of other systems available.  For academic librarians like myself, I encourage the exploration of open research information systems, and support the development of critical information literacy in our library users, incorporating into teaching about search tools an understanding of how the systems we use to find and access information are created and funded, and how that shapes the results.

 


The content generated on this blog is for information purposes only. This Article gives the views and opinions of the authors and does not reflect the views and opinions of the Impact of Social Science blog (the blog), nor of the London School of Economics and Political Science. Please review our comments policy if you have any concerns on posting a comment below.

Image Credit: sirtravelalot on Shutterstock.


Latest article