Talk – AI is not the problem – thinking about outcomes (updated)

On 25 April 2024, I gave a talk at the Open Science & Societal Impact conference titled “AI is not the problem – thinking about outcomes”. It was co-created with Jennifer Ding of the Turing Way who is the real AI expert here, and wrote a great post about an outcomes-based approach to AI. There's extra stuff I couldn't fit into the talk, so I'm putting them here plus a transcript and video recording of the talk.

The slides are published on Zenodo with DOI: 10.5281/zenodo.11051128

I also tweaked this talk linking it to reproducibility in science at the Reproducibility by Design symposium on 26 June 2024 (at the life sciences department at the University of Bristol), kindly organised by Nick and Fiona.

I will try to gather:

I'll try to clean up this post with more context and details on a best-effort basis.

There is a video recording (of the April 2024 version) which is saved in a Zenodo item and viewable on the Internet Archive. The video is also embedded here:

Further reading

The talk cites various people and resources:

And here are the academic literature cited in the talk or are relevant:

Ball, P. (2023). Is AI leading to a reproducibility crisis in science? Nature, 624(7990), 22–25. https://doi.org/10.1038/d41586-023-03817-6

RETRACTED Guo, X., Dong, L., & Hao, D. (2024). Cellular functions of spermatogonial stem cells in relation to JAK/STAT signaling pathway. Frontiers in Cell and Developmental Biology, 11. https://doi.org/10.3389/fcell.2023.1339390 (original PDF)

Hicks, M. T., Humphries, J., & Slater, J. (2024). ChatGPT is bullshit. Ethics and Information Technology, 26(2), 1–10. https://doi.org/10.1007/s10676-024-09775-5

Liesenfeld, A., & Dingemanse, M. (2024). Rethinking open source generative AI: open-washing and the EU AI Act. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 1774–1787. https://doi.org/10.1145/3630106.3659005

Messeri, L., & Crockett, M. J. (2024). Artificial intelligence and illusions of understanding in scientific research. Nature, 627(8002), 49–58. https://doi.org/10.1038/s41586-024-07146-0

Sauermann, H., & Franzoni, C. (2015). Crowd science user contribution patterns and their implications. Proceedings of the National Academy of Sciences, 201408907. https://doi.org/10.1073/pnas.1408907112

Watermeyer, R., Lanclos, D., & Phipps, L. (2024). Does generative AI help academics to do more or less? Nature, 625(7995), 450–450. https://doi.org/10.1038/d41586-024-00115-7

Watermeyer, R., Phipps, L., Lanclos, D., & Knight, C. (2024). Generative AI and the automating of academia. Postdigital Science and Education, 6(2), 446–466. https://doi.org/10.1007/s42438-023-00440-6

White, M., Haddad, I., Osborne, C., Yanglet, X.-Y. L., Abdelmonsef, A., & Varghese, S. (2024). The model openness framework: Promoting completeness and openness for reproducibility, transparency, and usability in artificial intelligence (arXiv:2403.13784). arXiv. https://doi.org/10.48550/arXiv.2403.13784

Widder, D. G., West, S., & Whittaker, M. (2023). Open (for business): Big tech, concentrated power, and the political economy of open AI (SSRN Scholarly Paper 4543807). https://dx.doi.org/10.2139/ssrn.4543807

RETRACTED Zhang, M., Wu, L., Yang, T., Zhu, B., & Liu, Y. (2024). The three-dimensional porous mesh structure of Cu-based metal-organic-framework—Aramid cellulose separator enhances the electrochemical performance of lithium metal anode batteries. Surfaces and Interfaces, 46, 104081. https://doi.org/10.1016/j.surfin.2024.104081

Transcript

Thank you for the introduction. For this talk, I’m going to stay on a high level, and offer my reflections on how to situate “AI” in open science as it relates to wider society. There is a lot of understandable concern about how this technology will affect scientific practice.

And we've seen some pretty egregious examples in academic science. Last month this engineering paper published by Elsevier made the rounds because as soon as you start reading the introduction, you’ll see that it starts with “Certainly, here is a possible introduction for your topic…” This is very likely a sentence generated by ChatGPT, a chatbot based on large language models, and brings into doubt the rigour of the rest of the paper.

I think the most dramatic example is one published by Frontiers in February 2024, where it’s pretty obvious that much of the contents are AI-generated, with a dramatic figure of a lab rat with giant gonads. You can also see some gibberish text in the annotations.

What’s remarkable is that these papers were seen by peer reviewers, editors, and copyeditors and were still published.

On the other side of this is that there is growing evidence of academics using tools like ChatGPT to write their peer reviews.

And in higher education, we know that some students would use generative AI to write their essays. But now some instructors are using the same tools to grade those essays.

With that in mind, there are three things I’d like to cover today.

The first is that words matter. A lot. With all of the hype around “AI” right now, it’s important to realise that this is a big umbrella marketing term (instead of a technical term of art) for a bunch of different technologies.

And I really appreciate how Kate Crawford reminds us that these technologies are neither artificial nor intelligent. What we call AI is built on human labour, and it is certainly not intelligent in the way humans are.

In the context of open science, there are calls for open source AI that is transparent, reproducible, and reusuable by others. I agree with this, but what counts as open source or open AI is also not clearly defined.

Last year Meta released a large language model called Llama 2 and marketed it as open source. However, the license for Llama 2 actually came with many restrictions on who can use it and how they can use it. We can agree or disagree with these restrictions, but these restrictions mean that Llama 2 is categorically not open source as it has been widely defined for software.

There’s this paper by Widder, Whittaker, and West in 2023 about how ambiguity in words like AI and open source AI has created an opening for the big players to openwash their products. What happens here is that the word “open” becomes a very fuzzy term that feels good, while meaning very little at the same time. And this furthers the power that these big players hold over technology and society.

All of this is to say that what people call open source AI is often neither open, artificial, nor intelligent! For the purposes of today’s meeting, I think this is a major problem because when a term is taken to mean everything, it ends up meaning nothing.

And the societal impact of this ambiguity is that the wider public will trust science even less than they already do.

What this means in practice is that we should be clear about what we mean when talking about AI. If there’s a specific underlying concept like machine learning, training large language models, and so on, then let us use more specific terms.

There are also cross-cutting work to collaboratively define terms like open source AI, and I believe the scientific research community should absolutely be part of this conversation. The Open Source Initiative is one of the leaders on this and I encourage everyone to check it out.

Having said that. Even though having clearly defined terminology can help us conceptualise and communicate issues around artificial intelligence it is a necessary but insufficient step for addressing those issues. Because being effective communication doesn’t solve problems by itself.

Yes, words matter, and outcomes also matter. And once again, there is a lot of work in this space on topics like reproducibility which is important in scientific research, to others like democracy, trustworthiness, inclusion, accountability, to safety.

I really like the work by the Mozilla Foundation, such as their thinking about trustworthy AI and the need for openness, competition, and accountability. There are so outcomes for us to consider, and to make things more concrete, I want to focus on a real world example which challenges us to think more deeply about what outcomes we want to see.

To make this point we should realise that what’s often called “artificial intelligence” is foundationally similar to autocorrect/spell check. In this case, your typing input is fed into a statistical model that suggests the correct spelling for a word. Now, I know this is simplifying things a bit, and not to minimise the amazing math and computer science research that went into it, but the large language models underlying much of generative AI today is – on a high level – an autocorrect that runs some very very sophisticated statistics on your input to produce natural feeling outputs. It’s important to know this because enormous amounts of human labour goes into labelling the huge datasets used to train these models.

Around this time last year (2023), workers for the companies behind ChatGPT, TikTok, and Facebook formed a union in response to the horrible working conditions they had to put up with.

What’s behind the “artificial intelligence” façade is that many of them are sweatshop workers who manually label training data.

For ChatGPT, these sweatshop workers where hired to tag and filter text that describes extremely graphic details like sexual abuse, murder, suicide, or torture.

This reminds us of how “artificial intelligence” is neither artificial nor intelligent, and it has become a smokescreen for deeper issues like how labour is not being replaced by machines when in fact it is being displaced and made even more invisible.

So, when we think about what outcomes we want to see, we must consider underlying problems like outsourcing, labour rights, or colonialism.

But what does this have to do with scientific research?

Well, there are similar things happening, where what some people call “crowd science” is used as a research methodology, where academic scientists crowdsource data collection and data labelling to online volunteers.

To be clear, there are positive things that can come from this, for example some scientists build crowdsourcing into science outreach and engagement activities, and there are ways to integrate crowd science into science education.

However, I’ve reviewed many scientific papers about this over the years, and some are really focused on how crowdsourcing is a way to shorten the time needed to process data, and to lower costs for the scientist.

Right now, a lot of this is being used to train machine learning models and other AI applications. And I feel there is a risk that parts of the scientific community is inadvertently perpetuate not just the hype around AI, but also the exploitation of people.

I give these examples because I think that we, as members of the scientific community, should go outside of the ivory tower and engage with wider efforts to think about what outcomes we’d like to see in a world with AI. For instance, what can we learn from labour movements to inform more equitable practices when doing crowd science?

This is just one possibility for thinking about outcomes for science.

And the third thing I want to cover is what AI means for open science. To do this I want to take us back to this extraordinary generated figure of a lab rat. One response that we might have to AI-generated papers or peer reviews is to ban the use of AI tools for scientific papers. Some publishers and journals have already implemented these policies. But I’m concerned about if and what problems we actually solve if we focus on dealing with AI.

I fear that we might inadvertently think that we’ve “solved” the problem, when we are entrenching a much deeper problem.

For example, I wouldn’t be surprised if one of the big academic publishers would release a new proprietary tool for detecting AI generated text in submitted papers and reviews, and tie this feature into journals that they publish. On one hand, maybe the tool is really effective and would weed out these junk papers.

But “solutions” like this might concentrate even more power into these huge publishers, who are a big part of why peer review is so broken in the first place. And in this case, I think fixing peer review is more important than dealing with AI.

I think the broader lesson is that we should support existing open science efforts. For example, there are many tools to help fix peer review, such as preregistration, publishing Registered Reports, publishing preprints followed by open post-publication peer review. Groups like PREreview or journals like the Journal of Open Source Software have been doing this work for years.

We also have to tackle even deeper problems like job precarity in academic research, where some researchers move from one short term job to another, or professors who live in tents. And many of us have to deal with toxic workloads where we are expected to do even more for less pay.

And what’s most important to realise is that AI didn’t create these problems, just like how AI didn’t create sweatshops.

So what I want to suggest is that AI is not the problem. At least it often isn’t.

Instead, AI reminds us of existing systemic problems. And if we only focus on AI, then we risk making those problems much worse.

So, these are the three suggestions I want to make today:

I hope there was something useful in this talk and that it can provoke more conversations.

And if you’re interested in continuing the conversation, I want to point to the Turing Way community.

The Turing Way started as an online guide on open science practices, but over the past five years has turned into a global community of concerned researchers who reflect on some of the issues I talked about today.

For example, last year my co-author Jennifer Ding led a Turing Way Fireside Chat about open source AI, and the labour issues behind it.

I invited you to visit the Turing Way to talk about AI or other open science and open research topics.

With that, thank you very much for coming to my little show and tell today.

addendum on reproducibility

Here are the additional points I made about reproducibility at the Bristol life science Reproducibility by Design symposium on 26 June 2024:

There are possible good uses of so-called “AI” to help with reproducibility (not everything is doom and gloom!).

For example, my colleague Shern Tee pointed me to the “Speech Schema Filling” tool made by Näsström, Götte, and Schumann (2024). This tool was developed by and for chemists to help them better document their experiments.

It uses speech recognition and a large language model running locally on your computer, so that you talk through each step in your experiment as you are doing it, and this tool records everything into an electronic lab notebook.

The remarkable thing is that this language model actually parses what you are saying and records the details of your experiment into a standardized structured data format (for chemistry) that can go with your lab notebook (see this example).

I think this is super cool because as long as you’re willing to talk into a microphone as you work, this tool makes documentation so much easier, and helps with data quality and reproducibility.

That said, considering that so-called “AI” and “open source AI” are neither open, artificial, nor intelligent, there is a recent conference paper (just published June 2024) where they sampled 40 of the commonly used large language models for generative AI.

They evaluated the “openness” of these models with 14 measures of availability of underlying materials, documentation, and access (see Figure 2 in: https://doi.org/10.1145/3630106.3659005). The overwhelming majority of them are highly closed source, so you have no idea what's happening under the hood. Notably Meta's Llama 2 which was marketed as “open source” is 6 from the bottom, and OpenAI's ChatGPT comes in last place.

I think this is bad for reproducibility, especially if we integrate them into the scientific process. And unfortunately we are starting to see this happen.

For example, I've seen real papers in real, highly prestigious journals proposing things such as (paraphrased):

In my view, if we build our science on top of the really opaque “AI” which most of the popularly used ones are, then we are not doing science. We'd be doing alchemy. (not to mention we would become even more beholden to Big Tech who holds power over that technology)

And this alchemy would give us “illusions of understanding” as wonderfully described by Messeri & Crockett (2024) (https://doi.org/10.1038/s41586-024-07146-0). I believe this is a great risk to science.


This talk is open source and I published it on Zenodo.org with this DOI (10.5281/zenodo.11051128) along with a transcript, and I encourage you to check it out, fork it, turn it into what you like, and visit the Turing Way community where we can continue these conversations.

#talks


Unless otherwise stated, all original content in this post is shared under the Creative Commons Attribution-ShareAlike 4.0 license (CC BY-SA 4.0).