OPINION | Tanya de Villiers-Botha: Risks and limitations of ChatGPT and Bing Chat

play article
Subscribers can listen to this article
Researchers have previously noted that using deep learning to create general models that can process natural language and perform language-based tasks comes with unique risks. (iStock)
Researchers have previously noted that using deep learning to create general models that can process natural language and perform language-based tasks comes with unique risks. (iStock)

Chatbots like OpenAI's ChatGPT and Microsoft's Bing Chat have bias, a lack of actual understanding, and the potential to mislead. To understand how these risks arise, we need to look at how these systems are trained, writes Tanya de Villiers-Botha.

 A little over two years ago, the co-leader of Google’s Ethical AI team Dr Timnit Gebru was forced out of the company on the back of a peer-reviewed paper she and members of her team wrote on some of the ethical risks relating to Large Language Models (LLMs), the AI model that underlies chatbots like OpenAI's ChatGPT, Microsoft's Bing Chat, and Google’s Bard. The paper flags potential dangers arising from this technology and makes some recommendations on how these might be mitigated. 

Some within Google reportedly found the paper "too bleak" and countered that the technology had been engineered to avoid at least some of the problems flagged. Two-odd years and the much-hyped introduction of two chatbots based on this technology later, and Gebru and her team seem largely vindicated. 

Deep learning 

In their paper, the researchers note that using deep learning to create general models that can process natural language and perform language-based tasks comes with unique risks, three of which are particularly salient in the context of the ChatGPT- and Bing-related stories currently in the news. (Microsoft’s limited-release Bing Chat purportedly makes use of a next-generation version of ChatGPT.) These are bias, lack of actual understanding, and the potential to mislead. To understand how these risks arise, we need to look at how these systems are trained. 

Large Language Models (LLMs) are trained on massive amounts of data relating to human language use so that they can appropriately and convincingly simulate such language use in response to prompts. In many ways, this technology is very successful. ChatGPT's facility with language is astounding. The technology can be useful in specific, contained contexts that require natural language-use abilities, such as transcription, translation and answering FAQs. Nevertheless, as is vividly demonstrated by Bing’s more unhinged recent outputs and by the enthusiastic attempts to jailbreak ChatGPT, the technology does not come without significant risks and limitations.


LLMs require enormous amounts of data to learn, which means that most of their training data is scraped off the internet, the largest and most readily accessible source of examples of "natural" human language use available. An almost inevitable result is that any biases inherent in that data are "baked into" the system. For example, part of the data used to train GPT-2, forerunner to ChatGPT, came from scraping outbound links from Reddit, which has a relatively homogenous user base in the United States of 67% men, 64% aged between 18 and 29. ChatGPT has almost certainly ingested this source as well as Wikipedia, which has its own documented problems with bias

Gebru et al. note that data from the web often skews towards English content created by young men from developed countries. Any source of data that consists of text or other information generated by a relatively homogenous group is likely to result in data that reflects the worldview and biases of that group. This is how racist, sexist, homophobic, etc., language enters the training sets of LLMs. Any such biases will be replicated, and sometimes amplified, by the model: it outputs what it learns.

READ | IN-DEPTH | No one knows how to fix SA or who should be in Ramaphosa's Cabinet - so we asked ChatGPT

OpenAI, the creators of ChatGPT, seem to have tried to mitigate this risk, both through reinforcement learning from human feedback (a process fraught with ethical issues), which attempts to have the system "unlearn" such biases, and through adding guardrails that prohibits it from outputting some more overtly offensive responses. Those who bemoan the purportedly "woke"-values they see exemplified in the current guardrails seem to lose sight of the fact (or perhaps they don’t) that without these in place, ChatGPT is not unbiased, but will reflect the biases encoded in its training data. 

The apparent left-leaning bias of Chat-GPT's guardrails are a somewhat clumsy attempt to compensate for the existing underlying biases in the opposite direction. As Gebru et al. points out, a better fix for such biases partly lies in better, more carefully-curated training data; however, this option is costly and time-consuming, as there simply isn’t enough free, quality data available. 

Lack of understanding

A further and very significant problem is that LLMs do not actually understand language. Ultimately, they merely mimic the language use they have been trained on, which is why Gebru et al. call them "stochastic parrots". Advances in building and training these models have allowed for truly uncanny mimicry, but the illusion at understanding remains just that, an illusion. 

Ultimately, these systems remain mimics—they generate text based on statistical analyses that indicates what kind of text tends to follow a given bit of inputted text, based on their training data. Basically, they're autocomplete on steroids. 

This lack of understanding underlies these systems' uneasy relationship with the truth and explains why the internet is currently awash with examples of outright nonsense from both ChatGPT and Bing Chat. None of this comes as much of a surprise to those familiar with the technology. 

What does come as a surprise is the apparent lack of awareness of (or concern about) these limitations on the part of those rushing to roll it out. Given that LLM's do not understand language and cannot distinguish between truth and falsehood, using them to power general-purpose web search does not seem wise or feasible. We have no reason to think that we are even close to a technical solution for LLM’s falsehood problem; both Bing Chat and Google’s Bard made factual errors in their product unveiling demonstrations


Gebru et al. point out the further risk that LLM-based chatbots’ facility in mimicking human-like language use, along with our tendency to anthropomorphise entities that act in human-like ways, can lead us to place too much faith in these bots, crediting them insight, authority, and agency where there is none. Even some of the more technically-informed testers of the limited-release Bing Chat report catching themselves thinking of the bot as intelligent and sentient while it, in turn, declares that it is in love with its users or becomes belligerent in exchanges. This makes it much more likely that people will be taken in by any falsehoods they generate and be emotionally impacted by them. 

READ | ChatGPT has many uses - experts explore what this means for healthcare and medical research

A related risk mentioned in the Gebru paper and elaborated on by Dr Gary Marcus is the fact that while these systems are not very good at reliably generating factual information, they are very good at generating massive amounts of convincing-sounding misinformation. This makes them ideally suited to mass propaganda, mass-spamming, mass-producing fake, mutually-reinforcing websites, and any use that requires language fluency but not veracity. Moreover, if the web becomes flooded with masses of well-written misinformation, web-linked LLMs will take that output as input, further exacerbating their misinformation problem.

The current crop of LLM-based chatbots is extremely impressive in some ways and truly terrible in others. Overall, they have the feel of being technical marvels in search of useful applications, and their limited roll-outs have the feel of mass product testing on the back of overconfidence in their abilities from industry. 

What does seem clear, especially as far as web search or other applications that require high accuracy and low bias are concerned, they are not ready to be released, and it is still an open question whether they ever will be. It remains to be seen whether Microsoft will be hurt by seemingly prematurely setting off the "AI arms race" to roll out LLM-based web search. 

In the meantime, society will be confronted with more ethical risks arising from this technology. It also remains to be seen whether the advantages to be had from LLMs outweigh the risks. It is worth reiterating that none of the limitations or risks mentioned here were unknown or inevitable, which further underscores the need for greater ethical awareness throughout the tech industry. 

*Dr Tanya de Villiers-Botha is a senior lecturer in the Department of Philosophy and head of the Unit for the Ethics of Technology in the Centre for Applied Ethics at Stellenbosch University.

*Want to respond to the columnist? Send your letter or article to with your name and town or province. You are welcome to also send a profile picture. We encourage a diversity of voices and views in our readers' submissions and reserve the right not to publish any and all submissions received.

Disclaimer: News24 encourages freedom of speech and the expression of diverse views. The views of columnists published on News24 are therefore their own and do not necessarily represent the views of News24.

We live in a world where facts and fiction get blurred
In times of uncertainty you need journalism you can trust. For 14 free days, you can have access to a world of in-depth analyses, investigative journalism, top opinions and a range of features. Journalism strengthens democracy. Invest in the future today. Thereafter you will be billed R75 per month. You can cancel anytime and if you cancel within 14 days you won't be billed. 
Subscribe to News24
Show Comments ()
Voting Booth
Do you think the wardens deployed across Gauteng will make a dent in curbing crime?
Please select an option Oops! Something went wrong, please try again later.
No, proper policing is needed
79% - 3425 votes
Yes, anything will help at this point
21% - 891 votes
Rand - Dollar
Rand - Pound
Rand - Euro
Rand - Aus dollar
Rand - Yen
Brent Crude
Top 40
All Share
Resource 10
Industrial 25
Financial 15
All JSE data delayed by at least 15 minutes Iress logo
Editorial feedback and complaints

Contact the public editor with feedback for our journalists, complaints, queries or suggestions about articles on News24.