I’d like to begin this post by warning you not to treat this as a research article! This is an opinion piece, which sometimes is based on speculations and common sense arguments, rather than rigid experiments (although sometimes scientific papers do feel more like opinion pieces, but we’ll simply disregard such cases! ).
In recent months, ChatGPT (https://openai.com/blog/chatgpt/) has absolutely disrupted the Internet and, as usual in virtually anything, there arose two camps. The first one consists of people who are absolutely fascinated by what ChatGPT can do and think about the possibilities it brings (let’s nickname this camp as early adopters). The second one are those more careful who got their fair share of plausible, but wrong outputs and are a bit more skeptical about using this technology off the shelf (let’s call this camp early critics). And of course, personally, I simply belong to the camp of people who divide people into two camps! Getting back to the initial argument, I can see why both stances are in fact valid from their own point of view.
Early adopters vs early critics
The early adopters try to see how investing in ChatGPT can potentially cut the costs of running their business and save some money down the road. There have obviously been a number of cases (https://openai.com/blog/gpt-3-apps/) who already adopted GPT-3 for their business, otherwise OpenAI wouldn’t run it as a paid service. I’d argue these cases would benefit equally from adopting ChatGPT as well. Interestingly, most of the featured GPT-3 use cases don’t rely on the GPT-3’s ability to provide correct factual information. Let’s examine these ventures in more detail:
- Viable provides the summaries of insights from the surveys, help desk tickets, live chat logs, reviews, etc. The example from the blog post above is (and I quote): “For example, if asked, What’s frustrating our customers about the checkout experience?, Viable might provide the insight: Customers are frustrated with the checkout flow because it takes too long to load. They also want a way to edit their address in checkout and save multiple payment methods.”. What is the harm if a generated insight uses wrong facts and gives wrong conclusions? Well, the decision makers will notice it and will not act on it (if they will, they are simply bad decision makers!). Now what is the benefit if the generated insight is in fact legit, well, you can grab it, act on it, improve your service, get more happy customers, and happy customers are returning customers willing to spend money on your service and bring you benefits! Do potential benefits outweigh potential harm? Yes! Could you simply rely on GPT-3 for decision making? No!
- Fable Studio uses GPT-3 to fuel interactive stories for their virtual beings. Now these are the stories, do they have to be factually correct? No, if the author decides that it’s not relevant. And if it is relevant, the author can correct it. A large problem here is that the author needs to notice that it’s incorrect and these GPT-models sound very-very plausible even when they provide a factual bullshit. But again, in the world of fables, how harmful will it be if the author doesn’t notice it? Well, not very. How beneficial will it be if the generated story provides some good starting point for a book/video? Potentially extremely beneficial!
- Algolia is said to offer “semantic search”. Now what that means and how it’s different from Google is not entirely clear from just that description. However, the part where they use GPT-3 is (and I quote from the OpenAI’s blog post again): “Algolia Answers helps publishers and customer support help desks query in natural language and surface nontrivial answers”. The benefit-harm argument for this use case would be sort of similar to the case of Viable.
Now early critics point out that ChatGPT doesn’t always provide factually correct answers and unfortunately provides wrong, but plausible answers in a convincing and somewhat stubborn manner. On the side note, this is actually quite funny, because I’m working on the opposite problem of generating wrong but plausible answers for multiple choice questions. We also tried using language models and the most typical problem is that we get correct answers instead of wrong! These LLMs can never just do what we want them to! Sorry, I digressed… So are wrong, but very convincing answers produced by ChatGPT a problem in general? Yes! In fact, OpenAI knows about the problem and recognizes that it’s challenging to fix (and I fully agree that it is, even more so if you solve the opposite problem!):
ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows. (From the original OpenAI’s blog post on ChatGPT)
Is it something the scientific field should look at? Most definitely yes! Is it a problem if we want to build a fully automated pipeline? A resounding yes!!! Is it always the problem? No, as we have already seen for 3 business applications of GPT-3 above (and one could easily replace GPT-3 there with ChatGPT, because they are birds of the same feather).
ChatGPT for learning?
Is ChatGPT good for learning? It depends. Is it good for learning from scratch? No. Why? Because of the truthfullness problems that early critics have pointed out. When you’re learning something new it’s very useful to get the correct information from the very beginning. Why? Because then you learn on top of it and it becomes a brick in your knowledge dome. Now if one the bricks is faulty, the whole structure is shaky. Depending on what brick it is, it’s not always easy to replace it. For the case of knowledge, I find that often the wrong stuff I learned first sits like a bug in my head and makes me doubt myself countless times, even when I “fixed” it and tried to replace it with the correct piece of knowledge. I want to break apart a bit further two most common learning use cases I’ve heard about from my friends and colleagues, and via social media in recent months.
Learning a second language
When you learn a language, you essentially learn like a mapping in your brain from your native language to that other language (at least that’s how I think about it and L2 researchers would probably disagree with me). So if I tell you that “apple” is “цибуля” (which actually means “onion”) in Ukrainian, you will trust me, if you’re at the beginning of your language learning quest. Then if the mistake is simple like that you could just look it up in a dictionary and prove me wrong. However, if it’s more elaborate, say that in Swedish you need to put adverbs before the main verb in the subordinate clauses only if the conjunction is att (which is not true, you need to that always, no matter the conjunction), then it’s harder to verify if you’re just learning and don’t have access to the L2 expertise.
Now when you’ve already spent some time learning the language and can “feel the language” to some extent, then you can more often than not make “gut feeling”-judgements on whether the provided tip/translation is good or not. But before you get there, I’d recommend against using ChatGPT.
Learning to code
The argument here is similar to learning the second language with one major difference: you can run the code and check if it gives you the correct result! So even if you’re learning, there is an easy way to check whether the provided code is correct. Now there are two caveats to this approach:
- If you ask ChatGPT of why your code is correct/wrong, the motivation may be wrong in a very subtle way, so that you learn it wrong. This makes forums like StackOverflow more preferrable, because people with expertise will more often than not judge the given answer by upvotes making the quality of the answers higher than that of ChatGPT. No wonder why StackOverflow has banned ChatGPT!
- If you ask ChatGPT to write a more complex code for you, there are way more possibilities for bugs that you might miss when you test manually. This means someone has to write automated tests for your code. Now if you ask ChatGPT to write the tests for you, then it becomes like a perpetual loop, because you’ll need to make sure those tests are correct. So some level of expertise is always necessary, you can’t blindly rely on hte generated code snippets.
ChatGPT replacing jobs?
There are so many types of jobs these days that I don’t even remotely know all of them, which is why I tend to believe that ChatGPT will most probably make some of them obsolete, yes. But then in return it will create some new jobs, like prompt engineers (not sure how long-lasting these will be though).
Will it replace jobs for which you need qualification? I don’t think so. Before ChatGPT is able to fix a car or brew some coffee, these kinds of jobs don’t go away anywhere. And honestly now even with coffee machines all over the place, the coffee shops are still around!
Regarding the jobs that require higher education, take translators, for instance. Did Google Translate or DeepL remove the need for translators? No. Why? Because they rely on Machine Learning and currently, no matter how good ML algorithms are, they never give you 100% success rate, unless aplied on a very easy toy problem. So whenever you need absolute accuracy, like say, in translating legal documents, you have to rely on humans for now (and I believe you’ll have to within the current paradigm of learning from data). Aren’t humans prone to mistakes, you ask? Yes, they are, but they can also be held accountable, unlike ChatGPT (or any other ML model).
Summary
Now to summarize, what characterizes applications for which ChatGPT/GPT-3 is a good fit?
- Factual truthfullness is not necessary. It’s nice to have, but it’s not catastrophic if some/all facts are wrong.
- The benefits of getting something useful outweigh by a considerable margin the harm of getting wrong and misleading information.
- The generated outputs are NOT the one and only source for decision making and are thus NOT the part of any fully automated pipeline.
- If factual truthfullness is necessary, the expert knowledge to assess the output, generated by the GPT models is readily available and used.
What potential application areas fit the aforementioned conditions?
- Working with customer insights. Again, getting useful generated insights, from, say, summaries of help desk tickets, could potentially lead to improving your service, making customers more happy and thus willing to return, spend more money and drive your revenues. How harmful the generated insights that are simply not true? This is where condition 3 must kick in, you can’t use these insights for decision making directly! But if you verify this insight in another way and it will turn out to be worth acting on, then one of the GPT-models just potentially saved you a lot of time and maybe even brought some money.
- Creative writing. Here obviously, you want to write a story that is interesting, not necessarily factually correct (the latter is in fact largely irrelevant for fiction, fantasy, fairy tales, and even science fiction sometimes).
- Chitchat. Built for pleasure, not for veracity!
- Speeding up the process someone is already qualified for. The most similar example I can find here are machine translation systems (like DeepL or Google Translate), which currently speed up the translation process significantly. Can you rely on them entirely though? Still no! If the quality of the translation matters, you still need a human to check!
The list of no-go applications is really endless, but basically it’s any application area where the aforementioned 4 conditions do not apply.