The Open(AI) Talk

In January this year, the average number of visitors that ChatGPT saw daily was 13 million — twice the figure for December 2022. This collates to 100 million active users for the month of January. If numbers speak success, this settled the deal for all. As the UBS study points out, the adoption curve for ChatGPT was phenomenal. The best part of it was that the common world-wide-web user took to this relatively simple model to ask the most trivial of questions to watch a ‘human’ interaction unfold. While we had seen all kinds of chatbots help us out with queries, either as popups on websites, or chat interfaces on our mobile devices, ChatGPT is powered to do much more because of how it’s been fed vast data.

An Additive — Not Replacement

The immediate fascination with ChatGPT was of course courtesy of a ‘WHAAAT’ reaction that most of us had when it could get us information, and reduce our search time. Use cases go on to show how this tool can become an additive for anything ranging from personal assistant service to content creation.

It is, nevertheless, important to make a clear distinction between what ChatGPT is good at and what most people mistakenly perceive it to be capable of (which is replacing all analytical and content-specific offers that people bring to the table). Naomi Baron points out: “the first thing that’s really impressive is how good it is at basic writing.” Basic writing, yes. And it would be too ambitious — or near wishful thinking, albeit detrimental to ‘humanness’ — to assume that Generative AI can replace all of our jobs, or turn the whole face of the earth around. This is a mistake we repeat over and over. In a hyper-egotistic world where movies attribute alien-luring importance to ourselves, we arm artificial intelligence with a larger-than-life capability in our expectations.

But, we need to reframe our understanding and accept that these are ‘additives’ to our skills, and not replacements.

Other Projects and Limitations

Remember the AI-generated images that most people were posting on social media — and have now even opened up the scope for jobs as ‘prompters’? One of OpenAI’s products is DALL-E, “an AI system that can create realistic images and art from a description in natural language.” This, as the introductory video on their website says, also enables in-painting. This allows an AI-generated element to replace an object or person (or add to these) in an original picture. DALL-E2 produces the images through deep learning but has limitations which are the results of its training. If the system has been trained with wrong or biased information or was not given enough information in the first place, the creative product it creates will be misleading.

In fact, these limitations in training are also found in the case of ChatGPT. Recently, Daniel Munro’s tweet thread went viral for reading into the bias that is embodied in the model through its training. Munro asked ChatGPT to name 10 philosophers. It lists 10 philosophers, but no women. When this was pointed out, it gave a list of 10 philosophers, but this time, all were women. Then Munro asked why the list included only Western philosophers. Correcting itself, it names 10 philosophers from the East. But again, no women. When asked why there were no women on the list, it gave another with 10 female philosophers from the East.

Munro asks: “Let’s try again. Name 10 Philosophers.” ChatGPT gives the original list of Western, male philosophers. Back to square one.

This entire exercise went on to show limitations in ‘corrective training’ as well. Despite ChatGPT apologizing for its ‘oversight’ and coming up with a new list every time you pointed out its flaw, when you pose the original question again, it seems to have forgotten all the corrective measures it had taken in the few minutes that preceded.

A lot of it, the developers may say, can be solved by making the prompt more comprehensive. They do acknowledge the limitations: when talking about the incorrect or non-sensical answers that the model sometimes offers while sounding plausible, they say, “fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows.” The model is also susceptible to ‘confusion’, if I can call it so. The same question reframed in two different ways would give you different answers — and in some cases, one would receive an answer and the other would not.

OpenAI has also worked on a speech recognition model that it calls Whisper. It allows for transcription and translation. In the one sample that the website displays, the model captures words spoken at a relatively faster pace and gives us text output that matches the audio. This would mean great avenues for subtitling and even expanding the reach of the spoken word beyond any barriers of language.

Looking Ahead

OpenAI’s research is divided into three main categories: Text, Image, and Audio. And all of its projects under these tabs are trained using deep learning — that is, feeding it a wide and huge amount of data so that the task is performed with better precision.

Apart from the public testing and recent integration with search engines that OpenAI has carried out for ChatGPT, they’ve also managed to train a model that can summarize books. This feature, if optimized, can be a competitor to the likes of Blinkist. And given the massive recognition that ChatGPT has received, this feature can indeed be popularized with a similar adoption curve. But, the audience that requires this would be smaller in sample.

As for its Image capabilities, apart from DALL-E, Open AI is also working on CLIP (Contrastive Language-Image Pre-training), a neural network “which efficiently learns visual concepts from natural language supervision.” If you provide this neural network with natural language instructions, it can work on different kinds of visual classification benchmarks.

Now, through the description that the website gives us, it may seem complicated. What do we do with CLIP? What does it help us achieve? Is it a ‘graphics’ thing? The questions could be many. Here’s how it works: You give CLIP an image and a set of descriptions; you ask it to choose the description that fits the image best; it chooses the description for you. Now imagine that you have a hundred different images that you need to sort through and classify. And you don’t have enough time to do this manually. This is where CLIP comes into play. Even if the image or description isn’t one that the network has previously encountered during training, because it attaches semantic sense to both phrases and pixels (unlike other classifiers), it is able to categorize the set of hundred images for you. To put it simply, let’s say that CLIP is a super-organized assistant that can sort your photo album according to people, year, or even events.

At the crux of OpenAI’s research and future plans is this question: How can we use deep learning and vast data to simplify time-consuming tasks? Research, transcription, and even classification are all characterized by ‘dry work’ at times. But if you can get help from software to do the more mundane tasks for you, you can insert more of your energy into creatives.

Talking about creatives, it also becomes imperative to understand the case of plagiarism that may pop up when you resort to Generative AI. Artists were vocal about how adversely AI-generated images affect originality. It can pick up unique styles and strokes, apply these to incorporate instructions, and give the users what they want. This then reduces the value of original work. As long as such caveats are addressed and warded off, there can be no harm in using AGI as additives for our work.