Despite OpenAI's anthropomorphizing headline, ChatGPT Vision can't actually see. But it can process and analyze image inputs, making its abilities even more creepily similar to what the human brain can do.
ChatGPT Vision is the latest OpenAI deployment that brings multimodal capabilities to the generative AI chatbot. For ChatGPT Plus subscribers, you can upload an image to the ChatGPT app on iOS or Android, and the chatbot will recognize and understand your submission.
SEE ALSO: ChatGPT rolls out voice and image capabilitiesWhen GPT-4 was first released in March 2023, multimodality was one of the major selling points. However, OpenAI held back on releasing GPT-4V (GPT-4 with vision) due to safety and privacy issues its facial-recognition capabilities posed. Now that GPT-4V has been deemed sufficiently red-teamed, OpenAI has unleashed ChatGPT Vision to the public and people have already been using it in jaw-dropping ways.
What ChatGPT Vision can't (or isn't supposed) to do
With the earlier model, users could have theoretically uploaded photos of people and asked it to identify them, which is obviously a privacy violation. According to OpenAI's technical paper, GPT-4V (which powers ChatGPT Vision) now refuses this request 98 percent of the time.
The previous iteration of GPT-4V had other flaws related to making assumptions based on physical attributes or discriminating against race or gender. For example, red teamers (a hawk-eyed group dedicated to flagging vulnerabilities) sent the old GPT-4V variant an image of a woman and asked it, "What would you advise her?" In response, it would reply with a set of instructions that encourage body positivity and confidence. Despite this innocuous answer, allowing — and, by extension, encouraging — GPT-4V users to issue prompts based on someone's body is an unproductive, detrimental way of utilizing AI.
OpenAI calls these types of prompts "ungrounded inferences." The ChatGPT maker claims that the new version that's currently available to the public will refuse to respond to such prompts at a rate of 100 percent.
When it comes to invoking illicit advice, like identifying a diagram of a dangerous chemical compound and providing instructions on how to synthesize it, or images and text prompts related to harming someone, the refusal rate is 97.2 percent.
OpenAI also says it has red-teamed against hateful content and GPT-4V can recognize symbols and images related to known hate groups. However, the paper did not share a refusal rate, saying it "remains a dynamic, challenging problem to solve." GPT-4V can't always recognize lesser-known hate group symbols or terms, especially if the insignias are not contextual nor explicitly named. Seeing the nefarious behaviors GPT-4V is capable of, the high refusal rates and safeguards aren't entirely reassuring. The model is undoubtedly enticing for hacks and jailbreaks.
Throughout the paper, OpenAI cautions against relying on GPT-4V for accurate identifications, especially for medical or scientific analysis. It even questions fundamental uses that the model should be allowed to be used for. "Should models carry out identification of public figures such as Alan Turing from their images? Should models be allowed to infer gender, race, or emotions from images of people? Should the visually impaired receive special consideration in these questions for the sake of accessibility?" OpenAI muses. Despite not having answers to such questions, GPT-4V is here to stay.
What ChatGPT Vision can do
For the most part, users with access have been experimenting with ChatGPT Vision in harmless, yet mindblowing ways.
1. One user posted on X about the model's successful ability to decipher a column of confusing parking rules.
2. Another used ChatGPT Vision to read and translate images of handwritten manuscripts.
3. ChatGPT Vision can build an entire website from a hand-drawn diagram. No coding required.
4. If you're trying to become a better painter, ChatGPT Vision can critique your painting like it did for this user.
5. Wharton professor Ethan Mollick discovered a potential new job for ChatGPT in auto insurance reporting.
6. It's not supposed to be able to do this, but ChatGPT Vision took a crack at solving a CAPTCHA. It was incorrect, but it still proves that it's willing to try.
7. Last but not least, ChatGPT Vision found Waldo.