OpenAI made a significant announcement on Monday, introducing a major update to ChatGPT. This update empowers ChatGPT's GPT-3.5 and GPT-4 AI models with the capability to analyze images and respond to them as part of text-based conversations. Additionally, OpenAI revealed plans to enhance the ChatGPT mobile app with speech synthesis options, enabling fully verbal interactions with the AI assistant when combined with its existing speech recognition features.


These new features are scheduled to be rolled out to Plus and Enterprise subscribers over the next two weeks. Speech synthesis will be available on iOS and Android platforms, while image recognition will be accessible through both the web interface and mobile apps.


OpenAI's image recognition feature allows users to upload one or more images during a conversation, utilizing either the GPT-3.5 or GPT-4 models. OpenAI suggests various practical applications, such as using images of the fridge and pantry to decide what to cook for dinner or troubleshooting issues like a non-starting grill. Users can also use their device's touch screen to highlight specific areas of the image they want ChatGPT to focus on. While OpenAI has provided a promotional video showcasing these capabilities, real-world effectiveness remains untested.


Regarding the technical workings, OpenAI has not disclosed the inner workings of GPT-4 or its multimodal functionality. However, based on existing AI research, multimodal AI models typically convert text and images into a shared encoding space, allowing them to process diverse data types using the same neural network. OpenAI might employ techniques like CLIP to align image and text representations in a shared latent space, enabling ChatGPT to make contextual inferences across text and images.


In the realm of audio, ChatGPT's new voice synthesis feature enables spoken conversations with the AI. OpenAI describes it as a "new text-to-speech model," though text-to-speech technology has been available for some time. Users can activate this feature by opting in to voice conversations in the app's settings and choose from five synthetic voices named "Juniper," "Sky," "Cove," "Ember," and "Breeze." OpenAI developed these voices in collaboration with professional voice actors.


OpenAI will continue to use its open source speech recognition system, Whisper, for transcribing user speech input. Whisper has been integrated into the ChatGPT iOS app since its launch in May, with the Android app receiving this integration in July.

#OpenAI

#ChatGPT

#GPT-3.5

#GPT-4

#AI models

#Image analysis

#Speech synthesis

#Text conversation

#Verbal interaction

#Mobile app

#Plus subscribers

#Enterprise subscribers

#Speech recognition

#Image recognition

#Multimodal AI

#CLIP

#Technical details

#Text-to-speech

#Synthetic voices

#Voice actors

#Whisper

#Transcription

#ChatGPT iOS app

#ChatGPT Android app

#Natural language processing

#Conversational AI

#AI assistant

#Image processing

#Visual data

#Neural network

#Latent space

#Contextual deductions

#Everyday applications

#User interaction

#Voice conversations

#Mobile technology

#Verbal communication

#User experience

#Speech input

#Professional benchmarks

#OpenAI's Whisper

#Text and image processing

#Voice interaction

#Image-based troubleshooting

#AI advancements

#AI technology

#AI capabilities

#Human-like AI

#Language model

#Cutting-edge AI