Blog

Memeing AI

By October 31, 2023November 9th, 2023No Comments

There is a famous episode of Star Trek: The Next Generation called Darmok where Captain Picard is abducted by a captain of an alien species and brought to a barren planet that has a Predator like monster on it. The alien captain wants Picard to help him hunt the monster, while Picard just wants to make peace with his opposite. Trouble is, he can’t understand what the alien captain is saying. The Universal Translator, that magic device in Star Trek that turns all alien language to English, seems to be translating the alien Captain’s words correctly. Phrases that the alien captain says like “Darmok and Jalad at Tenagra” and “Shaka when the walls fell” are grammatically correct but cant be understood by Picard.

Eventually Picard figures it out. Rather than using language to communicate concepts, the aliens use metaphors and allegory to communicate meaning. In the alien’s culture Darmok and Jalad were mythical heroes who met on an island of Tenagra, fought and defeated a monster together and left as friends. Saying “Darmok and Jalad at Tenagra” makes perfect sense to the alien captain. It was a peace offering. But to Picard it was gibberish.

The episode got considerable attention because we use memes, GIFs and emoji in a similar way today; we use metaphors instead of written text. We are increasingly receiving chat data that contain these types of images. We can review this data manually, but as we look to our AI enhanced future, we ask can AI handle this type of data? We have already seen Gen AI summarising documents and creating helpful suggestions for our reviewers. What can it do with Memes, GIFs and emoji? Can it understand their meaning or will be it confused like Picard? And will it be able to learn?

Memes and GIFS in eDiscovery

 

Memes and GIFs are not new. They have been used in social media and informal chats for as long as the Internet has been alive. GIFs are widespread, available natively in Teams, Slack, WhatsApp and on most chat platforms. They are a popular shorthand for communication in a conversation and are increasingly found in eDiscovery matters. Have a look at your local Team’s chat and check how many GIFs there are for yourself. I’m guessing there is at least one GIF-happy person in your team.

We will talk about emoji and AI at some other. Lots of articles have been written on how to handle this type of data and ProSearch’s product manager Jessica Lee has discussed emojis in depth. .

A meme, also known as an Image Macro, is an idea or behavior that spreads within a culture and often carries symbolic meaning representing a particular phenomenon or theme. They are often humorous, sarcastic and are used instead of the written word. They can even comfortably replace entire conversations. The meaning is not direct, it is implied. While you may think memes may be too informal for corporate use these images are still found in eDiscovery datasets. And let’s not mention that memes seem to be the communication of choice for some tech billionaires.

There are even communities churning out memes for eDiscovery.


‘Disaster Girl’ meme



‘Mother Ignoring Kid Drowning In A Pool’ meme

Reviewing Images with AI

 

Collecting and reviewing Memes and GIFs is not that different to reviewing any other documents. They are simply images in .JPG or .GIF format. They are displayed and reviewed in Relativity just fine. What is significant is the effect they have on AI.

ChatGPT, Bard and ProSearch’s nascent GenAI offering work on text only. Image specific GPT models are being developed by the major tech companies such Google’s Vision AIMicrosoft’s Azure AI Vision and Amazon Rekognition. They can identify faces, scenes and even expressions all in an effort to classify and search for images more easier. Reverse image search from Google and Tineye have been around for years. ProSearch has developed PrivacySuite an AI system that classified Personal Identifyable Information, which in part has a computer vision component to recognise ID cards.

Tests on these systems with memes have had mixed results. The descriptions are predictably robotic and suffer from ‘illusions’ much like their Chat counterpart. Below is the ‘Disaster Girl’ meme as described by one of the larger Computer Vision AI models, Astica. The description is robotic, surprisingly accurate in some parts but completely inaccurate in others. It also completely misses the point of what this image is. Informal tests by others have also concluded that GPT is impressive but still has far to go.

https://www.astica.org/vision/describe/

More specific work on memes has been sparse. Facebook had a project called Rosetta that analysed more than a billion memes and GIFs posted on its social network. The project appears to have only been an advanced OCR system that identifies text for the purpose of screening hate speech and other questionable material. That was in 2018, an eon ago in terms of AI development.

More recently, a paper by Priyadarshini combined text with emotion identification and Sharma described MEMEX, a proof-of-concept system that is trained to recognise the meaning of meme. The latter processes the image and associated documentation (such as the surrounding chat conversation) to generate meaning and context. Both studies were much more limited than Facebook’s work but point to a new area of AI research: Multimodal AI. This is a type of artificial intelligence that can process and understand information from multiple sources, such as text, images, audio, and video. Exactly what memes are. This should allow AI systems to make more accurate and informed decisions than systems that can only process information from a single source.

ChatGPT-4 is multimodal. A preview of its image processing is available through the Bing Chat bot. Below is Bing’s description of the second eDiscovery meme. Although it gets some details wrong it does recognise it as a joke. This feature should be made available soon through the CoPilot service. Informal tests have shown that it has promise identifying humour. Its not perfect, but it is improving.

Summary

 

With Gen AI buzz still in high fervor, it may be tempting to think that AI understanding Memes and GIFs is only matter of time. However, these datatypes requires understanding multiple facets of the data: context, its surrounding conversation, the text in the image, as well as understanding the scene or movie from which the meme originates. It is sarcasm, humour and cultural reference all rolled into one. It’s the ultimate CAPTCHA. Multimodal AI has promise and can describe elements of the images accurately, but it misses the one thing that bots cant have: human experience. The memes above are funny because they are relatable to the real world.

Captain Picard recognised the “Darmok and Jalad at Tenagra” phrase as a peace offering because he understood the significance of the mythological character’s actions. Sure he needed AI to explain to him who the characters were and what they did but he inferred the rest. As is the case for ChatGPT text summarisations, Image AI descriptions could boost productivity by helping human reviewers understand the basic elements of an image but it would still be up to the human reviewer to connect the final dots.

Filed under:

Blog
Damir Kahvedžić

Damir Kahvedžić

Damir Kahvedžić is a technology expert specializing in providing clients with technical assistance in eDiscovery and Forensics cases. He has a PhD in Cybercrime and Digital Forensics Investigations from the Centre for Cybercrime Investigation in UCD and holds a first-class Honours B.Sc in Computer Science. Experienced in the use of industry leading software, such as Relativity, EnCase, NUIX, Cellebrite, Clearwell, and Brainspace, Damir is also a PRINCE2 and PECB ISO 21500 qualified project manager. Damir has published both academic and technical papers at several international conferences and journals including the European Academy of Law, Digital Forensic Research Workshop (DFRWS), Journal of Digital Forensics and Law amongst others.