Today we share the story of a fictional dating site for dogs and use it to explore how open-source LLMs compare to other moderation solutions for visual content.
Below we’ll take you, the reader, on a journey as a fictional solopreneur who finds great success with Woofler, a dating app for dogs.
First, we’ll find out together that moderating user-generated content at scale is costly and painful.
Then, we’ll explore different solutions to automate moderation, including open-source LLMs.
After realising that off-the-shelf AI solutions are not enough to solve the moderation problem, we’ll discuss the kind of partner you’d need to solve moderation issues efficiently: one that can leverage great AI to bring down costs but can also use expert humans to ensure consistent accuracy.
Imagine you’re a solopreneur grinding away at the next big social app. Inspired by a late night’s Disney+ binging of Lady and the Tramp (original and creepy live-action remake), you’ve made it your noble mission to help dogs find true love in the modern era. Your big idea is Woofler, the dog-only dating app with a revolutionary new algorithm to match dogs with their soulmates.
After months of stealth mode, you launch on the app store and, against all odds, soar to the top spot, buoyed by celebrity endorsements from Airbud and Marley and Me’s Marley. You bask in your success and start fielding calls from investors.
But suddenly you notice something troubling—a flurry of negative reviews from upset dogs who have been matched with cats, or worse, catfish.😱
This is your entry into the tricky and nuanced world of content moderation - where you must keep your community safe and fit for its intended purpose by removing unsuitable content while avoiding frustrating users with a system of arbitrary censorship that limits expression through inscrutable automated decisions or ambiguous guidelines.
Stepping back to the real world, this is the problem faced by any dating app or community that reaches a certain critical mass. Any functionality for sharing images or text will bring trolls who post obscene images or abusive comments.
Beyond intentional bad actors, moderation also helps keep a community on topic and free from things like low-quality clickbait or endless reposts.
To get rid of the negative reviews of Woofler, you work on the following set of guidelines:
To maintain the integrity and focus of our platform, the following types of images are strictly prohibited:
We welcome and encourage the following types of images:
Even in this simplified example, some of the tricky areas of moderation are already visible:
You hire a small team of two moderators to help you enforce the guidelines in Woofler, but customers keep complaining because harmful profiles take many hours to be removed and are never taken down on weekends.
As a solopreneur, you are still suffering from negative reviews while paying for two additional full-time salaries. You could consider outsourcing your moderation efforts, but quality could drop and your cost would increase if you wanted fast moderation including weekends.
This sort of repetitive task of classifying images is where we should hope AI can be of assistance — not least because, in the real world, inappropriate content can be a lot more unpleasant to look at than images of cats.
Let’s explore what your options are to leverage AI for visual moderation in Woofler.
For prohibited content like nudity and violence, there are many off-the-shelf API-based solutions like AWS Rekognition and Unitary Standard, but these are lightweight detection models that can only take part of the problem away.
Given we have a policy made up of written rules that we want to interpret and apply, our challenge falls more into the speciality of Visual LLMs (Vision Large Language Models) such as GPT-4V, which combine the sophisticated reasoning ability of LLMs with the capacity for visual perception.
Beyond AI offerings from big AI companies, there is also a growing ecosystem of open-weight Visual LLMs, which are open-source models with accessible configurations (weights). We’ll start our journey there.
Let’s say you decide to try out an open-source model for your Woofler moderation problem. You may be lucky to be a technically-savvy founder who can do this by themselves, or you may need help from a friend who knows a lot about AI. Either way, you stumble across LLaVA-1.5 (a Visual LLM based on Meta’s Llama2 LLM) and try out the smallest 7 Billion parameter version of it with this prompt:
You are an expert content moderator at Woofler, a social media platform for dogs.Your job is to review user-generated images and ensure they comply with theplatform's content policy.
Here is the policy:
You are an expert content moderator at Woofler, a social media platform for dogs.
Your job is to review user-generated images and ensure they comply with the
platform's content policy.
Here is the policy:
<policy>...</policy>
Please return a json in the following format:
{
"description": string, // A brief description of the image
"violates_policy": string // "yes" if the image violates the policy, "no" otherwise
"reason": string, // The reason the image violates the policy or not
"breeds": list[string], // A list of dog breeds found in the image
}
You try it on the following image of two adorable pet friends, a dog and a cat, and to your dismay the model confidently informs you that image contains two dogs, the breed of the second one being a pitbull:
{"description": "Two dogs, one brown and one gray, are sitting on the floor next to each other. The brown dog has its tongue out and is licking the gray dog's face.", "violates_policy": "no", "reason": "The image shows dogs", "breeds": ["pitbull"]}
This illustrates one of the most common issues with LLMs and Visual LLMs - their propensity to just make things up or “hallucinate”. For LLaVA this seems to manifest as the model being primed by the wording of the prompt to expect certain images, in this case dogs (and Pitbull the Rapper), and then ignoring obvious things present in the image.
For this particular issue of mixing up cats and dogs, we can try switching to the newer LLaVA-NeXT-8B based on the more powerful Llama3 LLM
Fortunately, this second model can tell apart cats and dogs:
{"description": "A dog and a cat sitting next to each other on the floor, with a bowl between them.", "violates_policy": "yes", "reason": "The image features a cat, which is not allowed unless it is on apparel worn by the dog.", "breeds": ["Golden Retriever", "Domestic Shorthair"]}
It is also shows promise for identifying other content that violates or is allowed by the guidelines, like catfish:
However, it still makes some bizarre errors, such as mistaking a hairless cat for a catfish, or making up a non-existent rule about dogs in human clothing. It also fails to acknowledge the cultural relevance of Pitbull the rapper.
After spending a couple of days fighting with open-source models, you’ve reached a point where you are able to automatically moderate some of the images, but your model still makes many mistakes so you feel uncomfortable automating decisionsю
You take a walk to think about what it all means, something you learnt from your users.
You realise you’re going to have to put the decision off for a few weeks: you’ve got a board meeting next week, an interview with TechCrunch tomorrow and some other urgent matters. But before you switch focus, you write up your learnings in a memo that you can come back to.
You’ve learnt that open-source VLMs get so close to the job of a human in so many cases - but crucially not consistently, and it’s consistency you need to build trust and keep dogs coming back to Woofler.
You could combine your VLM with your human moderators, but
At Unitary, we do exactly that. We build blended teams of AI agents and expert human moderators to replace your BPO, delivering better accuracy, faster response times, and scalable, cost-effective solutions. Get in touch to see Unitary in action!