Scaling Content Moderation for Dating Apps with AI and Human Expertise

Below we’ll take you, the reader, on a journey as a fictional solopreneur who finds great success with Woofler, a dating app for dogs.

First, we’ll find out together that moderating user-generated content at scale is costly and painful.

Then, we’ll explore different solutions to automate moderation, including open-source LLMs.

After realising that off-the-shelf AI solutions are not enough to solve the moderation problem, we’ll discuss the kind of partner you’d need to solve moderation issues efficiently: one that can leverage great AI to bring down costs but can also use expert humans to ensure consistent accuracy.

The problem that comes with growing your platform

Imagine you’re a solopreneur grinding away at the next big social app. Inspired by a late night’s Disney+ binging of Lady and the Tramp (original and creepy live-action remake), you’ve made it your noble mission to help dogs find true love in the modern era. Your big idea is Woofler, the dog-only dating app with a revolutionary new algorithm to match dogs with their soulmates.

After months of stealth mode, you launch on the app store and, against all odds, soar to the top spot, buoyed by celebrity endorsements from Airbud and Marley and Me’s Marley. You bask in your success and start fielding calls from investors.

But suddenly you notice something troubling—a flurry of negative reviews from upset dogs who have been matched with cats, or worse, catfish.😱

This is your entry into the tricky and nuanced world of content moderation - where you must keep your community safe and fit for its intended purpose by removing unsuitable content while avoiding frustrating users with a system of arbitrary censorship that limits expression through inscrutable automated decisions or ambiguous guidelines.

Stepping back to the real world, this is the problem faced by any dating app or community that reaches a certain critical mass. Any functionality for sharing images or text will bring trolls who post obscene images or abusive comments.

Beyond intentional bad actors, moderation also helps keep a community on topic and free from things like low-quality clickbait or endless reposts.

To get rid of the negative reviews of Woofler, you work on the following set of guidelines:

Prohibited Content ❌

To maintain the integrity and focus of our platform, the following types of images are strictly prohibited:

Puppy Dogs: Puppies are not allowed to sign up to the dating app. This restriction helps ensure the safety and well-being of younger dogs and focuses the platform on mature dog interactions.
Human Faces: Woofler is focused on dogs, not their owners. Human bodies are acceptable in the background and their faces can be seen if they are not the main focus.
Cats (Except on Apparel): Images of cats are not permitted unless they are featured on t-shirts or other apparel worn by the dog. This ensures the content remains dog-centric.
Catfish (Animals): Images of catfish, the aquatic animals, are prohibited. Our platform is dedicated to dogs and relevant content.
Cartoonish Depictions of Dogs: Images of dogs that are cartoonish or heavily stylized are not allowed. This maintains a focus on real, identifiable dogs and their interactions.

Allowed Content ✅

We welcome and encourage the following types of images:

Dogs: Images featuring mature dogs of all breeds and sizes are allowed. The focus should be on showcasing the dog's personality, activities, and unique traits. Only photorealistic images of dogs (including paintings and sculptures) are allowed. This ensures that all representations of dogs on the platform are authentic and relatable.
Cats/Catfish on Apparel: Images that include cats or catfish designs on t-shirts or other apparel worn by the dog or owner are permitted. This allows for some creative expression while maintaining the primary focus on dogs.
Images of Pitbull the Rapper: Images of Pitbull, the rapper, are allowed. This addition acknowledges the cultural relevance and popularity of the artist while keeping the platform engaging for users.

Even in this simplified example, some of the tricky areas of moderation are already visible:

Judging age
The importance of context with exceptions for apparel
A need to distinguish between cartoons and photorealistic depictions.

You hire a small team of two moderators to help you enforce the guidelines in Woofler, but customers keep complaining because harmful profiles take many hours to be removed and are never taken down on weekends.

As a solopreneur, you are still suffering from negative reviews while paying for two additional full-time salaries. You could consider outsourcing your moderation efforts, but quality could drop and your cost would increase if you wanted fast moderation including weekends.

Second solution: open-source AI models

This sort of repetitive task of classifying images is where we should hope AI can be of assistance — not least because, in the real world, inappropriate content can be a lot more unpleasant to look at than images of cats.

Let’s explore what your options are to leverage AI for visual moderation in Woofler.

For prohibited content like nudity and violence, there are many off-the-shelf API-based solutions like AWS Rekognition and Unitary Standard, but these are lightweight detection models that can only take part of the problem away.

Given we have a policy made up of written rules that we want to interpret and apply, our challenge falls more into the speciality of Visual LLMs (Vision Large Language Models) such as GPT-4V, which combine the sophisticated reasoning ability of LLMs with the capacity for visual perception.

Beyond AI offerings from big AI companies, there is also a growing ecosystem of open-weight Visual LLMs, which are open-source models with accessible configurations (weights). We’ll start our journey there.

Let’s say you decide to try out an open-source model for your Woofler moderation problem. You may be lucky to be a technically-savvy founder who can do this by themselves, or you may need help from a friend who knows a lot about AI. Either way, you stumble across LLaVA-1.5 (a Visual LLM based on Meta’s Llama2 LLM) and try out the smallest 7 Billion parameter version of it with this prompt:

You are an expert content moderator at Woofler, a social media platform for dogs.Your job is to review user-generated images and ensure they comply with theplatform's content policy.

Here is the policy:

You are an expert content moderator at Woofler, a social media platform for dogs.
Your job is to review user-generated images and ensure they comply with the
platform's content policy.

Here is the policy:

<policy>...</policy>

Please return a json in the following format:

{
  "description": string, // A brief description of the image
  "violates_policy": string // "yes" if the image violates the policy, "no" otherwise
  "reason": string, // The reason the image violates the policy or not
  "breeds": list[string], // A list of dog breeds found in the image
}

You try it on the following image of two adorable pet friends, a dog and a cat, and to your dismay the model confidently informs you that image contains two dogs, the breed of the second one being a pitbull:

{"description": "Two dogs, one brown and one gray, are sitting on the floor next to each other. The brown dog has its tongue out and is licking the gray dog's face.", "violates_policy": "no", "reason": "The image shows dogs", "breeds": ["pitbull"]}

This illustrates one of the most common issues with LLMs and Visual LLMs - their propensity to just make things up or “hallucinate”. For LLaVA this seems to manifest as the model being primed by the wording of the prompt to expect certain images, in this case dogs (and Pitbull the Rapper), and then ignoring obvious things present in the image.

For this particular issue of mixing up cats and dogs, we can try switching to the newer LLaVA-NeXT-8B based on the more powerful Llama3 LLM

Fortunately, this second model can tell apart cats and dogs:

{"description": "A dog and a cat sitting next to each other on the floor, with a bowl between them.", "violates_policy": "yes", "reason": "The image features a cat, which is not allowed unless it is on apparel worn by the dog.", "breeds": ["Golden Retriever", "Domestic Shorthair"]}

It is also shows promise for identifying other content that violates or is allowed by the guidelines, like catfish:

However, it still makes some bizarre errors, such as mistaking a hairless cat for a catfish, or making up a non-existent rule about dogs in human clothing. It also fails to acknowledge the cultural relevance of Pitbull the rapper.

What’s next: finding a scaleable solution

‍After spending a couple of days fighting with open-source models, you’ve reached a point where you are able to automatically moderate some of the images, but your model still makes many mistakes so you feel uncomfortable automating decisionsю

You take a walk to think about what it all means, something you learnt from your users.

You realise you’re going to have to put the decision off for a few weeks: you’ve got a board meeting next week, an interview with TechCrunch tomorrow and some other urgent matters. But before you switch focus, you write up your learnings in a memo that you can come back to.

‍You’ve learnt that open-source VLMs get so close to the job of a human in so many cases - but crucially not consistently, and it’s consistency you need to build trust and keep dogs coming back to Woofler.

‍You could combine your VLM with your human moderators, but

This isn’t a straightforward task - the model doesn’t know when it’s hallucinating, so it’s hard to know what to escalate.
It would take a lot more of your time to get the VLM performing at its best. You’ve heard of RAG and fine-tuning, but you’re not sure what really works and how long it takes. Plus, they might need quality data, and you’re not reliably getting that from your human team.
Even with reduced volumes you have some sticky optimisation questions with your humans - since they work in shifts, you still need a critical mass - and more so if you were to offer round the clock moderation. Plus it’s hard to forecast your volumes right now.

‍What are the alternatives?

Hire a few machine learning engineers dedicated to your moderation problem. Leverage them to constantly keep improving your moderation solution using RAG, fine-tuning and other techniques. You would still need humans for the trickiest cases, so despite all your best efforts you may still need to employ a number of content moderators, plus all of the engineers you’ve now hired.‍
Look for an AI partner. It feels right to outsource the technical stuff to a team totally focused on it, but this still doesn’t solve your human problem and you’re not sure how well you’ll be able to make the different parts of the pipeline speak to each other. What’s more, most AI vendors are focused on detection rather than making a human-type decisions against a policy.‍
The best and easiest solution? Choose an end-to-end moderation partner who can take away the whole problem, freeing you up to focus on growth initiatives. A good end-to-end partner should 1) use the best of AI to automate wherever possible; 2) have humans ready to step in where needed; and 3) have deep expertise in combining these to learn and improve as quickly as possible. They should also help you refine your guidelines as content evolves, and proactively identify new trends.

At Unitary, we do exactly that. We build blended teams of AI agents and expert human moderators to replace your BPO, delivering better accuracy, faster response times, and scalable, cost-effective solutions. Get in touch to see Unitary in action!

Are visual open-source LLMs good enough to moderate your platform’s User Generated Content?

The problem that comes with growing your platform

Prohibited Content ❌

Allowed Content ✅

Second solution: open-source AI models

What’s next: finding a scaleable solution

‍What are the alternatives?

Download the white paper

Book a consultation

More blog articles