Advancements in AI, specifically multimodal algorithms present a new avenue for content moderation teams and brand safety experts. These powerful algorithms process the different modalities of each piece of content, which means they can understand context, accurately and at scale.
Detecting harmful content in text or images is (conceptually) straightforward. However, as the internet continues to evolve and content becomes more complex, the same techniques no longer work – particularly when there are several modalities involved simultaneously.
Consider a typical meme. There are two elements – the image and a caption, either of which could contain defamatory, illegal or otherwise damaging content. A traditional content detection algorithm may be able to identify problems with the image or the caption and flag it according to that specific element.
Although relatively effective, there are several problems with this approach. First, two algorithms need to be employed to process the two elements of the meme – one for the image and a second for the caption. This means processing the asset twice, increasing resource usage and cost as well as the actual time taken to approve it.
Second, the algorithms operate independently, making them unable to make any assessment about context. Context is crucial to correctly classifying content, preventing a) false-positive results and b) content being flagged or blocked incorrectly. False-positives will reduce user trust in your system – and your brand, so you want to steer away from them as much as possible. At the same time, a lack of context could also result in content being incorrectly moderated, thereby compromising your brand’s reputation.
The reality is that content now combines multiple sources simultaneously. A user generated video may include image-based content, on-screen captions, audio, text and subtitles – and they all need to align with your brand safety guidelines or community rules.
For very low-traffic websites it may be possible to perform these multiple layers of analysis manually. But the reality is that keeping up with demand is virtually impossible, particularly as the volume of submitted content continues to grow exponentially. Today, solutions that can moderate different types of complex content at scale are needed more than ever.
Artificial Intelligence powered by multimodal algorithms offers a viable alternative. New tools are emerging, which can process large volumes of content at speed, identifying and blocking brand-damaging content far faster (and more accurately) than human moderators. Only where the algorithm is unable to make a definitive classification, are human moderators required to become involved – and even then, the list is considerably shortened as a result of pre-filtering.
There are secondary benefits to this approach too. The cost of reviewing and approving content can be reduced or contained because there is no need to grow the moderation team. Similarly, moderators themselves are exposed to far less harmful content, thus helping to reduce the psychological impact of their valuable (but risky) work. Fewer people being exposed to less risk is a win-win for the business.
Ultimately, both of these factors, combined with the improved functionality of multimodal algorithms, will strengthen brand safety and reputation.
Read more about what multimodal algorithms really are and and how they can be employed to improve content classification.