Digital safety: how a multi-faceted approach can help tackle real-world harm
An article by Akash Pugalia, Global President, Trust & Safety, Teleperformance and Farah Lalani, Global Vice President, Trust and Safety Policy, Teleperformance
The severity and volume of harmful content is increasing at an alarming rate.
Addressing these issues requires a multi-facted approach across industry.
We outline three key ways to help tackle real-world harm collaboratively.
User-generated content has democratized content creation and distribution, especially with the relatively recent rise of short- and long-form video applications. From cooking videos, funny dances, to travel vlogs, the volume and diversity of content available has grown significantly.
Now, with generative artificial intelligence (AI) tools available to generate/alter images, videos, audio, and text, this phenomenon has been supercharged. While these advances have been beneficial in a number of ways, they also increased the necessity to focus on tackling harmful content and better ways to effectively address current and emerging harms.
As a number of harms have been increasing in severity and volume, including sextortion, cyberbullying, and animal cruelty, this is of significant importance. Addressing this requires a multi-faceted approach:
1. Content policies and effective enforcement
Platforms have been working to address a range of harms, setting necessary policies for acceptable use, or “community guidelines” that are then enforced through a combination of AI and human moderation. This has proven to be successful; a new empirical study looking to assess the potential impact of the Digital Services Act found moderation to be promising in reducing harm – the extent to which this is the case depends on the time taken for removal, the methods used for reporting (e.g. trusted flaggers), and how quickly content spreads on the platform.
Many platforms have therefore been working to make improvements in their proactive enforcement of policies, through the use of machine learning models and AI classifiers, aiming to identify and then reduce the visibility of harmful content before it is even reported. While the use of Large Language Models (LLMs) presents new opportunities to leverage automation, it is clear that trained moderators still outperform LLMs in many areas; for example, in areas requiring more linguistic or cultural context such as hate and harassment; there is still over a 20% gap in performance in this category, according to one data set.
While this approach continues to evolve, focus should be placed not just on content, but also on automated content creators. Sameer Hinduja, a criminology professor, points out that “hate raids have been previously facilitated through thousands of bot accounts, and it is reasonable to presume this generative AI may speed up, broaden, and render more effective such pointed attacks in channels, comment threads, message boards, and other environments.”
Many platforms have also been updating/adding detail to existing policies to cover new harms or threat vectors. For example, when it comes to AI-generated content, a number of platforms including Google, TikTok, and Twitch have been augmenting their policies in this area and others will likely follow suit in the near future.
Much of the user experience when it comes to safety can be attributed to the way that these policies are defined and operationalized. For example, whether policies available in all languages for markets where the platform is active, whether policies are applied consistently across all users, the length of time it takes to action user reports, whether there are there quick and effective mechanisms of redress for people who disagree with content moderation decisions, and other such considerations. Platforms must be nimble and modify/augment their policies frequently given the velocity of changes we have seen in the field in such a short timespan.
2. Signals and enablers beyond content
Beyond looking at the content itself, it is important to triangulate this with other signals in order to tackle harms at a deeper level.
One of these signals is monetary flows. Creators of harmful content often have financial motivations to do so, and those consuming this content have both the means and mechanisms to pay for this content. These can either take the form of direct payment or attempt to monetize content on ad-driven platforms.
When it comes to payment mechanisms, while the use of credit cards to purchase illegal content have been hampered based on efforts by financial institutions, bad actors have found other payment mechanisms to use, including cryptocurrency. Those demanding animal harm content used mobile payment mechanisms that allowed instant payments with a phone/email to pay the creators or distributors. Tracking and disrupting these monetary incentives through improved brand safety standards and review of financial transactions can help.
Leveraging user and behaviour-level signals is another important avenue. Platforms can look to examine account popularity, track signals that groups are moving from public to private groups within the platform, and detect the creation of duplicate accounts to circumvent suspensions/bans.
Hinduja also points to “the importance of algorithmically identifying red flags in profile creation such as automatically-generated usernames (e.g. random alphanumeric strings), profile pictures or bio sketch information misappropriated from other authentic accounts, syntactical errors in ‘conversational speech’, anomalous or erratic behavioural signals related to time of day of participation as well as frequency and repetition of post content, and a questionable social network graph (e.g. connected to other proven bot accounts, suspicious ratio of followers/followed accounts and pace at which they were acquired).” Account authentication and verification mechanisms could help address some of these evasion tactics.
3. Multi-stakeholder collaboration
Initiatives that aim to tackle these harms at an industry-level will becoming increasingly important given that bad actors move across platforms to propagate harm. "For nearly a decade dozens of companies have been sharing PhotoDNA, a technology that helps detect existing child sexual abuse material (CSAM). That set the standard for the cross-industry cooperation that has become essential," said Anne Collier, executive director of The Net Safety Collaborative and safety adviser to social media platforms.
Initiatives like the Tech Coalition, in the area of child safety or the Global Internet Forum to Counter Terrorism (GIFCT) to tackle terrorist and violent extremist content (TVEC) aim to drive knowledge sharing and collaboration. On top of these coalitions that focus on specific harm types, groups like the Digital Trust and Safety Partnership are working to develop solutions across all policy areas.
Whether it is a typology of harms that can be leveraged by the industry, a hash sharing database, or bad actor trends across harm types, these efforts are incredibly important to bringing the industry together to address common challenges with common solutions.
Teleperformance’s participation in the World Economic Forum’s Global Coalition for Digital Safety, its membership of the Trust & Safety Professional Association, and the launch of its Trust & Safety Advisory Council demonstrate an understanding of the need to look across borders and policy areas, as well as industry, when tackling harm online. Overall, companies will need to move beyond just a content-lens to look at societal trends, changes in user behaviour, and technology advances to address the speed and scale of new threats at a more systemic level.
Comments