INDEX SERVICESAPPEN

DEADDATA

Appen

AI training data labeling. Revenue collapsed.

MARKET IMPACT

-96%

PEAK YEAR

2020

PEAK METRIC

$461M annual revenue

DOMAIN

appen.com ↗

WHAT HAPPENED

Appen operated the world's largest human data annotation workforce — over 1 million contractors labeling images, text, and audio to train AI models. Ironically, the AI models they helped build became capable of generating their own training data. RLHF (Reinforcement Learning from Human Feedback) automation and synthetic data generation destroyed demand for manual labeling. Revenue fell from $461M to under $100M. The stock collapsed 96% from its peak.

REPLACED BY

RLHF Automation

AI companies moved to synthetic data generation and automated RLHF pipelines, eliminating the need for Appen's army of human data labelers.

TIMELINE

2020

Peak: $461M revenue, 1M+ crowd workers

2021

Major AI labs begin synthetic data experiments

2022

Revenue begins declining, contract sizes shrink

2023

Mass layoffs, revenue under $200M

2024

Stock -96%, emergency restructuring

2025

Effectively a zombie company

2026

Delisted from ASX, remaining contracts wound down

AI ENGINEER PLAYBOOK

How to replace Appen

Replace manual data labeling with LLM-powered annotation pipelines. Use frontier models to generate synthetic training data, auto-label datasets, and evaluate model outputs — reducing annotation costs by 90%+.

DIFFICULTY

Advanced

SETUP TIME

2-4 hours

COST BEFORE

$0.05-0.15 per label (human annotators via Appen)

COST AFTER

$0.001-0.005 per label (LLM API calls)

RECOMMENDED TOOLS

Claude API↗

High-quality text annotation, classification, entity extraction

docs.anthropic.com

GPT-4 API↗

Synthetic data generation, label verification, RLHF reward modeling

platform.openai.com

Label Studio

STEP-BY-STEP WORKFLOW

Define your annotation schema (categories, entity types, scoring rubrics)

Write a detailed system prompt with labeling instructions + 5-10 few-shot examples

Process your dataset through the Claude or GPT-4 API in batches

Route low-confidence outputs to human reviewers in Label Studio

Calculate inter-annotator agreement between AI labels and a human gold set

Iterate on the prompt until agreement exceeds your quality threshold (typically 90%+)

EXAMPLE PROMPTS

CLICK TO COPY

Text classification labeling

4 LINESClaude API (Haiku for speed, Opus for quality)

You are a data annotator. Classify the following text into exactly one category: [POSITIVE, NEGATIVE, NEUTRAL]. Respond with only the label.

Text: "{{text}}"
Label:

Synthetic data generation

1 LINESGPT-4 API

Generate 20 diverse examples of customer support conversations about [topic]. Each should include: a realistic customer message, the ideal agent response, and a sentiment label. Vary the tone, complexity, and customer frustration level. Output as JSON.

RLHF preference ranking

8 LINESClaude API (Sonnet)

Given the following prompt and two model responses, determine which response is better. Consider helpfulness, accuracy, safety, and conciseness.

Prompt: {{prompt}}
Response A: {{response_a}}
Response B: {{response_b}}

Better response (A or B):
Reasoning:

←ALL SERVICES