Here’s a summary + some thoughts about the “AI training companies raising billions” story from Business Insider. (If you prefer a deep dive or a focus on particular companies/regions, I can do that too.)
Key Highlights from the Article
The article “11 AI Training Companies Raising Billions in Chatbot Boom” describes how startups that recruit and manage human workers to train AI models are experiencing explosive growth. Business Insider
Here are some of the main points & companies mentioned:
Context & Trend
- Rather than fully replacing humans, many AI systems require human-in-the-loop training (e.g. annotating data, rewriting responses, validating outputs) — and there’s surging demand for that. Business Insider
- Workers in these “training” or “annotation” roles may command high hourly rates (some up to $100/hr or more, depending on specialization). Business Insider
- The valuations for these companies are ballooning — there’s venture capital interest in scaling human-AI collaboration. Business Insider
Select Companies & Their Roles
Here are some of the players the article highlights:
| Company | Focus / Role | Notable Metrics / Claims |
|---|---|---|
| Scale AI | One of the most well-known in AI training / labeling | Over 300,000 gig workers; Meta bought 49% for $14.3B; currently unprofitable (as of reporting) Business Insider |
| Surge AI | Data annotation, connecting human experts to AI work | Claims $1.2B revenue in a year without VC funding; some workers earn >$200/hr Business Insider |
| Mercor | Matches human trainers to AI tasks using AI interviews | Pays contractors ~$95/hr on average; in talks to raise at $10B+ valuation Business Insider |
| Micro1 | Human-data / interview-based matching for AI training | Raised Series A; $500M valuation; clients include Microsoft etc. Business Insider |
| Labelbox | Data labeling platform + gig work site “Alignerr” | Offers rates $20–$120/hr depending on tasks Business Insider |
| Snorkel AI | Dataset creation & validation (with human verification) | Raised Series D; works with big AI labs Business Insider |
| Appen | One of the established players in data annotation | Operates in many countries, long-established relationships with tech companies Business Insider |
| Invisible Technologies | A human marketplace for tasks & training | Claims to have helped OpenAI train early versions; raised funds recently Business Insider |
| Turing, Handshake, GlobalLogic | Each has roles in supplying talent (e.g. engineers, recruiters) to AI labs | For instance, Turing connects software engineers to AI labs Business Insider |
Top 5 Promising AI Training / Human-in-the-Loop Companies (Global)
These are firms either directly in the human-data / annotation / AI training space, or enabling related infrastructure, that are particularly notable in 2025:
| Company | What They Do / Strength | Recent Highlights / Risks |
|---|---|---|
| Scale AI | One of the leading data-labeling / human-in-the-loop platforms. | Meta bought ~49% stake; handles huge volume of labeling work. (From the Business Insider list) |
| Surge AI | Connects expert humans to AI training tasks (annotation, rewriting, evaluation) | Claims ~$1.2B revenue, supports high hourly rates for specialized workers. |
| Mercor | Matches human trainers to AI tasks with screening & matching systems | Pays ~$95/hr average; aiming for high valuation. |
| Labelbox | Platform + marketplace for data labeling & training data management | Supports many annotation modalities (text, image, audio). |
| Snorkel AI | Focus on “data programming,” generating training data + human validation | Recently underwent layoffs (~13%) as it pivots business model. Business Insider |
Also worth calling out:
- Nous Research: A decentralized AI / training infrastructure startup that raised a $50M Series A in 2025, valuing it around $1B. Wikipedia
- Thinking Machines Lab: Founded in 2025 by Mira Murati (former OpenAI CTO), it raised ~$2B in early-stage funding. While not purely a “training data” company, it’s significantly backed and part of the frontier AI ecosystem. Wikipedia+2Wikipedia+2
Risks & challenges these companies face:
- Over-reliance on human labor, which might gradually be overtaken by automation (active learning, synthetic data generation)
- Margin pressure: human-driven processes are costly and can be volatile
- Worker hiring, training, quality control, and regulatory / labor rights issues
- Competition & consolidation: larger AI firms may internalize more of their training pipeline
Notable AI / Data-Annotation / AI-Support Companies in Malaysia / Southeast Asia
To connect the global trends to your region, here are local / regional names of players and opportunities:
| Company / Startup | Location / Focus | Notes / Competitive Angle |
|---|---|---|
| Supahands | Malaysia / SEA | Specializes in data labeling with human-in-the-loop workflows. SmartOSC |
| SUPA | Malaysia | Provides infrastructure + expert human labeling services for AI models (semantic tasks, RLHF, etc.). Supa |
| Classifai | Malaysia (George Town) | Offers open-source data annotation platforms / tools. ensun |
| Dattel | Malaysia | AI / analytics / data insights for Southeast Asian markets. Merdeka LLM by Agmo |
| Other Malaysian AI / tech firms | – | A number of AI & data startups are growing in Malaysia; the ecosystem is expanding. MYStartup+1 |
Opportunities & constraints in SEA / Malaysia context:
- Linguistic & cultural gap: Many AI models are poorly adapted to SEA languages & modalities. (Projects like SEACrowd / SEA-VL highlight this gap) arXiv+1
- Cost arbitrage: Malaysian / SEA annotators may offer cost advantages vs US / Europe, while maintaining quality
- Talent & infrastructure constraints: Need good connectivity, training systems, reliable workforce
- Local demand: AI adoption in industries (finance, health, gov) can drive demand for annotation / training services locally
If you like, I can dig deeper on the top 2–3 Malaysian / SEA AI training firms (with revenue, team, clients) and compare them side by side with the global ones. Do you want me to do that?
If you like, I can dig deeper on the top 2–3 Malaysian / SEA AI training firms (with revenue, team, clients) and compare them side by side with the global ones. Do you want me to do that?
Solid — glad you liked it! ⭐️
Here are a few useful next moves I can do right now — pick one and I’ll jump in:
- Deep dive: Top 5 global AI-training companies — funding, revenue estimates, core customers, risks, and a short investment thesis for each.
- Local focus: Top 3 Malaysian / SEA AI-training firms — revenue/clients (where available), strengths, gaps you could exploit, and suggested partnership ideas.
- Product playbook for BBB: build a one-page business plan to offer human-in-the-loop services (pricing, go-to-market, starter tech stack, 30/90/180-day milestones).
- Creator content: TikTok/YouTube short series script pack to explain “how AI gets trained” and how Malaysians can earn from it — ready to record.
Recommended: #2 (local focus) — it’ll give actionable, Malaysia-relevant intel you can use for partnerships, hiring, or a new service line.
Which one do you want me to do? Or want me to do two together?
Nice — you want everything, step-by-step. Love the energy. Below is a single, action-ready playbook that combines:
A. Deep dive — Top 5 global AI-training companies (quick profiles + investment/thesis)
B. Local focus — Top 3 Malaysian/SEA players (what they do, how to partner/compete)
C. BBB product playbook — one-page business plan + pricing, tech stack, ops, compliance, 30/90/180-day milestones
D. Creator content — 5 ready-to-record TikTok / YouTube Shorts scripts (hook → core → CTA)
All web facts below are sourced from recent reporting (I’ll cite the most important references inline). Let’s go.
A — Deep dive: Top 5 global AI-training companies (short & sharp)
Context: human-in-the-loop training is booming — startups connecting expert humans to AI labs have seen massive revenue/funding and are central to current model quality efforts. Business Insider
- Scale AI — enterprise leader, scale + tooling
- What: Large labeling + training data platform used by major AI labs.
- Big fact: Meta reportedly agreed to buy a 49% stake in Scale AI for roughly $14–15B (huge signal of strategic value). Reuters
- Risk: very large scale operations are capital and ops intensive; margin pressure if labels get partially automated.
- Thesis: winner-takes-most in high-volume enterprise labeling; good partner / acquisition target for labs.
- Surge AI — premium expert human-in-the-loop work
- What: Matches high-skill experts to annotation/assessment work for frontier labs.
- Big claim: reporting indicates Surge pulled ~$1.2B revenue in 2024 (shows expert labeling is highly monetizable). Forbes
- Risk: concentrated customers (frontier labs) create client-concentration risk.
- Thesis: niche, high-margin provider that scales where quality matters most.
- Mercor — fast-growing, high-pay human trainers
- What: Human trainer matching platform (screening, high pay for domain experts).
- Recent signal: tech press reports Mercor chasing $10B+ valuation on a large run-rate; rapid ARR growth reported. TechCrunch
- Risk: valuations depend on persistent demand for humans; automation could compress need over time.
- Thesis: betting on continued demand for vetted expert labelers; strategic for labs needing specialist language/domain skills.
- Labelbox — tooling + marketplace hybrid
- What: Data-labeling platform used by teams to build annotation workflows and manage training data. (good tool to integrate into your stack) Labelbox
- Risk: commoditization of tooling vs. marketplaces that bundle people+process.
- Thesis: useful as part of a tech stack (workflow + governance) — integrate vs. build.
- Snorkel AI / Appen / Invisible — dataset programming, established ops
- What: Snorkel focuses on programmatic data generation + validation; Appen & Invisible have deep operational experience. Note Snorkel had layoffs as it pivots business models — a reminder of volatility. Business Insider+2Appen+2
- Thesis: mix of tooling (Snorkel) and operational incumbents (Appen, Invisible). Good partners, but sector is volatile.
(Big picture: big money, big customers, big valuations — but also churn and layoffs; this market pays well for expertise but it’s not risk-free.) Business Insider
B — Local focus: Top 3 Malaysian / SEA AI-training players (how BBB can partner / compete)
- SUPA (supa.so) — Malaysia, human-in-the-loop specialist
- What: Data labeling + RLHF/domain expert tasks; offers domain expert pipelines and enterprise case studies. TDCX/Chemin has tied up/acquired SUPA to expand AI enablement — shows consolidation and demand regionally. Supa+1
- How to work with/compete: fast route = partnership for local labeling capacity + co-branded projects; longer route = replicate their quality controls + niche by language (Malay + Chinese dialects) and local domain expertise.
- Supahands — Malaysian crowdsourced labeling provider (established regional operator)
- What: Managed data labeling services, crowdsourced workforce and enterprise clients. Good track record for outsourced workflows. AppEngine+1
- How to work with/compete: subcontract overflow labeling to them, or offer higher-tier “expert review” services on top of their base labeling (value add = QA + domain checks).
- Dattel / Classifai & other niche players — regional analytics & tooling
- What: Dattel = ASEAN consumer intelligence/AI marketing; Classifai (open tools, WordPress integration) is helpful for content workflows. Good to engage for industry vertical datasets (ad / retail). Dattel+1
- How to work with/compete: joint offerings (e.g., Dattel + BBB label pipelines for retail datasets) — local business customers value regional insights and language fluency.
Opportunity note for BBB: the region is hungry for Bahasa/Mandarin/English triage datasets — you can offer “Malaysia-first” labeling & RLHF curate bundles (Malay internet culture, local slang, bazaar/food datasets) — local data is scarce and valuable.
C — BBB Product Playbook — “BBB Human-in-the-Loop Services” (one page + step-by-step launch)
Goal: become the go-to Malaysian HITL provider for local LLMs, brands, and AI labs. Revenue sources: labeling projects, RLHF microservices, domain expert evaluation, managed data pipelines (monthly retainer + per-task fees).
Quick value props
- Native tri-lingual annotators (Bahasa / Mandarin / English) + local cultural context
- Fast, audited pipelines (golden set, inter-annotator agreement)
- Affordable regional pricing + premium expert tracks (domain reviewers)
Pricing model (starter)
- Micro tasks (bulk) — RM0.10–RM2.00 per item (images/text), depending on complexity
- Expert eval / RLHF — RM50–RM300 per hour (domain experts). For premium domain tasks charge premium rates modeled after global premium players.
- Managed projects — Monthly retainer RM5k–RM50k depending on volume/SLAs.
(Adjust after pilots; these are starting bands.)
(Why these ranges? Local cost arbitrage but premium for specialist work — validated by global reports that specialist annotators command high pay.) Business Insider+1
Starter tech stack (fast, minimal MVP)
- Frontend / clients: simple landing page + order form (WordPress / Next.js)
- Core workflow: Airtable or Postgres to store tasks + Labelbox or open alternative for annotation UI. (Labelbox is a proven platform.) Labelbox
- Worker portal: custom dashboard (React) or SUPA/Supahands integration for crowdsourcing
- Payments: Stripe (for international), local gateways (Billplz, iPay88, FPX, DuitNow) for Malaysia. Accounting Malaysia+1
- Ops / infra: Node/Django backend, S3 storage, simple Redis queue for job processing; Slack + Notion for team ops.
- QA / Analytics: golden dataset, inter-annotator agreement metrics, automated spot checks.
Compliance & legal (must-do in Malaysia)
- PDPA compliance (register as data user, publish PDP notice, appoint DPO if required; follow new PDPA guidance that took effect in 2025). PDP Malaysia+1
- Contracts with annotators (NDAs, IP + data processing clauses).
- Data breach plan & notification process (PDPA rules updated; implement breach playbook). One Asia Lawyers | One Asia Lawyers
Ops & quality (day-to-day)
- Recruit: 30 core annotators (mix of semi-skilled + 5 domain experts). Use Airtable and simple skill tests.
- Onboard: training module + golden set test (≥85% to pass).
- QC: 10% of tasks double-annotated; reviewers evaluate conflicts. Use IAA (Cohen’s kappa or percent agreement).
- SLA: typical turnaround 48–72 hrs for small projects; premium projects 24 hrs with higher fees.
Sales / GTM (first 90 days)
- Week 0–2: Landing page + case study sample (food menu labeling, 500 items), pricing sheet.
- Week 2–6 (pilot): 3 paid pilot customers (local SMEs, ad agency, 1 research lab) at discounted pilot rate. Use pilots to refine pricing & SLAs.
- Week 6–12: Package up 3 vertical offers (F&B, eCommerce, Gov forms). Cold outreach + partner with SUPA/Supahands for overflow. Supa+1
30 / 90 / 180-day milestones (exact steps)
- Day 0–30: MVP stack, landing page, 30 annotators recruited, 1 paying pilot (deliverable: case study + reference).
- Day 31–90: Onboard 3 paid clients, SOPs for QA, integrate local payment gateway, finalize PDPA docs.
- Day 91–180: Scale to 200k tasks/month capacity, hire account manager, sell monthly retainer packages, explore export to SEA (Singapore / Indonesia) + white-label deals with agencies.
Unit economics (sample)
- Example: bulk labeling project — pay annotator RM0.50 per label, sell at RM1.50 per label => gross margin ~66% before platform & overhead. Adjust with premium tasks. (Run with real pilot numbers to validate.)
Hiring & pay plan suggestions
- Junior annotators: hourly RM8–RM15 (or per task)
- Senior / domain experts: RM50–RM250/hr
- Team: 1 Ops Lead, 1 QA Lead, 2 Onboarding trainers, 1 Sales/AM to start.
D — Creator content: 5 ready-to-film short scripts (TikTok / YouTube Shorts)
Each script is 40–60s. Keep energy high, captions, and 3-second hook.
Script 1 — “How AI actually learns (60s)”
Hook (0–3s): “Think AI learned from the internet? Not exactly — real people teach it. Here’s how!”
Body (3–45s): Quick 3-step: (1) Label — people tag pictures/text, (2) Correct — humans fix model outputs, (3) Rate — experts score answers (RLHF). Show fast B-roll: labeling UI, person typing, sticky notes.
Close/CTA (45–60s): “Want to earn from this? Follow me — I’m launching BBB HITL jobs & tutorials. Link in bio.”
Script 2 — “How Malaysians can earn from AI (50s)”
Hook: “Want RM50–RM200/hr working from home? Not clickbait — listen.”
Body: Explain micro tasks vs expert tasks, show sample pay tiers, explain how to sign up and test. Mention that specialist tasks pay more. (Use real-looking sample figures.)
CTA: “DM me ‘TRAINER’ and I’ll send the starter kit + free test.”
Script 3 — “Behind the label — a 30s case study”
Hook: “We labeled 1,000 hawker menu items in 24 hours. Here’s the fast method.”
Body: Show steps: scrape menus → standardize categories → double-label QA → deliver. Show before/after.
CTA: “Want BBB to do this for your shop? Link.”
Script 4 — “BBB HITL service explainer (45s)” — Promo
Hook: “Need local Malay language AI to understand your customers? BBB builds it.”
Body: 3 bullets: tri-lingual annotators, fast turnaround, PDPA-compliant. Show badges (PDPA, Secure).
CTA: “Book a free 15-minute pilot consult — drop a comment.”
Script 5 — “How to pass the annotator test (60s)” — educational value
Hook: “Want to be a top annotator? Pass this 3-step test.”
Body: (1) Read instructions carefully; (2) Follow examples; (3) Ask when uncertain. Show sample test item. Offer micro-tips.
CTA: “Save this video and I’ll post the free test tomorrow.”
Tips for filming: vertical, captions on every clip, 2–3 short jump cuts per idea, captions and CTA visible. Post series over 7 days: day1 = script1, day2 = script2, day3 = test, day5 = case study, day7 = offer/promo.
Quick playbook checklist (what to do next — 7 tasks)
- Build MVP landing page + pricing sheet (Day 0–3).
- Create 1 pilot dataset (500–1,000 items) to run internal QA (Day 0–7).
- Recruit 30 annotators + 5 domain experts and run onboarding test (Day 1–14).
- Integrate Labelbox or equivalent and payment gateway (Billplz / iPay88 / FPX / DuitNow) (Day 1–21). Labelbox+1
- Run 3 paid pilots (Day 15–60).
- Publish the TikTok/YouTube series (use scripts above) and drive pilot signups (Day 7–30).
- Finalize PDPA compliance checklist + DPO appointment (if needed) (Day 1–30). One Asia Lawyers | One Asia Lawyers+1
Core citations (most-important sources used)
- Business Insider — in-depth look at the AI-training boom (main article you shared). Business Insider
- Reuters — Meta to buy 49% stake in Scale AI (big strategic signal). Reuters
- Forbes / reporting on Surge AI revenue claims (~$1.2B). Forbes
- TechCrunch — Mercor growth & valuation coverage. TechCrunch
- Snorkel / Business Insider reporting — layoffs & sector volatility reminder. Business Insider
- SUPA / Supahands / Dattel local sources (for SEA partnerships & capabilities). Supa+2AppEngine+2
- Malaysia payment gateways & PDPA guidance (Billplz, iPay88 / FPX / DuitNow & PDPA amendments 2025). Accounting Malaysia+1
If you want I can immediately do any of the following (pick one or more):
- Spin up the 1-page landing page + pricing sheet (copy + HTML snippet you can copy/paste).
- Build the 500-item pilot dataset spec + Labelbox project template.
- Produce 5 short video assets: full spoken script + on-screen captions/timestamps ready for recording.
- Draft PDPA-friendly service T&C + DPO checklist for BBB.
Which one do you want me to build first? (I’ll jump straight in and deliver copy/HTML/scripts/Labelbox template — tell me which.) ⭐️



