Methodology

How Bullsift scores YouTube video credibility. Every metric explained.

Last updated: March 23, 2026

Analysis Pipeline Overview

Bullsift uses a two-pass AI pipeline to analyze YouTube videos. Each pass serves a different purpose and runs different models at different cost tiers.

Pass 1: Quick Sift

  • What it does: Extracts the transcript, generates an AI summary, identifies 10-40 individual claims, and calculates a Slop Score
  • Speed: 2-3 seconds
  • Availability: All tiers including Free
  • Web search: No — uses AI general knowledge only

Pass 2: Deep Sift

  • What it does: Takes the most critical verifiable claims from Pass 1 and verifies each one against the open web
  • Speed: 15-30 seconds
  • Availability: Pro (50/month) and Power (150/month) tiers
  • Web search: Yes — 3-10 targeted web searches per claim batch, cross-referencing multiple independent sources

Slop Score

The Slop Score measures overall content quality on a scale from 0.0 (high quality) to 1.0 (maximum slop). It flags clickbait, AI-generated filler, speculation presented as fact, and content-farm patterns.

Unlike many AI metrics, the Slop Score is deterministic — it uses a fixed mathematical formula, not an LLM judgment call. This makes it consistent and reproducible across identical inputs.

Scoring Factors

The Slop Score combines 10 weighted factors:

FactorWeightWhat It Measures
Title Clickbait8-10%ALL CAPS words, panic keywords, excessive punctuation
Content Slop12-15%Emotional word density, fear-inducing openers, engagement farming
Speculation Ratio8-10%Ratio of opinion and prediction claims to total claims
Hedging Quality5-7%Whether the creator qualifies uncertain statements (inverted — hedging reduces slop)
Source Attribution5%Whether claims cite sources (inverted — citations reduce slop)
Reputation Factor10%Channel professionalism and authority signals
Absolute Language7-8%Use of "always", "never", "worst", "best" without qualification
Host Pushback8%Whether interviewers challenge claims (inverted — pushback reduces slop)
Financial Self-Promotion12-15%Promoting own products while making directional predictions about those assets
Extreme Predictions8-12%Unhedged doomsday claims like "going to zero" or "inevitable collapse"

When YouTube comment data is available, a Comment Consensus factor (17% weight) is added, analyzing whether the video's own audience is flagging the content as misleading. Factor weights are automatically rebalanced when this signal is present.

Trust Scores

Bullsift produces three trust metrics for every analyzed video, each measuring a different dimension of credibility. All scores range from 0 (least credible) to 100 (most credible).

AI Trust Score

Derived from claim-level verdicts. Each claim is weighted by type: factual claims carry the most weight (50%), while opinion and prediction claims each contribute 25%. Verdicts map to credibility values — Supported/True scores 1.0, Partially Supported scores 0.7, Misleading scores 0.2, and Unsupported/False scores 0.0. The weighted average is converted to a 0-100 scale.

Community Trust Score

Calculated from weighted community voting. Pro members get 1x vote weight; Power members get 2x. The score is the proportion of trust votes to total weighted votes, scaled to 0-100. With zero votes, the score defaults to 50 (neutral).

Combined Trust Score

The primary credibility metric shown to users. It blends the AI Trust Score with the Community Trust Score using a dynamic weighting system. With zero community votes, the AI score carries 90% of the weight. As votes accumulate, community weight increases — reaching a maximum of 50% at around 50 votes. This prevents early manipulation while ensuring that sufficient community consensus can influence the final score.

Channel Baseline Trust Score

For channels with fewer than 5 community votes, Bullsift generates a Baseline Trust Score using three components: channel metadata (subscriber count, age, verification status — 40% weight), an AI assessment of channel credibility (45% weight), and an anti-slop heuristic score (15% weight). Baselines are auto-refreshed every 30 days.

Claim Verdicts

How Claims Are Extracted

During Pass 1, the AI extracts 10-40 individual claims depending on video length. Each claim is classified by category (factual, statistic, opinion, prediction, recommendation), tagged with a timestamp, scored for speaker confidence, and marked as verifiable or non-verifiable. Advertising claims are automatically filtered out, duplicates are removed, and vague pronouns are resolved to named entities.

How Claims Are Prioritized

Not all claims are sent to Deep Sift. A criticality scoring system ranks claims by importance. Statistics and health/financial claims score highest. Suspicious or low-confidence claims get a boost. Opinions and very short claims are deprioritized. The top-ranked verifiable claims are sent for web verification — 5 claims for Pro users, 10 for Power users.

Verdict Categories

Each verified claim receives one of seven verdicts:

Supported — Direct corroborating evidence found from independent sources
True — Exact numbers, dates, or facts match authoritative sources
Partially Supported — Part of the claim is confirmed, or confirmed with significant caveats
Unsupported — Contradicting evidence found from independent sources
Misleading — Technically true but framed in a way that creates a false impression
Opinion — Subjective belief or values-based judgment that cannot be fact-checked
Unverifiable — No evidence found for or against; distinct from Unsupported

Anti-Hallucination Safeguards

Bullsift enforces strict rules to prevent AI hallucination in verdicts. The AI is prohibited from citing the video itself as evidence (circular reasoning). Only external sources — news articles, official websites, research papers, government records — count as evidence. If a person tells the same story on multiple podcasts, that counts as circular repetition, not independent corroboration. When no external evidence exists, the claim is marked Unverifiable rather than given a false verdict.

Deepfake & Vision AI Detection

Bullsift's Vision AI analyzes sampled frames from the video to detect AI-generated or manipulated visual content. The system samples 4 frames at different points in the video (10%, 30%, 50%, and 70% of total duration) and analyzes them for artifacts.

What It Detects

  • Generative AI imagery — morphing, unnatural textures, structural inconsistencies in faces, hands, and backgrounds
  • Stock footage slop — disjointed random stock footage with grainy overlays, light leaks, and large text boxes typical of faceless content farms
  • AI slideshows — still images with pan/zoom effects or AI-warping animation

What It Does Not Flag

  • Professional motion graphics and recorded interviews
  • Financial dashboard screenshots (compression artifacts are normal)
  • Designed YouTube thumbnails with bold text and branding
  • Channel branding elements like logo animations and end screens

Fakeness Probability Scale

Results are reported as a probability score from 0 to 100. Scores of 0-15 indicate clearly human-produced content. Scores of 16-30 suggest minor concerns but likely human production. Scores above 60 indicate strong AI-generation signals. The system accounts for channel context — professional verified channels are evaluated with awareness that high-end motion graphics differ from generic AI slop, though established status does not grant a free pass.

Channel Heuristics & Content Farm Detection

Bullsift runs a separate heuristic analysis on each channel to detect content-farm and bot-farm behavior. This score (0.0 to 1.0) feeds into the Slop Score and Baseline Trust calculations.

Detection Signals

  • Upload velocity — channels posting more than 2 videos per day receive the highest penalty (this upload rate is a signature of automated content generation)
  • Age/volume mismatch — a channel less than 90 days old with over 100 videos is flagged as suspicious
  • Low subscriber-to-video ratio — fewer than 5 subscribers per video (with 50+ videos) indicates mass-produced content with no audience retention
  • Engagement anomalies — abnormally low like-to-view ratios on high-view videos, or suspiciously high ratios that suggest manipulation

Authority Balancing

To prevent false positives on legitimate high-output publishers, authority signals like channel verification, high subscriber counts, and long channel age reduce the heuristic score. A verified channel with 1M+ subscribers receives substantial authority reduction even if upload velocity is high.

Global Claims Database

Bullsift maintains a global database of claims extracted from analyzed videos. When the same claim appears across multiple videos, it is canonicalized and tracked — similar to how Snopes tracks recurring claims.

Claim Matching

Claims are matched using a three-tier approach: text similarity matching catches obvious duplicates, semantic vector matching (using embeddings) catches claims that say the same thing in different words, and a cache layer prevents redundant re-verification of recently checked claims.

Temporal Truth Decay

Not all claims age equally. Bullsift categorizes claims by freshness — stable facts rarely need re-verification, while event-driven or fast-changing claims are automatically flagged for periodic re-checking. A background system monitors claim expiry and triggers re-verification when a claim becomes stale, ensuring that verdicts stay current as new evidence emerges.