Methodology
How Bullsift scores YouTube video credibility. Every metric explained.
Last updated: March 23, 2026
Analysis Pipeline Overview
Bullsift uses a two-pass AI pipeline to analyze YouTube videos. Each pass serves a different purpose and runs different models at different cost tiers.
Pass 1: Quick Sift
- What it does: Extracts the transcript, generates an AI summary, identifies 10-40 individual claims, and calculates a Slop Score
- Speed: 2-3 seconds
- Availability: All tiers including Free
- Web search: No — uses AI general knowledge only
Pass 2: Deep Sift
- What it does: Takes the most critical verifiable claims from Pass 1 and verifies each one against the open web
- Speed: 15-30 seconds
- Availability: Pro (50/month) and Power (150/month) tiers
- Web search: Yes — 3-10 targeted web searches per claim batch, cross-referencing multiple independent sources
Slop Score
The Slop Score measures overall content quality on a scale from 0.0 (high quality) to 1.0 (maximum slop). It flags clickbait, AI-generated filler, speculation presented as fact, and content-farm patterns.
Unlike many AI metrics, the Slop Score is deterministic — it uses a fixed mathematical formula, not an LLM judgment call. This makes it consistent and reproducible across identical inputs.
Scoring Factors
The Slop Score combines 10 weighted factors:
| Factor | Weight | What It Measures |
|---|---|---|
| Title Clickbait | 8-10% | ALL CAPS words, panic keywords, excessive punctuation |
| Content Slop | 12-15% | Emotional word density, fear-inducing openers, engagement farming |
| Speculation Ratio | 8-10% | Ratio of opinion and prediction claims to total claims |
| Hedging Quality | 5-7% | Whether the creator qualifies uncertain statements (inverted — hedging reduces slop) |
| Source Attribution | 5% | Whether claims cite sources (inverted — citations reduce slop) |
| Reputation Factor | 10% | Channel professionalism and authority signals |
| Absolute Language | 7-8% | Use of "always", "never", "worst", "best" without qualification |
| Host Pushback | 8% | Whether interviewers challenge claims (inverted — pushback reduces slop) |
| Financial Self-Promotion | 12-15% | Promoting own products while making directional predictions about those assets |
| Extreme Predictions | 8-12% | Unhedged doomsday claims like "going to zero" or "inevitable collapse" |
When YouTube comment data is available, a Comment Consensus factor (17% weight) is added, analyzing whether the video's own audience is flagging the content as misleading. Factor weights are automatically rebalanced when this signal is present.
Trust Scores
Bullsift produces three trust metrics for every analyzed video, each measuring a different dimension of credibility. All scores range from 0 (least credible) to 100 (most credible).
AI Trust Score
Derived from claim-level verdicts. Each claim is weighted by type: factual claims carry the most weight (50%), while opinion and prediction claims each contribute 25%. Verdicts map to credibility values — Supported/True scores 1.0, Partially Supported scores 0.7, Misleading scores 0.2, and Unsupported/False scores 0.0. The weighted average is converted to a 0-100 scale.
Community Trust Score
Calculated from weighted community voting. Pro members get 1x vote weight; Power members get 2x. The score is the proportion of trust votes to total weighted votes, scaled to 0-100. With zero votes, the score defaults to 50 (neutral).
Combined Trust Score
The primary credibility metric shown to users. It blends the AI Trust Score with the Community Trust Score using a dynamic weighting system. With zero community votes, the AI score carries 90% of the weight. As votes accumulate, community weight increases — reaching a maximum of 50% at around 50 votes. This prevents early manipulation while ensuring that sufficient community consensus can influence the final score.
Channel Baseline Trust Score
For channels with fewer than 5 community votes, Bullsift generates a Baseline Trust Score using three components: channel metadata (subscriber count, age, verification status — 40% weight), an AI assessment of channel credibility (45% weight), and an anti-slop heuristic score (15% weight). Baselines are auto-refreshed every 30 days.
Claim Verdicts
How Claims Are Extracted
During Pass 1, the AI extracts 10-40 individual claims depending on video length. Each claim is classified by category (factual, statistic, opinion, prediction, recommendation), tagged with a timestamp, scored for speaker confidence, and marked as verifiable or non-verifiable. Advertising claims are automatically filtered out, duplicates are removed, and vague pronouns are resolved to named entities.
How Claims Are Prioritized
Not all claims are sent to Deep Sift. A criticality scoring system ranks claims by importance. Statistics and health/financial claims score highest. Suspicious or low-confidence claims get a boost. Opinions and very short claims are deprioritized. The top-ranked verifiable claims are sent for web verification — 5 claims for Pro users, 10 for Power users.
Verdict Categories
Each verified claim receives one of seven verdicts:
Anti-Hallucination Safeguards
Bullsift enforces strict rules to prevent AI hallucination in verdicts. The AI is prohibited from citing the video itself as evidence (circular reasoning). Only external sources — news articles, official websites, research papers, government records — count as evidence. If a person tells the same story on multiple podcasts, that counts as circular repetition, not independent corroboration. When no external evidence exists, the claim is marked Unverifiable rather than given a false verdict.
Deepfake & Vision AI Detection
Bullsift's Vision AI analyzes sampled frames from the video to detect AI-generated or manipulated visual content. The system samples 4 frames at different points in the video (10%, 30%, 50%, and 70% of total duration) and analyzes them for artifacts.
What It Detects
- Generative AI imagery — morphing, unnatural textures, structural inconsistencies in faces, hands, and backgrounds
- Stock footage slop — disjointed random stock footage with grainy overlays, light leaks, and large text boxes typical of faceless content farms
- AI slideshows — still images with pan/zoom effects or AI-warping animation
What It Does Not Flag
- Professional motion graphics and recorded interviews
- Financial dashboard screenshots (compression artifacts are normal)
- Designed YouTube thumbnails with bold text and branding
- Channel branding elements like logo animations and end screens
Fakeness Probability Scale
Results are reported as a probability score from 0 to 100. Scores of 0-15 indicate clearly human-produced content. Scores of 16-30 suggest minor concerns but likely human production. Scores above 60 indicate strong AI-generation signals. The system accounts for channel context — professional verified channels are evaluated with awareness that high-end motion graphics differ from generic AI slop, though established status does not grant a free pass.
Channel Heuristics & Content Farm Detection
Bullsift runs a separate heuristic analysis on each channel to detect content-farm and bot-farm behavior. This score (0.0 to 1.0) feeds into the Slop Score and Baseline Trust calculations.
Detection Signals
- Upload velocity — channels posting more than 2 videos per day receive the highest penalty (this upload rate is a signature of automated content generation)
- Age/volume mismatch — a channel less than 90 days old with over 100 videos is flagged as suspicious
- Low subscriber-to-video ratio — fewer than 5 subscribers per video (with 50+ videos) indicates mass-produced content with no audience retention
- Engagement anomalies — abnormally low like-to-view ratios on high-view videos, or suspiciously high ratios that suggest manipulation
Authority Balancing
To prevent false positives on legitimate high-output publishers, authority signals like channel verification, high subscriber counts, and long channel age reduce the heuristic score. A verified channel with 1M+ subscribers receives substantial authority reduction even if upload velocity is high.
Global Claims Database
Bullsift maintains a global database of claims extracted from analyzed videos. When the same claim appears across multiple videos, it is canonicalized and tracked — similar to how Snopes tracks recurring claims.
Claim Matching
Claims are matched using a three-tier approach: text similarity matching catches obvious duplicates, semantic vector matching (using embeddings) catches claims that say the same thing in different words, and a cache layer prevents redundant re-verification of recently checked claims.
Temporal Truth Decay
Not all claims age equally. Bullsift categorizes claims by freshness — stable facts rarely need re-verification, while event-driven or fast-changing claims are automatically flagged for periodic re-checking. A background system monitors claim expiry and triggers re-verification when a claim becomes stale, ensuring that verdicts stay current as new evidence emerges.