Data Quality and Reliability Report

Confidence scoring, coverage diagnostics, and practical publication safeguards for startup and investor insights.

Snapshot reference: February 15, 2026

252,299
Companies with >=2 linked sources
188,841
Companies with >=3 linked sources
132,349
Core deep-intersection companies
70,254
Premium-report-ready companies

Executive Summary

This report defines what can be trusted, what should be qualified, and how to publish insight pages responsibly at scale. High entity volume alone does not guarantee publication quality. Quality comes from coverage depth, intersection depth, identifier reliability, and transparent caveats.

The dataset performs strongly on relationship depth. More than 252k companies are already linked to at least two related source families, and nearly 189k are linked to at least three. This supports large-scale generation of rich pages where each profile can include multi-dimensional context instead of single-field summaries.

Coverage varies by metric type. Structural fields such as country and sector are highly complete, while funding-event amount values are less complete than funding-event counts. The practical rule is simple: counts and rankings can usually be reported with high confidence, while amount totals require explicit coverage notes.

Quality Diagnostics Charts

Company Intersection Depth
How many companies are linked across multiple source families
  • >=1 source280,500
  • >=2 sources252,299
  • >=3 sources188,841
  • >=4 sources118,347
  • all 9428
FasterCapital
Critical Coverage Ratios
Coverage by field family (counts converted to percentage)
  • Funding date93.8%
  • Funding amount16.3%
  • Company country98.8%
  • Company sector97.9%
FasterCapital
Duplicate Name Groups
Text-name duplicates are expected; IDs remain stable
  • Investor name groups78,206
  • Company name groups6,941
FasterCapital
Production-Ready Profile Tiers
How much data is available for richer pages
  • Deep intersection132,349
  • Premium tier70,254
FasterCapital

Confidence Framework and Publishing Rules

MetricCountConfidenceUse Guidance
Funding records (total)478,975HighRaw event inventory
Funding records with date449,221HighSuitable for cycle timing and trend analysis
Funding records with amount77,863MediumUse with explicit coverage caveat
Companies with country287,614HighReliable for country-level comparisons
Companies with sector284,847HighReliable for sector-level comparisons
Deep-intersection companies132,349HighSuitable for rich profile report generation

Publication policy should reflect confidence level. High-confidence fields can appear in headlines and summaries. Medium-confidence fields should remain in chart or analysis sections with caveats. For strict quality control, entity-level pages should pass minimum intersection rules before publication (for example: funding + investors + people + core descriptive fields).

Identifier policy is equally important. Do not merge or deduplicate records by display names alone. Use stable identifiers (`_id`, profile slug, public ID) and treat display names as presentational attributes. This prevents false merges and preserves analytical reliability across large-scale report generation.

  • Show coverage notes directly beside amount-based charts.
  • Enforce ID-based identity logic for all profile-level pages.
  • Use minimum quality tiers before generating deep profile reports.
  • Retain methodology and caveat sections on every indexable report page.

Year-Dimension Reliability Note

Year dimension quality is strong when dates are available, which supports robust cycle analysis. For year-based reporting, prioritize date-complete series and avoid constructing hard conclusions from amount-only subsets without indicating coverage ratio.

Related Reports (Quality Tier Exploration)

Quality tiers are designed to control confidence at scale. The fastest way to understand impact is to keep the scenario fixed and vary only the quality tier. The examples below are generated for the most recent years in the dataset (2025, 2024).

Related ReportURLWhy This Helps
Investor Intelligence Report/data/insights/report/investor-intelligence/Use quality tiers together with investor activity and market tags to improve matching and outreach efficiency.
Funding Cycle Report (Year Dimension)/data/insights/report/funding-cycle/Dates are strong coverage; use year-dimension reporting heavily and treat amount totals with caveats.
Hypercube Builder (8 Dimensions)/data/insights/report/hypercube/Build deep scenario reports and choose a data-quality tier for confidence control.
Matrix Catalog (Country x Sector x Year)/data/insights/report/matrix/High-detail intersection reports where quality caveats can be applied consistently.
Matrix Example (2025): United States x Internet/data/insights/report/matrix/united-states/internet/2025/Same market slice with a newer year dimension, then use quality tiers for confidence control.
Quality Tier Compare (2025): Single-Source Coverage/data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/single-source/Lower confidence tier for broader coverage; suitable for discovery with stronger caveats.
Quality Tier Compare (2025): Dual-Source Coverage/data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/dual-source/Moderate confidence tier; better for lists and ranking comparisons.
Quality Tier Compare (2025): Triple-Source Coverage/data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/triple-source/Higher confidence tier; useful for richer pages and stronger claims.
Quality Tier Compare (2025): Quad-Source Coverage/data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/quad-source/Strong confidence tier; recommended default for published scenario reports.
Quality Tier Compare (2025): Nine-Source Verified/data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/nine-source-verified/Highest confidence tier; best for premium pages and externally cited claims.
Matrix Example (2024): United States x Internet/data/insights/report/matrix/united-states/internet/2024/Same market slice with a newer year dimension, then use quality tiers for confidence control.
Quality Tier Compare (2024): Single-Source Coverage/data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/single-source/Lower confidence tier for broader coverage; suitable for discovery with stronger caveats.
Quality Tier Compare (2024): Dual-Source Coverage/data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/dual-source/Moderate confidence tier; better for lists and ranking comparisons.
Quality Tier Compare (2024): Triple-Source Coverage/data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/triple-source/Higher confidence tier; useful for richer pages and stronger claims.
Quality Tier Compare (2024): Quad-Source Coverage/data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/quad-source/Strong confidence tier; recommended default for published scenario reports.
Quality Tier Compare (2024): Nine-Source Verified/data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/nine-source-verified/Highest confidence tier; best for premium pages and externally cited claims.

FAQ

Can we safely generate thousands of pages?

Yes, especially using quality tiers. More than 70k premium entities are available for richer pages.

Why include caveats if data is large?

Scale does not remove field-level coverage gaps; caveats preserve analytical integrity.

Should we deduplicate by name?

No. Use stable IDs for identity and keep name duplicates as expected real-world behavior.

Which metrics are safest for headline claims?

Entity counts and high-completeness structural dimensions such as country and sector.

How do we improve confidence over time?

Run periodic enrichment and normalization workflows, then regenerate metrics and report pages.

`