Data Quality and Reliability Report

Confidence scoring, coverage diagnostics, and practical publication safeguards for startup and investor insights.

Snapshot reference: February 15, 2026

All Reports Finder Report Directory Matrix Hypercube Global Country Sector Funding Investor Quality

252,299

Companies with >=2 linked sources

188,841

Companies with >=3 linked sources

132,349

Core deep-intersection companies

70,254

Premium-report-ready companies

Executive Summary

This report defines what can be trusted, what should be qualified, and how to publish insight pages responsibly at scale. High entity volume alone does not guarantee publication quality. Quality comes from coverage depth, intersection depth, identifier reliability, and transparent caveats.

The dataset performs strongly on relationship depth. More than 252k companies are already linked to at least two related source families, and nearly 189k are linked to at least three. This supports large-scale generation of rich pages where each profile can include multi-dimensional context instead of single-field summaries.

Coverage varies by metric type. Structural fields such as country and sector are highly complete, while funding-event amount values are less complete than funding-event counts. The practical rule is simple: counts and rankings can usually be reported with high confidence, while amount totals require explicit coverage notes.

Quality Diagnostics Charts

Company Intersection Depth

How many companies are linked across multiple source families

>=1 source280,500
>=2 sources252,299
>=3 sources188,841
>=4 sources118,347
all 9428

FasterCapital

Critical Coverage Ratios

Coverage by field family (counts converted to percentage)

Funding date93.8%
Funding amount16.3%
Company country98.8%
Company sector97.9%

FasterCapital

Duplicate Name Groups

Text-name duplicates are expected; IDs remain stable

Investor name groups78,206
Company name groups6,941

FasterCapital

Production-Ready Profile Tiers

How much data is available for richer pages

Deep intersection132,349
Premium tier70,254

FasterCapital

Confidence Framework and Publishing Rules

Metric	Count	Confidence	Use Guidance
Funding records (total)	478,975	High	Raw event inventory
Funding records with date	449,221	High	Suitable for cycle timing and trend analysis
Funding records with amount	77,863	Medium	Use with explicit coverage caveat
Companies with country	287,614	High	Reliable for country-level comparisons
Companies with sector	284,847	High	Reliable for sector-level comparisons
Deep-intersection companies	132,349	High	Suitable for rich profile report generation

Publication policy should reflect confidence level. High-confidence fields can appear in headlines and summaries. Medium-confidence fields should remain in chart or analysis sections with caveats. For strict quality control, entity-level pages should pass minimum intersection rules before publication (for example: funding + investors + people + core descriptive fields).

Identifier policy is equally important. Do not merge or deduplicate records by display names alone. Use stable identifiers (`_id`, profile slug, public ID) and treat display names as presentational attributes. This prevents false merges and preserves analytical reliability across large-scale report generation.

Show coverage notes directly beside amount-based charts.
Enforce ID-based identity logic for all profile-level pages.
Use minimum quality tiers before generating deep profile reports.
Retain methodology and caveat sections on every indexable report page.

Year-Dimension Reliability Note

Year dimension quality is strong when dates are available, which supports robust cycle analysis. For year-based reporting, prioritize date-complete series and avoid constructing hard conclusions from amount-only subsets without indicating coverage ratio.

Related Reports (Quality Tier Exploration)

Quality tiers are designed to control confidence at scale. The fastest way to understand impact is to keep the scenario fixed and vary only the quality tier. The examples below are generated for the most recent years in the dataset (2025, 2024).

Related Report	URL	Why This Helps
Investor Intelligence Report	/data/insights/report/investor-intelligence/	Use quality tiers together with investor activity and market tags to improve matching and outreach efficiency.
Funding Cycle Report (Year Dimension)	/data/insights/report/funding-cycle/	Dates are strong coverage; use year-dimension reporting heavily and treat amount totals with caveats.
Hypercube Builder (8 Dimensions)	/data/insights/report/hypercube/	Build deep scenario reports and choose a data-quality tier for confidence control.
Matrix Catalog (Country x Sector x Year)	/data/insights/report/matrix/	High-detail intersection reports where quality caveats can be applied consistently.
Matrix Example (2025): United States x Internet	/data/insights/report/matrix/united-states/internet/2025/	Same market slice with a newer year dimension, then use quality tiers for confidence control.
Quality Tier Compare (2025): Single-Source Coverage	/data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/single-source/	Lower confidence tier for broader coverage; suitable for discovery with stronger caveats.
Quality Tier Compare (2025): Dual-Source Coverage	/data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/dual-source/	Moderate confidence tier; better for lists and ranking comparisons.
Quality Tier Compare (2025): Triple-Source Coverage	/data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/triple-source/	Higher confidence tier; useful for richer pages and stronger claims.
Quality Tier Compare (2025): Quad-Source Coverage	/data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/quad-source/	Strong confidence tier; recommended default for published scenario reports.
Quality Tier Compare (2025): Nine-Source Verified	/data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/nine-source-verified/	Highest confidence tier; best for premium pages and externally cited claims.
Matrix Example (2024): United States x Internet	/data/insights/report/matrix/united-states/internet/2024/	Same market slice with a newer year dimension, then use quality tiers for confidence control.
Quality Tier Compare (2024): Single-Source Coverage	/data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/single-source/	Lower confidence tier for broader coverage; suitable for discovery with stronger caveats.
Quality Tier Compare (2024): Dual-Source Coverage	/data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/dual-source/	Moderate confidence tier; better for lists and ranking comparisons.
Quality Tier Compare (2024): Triple-Source Coverage	/data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/triple-source/	Higher confidence tier; useful for richer pages and stronger claims.
Quality Tier Compare (2024): Quad-Source Coverage	/data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/quad-source/	Strong confidence tier; recommended default for published scenario reports.
Quality Tier Compare (2024): Nine-Source Verified	/data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/nine-source-verified/	Highest confidence tier; best for premium pages and externally cited claims.

FAQ

Can we safely generate thousands of pages?

Yes, especially using quality tiers. More than 70k premium entities are available for richer pages.

Why include caveats if data is large?

Scale does not remove field-level coverage gaps; caveats preserve analytical integrity.

Should we deduplicate by name?

No. Use stable IDs for identity and keep name duplicates as expected real-world behavior.

Which metrics are safest for headline claims?

Entity counts and high-completeness structural dimensions such as country and sector.

How do we improve confidence over time?

Run periodic enrichment and normalization workflows, then regenerate metrics and report pages.