Data Quality and Reliability Report
Confidence scoring, coverage diagnostics, and practical publication safeguards for startup and investor insights.
Snapshot reference: February 15, 2026
Executive Summary
This report defines what can be trusted, what should be qualified, and how to publish insight pages responsibly at scale. High entity volume alone does not guarantee publication quality. Quality comes from coverage depth, intersection depth, identifier reliability, and transparent caveats.
The dataset performs strongly on relationship depth. More than 252k companies are already linked to at least two related source families, and nearly 189k are linked to at least three. This supports large-scale generation of rich pages where each profile can include multi-dimensional context instead of single-field summaries.
Coverage varies by metric type. Structural fields such as country and sector are highly complete, while funding-event amount values are less complete than funding-event counts. The practical rule is simple: counts and rankings can usually be reported with high confidence, while amount totals require explicit coverage notes.
Quality Diagnostics Charts




Confidence Framework and Publishing Rules
| Metric | Count | Confidence | Use Guidance |
|---|---|---|---|
| Funding records (total) | 478,975 | High | Raw event inventory |
| Funding records with date | 449,221 | High | Suitable for cycle timing and trend analysis |
| Funding records with amount | 77,863 | Medium | Use with explicit coverage caveat |
| Companies with country | 287,614 | High | Reliable for country-level comparisons |
| Companies with sector | 284,847 | High | Reliable for sector-level comparisons |
| Deep-intersection companies | 132,349 | High | Suitable for rich profile report generation |
Publication policy should reflect confidence level. High-confidence fields can appear in headlines and summaries. Medium-confidence fields should remain in chart or analysis sections with caveats. For strict quality control, entity-level pages should pass minimum intersection rules before publication (for example: funding + investors + people + core descriptive fields).
Identifier policy is equally important. Do not merge or deduplicate records by display names alone. Use stable identifiers (`_id`, profile slug, public ID) and treat display names as presentational attributes. This prevents false merges and preserves analytical reliability across large-scale report generation.
- Show coverage notes directly beside amount-based charts.
- Enforce ID-based identity logic for all profile-level pages.
- Use minimum quality tiers before generating deep profile reports.
- Retain methodology and caveat sections on every indexable report page.
Year-Dimension Reliability Note
Year dimension quality is strong when dates are available, which supports robust cycle analysis. For year-based reporting, prioritize date-complete series and avoid constructing hard conclusions from amount-only subsets without indicating coverage ratio.
Related Reports (Quality Tier Exploration)
Quality tiers are designed to control confidence at scale. The fastest way to understand impact is to keep the scenario fixed and vary only the quality tier. The examples below are generated for the most recent years in the dataset (2025, 2024).
| Related Report | URL | Why This Helps |
|---|---|---|
| Investor Intelligence Report | /data/insights/report/investor-intelligence/ | Use quality tiers together with investor activity and market tags to improve matching and outreach efficiency. |
| Funding Cycle Report (Year Dimension) | /data/insights/report/funding-cycle/ | Dates are strong coverage; use year-dimension reporting heavily and treat amount totals with caveats. |
| Hypercube Builder (8 Dimensions) | /data/insights/report/hypercube/ | Build deep scenario reports and choose a data-quality tier for confidence control. |
| Matrix Catalog (Country x Sector x Year) | /data/insights/report/matrix/ | High-detail intersection reports where quality caveats can be applied consistently. |
| Matrix Example (2025): United States x Internet | /data/insights/report/matrix/united-states/internet/2025/ | Same market slice with a newer year dimension, then use quality tiers for confidence control. |
| Quality Tier Compare (2025): Single-Source Coverage | /data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/single-source/ | Lower confidence tier for broader coverage; suitable for discovery with stronger caveats. |
| Quality Tier Compare (2025): Dual-Source Coverage | /data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/dual-source/ | Moderate confidence tier; better for lists and ranking comparisons. |
| Quality Tier Compare (2025): Triple-Source Coverage | /data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/triple-source/ | Higher confidence tier; useful for richer pages and stronger claims. |
| Quality Tier Compare (2025): Quad-Source Coverage | /data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/quad-source/ | Strong confidence tier; recommended default for published scenario reports. |
| Quality Tier Compare (2025): Nine-Source Verified | /data/insights/report/hypercube/united-states/internet/2025/alive/venture-capital/software/5-24-deals/nine-source-verified/ | Highest confidence tier; best for premium pages and externally cited claims. |
| Matrix Example (2024): United States x Internet | /data/insights/report/matrix/united-states/internet/2024/ | Same market slice with a newer year dimension, then use quality tiers for confidence control. |
| Quality Tier Compare (2024): Single-Source Coverage | /data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/single-source/ | Lower confidence tier for broader coverage; suitable for discovery with stronger caveats. |
| Quality Tier Compare (2024): Dual-Source Coverage | /data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/dual-source/ | Moderate confidence tier; better for lists and ranking comparisons. |
| Quality Tier Compare (2024): Triple-Source Coverage | /data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/triple-source/ | Higher confidence tier; useful for richer pages and stronger claims. |
| Quality Tier Compare (2024): Quad-Source Coverage | /data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/quad-source/ | Strong confidence tier; recommended default for published scenario reports. |
| Quality Tier Compare (2024): Nine-Source Verified | /data/insights/report/hypercube/united-states/internet/2024/alive/venture-capital/software/5-24-deals/nine-source-verified/ | Highest confidence tier; best for premium pages and externally cited claims. |
FAQ
Can we safely generate thousands of pages?
Yes, especially using quality tiers. More than 70k premium entities are available for richer pages.
Why include caveats if data is large?
Scale does not remove field-level coverage gaps; caveats preserve analytical integrity.
Should we deduplicate by name?
No. Use stable IDs for identity and keep name duplicates as expected real-world behavior.
Which metrics are safest for headline claims?
Entity counts and high-completeness structural dimensions such as country and sector.
How do we improve confidence over time?
Run periodic enrichment and normalization workflows, then regenerate metrics and report pages.
