Properties of a High Quality OSV Record
Version
1.0.0 (SEMVER)
Purpose
Describe the “good enough” OSV record that will be imported by OSV.dev
Out of scope
This does not discuss the problem of record bit rot over time, after initial successful import. The problem of continuous revalidation and treatment of records that have been successfully imported will be dealt with separately in the companion to this, Managing the Perishability of OSV Records.
Deferred to a future iteration: validating the existence of vulnerable functions in the ecosystem_specific
field, if supplied.
Audience
- OSV record producers
- Downstream OSV.dev record consumers
Rationale
OSV.dev seeks to be a comprehensive, accurate and timely database of known vulnerabilities that is highly automation friendly. In order to meet this accuracy goal, a quality bar needs to be both defined and sustainably enforced.
Properties of a High Quality OSV Record
Valid
As a prerequisite, it is assumed that a record passes JSON Schema validation for the version of the OSV Schema it declares itself to comply with in the schema_version
field, or 1.0.0 if it does not. It is also assumed that the vulnerability discussed in the OSV record is valid and affects the software described.
Precise
A high quality OSV record allows a consumer of that record to be able to answer the following questions in an automated way, at scale:
- “Does this vulnerability, as described, impact me?”
- “What version do I need to upgrade to, or what patches do I need to apply, for it not to impact me?”
- “Should I replace or remove this (potentially orphaned) package with known unfixed vulnerabilities?”
The definition of “impact” will vary depending on how fine-grained the information available is (i.e. package-level or symbol-level for software library packages). Package-level precision is the minimum standard.
- for version and commit ranges
affected[]
.ranges[]
.events[]
.introduced
is defined- prefer
affected[]
.ranges[]
.events[]
.fixed
overaffected[]
.ranges[]
.events
.last_affected
- this minimizes false negatives
- distinct ranges for
introduced..fixed
and/orintroduced..last_affected
(i.e. introduced and fixed versions or commits can’t be the same) - values in
introduced
are before/less thanfixed
/last_affected
according to the canonical package registry or project version control - for version (
ECOSYSTEM
andSEMVER
) ranges- the versions exist in the specific package ecosystem
- for commit (
GIT
) ranges- the commits exist in the specified
repo
(i.e. they are not from another GitHub fork)
- the commits exist in the specified
- the
package.ecosystem
, and a uniqueidentifier
prefix for it, are defined in the OSV Schema - the
package.name
exists within the definedpackage.ecosystem
, and is canonically encoded to be unambiguous (i.e. normalized) - Package URLs in the
package.url
field conform to the specification reference
URLs return a 2xx or 3xx response at the time of publication
Identifiable
- Where relevant, an
alias
to the equivalent CVE record is present - Where an OSV record consolidates multiple vulnerabilities in another ecosystem (or universe), multiple
related
identifiers are present
Examples
- GO-2024-2687
- Has
introduced
andfixed
versions - Has an alias to a CVE record ID
- Has a purl
- Has
- OSV-2024-98
- Has
introduced
andfixed
commits- commits exist in repo
- Has
- DSA-5678-1
- Has
introduced
andfixed
versions - Has multiple
related
CVE record IDs
- Has
Appendix A: OSV Schema validation
(As at version 1.6.3, generated by Gemini from the OSV JSON schema)
Top-Level Information:
- id: A unique string identifier for the vulnerability.
- modified: A timestamp (in RFC3339 format, in UTC, ending in “Z”) indicating when the vulnerability information was last updated.
Optional, but validated when present:
- schema_version: A string specifying the version of the schema being used.
- published/withdrawn: Timestamps (in RFC3339 format, in UTC, ending in “Z”) for when the vulnerability was published or withdrawn.
- aliases/related: Arrays of strings for alternate identifiers or related vulnerabilities.
- summary/details: String descriptions of the vulnerability.
- severity: An array of objects detailing the severity using different scoring systems (e.g., CVSS v2, v3, or v4), if available.
- affected: An array of objects describing which packages are affected, including details like:
- package: The ecosystem (e.g., npm, PyPI), name, and Package URL (PURL) of the affected package.
- severity: Severity for the specific package (if different from the overall severity).
- ranges: Information on the affected version ranges, commit ranges, or ecosystem-specific identifiers.
- versions: A list of specific affected versions.
- ecosystem_specific/database_specific: Additional data specific to the package ecosystem or the vulnerability database.
- references: An array of objects providing URLs to external resources about the vulnerability, categorized by type (e.g., advisory, article, discussion).
- credits: An array of objects giving credit to individuals or organizations involved in discovering, reporting, or fixing the vulnerability.
- database_specific: A flexible object for any extra information specific to the database using this schema.
Additional Validation Rules:
- timestamp: A custom definition that ensures timestamps adhere to the RFC3339 date-time format (e.g., “2023-11-15T12:34:56Z”).
- additionalProperties: false: This prevents any extra properties from being added to the JSON object beyond those defined in the schema.
- **Specific Requirements in
affected
Array:- There are conditional validations based on the
type
of range, ensuring the correct properties are present (e.g.,repo
is required whentype
isGIT
). - A logical check ensures that if
last_affected
is specified inevents
, thenfixed
cannot be present in the sameevents
array.
- There are conditional validations based on the