DSPL 2.0

Contact: Omar Benjelloun, Natarajan Krishnaswami

Last updated: 2019-07-09

Introduction

The DSPL2 open format represents the structure and content of statistical datasets in a simple, standards-compatible way that enables Search engines and other data exploration tools to provide useful functionality to users.

DSPL2 focuses on “statistical” datasets, i.e., datasets that contain aggregated numerical measures over dimensions such as location or demographic attributes. The data is generally provided as time series. This format may also be useful for other types of datasets.

Some of the key ideas of DSPL2:

Non-goals of this document:

Background

DSPL2 builds on a number of existing formats and vocabularies:

DSPL2 is closely related to the following formats and standards that address dataset authoring:

Overview

We introduce the following constructs:

While most of the constructs above describe “metadata” about the dataset, we also need to represent actual data. The preferred way of providing this data is to use CSV tables. An equivalent triple representation is also supported.

Diagram of DSPL 2.0 types and properties

Notes

Provenance

Many statistical datasets include data from multiple providers. The CreativeWork superclass for StatisticalDataset offers support for multiple authors and publishers at the dataset level, but does not address provenance for individual measures or timeseries observations. This is important for data aggregators, so it may be desirable to add support for provenance at a finer granularity in the future.

Identifiers

JSON-LD objects can be assigned URIs (or rather, IRIs) using the @id property. Within a file, these can refer to the file’s location using an empty identifier, or document-unique IDs using fragments like #age_year_slice. This can be used to give permanent identifiers to datasets, their dimensions, measures, and annotations, and to the timeseries within them.

Schema

A draft of the schema additions is at dspl2.jsonld. (The schema should follow this document; any discrepancies are errors.)

Vocabulary

StatisticalDataset

Thing > CreativeWork > Dataset > StatisticalDataset

A dataset that contains statistical data. It has the following properties, in addition to those available to a Dataset:

Property Expected type Description
measure StatisticalMeasure A measure defined in this dataset.
dimension CategoricalDimension or TimeDimension A dimension defined in the dataset.
annotation StatisticalAnnotation or URL an annotation defined in this dataset, or the URL of a CSV table containing the annotation definitions. See section “[Annotations](#annotations)” for more details.
slice DataSlice A slice defined in this dataset.
footnote PropertyValue A footnote applying to this dataset.

Examples

{
  "@type": "StatisticalDataset",
  "@id": "#europe_unemployment",
  "name": "Unemployment in Europe (monthly)",
  "description": "Harmonized unemployment data for European countries.",
  "url": "http://epp.eurostat.ec.europa.eu/portal/page/portal/lang-en/employment_unemployment_lfs/introduction",
  "license": "https://ec.europa.eu/eurostat/about/policies/copyright",
  "creator":{
     "@type":"Organization",
     "url": "https://ec.europa.eu/eurostat",
     "name":"Eurostat",
  },
  "temporalCoverage":"1993-01/2010-12",
  "spatialCoverage":{
     "@type":"Place",
     "geo":{
       "@type":"GeoShape",
       "name": "European Union",
        "box":"34.633285 -10.468556 70.096054 34.597916"
     }
  },
  "measure": …,
  "dimension": …,
  "annotation": …,
  "slice": …
}

StatisticalMeasure

Thing > Intangible > StatisticalMeasure

A quantifiable phenomenon or indicator being observed or calculated (e.g., average rainfall, population, percentage of forested land, …) .

Property Expected type Description
dataset Dataset A dataset where this measure is defined.
unitCode Text or Url The unit of measurement given using the UN/CEFACT Common Code (3 characters) or a URL. Other codes than the UN/CEFACT Common Code may be used with a prefix followed by a colon.
unitText Text A string or text indicating the unit of measurement. Useful if you cannot provide a standard unit code for unitCode.
footnote PropertyValue A footnote applying to this measure.

Examples

{
  "@type": "StatisticalMeasure",
  "@id": "#unemployment_rate",
  "dataset": "#europe_unemployment",
  "sameAs": "https://www.wikidata.org/wiki/Q1787954",
  "name": [
    { "@value": "Unemployment rate (monthly)", "@language": "en" },
    { "@value": "Arbeitslosenquote (monatlich)", @language": "de" },
  ],
  "description": "The unemployment rate represents unemployed persons as a percentage of the labour force. The labour force is the total number of people employed and unemployed.",
  "url": "http://ec.europa.eu/eurostat/product?code=une_rt_m&language=en",
  "unitCode": "P1"
}

DimensionProperty

Type: Thing > Intangible > DimensionProperty

Categorical dimensions can have additional properties than the ones defined above. Each of these should have a DimensionProperty describing its name and type.

Property Expected type Description
propertyID Text The name of the property in a `DimensionValue`.
propertyType rdfs:Class or CategoricalDimension The type to use for the property's value, if using CSV files for slice data.

If the propertyType is the ID of a CategoricalDimension, the corresponding slice CSV values must be code values from that dimension’s codeList.
If it is one of the schema:DataType types, the corresponding slice CSV values must be of that type.
Otherwise, the type’s properties may be specified using a column named [*propertyID*].[*property of propertyType*], with [*propertyID*] alone being equivalent to [*propertyID*].name.

For example, for the following DimensionProperty:

{
  "@type": "DimensionProperty",
  "propertyID": "geo",
  "propertyType": "GeoCoordinates"
}

The corresponding columns in a slice data CSV file might look like:

…,geo.latitude,geo.longitude
…,32.318231,-86.902298
…,63.588753,-154.493062
…,34.048928,-111.093731
…,35.20105,-91.831833

TableMapping

Type: Thing > Intangible > TableMapping

This specifies the mapping from an entity to a particular header column name in a CSV file.

Property Expected type Description
sourceEntity URL The entity (Dimension, Measure, DimensionProperty) whose column name is being specified.
columnIdentifier Text The column name in a related CSV corresponding to this property.
value DataType The value to use for the specified `sourceEntity`; this specifies a constant-valued virtual column.

CategoricalDimension

Type: Thing > Intangible > CategoricalDimension

Categorical dimensions define the categories that measures can apply to. For instance, countries, genders, age groups, etc. A categorical dimension has a list of possible categories or values, called a codeList (see below).

A categorical dimension may correspond to an existing (schema.org) type, in which case that type should be set as the dimension’s equivalentType. The values in the codeList may specify that they have that type and set any of its properties.

Property Expected type Description
dataset Dataset A dataset where this dimension is defined.
equivalentType URL The type of the DimensionValue objects for this dimension
dimensionProperty DimensionProperty Type information for each additional property to use on this dimension's `DimensionValue`s.
parentProperty DimensionProperty The ID of a dimensionProperty for this dimension's parent dimension.
tableMapping TableMapping Column mappings for `dimensionProperty` `propertyID`s to `codeList` CSV header names.
codeList DimensionValue or URL The code values and additional properties for this dimension or the URL of a CSV file containing them.
footnote PropertyValue A footnote applying to this dimension.

When a code list is provided as a table, the data in the CSV table must follow these conventions:

If a tableMapping has a value property, this is used instead of the corresponding column in the CSV file, if present.

If a single parentProperty is specified and refers to a DimensionProperty whose propertyType is a CategoricalDimension, that dimension is called the parent dimension for this dimension, and its values are the parent values for the DimensionValues they occur in.

Examples

Below, the dimension’s values are defined in “genders.csv”.

{
  "@type": "CategoricalDimension",
  "@id": "#gender",
  "dataset": "#europe_unemployment",
  "codeList": "genders.csv"
}

Where the CSV table might begin like this:

"codeValue","name@en","name@fr","name@de"
"f","Women","Femmes","Frauen"
"m","Men","Hommes","Männer"
"nb","Non-binary","Non-binaire","Nichtbinär"
"tm","Transmasculine","Transmasculin","Transmaskulin"
"tf","Transfeminine","Transféminine","Transfeminin"
"ag","Agender","Agenre","Agender"

These have a parent-child relationship:

{
  "@type": "CategoricalDimension",
  "@id": "#country_group",
  "dataset": "#europe_unemployment",
  "codeList": [{
    "@type": "DimensionValue",
    "@id": "#country_group=eu",
    "dimension": "#country_group",
    "codeValue": "eu",
    "name": "European Union members"
  }, {
    "@type": "DimensionValue",
    "@id": "#country_group=non-eu",
    "dimension": "#country_group",
    "codeValue": "non-eu",
    "name": "non-European Union members"
  }
},
{
  "@type": "CategoricalDimension",
  "@id": "#country",
  "dataset": "#europe_unemployment",
  "dimensionProperty": {
    "@type": "DimensionProperty",
    "@id": "#country_group_property",
    "propertyID": "country_group",
    "propertyType": "#country_group"
  },
  "parentProperty": "#country_group_property",
  "codeList": "countries.csv"
}

This demonstrates a virtual parent column:

{
  "@type": "CategoricalDimension",
  "@id": "#state",
  "dataset": "",
  "name": "US States",
  "equivalentType": "State",
  "dimensionProperty": [{
    "@type": "DimensionProperty",
    "@id": "#countryProperty",
    "propertyID": "containedInPlace",
    "propertyType": "#country"
  }, {
    "@type": "DimensionProperty",
    "description": "The centroid of the state.",
    "propertyID": "geo",
    "propertyType": "GeoCoordinates"
  }],
  "parentProperty": "#countryProperty",
  "tableMapping": {
    "@type": "TableMapping",
    "sourceEntity": "#countryProperty",
    "value:": "US"
  },
  "codeList": "states.csv"
}

Where CSV file countries.csv has a column country_group with values of eu or non-eu.

TimeDimension

Type: Thing > Intangible > TimeDimension

Time dimensions define times that time series measures apply to. For instance, years, years with quarters, dates, datetimes, etc.

A time dimension may correspond to an existing (schema.org) type, in which case that type should be set as the dimension’s equivalentType. The dateFormat property indicates how to parse timestamps in slice observations, for slices with observations in CSV files.

Property Expected type Description
dataset Dataset A dataset where this dimension is defined.
equivalentType URL The type of the DimensionValue objects for this dimension.
dateFormat Text A CLDR date format pattern to use for parsing values from a slice CSV file into the dimension’s equivalentType. See http://www.unicode.org/reports/tr35/tr35-dates.html#Date_Format_Patterns for details on these patterns.
footnote PropertyValue A footnote applying to this dimension.

Examples

{
  "@type": "TimeDimension",
  "@id": "#month",
  "dataset": "#europe_unemployment",
  "equivalentType": "xsd:gYearMonth",
  "dateFormat": "yyyy/MM"
}

StatisticalAnnotation

Thing > Intangible > StatisticalAnnotation

Many statistical datasets use annotate values in observations with annotations to provide contextual explanations or methodological details. The inherited properties (e.g., name, description, and URL) can be used to provide details.

Property Expected type Description
dataset Dataset A dataset where this annotation is defined.
codeValue Text The code to use for this annotation.

Examples

{
  "@type": "StatisticalAnnotation",
  "@id": "#annotation=p",
  "dataset": "#europe_unemployment",
  "description": "This value is a projection.",
  "codeValue": "p"
}

For examples of defining annotations, see section, “Annotations - Examples - Defining Annotations

DimensionValue

Type: Thing > Intangible > StructuredValue > DimensionValue

A dimension’s permitted or observed value.

Values in a codeList must provide a code using the codeValue property, which is used to identify it elsewhere in the dataset (e.g., in observations). These can have additional properties:

In an observation, the values for dimensions without codes, such as the timestamp in time series data, should be set using the value property.

Property Expected type Description
dimension CategoricalDimension or TimeDimension The dimension this value belongs to.
codeValue Text The code from the dimension’s code list.
value Text or Number or DateTime The non-code value for a time dimension.
parent DimensionValue If `parentDimension` is specified, the value in the `parentDimension` for this value.
additionalProperty PropertyValue A property/value pair associated with the dimension value.

Examples

A DimensionValue with an inherited property (name):

{
  "@type": "DimensionValue",
  "@id": "#country_group=eu",
  "dimension": "#country_group",
  "codeValue": "eu",
  "name": [
    {"@language": "en", "@value": "European Union"},
    {"@language": "fr", "@value": "Union européenne"},
    {"@language": "de", "@value": "Europäische Union"}
  ]
}

A DimensionValue in #country’s codeList, with properties from an equivalentType and a parentProperty of country_group:

{
  "@type": ["DimensionValue", "Country"],
  "@id": "#country=at",
  "dimension": "#country",
  "codeValue": "at"
  "alternateName": "AT",
  "name": [
    {"@language": "en", "@value": "Austria"},
    {"@language": "fr", "@value": "Autriche"},
    {"@language": "de", "@value": "Österreich"}
  ],
  "geo": {
    "@type": "GeoCoordinates",
    "latitude": 47.6965545,
    "longitude": 13.34598005
  },
  "country_group": "#country_group=eu"
}

Note: In the above example, parent is given by ID rather than as a DimensionValue object with codeValue of ‘eu’. Processing software should behave the same with either form.

Similarly, a DimensionValue in an Observation can refer to a DimensionValue defined in its CategoricalDimension’s codeList by code:

{
  "@type": "DimensionValue",
  "dimension": "#country",
  "codeValue": "at"
}

Or by ID:

{
  "@id": "#country=at"
}

Time dimensions need non-code values:

{
  "@type": "DimensionValue",
  "dimension": "#month",
  "value": {
    "@type": "xsd:gYearMonth",
    "@value": "2010-10"
  }
}

For an example of a dimension value references in an observation, see “Observation - examples

DataSlice

Type: Thing > Intangible > DataSlice

A slice is a grouping of statistical observations that share the same measures and dimensions. For example, A slice may contain population and GDP (measures) by country and year (dimensions).

Property Expected type Description
dataset Dataset The dataset this slice belongs to.
dimension CategoricalDimension or TimeDimension A dimension that is used in this slice.
measure StatisticalMeasure A measure that is used in this slice.
tableMapping TableMapping Column mappings for `measure`s or `dimension`s to `data` CSV header names.
data URL or Observation A CSV table containing the observations of the slice or a set of observations that belong to the slice.
footnote PropertyValue A footnote applying to this slice.

Slice data can be provided in two equivalent ways:

When data for a DataSlice is provided as a CSV table, it must follow these conventions:

Examples

{
  "@type": "DataSlice",
  "dataset": "#europe_unemployment",
  "dimension": [
    "#country",
    "#month",
    "#age"
  ],
  "measure": [
    "#unemployment",
    "#unemployment_rate"
  ],
  "tableMapping": {
    "@type": "tableMapping", 
    "sourceEntity": "#age",
    "columnIdentifier": "ages_code"
  },
  "data": "country_total_byage.csv"
}

Observation

Type: Thing > Intangible > Observation

A statistical observation holds the individual measurements for a slice, for a given set of dimension values.

Property Expected type Description
slice DataSlice The slice the observation belongs to.
dimensionValue DimensionValue One or more dimension values for dimensions specified in the containing slice.
measureValue MeasureValue One or more measure values for measures specified in the containing slice.

Examples

{
  "@id": "#country=uk/month=2010-10/sex=m",
  "@type": "Observation",
  "dimensionValue": [
    {
      "@type": "DimensionValue",
      "dimension": "#country",
      "codeValue": "uk"
    },
    {
      "@type": "DimensionValue",
      "dimension": "#month",
      "value": {
        "@type": "xsd:gYearMonth",
        "@value": "2010-10"
      }
    },
    {
      "@type": "DimensionValue",
      "dimension": "#gender",
      "codeValue": "m"
    }
  ],
  "measureValue": …
}

MeasureValue

Type: Thing > Intangible > StructuredValue > QuantitativeValue > MeasureValue

A MeasureValue is a quantitative value that indicates the measure to which it corresponds. It may optionally be annotated with one or more of the annotations defined in the StatisticalDataset. For example, values for a population measure might use annotations to distinguish intercensal estimates from counts.

Property Expected type Description
measure StatisticalMeasure The measure for which this value is observed.
annotation StatisticalAnnotation an annotation defined in this statistical dataset.
value Boolean or
Number or
StructuredValue or
Text
The value property inherited from QuantitativeValue.

Examples

{
  "@type": "MeasureValue",
  "measure": #unemployment,
  "value": 1448000
}

{
  "@type": "MeasureValue",
  "measure": "#unemployment_rate",
  "unitCode": "P1",
  "value": 8.5
}

For examples with annotations, see section, “Annotations - Examples - Using Annotations”.

Annotations

Many statistical datasets use annotations on values in observations to provide contextual explanations or methodological details. The annotations appearing in a dataset can be defined using the StatisticalDataset’s annotation property, either as a URL to a CSV table or inline as JSON-LD objects. When annotations are provided as a table, the data in the CSV table must follow these conventions:

References to annotations appear as annotation values on MeasureValue instances. If these were provided in a CSV table for a slice,

Examples

Defining Annotations

an annotations CSV table might look like the one from the prior example:

codeValue,description@en
p,This value is a projection.
r,This value has been revised.

If the annotations are given as JSON, they might look like this:

{
  "@type": "StatisticalDataset",
  …
  "annotation": [
    {
      "@type": "StatisticalAnnotation",
      "@id": "#annotation=p",
      "description": {
        "@language": "en",
        "@value": "This value is a projection."
      },
      "codeValue": "p"
    },
    {
      "@type": "StatisticalAnnotation",
      "@id": "#annotation=r",
      "description": {
        "@language": "en",
        "@value": "This value has been revised."
      },
      "codeValue": "r"
    },
  ]
  …
}

Using Annotations

When setting annotations (possibly more than one, as in the example) on a MeasureValue can be referred to by code:

{
  "@type": "MeasureValue",
  "measure": "#unemployment",
  "value": 1448000,
  "annotation": [
    {
      "@type": "StatisticalAnnotation",
      "codeValue": "p",
    },
    {
      "@type": "StatisticalAnnotation",
      "codeValue": "r",
    }
  ]
}

When using a CSV table of observations for a slice, an annotation column looks like this:

country,month,sex,unemployment,unemployment_rate,unemployment*
uk,2010-11,m,1465035,8.6,
Uk,2010-10,m,1448000,8.5,p;r

Full Examples

JSON-LD Files

An example with dimension and slice data in CSV files is at eurostat-unemployment-dspl-v1.json

The analogous dataset with all data expressed in JSON-LD is at eurostat-unemployment-dspl-v1-inline-big.json (172 MB)

A trimmed version with a single observation per slice is at eurostat-unemployment-dspl-v1-inline-small.json

With CSV Data

Below, all the dimensions, measures, annotations and slices are defined in CSV files:

{
  "@context": "http://schema.org",
  "@id": "#europe_unemployment",
  "@type": "StatisticalDataset",
  "name": "Unemployment in Europe (monthly)",
  "description": "Harmonized unemployment data for European countries.",
  "url": "http://epp.eurostat.ec.europa.eu/portal/page/portal/lang-en/employment_unemployment_lfs/introduction",
  "license": "https://ec.europa.eu/eurostat/about/policies/copyright",
  "creator":{
     "@type":"Organization",
     "url": "https://ec.europa.eu/eurostat",
     "name":"Eurostat",
  },
  "temporalCoverage":"1993-01/2010-12",
  "spatialCoverage":{
     "@type":"Place",
     "geo":{
       "@type":"GeoShape",
       "name": "European Union",
        "box":"34.633285 -10.468556 70.096054 34.597916"
     }
  },
  "measure": [
    {
      "@type": "StatisticalMeasure",
      "@id": "#unemployment",
      "dataset": "#europe_unemployment",
      "name": "Unemployment (monthly)",
      "description": "The total number of people unemployed",
      "url": "http://ec.europa.eu/eurostat/product?code=une_nb_m&language=en",
      "unitCode": "IE"
    },
  "dimension": [
    {
      "@type": "CategoricalDimension",
      "@id": "#country",
      "dataset": "#europe_unemployment",
      "codeList": "countries.csv",
      "equivalentType": "Country"
    },
    {
      "@type": "TimeDimension",
      "@id": "#month",
      "dataset": "#europe_unemployment",
      "equivalentType": "xsd:gYearMonth"
    }
  ],
  "annotation": "annotations.csv",
  "slice": {
    "@type": "DataSlice",
    "dimension": ["#country", "#month"],
    "measure": ["#unemployment", "#unemployment_rate"],
    "data": "country_total.csv"
  }
}

The country dimension’s code list countries.csv might begin like this:

"codeValue","alternateName","country_group","name@en","name@fr","name@de","latitude","longitude"
"at","AT","eu","Austria","Autriche","Österreich","47.6965545","13.34598005"
"be","BE","eu","Belgium","Belgique","Belgien","50.501045","4.47667405"
"bg","BG","eu","Bulgaria","Bulgarie","Bulgarien","42.72567375","25.4823218"

The annotation definitions table annotations.csv might begin like this

codeValue,description
p,This value is a projection
r,This value has been revised

The slice observations table country_total.csv might begin like this:

country,seasonality,month,unemployment,unemployment_rate
at,nsa,1993-01,171000,4.5
uk,trend,2010-10,2455000,7.8

With JSON Data

Below, data for all properties is provided inline as JSON:

{
  "@context": "http://schema.org",
  "@id": "#europe_unemployment",
  "@type": "StatisticalDataset",
  "name": "Le Ch\u00f4mage en Europe (mensuel)",
  "description": "Harmonized unemployment data for European countries. This dataset was prepared by Google based on data downloaded from Eurostat.",
  "url": "http://epp.eurostat.ec.europa.eu/portal/page/portal/lang-en/employment_unemployment_lfs/introduction",
  "license": "https://ec.europa.eu/eurostat/about/policies/copyright",
  "creator":{
     "@type":"Organization",
     "url": "https://ec.europa.eu/eurostat",
     "name":"Eurostat",
  },
  "temporalCoverage":"1993-01/2010-12",
  "spatialCoverage":{
     "@type":"Place",
     "geo":{
       "@type":"GeoShape",
       "name": "European Union",
        "box":"34.633285 -10.468556 70.096054 34.597916"
     }
  },
  "dimension": [
    {
      "@id": "#country",
      "@type": "CategoricalDimension",
      "dataset": "#europe_unemployment",
      "equivalentType": "Country",
      "codeList": [
        {
          "@id": "#country=at",
          "@type": [
            "DimensionValue",
            "Country"
          ],
          "name": "Austria",
          "codeValue": "at",
          "dimension": "#country",
          "alternateName": "AT",
          "additionalProperty": [
            {
              "@type": "PropertyValue",
              "propID": "latitude",
              "value": 47.6965545
            },
            {
              "@type": "PropertyValue",
              "propID": "longitude",
              "value": 13.34598005
            }
          ]
        }
      ]
    },
    {
      "@id": "#month",
      "@type": "TimeDimension",
      "dataset": "#europe_unemployment",
      "equivalentType": "xsd:gYearMonth"
    }
  ],
  "measure": [
    {
      "@id": "#unemployment",
      "@type": "StatisticalMeasure",
      "name": "Unemployment (monthly)",
      "description": "The total number of people unemployed",
      "url": "http://ec.europa.eu/eurostat/product?code=une_nb_m&language=en",
      "dataset": "#europe_unemployment",
      "sameAs": "https://www.wikidata.org/wiki/Q41171",
      "unitCode": "IE"
    },
    {
      "@id": "#unemployment_rate",
      "@type": "StatisticalMeasure",
      "name": "Unemployment rate (monthly)",
      "description": "The unemployment rate represents unemployed persons as a percentage of the labour force. The labour force is the total number of people employed and unemployed.",
      "url": "http://ec.europa.eu/eurostat/product?code=une_rt_m&language=en",
      "dataset": "#europe_unemployment",
      "sameAs": "https://www.wikidata.org/wiki/Q1787954",
      "unitCode": "P1"
    }
  ],
  "annotation": [
    {
      "@id": "#annotation=p",
      "@type": "StatisticalAnnotation",
      "dataset": "#europe_unemployment",
      "codeValue": "p",
      "description": "This value is a projection"
    },
    {
      "@id": "#annotation=r",
      "@type": "StatisticalAnnotation",
      "dataset": "#europe_unemployment",
      "codeValue": "r",
      "description": "This value has been revised"
    }
  ],
  "slice": [
    {
      "@type": "DataSlice",
      "@id": "#country_total",
      "dataset": "#europe_unemployment",
      "dimension": ["#country", "#month"],
      "measure": ["#unemployment", "#unemployment_rate"],
      "data": [
        {
          "@id": "#country=at/month=1993-01",
          "@type": "Observation",
          "slice": "#country_total",
          "dimensionValue": [
            {
              "@type": "DimensionValue",
              "dimension": "#country",
              "codeValue": "uk"
            },
            {
              "@type": "DimensionValue",
              "dimension": "#month",
              "value": "1993-01"
            }
          ],
          "measureValue": [
            {
              "@type": "MeasureValue",
              "measure": "#unemployment",
              "value": 171000,
              "unitCode": "IE",
              "annotation": "#annotation=p"
            },
            {
              "@type": "MeasureValue",
              "measure": "#unemployment_rate",
              "value": 4.5,
              "unitCode": "P1"
            }
          ]
        }
      ]
    }
  ]
}

JSON API considerations

Per the JSON-LD API best practices draft, a first consideration of JSON-LD APIs is to produce developer-friendly JSON.

The DSPL 2 JSON format described above has a number of inconvenient features for use as the output of a JSON API. These center around its use of arrays instead of objects for a number of properties, making lookups cumbersome. Several of these can be simplified by customizing value indexing in the JSON-LD context. Although the features are described separately, we anticipate they will be most useful when combined.

Language indexing

Fields like name and description support JSON-LD literal values for multilingual values:

    "name": [
        {"@value": "Germany", "@language": "en"},
        {"@value": "Deutschland", "@language": "de"},
        {"@value": "Allemania", "@language": "fr"}
    ]

we can make this friendlier for data consumers by writing it as a map using language indexing. With this context,

    "@context": [
        {
            "description": {"@container": "@language"}
            "name": {"@container": "@language"}
        },
        "http://schema.org"
    ]

we can write:

    "name": {
        "en": "Germany",
        "de": "Deutschland",
        "fr": "Allemange"
    }

Data indexing

Dimension code lists and annotations have a codeValue field, and footnote and additionalProperty a propertyID, which users will commonly want to look up:

    "annotation": [
        {
            "@id": "#annotation=p",
            "@type": "StatisticalAnnotation",
            "dataset": "#europe_unemployment",
            "codeValue": "p",
            "description": "This value is a projection"
        },
        {
            "@id": "#annotation=r",
            "@type": "StatisticalAnnotation",
            "dataset": "#europe_unemployment",
            "codeValue": "r",
            "description": "This value has been revised"
        }
    ]

We can use apply JSON-LD 1.1 property-based data indexing in this case. E.g.,

    "@context": [
        {
            "@version": 1.1,
            "codeList": {
                "@container": "@index",
                "@index": "schema:codeValue"
            }
            "annotation": {
                "@container": "@index",
                "@index": "schema:codeValue"
            }
        },
        "http://schema.org"
    ]

lets us write

    "annotation": {
        "p": {
            "@id": "#annotation=p",
            "@type": "StatisticalAnnotation",
            "dataset": "#europe_unemployment",
            "description": "This value is a projection"
        },
        "r": {
          "@id": "#annotation=r",
          "@type": "StatisticalAnnotation",
          "dataset": "#europe_unemployment",
          "description": "This value has been revised"
        }
    }
}

Node identifier indexing

In other cases, such as dimensions and measures, a DSPL 2 dataset might have lists of these values which would need to be looked up by ID, e.g., to resolve the dimensions or measures in a slice:

    "measure": [
        {
            "@id": "#unemployment",
            "@type": "StatisticalMeasure",
            "name": "Unemployment (monthly)",
            "description": "The total number of people unemployed",
            "url": "http://ec.europa.eu/eurostat/product?code=une_nb_m&language=en",
            "dataset": "#europe_unemployment",
            "sameAs": "https://www.wikidata.org/wiki/Q41171",
            "unitCode": "IE"
        },
        {
            "@id": "#unemployment_rate",
            "@type": "StatisticalMeasure",
            "name": "Unemployment rate (monthly)",
            "description": "The unemployment rate represents unemployed persons as a percentage of the labour force. The labour force is the total number of people employed and unemployed.",
            "url": "http://ec.europa.eu/eurostat/product?code=une_rt_m&language=en",
            "dataset": "#europe_unemployment",
            "sameAs": "https://www.wikidata.org/wiki/Q1787954",
            "unitCode": "P1"
        }
    ]

We can use (node identifier indexing)[https://w3c.github.io/json-ld-syntax/#node-identifier-indexing] to write these as ID-keyed objects instead of lists. E.g,

    "@context": [
        {
            "dimension": {
                "@container": "@id"
            },
            "measure": {
                "@container": "@id"
            },
            "slice": {
                "@container": "@id"
            }
        },
        "http://schema.org"
    ]

lets us write:

    "measure": {
        "#unemployment": {
            "@type": "StatisticalMeasure",
            "name": "Unemployment (monthly)",
            "description": "The total number of people unemployed",
            "url": "http://ec.europa.eu/eurostat/product?code=une_nb_m&language=en",
            "dataset": "#europe_unemployment",
            "sameAs": "https://www.wikidata.org/wiki/Q41171",
            "unitCode": "IE"
        },
        "#unemployment_rate": {
            "@type": "StatisticalMeasure",
            "name": "Unemployment rate (monthly)",
            "description": "The unemployment rate represents unemployed persons as a percentage of the labour force. The labour force is the total number of people employed and unemployed.",
            "url": "http://ec.europa.eu/eurostat/product?code=une_rt_m&language=en",
            "dataset": "#europe_unemployment",
            "sameAs": "https://www.wikidata.org/wiki/Q1787954",
            "unitCode": "P1"
        }
    }