This specification describes a mechanism for ensuring the authenticity and integrity of Linked Data documents using mathematical proofs.

This is an experimental specification and is undergoing regular revisions. It is not fit for production deployment.

Introduction

The term Linked Data is used to describe a recommended best practice for exposing, sharing, and connecting information on the Web using standards, such as URLs, to identify things and their properties. When information is presented as Linked Data, other related information can be easily discovered and new information can be easily linked to it. Linked Data is extensible in a decentralized way, greatly reducing barriers to large scale integration. With the increase in usage of Linked Data for a variety of applications, there is a need to be able to verify the authenticity and integrity of Linked Data documents. This specification adds authentication and integrity protection to linked data documents through the use of mathematical proofs without sacrificing Linked Data features such as extensibility and composability.

Design Goals and Rationale

The Linked Data Proofs specification achieves the following design goals:

Simple for Developers
The proof format is designed to be easy to use for developers that don't have significant cryptography training. For example, cryptosuite identifiers are used instead of specific cryptographic parameters to ensure that it is difficult to accidentally produce a weak digital proof.
Syntax Agnostic
The proof mechanism can be used across a variety of RDF data syntaxes such as JSON-LD, N-Quads, and TURTLE, without the need to regenerate the proof.
Agile
Since digital proof suites may be compromised without warning due to technological advancements, it is important that suites can be easily and quickly replaced. This specification provides algorithm agility while still keeping the digital proof format easy for developers to understand.
Extensible
Creating and deploying new proof suites is a fairly trivial undertaking to ensure that the proof format increases the rate of innovation in the digital proof space.

Terminology

The following terms are used to describe concepts involved in the generation and verification of Linked Data digital proofs.

linked data document
A document comprised of Linked Data.
signed linked data document
A linked data document that has been digitally signed.
linked data proof
A set of attributes that represent a Linked Data digital proof and the parameters required to verify it.
proof options
A set of options that is included in the proof data. These options may be a domain, nonce, or other data that is specific to the proof format.
proof suite
A specified set of cryptographic primitives typically consisting of a canonicalization algorithm, a message digest algorithm, and a proof algorithm that are bundled together by cryptographers for developers for the purposes of safety and convenience.
public key
A cryptographic key that can be used to verify digital proofs created with a corresponding private key.
private key
A cryptographic key that can be used to generate digital proofs.
domain
A string value that specifies the operational domain of a digital proof. This may be an Internet domain name like example.com, a ad-hoc value such as mycorp-level3-access, or a very specific transaction value like 8zF6T$mqP. A signer may include a domain in its digital proof to restrict its use to particular target, identified by the specified domain.
canonicalization algorithm
An algorithm that takes an input document that has more than one possible representation and always transforms it into a deterministic representation. For example, alphabetically sorting a list of items is a type canonicalization. This process is sometimes also called normalization.
message digest algorithm
An algorithm that takes an input message and produces a cryptographic output message that is often many orders of magnitude smaller than the input message. These algorithms are often 1) very fast, 2) non-reversible, 3) cause the output to change significantly when even one bit of the input message changes, and 4) make it infeasible to find two different inputs for the same output.
proof algorithm
An algorithm that takes an input message and produces an output value where the receiver of the message can mathematically verify that the message has not been modified in transit and came from someone possessing a particular secret.

Linked Data Proof Overview

A linked data proof is comprised of information about the proof, parameters required to verify it, and the proof value itself. All of this information is provided using Linked Data vocabularies such as the [[!SECURITY-VOCABULARY]].

A linked data proof typically includes at least the following attributes:

type (required)
A URI that identifies the digital proof suite that was used to create the proof. For example: RsaSignature2018.
creator (required)
A URI that identifies the entity that created the proof such as a public/private key pair. The URI SHOULD be a URL that can be dereferenced to obtain a linked data document that contains a link identifying the entity that owns the proof material. Dereferencing the entity link SHOULD result in a Linked Data document that contains a link back to the URL identifier for the proof material, thereby proving ownership.
created (required)
The string value of an [[!ISO8601]] combined date and time string generated by the Proof Algorithm.
domain (optional)
A string value specifying the restricted domain of the proof.
nonce (optional, but strongly recommended)
A string value that is included in the digital proof and MUST only be used once for a particular domain and window of time. This value is used to mitigate replay attacks.
proofValue (required)
The value of the proof value generated by the Proof Algorithm.

Since this specification is based on Linked Data, the terms type, creator, created, domain, nonce, and proofValue above map to URLs. The vocabulary where these terms are defined is the [[SECURITY-VOCABULARY]].

A proof can be added to a Linked Data document like the following:

{
  "@context": "https://w3id.org/identity/v1",
  "title": "Hello World!"
}
      

by adding the parameters outlined in this section:

{
  "@context": "https://w3id.org/identity/v1",
  "title": "Hello World!",
  "proof": {
    "type": "RsaSignature2018",
    "creator": "https://example.com/i/pat/keys/5",
    "created": "2017-09-23T20:21:34Z",
    "domain": "example.org",
    "nonce": "2bbgh3dgjg2302d-d2b3gi423d42",
    "proofValue": "eyJ0eXAiOiJK...gFWFOEjXk"
  }
}
      

The proof example above uses the RsaSignature2018 proof suite to produce a verifiable digital proof.

Create a separate section detailing an optional mechanism for authenticating public key ownership via bi-directional links. How to establish trust in key owner entities is out of scope but examples can be given.
Specify algorithm agility mechanisms (additional attributes from the security vocab can be used to indicate other signing and hash algorithms). Rewrite algorithms to be parameterized on this basis and move `RsaSignature2018` definition to a single supported mechanism; specify its identifier as a URL. In order to make it easy to specify a variety of combinations of algorithms, introduce a core type `LinkedDataProof` that allows for easy filtering/discover of proof nodes, but that type on its own doesn't specify any default proof or hash algorithms, those must be given via other properties in the nodes.
Add a note indicating that this specification should not be construed to indicate that public key owners should be restricted to a single public key or that systems that use this spec and involve real people should identify each person as only ever being a single entity rather than perhaps N entities with M keys. There are no such restrictions and in many cases those kinds of restrictions are ill-advised due to privacy considerations.
Add an explicit check on key type to prevent an attacker from selecting an algorithm that may abuse how the key is used/interpreted.
Add a note indicating that selective disclosure proof mechanisms can be compatible with Linked Data Proofs; for example, an algorithm could produce a merkle tree from a canonicalized set of N-Quads and then sign the root hash. Disclosure would involve including the merkle paths for each N-Quad that is to be revealed. This mechanism would merely consume the normalized output differently (this, and the proof mechanism would be modifications to this core spec). It may also be necessary to generate proof parameters such as a private key/seed that can be used along with an algorithm to deterministically generate nonces that are concatenated with each N-Quad to prevent rainbow table or similar attacks.

Multiple Proofs

The Linked Data Proofs specification supports the concept of multiple proofs in a single document. There are two types of multi-proof approaches that are identified: Proof Sets and Proof Chains.

Proof Sets

A proof set is useful when the same data needs to be secured by multiple entities, but where the order of proofs does not matter, such as in the case of a set of signatures on a contract. A proof set, which has no order, is represented by associating a set of proofs with the proof key in a document.

{
  "@context": "https://w3id.org/identity/v1",
  "title": "Hello World!",
  "proof": [{
    "type": "RsaSignature2018",
    "creator": "https://example.com/i/pat/keys/5",
    "created": "2017-09-23T20:21:34Z",
    "domain": "example.org",
    "nonce": "2bbgh3dgjg2302d-d2b3gi423d42",
    "proofValue": "eyJ0eXAiOiJK...gFWFOEjXk"
  }, {
    "type": "RsaSignature2018",
    "creator": "https://example.com/i/kelly/keys/7f3j",
    "created": "2017-09-23T20:24:12Z",
    "domain": "example.org",
    "nonce": "83jj4hd62j49gk38",
    "proofValue": "eyiOiJJ0eXAK...EjXkgFWFO"
  }]
}
        

Proof Chains

A proof chain is useful when the same data needs to be signed by multiple entities and the order of when the proofs occurred matters, such as in the case of a notary counter-signing a proof that had been created on a document. A proof chain, where order must be preserved, is represented by associating an ordered list of proofs with the proofChain key in a document.

{
  "@context": "https://w3id.org/identity/v1",
  "title": "Hello World!",
  "proofChain": [{
    "type": "RsaSignature2018",
    "creator": "http://example.com/i/pat/keys/5",
    "created": "2017-09-23T20:21:34Z",
    "domain": "example.org",
    "nonce": "2bbgh3dgjg2302d-d2b3gi423d42",
    "proofValue": "eyiOiJKJ0eXA...OEjgFWFXk"
  }, {
    "type": "RsaSignature2018",
    "creator": "http://bank.example.com/notary/keys/7f3j",
    "created": "2017-09-23T20:24:12Z",
    "domain": "example.org",
    "nonce": "83jj4hd62j49gk38",
    "proofValue": "eyiOiJJ0eXAK...EjXkgFWFO"
  }]
}
        

Proof Suites

A Linked Data Proof is designed to be easy to use by developers and therefore strives to minimize the amount of information one has to remember to generate a proof. Often, just the proof suite name (e.g. RsaSignature2018) is required from developers to initiate the creation of a proof. These proof suites are often created or reviewed by people that have the requisite cryptographic training to ensure that safe combinations of cryptographic primitives are used.

This section details the cryptographic primitives that are available to proof suite developers.

At a minimum, a proof suite must have the following attributes:

id
A URL that identifies the proof suite. For example: https://w3id.org/security#RsaSignature2018.
type
The value ProofSuite.
canonicalizationAlgorithm
A URL that identifies the canonicalization algorithm to use on the document. For example: https://w3id.org/security#GCA2015.
digestAlgorithm
A URL that identifies the message digest algorithm to use on the canonicalized document. For example: https://www.ietf.org/assignments/jwa-parameters#SHA256
proofAlgorithm
A URL that identifies the proof algorithm to use on the data to be signed. For example: https://www.ietf.org/assignments/jwa-parameters#RS256

A complete example of a proof suite is shown in the next example:

{
  "id": "https://w3id.org/security#RsaSignature2018",
  "type": "ProofSuite",
  "canonicalizationAlgorithm": "https://w3id.org/security#GCA2015",
  "digestAlgorithm": "https://www.ietf.org/assignments/jwa-parameters#SHA256",
  "proofAlgorithm": "https://www.ietf.org/assignments/jws-parameters#RSASSA-PSS"
}
      

Algorithms

The algorithms defined below are generalized in that they require a specific canonicalization algorithm, message digest algorithm, and proof algorithm to be used to achieve the algorithm's intended outcome.

Proof Algorithm

The proof parameters should be included as headers and values in the data to be signed.

The following algorithm specifies how to create a digital proof that can be later used to verify the authenticity and integrity of a linked data document. A linked data document, document, proof options, options, and a private key, privateKey, are required inputs. The proof options MUST contain an identifier for the public/private key pair, creator, and an ISO8601 combined date and time string, created, containing the current date and time, accurate to at least one second, in Universal Time Code format. A nonce and a domain may also be specified in the options. A signed linked data document is produced as output. Whenever this algorithm encodes strings, it MUST use UTF-8 encoding.

  1. Create a copy of document, hereafter referred to as output.
  2. Generate a canonicalized document by canonicalizing document according to a canonicalization algorithm (e.g. the GCA2015 [[!RDF-DATASET-NORMALIZATION]] algorithm).
  3. Create a value tbs that represents the data to be signed, and set it to the result of running the Create Verify Hash Algorithm, passing the information in options.
  4. Digitally sign tbs using the privateKey and the the digital proof algorithm (e.g. JSON Web Proof using RSASSA-PKCS1-v1_5 algorithm). The resulting string is the proofValue.
  5. Add a proof node to output containing a linked data proof using the appropriate type and proofValue values as well as all of the data in the proof options (e.g. creator, created, and if given, any additional proof options such as nonce and domain).
  6. Return output as the signed linked data document.

Proof Verification Algorithm

This algorithm is highly specific to digital signatures and needs to be generalized to other proof mechanisms such as Equihash.

The following algorithm specifies how to check the authenticity and integrity of a signed linked data document by verifying its digital proof. This algorithm takes a signed linked data document, signed document and outputs a true or false value based on whether or not the digital proof on signed document was verified. Whenever this algorithm encodes strings, it MUST use UTF-8 encoding.

Specify how the public key can be obtained (through some out-of-band process and passed in or it can be retrieved by derefencing its URL identifier, etc.
  1. Get the public key by dereferencing its URL identifier in the proof node of the default graph of signed document. Confirm that the linked data document that describes the public key specifies its owner and that its owner's URL identifier can be dereferenced to reveal a bi-directional link back to the key. Ensure that the key's owner is a trusted entity before proceeding to the next step.
  2. Let document be a copy of signed document.
  3. Remove any proof nodes from the default graph in document and save it as proof.
  4. Generate a canonicalized document by canonicalizing document according to the canonicalization algorithm (e.g. the GCA2015 [[!RDF-DATASET-NORMALIZATION]] algorithm).
  5. Create a value tbv that represents the data to be verified, and set it to the result of running the Create Verify Hash Algorithm, passing the information in proof.
  6. Pass the proofValue, tbv, and the public key to the proof algorithm (e.g. JSON Web Proof using RSASSA-PKCS1-v1_5 algorithm). Return the resulting boolean value.

Create Verify Hash Algorithm

This algorithm is too specific to digital signatures and needs to be generalized for algorithms such as Equihash.

The following algorithm specifies how to create the data that is used to generate or verify a digital proof. It takes a canonicalized linked data document, canonicalized document, canonicalization algorithm, a message digest algorithm, and proof options, input options (by reference). The proof options MUST contain an identifier for the public/private key pair, creator, and an ISO8601 combined date and time string, created, containing the current date and time, accurate to at least one second, in Universal Time Code format. A nonce and a domain may also be specified in the options. Its output is a data that can be used to generate or verify a digital proof (it is usually further hashed as part of the verification or signing process).

  1. Let options be a copy of input options.
  2. If type, id, or proofValue exists in options, remove the entry.
  3. If created does not exist in options, add an entry with a value that is an ISO8601 combined date and time string containing the current date and time accurate to at least one second, in Universal Time Code format. For example: 2017-11-13T20:21:34Z.
  4. Generate output by:
    1. Creating a canonicalized options document by canonicalizing options according to the canonicalization algorithm (e.g. the GCA2015 [[!RDF-DATASET-NORMALIZATION]] algorithm).
    2. Hash canonicalized options document using the message digest algorithm (e.g. SHA-256) and set output to the result.
    3. Hash canonicalized document using the message digest algorithm (e.g. SHA-256) and append it to output.
  5. This last step needs further clarification. Signing implementations usually automatically perform their own integrated hashing of an input message, i.e. signing algorithms are a combination of a raw signing mechanism and a hashing mechanism such as RS256 (RSA + SHA-256). Current implementations of RSA-based Linked Data Proof suites therefore do not perform this last step before passing the data to a signing algorithm as it will be performed internally. The Ed25519Proof2018 algorithm also does not perform this last step -- and, in fact, uses SHA-512 internally. In short, this last step should better communicate that the 64 bytes produced from concatenating the SHA-256 of the canonicalized options with the SHA-256 of the canonicalized document are passed into the signing algorithm with a presumption that the signing algorithm will include hashing of its own.
    Note: It is presumed that the 64-byte output will be used in a signing algorithm that includes its own hashing algorithm, such as RS256 (RSA + SHA-256) or EdDsa (Ed25519 which uses SHA-512).
  6. Return output.

Security Considerations

The following section describes security considerations that developers implementing this specification should be aware of in order to create secure software.

TODO: We need to add a complete list of security considerations.