Privacy-Preserving Attestation

Published
Author(s)
Enimihil
Tags
#age-verification #credential #jwt #privacy #python #technology

In many places in contemporary society, we need to provide some proofs to third-parties. e.g. Proof of citizenship, proof of legality to work, proof of age, proof of disability, etc.

Often, these are provided by identifying information about an individual that far exceeds the scope of the proof itself. When providing a driver's license to a store clerk to prove one's age to buy alcohol, the store clerk is exposed to your driver's license number (or non-driver ID number), your exact birth date, the class of license you have, whether or not you are an organ donor, and your home address, among other personally identifying information.

None of these are strictly needed for the business to approve the sale, only the information that the purchaser is above the legal age threshold. Ideally, the business could verify only the age threshold, and would not even have access to the legal name, age or birthdate, home address, etc. of the purchaser. (Assuming those aren't required to be recorded at the time of purchase)

There are advantages to the business to not having access to personally identifying information (PII), much of which comes with the regulatory burden of handling it. Most businesses operate in a regulatory environment where any PII comes with requirements to protect it from disclosure and access by unauthorized parties, comes with requirements to disclose how it is used to the individual, and mandatory opt-out or opt-in rules for how it can be shared with third-parties. Avoiding the collection or access to PII would come with significantly less burden to the business, allowing smaller and simpler operations.

This matters even moreso in online services, where (so called) child-protecting rules require age verification for access to services, or require special rules for children under 13 acessing those services.

Current safeguards rely either on terms of service language that is rarely read or followed, or post-hoc enforcement of violations. Many online services explicitly bar anyone under 13 from using them [FILL IN EXAMPLES], but do little (if anything) to actively prevent young children from using their services; they rely on the terms of service to insulate them from the burdens imposed by law, anyone using the service that is under 13 is violating the terms.

These laws intending to protect children from too much online disclosure of their information (without parental involvement) indicate a strong privacy protection for children that disclosure of PII to verify would complicate. If PII of adults is sensitive, the PII of children using a service is even moreso. Avoiding the collection of PII while verifying appropriate controls are in place provides a motivation for minimal attestation.

Pornography access is another concern, it has always been easily accessible via online services, but has always (in every jurisidction I am aware of) been legally restricted to adults. Verification of age in some cases is by proxy, through use of payment methods like credit cards, or simply stating that one is of an appropriate age. Again, relying on terms of service to insulate the provider from liability for their user's deception.

'Adult content' also brings with it a strong privacy interest from the users of a service. Intimiate relations are normally private, and many would not want to be required to disclose PII, or sometimes even to use a psudonym.

Existing Solutions

Government issued identification

In most of the situations where an age check is required today, a government issued identification with a birthdate is used. (At least in the US) As mentioned in the introduction, this reveals a large amount of additional information. As well, it's a static credential that can be forged, as well as stolen. Countermeasures like holographic seals or other anti-forgery mechanisms are common, but expensive. And a sufficiently similar-looking individual can claim to be the pictured person on photo IDs without much difficulty.

This works well-enough for in-person retail transactions, because the worker checking the ID can be observed; if they were to take a photo, make a copy, or write down information from the ID it would be suspicious. The risk of the extra revealed information being retained or misused is minimal (though not completely mitigated).

Also, family member impersonating each other is likely not a large vulnerability in this system as family will often grant favors for each other anyway. e.g. A seventeen-year-old younger sibling may be able to use a twenty-two-year-old's ID to gain entrance to a nightclub; but for purchasing of alcohol, the twenty-two year old is as likely to share the alcohol they can legally purchase as their ID. (Though there are consequences in the law for doing so, this is not an uncommon practice.)

Proxy characteristics

Instead of directly requiring attestation or proof of age, using a proxy characteristic is verifying something (that is presumably easier to verify) which implies the required age threshold. e.g. that a valid credit card payment in their name implies the individual is over the age of majority. (Able to enter into contracts like credit card agreements.) And threfore, is assumed to be over 18.

In person, this can be as simple as appearance based assumptions, like facial hair, wrinkles, age spots, etc. that many people would use to gauge a person's age. These can be unreliable and error prone, but are very quick and cheap to use. Many places that check IDs will allow their workers to exercise their discretion if an individual looks wildly older than the minimum age threshold; both out of a desire to make the business interaction as easy as possible and to speed up the process.

Terms of Service

Many online services are subject to requirements that apply specifically to protecting younger children from accessing or using services, and that impose limits (often untenable) on the services provided to those young people. [EXAMPLE TOS LANGUAGE]

While the company may disclaim liability for specific cases, and require parents to take responsibility for their child's use of inappropriate services, this likely does not absolve them of their responsibilities when such users are later identified, in terms of how the data about those users is used. A more reliable mechanism that proactively prevented those post-hoc enforcement actions and data deletion or removal needs would benefit these service providers if the costs were low enough and the accuracy was high enough.

Self-attestation

A final, easily exploitable system for 'verifying' age thresholds would be to rely on the user's self-reported age. This appears to be what many adult websites used to do (and still do), as well as alcohol brand sites and advertisements. (As advertising to those under the legal drinking age is not allowed in the US) This is presumably done because a requirement exists to have a system of verification, but the consequences of failure to do so accurately are low; possibly due to lack of enforcement power or ability, or simply just the penalties are too minor to justify greater costs of a more accurate system. Again, a cheaper and more accurate system would benefit these companies.

Potential Solutions

Official Paperwork

The most basic approach to an individual proof system for age would identify the specific age thresholds that matter, and have the government issue a credential containing only that information. This would have to include demographic and picture ID information, much like a driver's license in order to be verifiable, as the subject of the credential would need to be able to be matched to the person presenting it. (It does no good to show proof that "Bob P" is over 21 if the verifier cannot determine if the person presenting the credential is that "Bob P")

Example of a simple age-only credential, showing name, height, weight, and a placeholder identification picture. Indication of "Over 13", "Over 18", "Over 21", "Over 55", and "Over 65" with "Yes" and "No", checkmark or x, and colored green or red (respectively).

Example age-only credential. This still requires privacy-sensitive information to be usefully verified against the identity of the individual presenting it.

The privacy leak of a name and photo may not be significant, and the challenges in verification (that of look alikes or forgeries) would not be significantly mitigated. But this does reduce the level of exposure, as information like home address or driver's license status is not revealed.

The lack of an expiration for the credential is also important. A credential like this for age verification would (logically) expire when the credential information would change, which would reveal a birthday or year if done naively. Using assertions that can only ever go from False→True in the credential allows for a credential of any age to be used, even if it is not current. (Unless time travel is invented and it is possible to later in time be younger.)

Signed Credential On Paperwork

Adding to this basic credential a digital signature of the attestation from an authoritative organization would provide a superior level of protection against forgery. (But does not mitigate the other issues)

As an example (to build further onto), using the JWT standards from the IETF ( RFC 7519 ), an authority could generate and sign a JWT which could be embedded into the credential, and verified by the receiver of the credential, assuming they know the authority's public key.

I've used python (3.10.8) software jwcrypto 1.4.2 and qrcode 7.3.1, as well as the zbar software version 0.23.90.

generate_authority_key.py (Source)

#!/usr/bin/env python3
import os

from jwcrypto.jwk import JWK

def main():
    authority_key = JWK.generate(kty='EC', crv='P-256')
    with open('authority_key.pub', 'w') as pubkey:
        pubkey.write(authority_key.export_public())
    with open('authority_key.priv', 'w') as privkey:
        privkey.write(authority_key.export_private())

if __name__ == '__main__':
    main()

generate_signed_jwt.py (Source)

#!/usr/bin/env python3
import os
import time

from jwcrypto.jwk import JWK
from jwcrypto.jwt import JWT


def example_token():
    return JWT(header={"alg": "ES256"},
               claims={
                   "iat": time.time(),
                   "over_13": True,
                   "over_18": True,
                   "over_21": True,
                   "over_55": False,
                   "over_65": False,
                   "name": "Bob P",
                   "height": 69,
                   "weight": 129,
               })


def main():
    with open('authority_key.priv') as privkey:
        privkey = JWK.from_json(privkey.read())

    token = example_token()

    token.make_signed_token(privkey)

    with open('example_credential.jose', 'w') as outfile:
        outfile.write(token.serialize())


if __name__ == '__main__':
    main()

make_svg_qrcode.py (Source)

#!/usr/bin/env python3
import os

import qrcode
import qrcode.image.svg


def main():
    factory = qrcode.image.svg.SvgPathImage

    with open("example_credential.jose", "r") as infile:
        img = qrcode.make(infile.read(), image_factory=factory)

    img.save('example_credential_qrcode.svg')


if __name__ == '__main__':
    main()

read_qrcode_credential.py (Source)

#!/usr/bin/env python3
import os
import subprocess


def main():
    proc = subprocess.run(
        ['zbarimg', '--raw', '-1', '--quiet', 'example_credential_qrcode.svg'],
        capture_output=True)

    with open('example_credential_scanned.jose', 'wb') as outfile:
        outfile.write(proc.stdout)


if __name__ == '__main__':
    main()

verify_signed_jwt.py (Source)

#!/usr/bin/env python3
import os
import json
import pprint

from jwcrypto.jwk import JWK
from jwcrypto.jwt import JWT

printer = pprint.PrettyPrinter(indent=4, width=66)
pformat = printer.pformat

def main():
    with open('example_credential_scanned.jose', 'r') as tokenfile:
        token = JWT.from_jose_token(tokenfile.read())

    with open('authority_key.pub', 'r') as pubkeyfile:
        pubkey = JWK.from_json(pubkeyfile.read())

    token.validate(pubkey)

    print("Header: ", pformat(json.loads(token.header)))
    print("Claims: ", pformat(json.loads(token.claims)))


if __name__ == '__main__':
    main()

This example ignores the important question of designating the authority, and handling expiration, invalidation, and rotation of keys

Running generate_authority_key.py generates the keypair used by the authority to generate the signed JWT containing the age-related information.

generate_signed_jwt.py uses our example "Bob P" to generate a signed JWT.

The contents of the example_credential.jose end up looking like this:

example_credential.jose (Source)

eyJhbGciOiJFUzI1NiJ9.eyJoZWlnaHQiOjY5LCJpYXQiOjE2NjgxMDM4ODguNjgxNTA4LCJuYW1lIjoiQm9iIFAiLCJvdmVyOjY1IjpmYWxzZSwib3Zlcl8xMyI6dHJ1ZSwib3Zlcl8xOCI6dHJ1ZSwib3Zlcl8yMSI6dHJ1ZSwib3Zlcl81NSI6ZmFsc2UsIndlaWdodCI6MTI5fQ.ELBMWtgSfVjChIQ_i6z-UZa-GI5AotnIFcMTcoleElUnz2Uh9qUU2GGDCefnmAieoHpolZK4a2voEISw0_4iog

And running make_svg_qrcode.py results in (after adding a white background to the generated file):

example_credential_qrcode.svg

So the final output of verify_signed_jwt.py which reads the value emitted from the QR code scanner appears as:

Header:  {'alg': 'ES256'}
Claims:  {   'height': 69,
    'iat': 1668106455.4591463,
    'name': 'Bob P',
    'over_13': True,
    'over_18': True,
    'over_21': True,
    'over_55': False,
    'over_65': False,
    'weight': 129}

(A python exception would be thrown if the key was invalid or the signature did not match.)

But the tying of the credential to the individual presenting it remains a privacy-invading challenge, the verifier can validate that the authority has issued this credential with more certainty, due to the signature, but still has to use other mechanisms to check that the individual presenting the credential is the individual the credential refers to.

It may be sufficient to have an online verification system that can check, in real-time, the validity of the issued credential (e.g. if the credential is reported lost or stolen, or is expired/revoked) to provide a stronger guarantee of validity. (A scheme like that standardized as PASSporT RFC 8225)

Online Challenge

Taking the idea of online verification even further, a challenge-response scheme could be used to identify to the attestating party that the presenter of the credential is the correct individual (perhaps by biometric means, by checking a non-privacy preserving credential that can be presented only to the trusted authority, or by authenticating using a username/password/token that has been pre-arranged with the authority)

This last online challenge-response mechanism would be able to be mapped onto existing protocols, like OAuth.

Some terminology:

  • A relying party is the service that wants to verify the age of a user.

  • A user is the person whose age is to be verified by the relying party.

  • A verification provider is the service that the relying party invokes to verify the user's age.

The question then, is who is the verification provider?

One obvious answer is that a government authority would be simple and universal (within a country or municipality that has established such an authority). If, at at national level a law requires checking that an individual is over the age of 16, the national government could establish an authoritative resource that allows citizens to sign-up/use their national id/etc. to prove that to third parties while preserving anonymity.

A governemnt acting as a verification provider is not the only option, however. Any private entity that can take on the liability for correctness could act as such an authority. e.g. A large employer might provide a service to attest to invidual employee's certification to operate certain machinery without then collecting detailed location and work schedule information of their employees or contractors, which would be a liability if leaked externally.

Existing systems of verification, like proxy verification through the use of payment channels only available to over 18 individuals could be used as well.

The method of verifying the user's information is not something that needs to be specified, just like with OAuth or OpenID generally. It could be as simple as logging into a service that knows all your details and that you trust to keep them safe. (Where you do not need to trust the relying party with anything extra, as is the whole point of this.)

Mapping new claims or profile data to OAuth or OpenID in the case where these are used for identification (whether or not psudonymously) is not difficult. The relying part and verification provider need only agree on how the data is represented.

In the situation that a user wants to verify an attribute with a verification provider, but reveal no other information (including a persistent identifier associated with that verification) to the relying party, the verifier would need to generate a unique value for any requested profile fields, every time.

In any of these cases, the relying party acts as an OAuth 2.0 Client, the verification provider acts as the Authorization Server, and the User is the logical Resource Owner (but there is no Resource Server as all metadata is exchanged as part of the interaction)