Core concepts¶
This page explains the fundamental concepts that form the foundation of TrustDeck: domains, pseudonyms, algorithms, and projects. These concepts define the data model and operational behavior of the pseudonymization system.
For details on API usage, see the Swagger UI.
For security and authorization, see Authentication (OIDC/JWT).
Overview¶
TrustDeck organizes pseudonymization around five core concepts:
| Concept | Purpose | Key code entities (examples) |
|---|---|---|
| Domains | Configuration units that define pseudonymization rules and form hierarchical structures | Domain, DomainDTO, DomainDBAccessService, DomainRESTController |
| Pseudonyms | Identifier-to-pseudonym mappings scoped to specific domains | Pseudonym, PseudonymDTO, PseudonymDBAccessService, PseudonymRESTController |
| Algorithms | Technical parameters for pseudonym generation (algorithm type, alphabet, length, etc.) | Algorithm, AlgorithmDTO, PseudonymizationFactory, Pseudonymizer |
| Projects | Entity management containers | Project, ProjectDTO, ProjectDBService, ProjectRESTController |
| Entities | Representations of real life persons, samples, objects, ... | EntityType, EntityInstance, EntityTypeRESTController, EntityInstanceRESTController, EntityTypeDTO, EntityInstanceDTO, EntityTypeDBService, EntityInstanceDBService |
Domain-pseudonym-algorithm relationship (high level)¶
A domain is a cluster for semntically similar pseudonyms and defines some ground rules for managing the pseudonyms.
A pseudonym is created within exactly one domain and stores the mapping for a specific identifier (+ type).
An algorithm describes how the pseudonym is generated (and with which parameters) for that domain.
In practice:
- Domains reference an algorithm configuration.
- Pseudonyms are created/read within a domain context.
- Algorithm choice and parameters can be inherited from the parent domain.
Domains¶
A domain is a configuration unit that defines how pseudonyms are managed. Each domain encapsulates:
- Validity period (start/end dates)
- Enforcement rules (whether to enforce validity constraints)
- Prefix (prepended to generated pseudonyms)
- Hierarchical relationship (optional parent domain)
Domain hierarchy¶
Domains form a tree structure. A domain may have a parent domain (superdomain) and can inherit configuration values from that parent.
Inheritance mechanism¶
When a domain is created with a parent, it can inherit properties. Each inheritable property has a corresponding *Inherited boolean flag.
| Property | Inherited flag | Default if not specified |
|---|---|---|
| validFrom | validFromInherited | Parent’s validFrom or now |
| validTo | validToInherited | Parent’s validTo or derived from validityTime |
| enforceStartDateValidity | enforceStartDateValidityInherited | Parent’s value or true |
| enforceEndDateValidity | enforceEndDateValidityInherited | Parent’s value or true |
| algorithm | algorithmInherited | Parent’s algorithm or RANDOM |
| alphabet | alphabetInherited | Parent’s alphabet or A–Z |
| pseudonymLength | pseudonymLengthInherited | Parent’s length or 16 |
| paddingCharacter | paddingCharacterInherited | Parent’s character or "0" |
| addCheckDigit | addCheckDigitInherited | Parent’s value or true |
| lengthIncludesCheckDigit | lengthIncludesCheckDigitInherited | Parent’s value or false |
| multiplePsnAllowed | multiplePsnAllowedInherited | Parent’s value or false |
Domain creation¶
Domains are created through REST endpoints. Typical patterns:
- Standard domain creation: simplified creation with essential properties
- Complete domain creation: specify all domain properties explicitly
Domains apply defaults if values are not provided (e.g., default algorithm, alphabet, pseudonym length, etc.).
Pseudonyms¶
A pseudonym is a mapping between an identifier and a generated pseudonym string, always scoped to a specific domain.
A pseudonym record typically contains:
- Identifier (identifier + idType)
- Pseudonym (generally generated by TrustDeck, but can also be given)
- Validity period (validFrom, validTo)
- Inheritance flags (whether validity was inherited from domain)
- Domain reference (domainId)
Identifier structure¶
Identifiers are represented as a combination of:
identifier(the actual identifying string)idType(the identifier type, e.g., EHR_ID, SSN, INSURANCE_ID)
Example (conceptual):
identifier: "123456"
idType: "PATIENT_ID"
Pseudonym validity and inheritance¶
Pseudonyms inherit validity periods from their domain unless explicitly specified. When the respective flags are set in the domain, these validity times can be automatically enforced to not be before/after those from the domain. Typical behavior:
-
validFrom:
- uses provided value if given (subject to enforcement rules)
- otherwise inherits from domain
-
validTo:
- can be provided directly or derived from a validityTime parameter
- may be capped by domain validity if enforcement is enabled
-
inheritance flags:
- set to true when value was inherited from domain
- set to false when explicitly provided
Example (conceptual):
- Domain validTo = 2035-01-01 (enforced)
- Pseudonym request validTo = 2036-01-01
- Result validTo may be capped to 2035-01-01
Multiple pseudonyms per identifier¶
The domain property multiplePsnAllowed controls whether multiple pseudonyms can exist for the same identifier + idType in the same domain:
- false (default): enforce 1:1 mapping
- true: allow 1:n mappings
This can be helpful when, for example, a patient has multiple x-ray images of the same modality and you want to generate a unique pseudonym for each image.
Batch pseudonym operations¶
TrustDeck supports batch creation of pseudonyms via dedicated endpoints (see Swagger UI). Batch operations typically return a per-item status such as:
- INSERTION_SUCCESS
- INSERTION_DUPLICATE_IDENTIFIER
- INSERTION_DUPLICATE_PSEUDONYM
- INSERTION_ERROR
Cross-domain pseudonym linking¶
TrustDeck can support linking pseudonyms across domains by traversing the domain hierarchy and matching identifier/pseudonym relationships along a path. This enables retrieving a pseudonym linked to another one in the same domain tree but on a different level.
- Input would be a starting domain as well as an identifier + idType or Pseudonym
- A target domain is specified by the user
- The endpoint will then try to find a pseudonym that is chained to the given one in the target domain
Example:
- Input:
- identifier:
123456 - idType:
EHR_ID - sourceDomain:
Domain - targetDomain:
GrandChildDomain
- identifier:
- Assumed domain structure:
Domain>ChildDomain>GrandChildDomain - Example pseudonym chain:
- Domain
Domain:- identifier:
123456 - idType:
EHR_ID - Pseudonym:
D-abcd
- identifier:
- Domain
ChildDomain:- identifier:
D-abcd - idType:
Domain_PSN - Pseudonym:
CD-1234
- identifier:
- Domain
GrandChildDomain:- identifier:
CD-1234 - idType:
ChildDomain_PSN - Pseudonym:
GCD-a1b2
- identifier:
- Domain
- Output:
- identifier:
CD-1234 - idType:
ChildDomain_PSN - Pseudonym:
GCD-a1b2
- identifier:
Pseudonymization algorithms¶
An algorithm defines the technical rules for generating pseudonyms. TrustDeck supports multiple algorithm types, for example:
| Algorithm | Characteristics |
|---|---|
| MD5 | Cryptographic (but broken) hash |
| SHA1/2/3 | Cryptographic hash |
| BLAKE3 | Cryptographic hash |
| xxHASH | Fast non-cryptographic hash |
| RANDOM | Random generation from a configurable alphabet |
| CONSECUTIVE | Sequential numbering |
Algorithm parameters (examples)¶
Common parameters include:
- Alphabet (for random-style generation or check digit calculation)
- Pseudonym length (how long the output pseudonym shoud be, excl. potential prefix)
- Salt (for hashing algorithms)
- Padding character
- Check digit settings (whether enabled, and whether it counts toward length)
Self configuration of algorithms¶
Often, users want the shortest possible pseudonyms for their use case since shorter strings are less error prone when manually handling them.
When selecting a randomness-based algorithm, the user can let the algorithm configure itself. For that, the user provides an estimated number of pseudonyms that should be available in the domain (e.g., 100 million for pseudonymizing persons in a large country). The user also defines a probability with which the pseudonymization should be successful as potential collisions might arise when generating a large number of random pseudonyms. Lastly, the user defines an alphabet that is to be used to generate the random strings from.
The algorithm will then calculate the minimum length required to guarantee the given settings. This guarantees then that the generated pseudonyms are only as long as they need to be and not longer.
Check digits (optional)¶
TrustDeck can apply a check digit (based on Luhn mod n) depending on domain configuration. This can improve detection of transcription errors when pseudonyms are manually handled.
Projects¶
A project is a container for entity management in the KING module. Projects provide:
- Organizational boundaries for entities
- Metadata (name, abbreviation, start/end dates)
- Configuration flags (e.g., whether it is used to store entities or pseudonyms)
- Access control via Keycloak roles/groups (depending on deployment)
Project–domain relationship¶
Projects and domains are separate concepts:
- Domains: pseudonymization containers
- Projects: entity management containers
Projects may reference domains when a project is configured to store pseudonyms.
Entities¶
Entities are part of the KING module and represent real-world objects you want to register and track (e.g., persons, biosamples, devices, documents). TrustDeck distinguishes between:
- Entity types: the schema/blueprint (what fields an entity has, validation rules, semantics)
- Entity instances: the actual stored records (data that conforms to an entity type)
In other words: EntityType = definition, EntityInstance = data.
Entity types¶
An entity type defines:
- a stable type name / identifier (e.g.,
Patient,Sample,StudySubject) - a schema describing the allowed payload fields and constraints
- metadata (description, versioning information, etc., depending on your setup)
Think of an entity type as a JSON Schema-like contract that describes what “valid entity data” looks like.
To minimize having to recreate the same kind of entities over and over again, TrustDeck distinguishes two kinds of entity types: base types and project-specific entity types.
Base types can only be created by authorized personnel, such as administrators or PIs. Base types can e.g. be a person entity, a biosample entity, or a device entity. Base types should not define too many atributes as perojects might not need some of them. When a project now wants to use a person entity, it can define its project-specific person entity by extending the base type and add attributes that are specific to the project.
Entity instances¶
An entity instance is a concrete record of a specific entity, created under a project and associated with a chosen entity type.
An entity instance typically contains:
- an instance identifier (unique ID, can be generated by TrustDeck)
- a reference to the entity type (which schema it follows)
- the payload/data (a JSON matching the type schema)
- lifecycle metadata (created/updated timestamps, status flags such as active/deleted, depending on implementation)
- optional links to pseudonyms/domains (depending on project configuration)
Projects and entities¶
Entity instances live inside projects. Projects act as containers and boundaries for:
- access control (who may read/write entities in that project)
- organization (grouping of related entities)
- optional storage behavior (whether to store entities and/or associated pseudonyms)
Projects and domains are separate concepts:
- Projects organize entities.
- Domains organize pseudonym mappings.
A project may be configured to store pseudonyms, which can be used to attach pseudonyms to entity instances or to derive pseudonyms during ingestion (depending on your workflow).
Example workflow (conceptual)¶
-
Create a project:
MyStudyProject
-
Define an entity type (schema/blueprint):
- Create
EntityTypenamedPatientbased onBasePerson - Include fields like
patientID,caseID,mainDiagnosis,admissionDate, etc.
- Create
-
Register entity instances in the project:
- Create instance of type
Patientwith payload (JSON) - Retrieve/list instances for downstream processing
- Create instance of type
Example payload (conceptual):
{
...
"patientID": "PAT-123456",
"caseID": "A4B3C2D1",
"mainDiagnosis": "ICD10GM-M54.5",
"admissionDate": "2026-01-01"
}