Building a Knowledge Graph for Compliance: Our Approach
How we structured 25 years of compliance expertise into a knowledge graph with 2.1 million nodes and 3.2 million relationships. The architecture decisions, data model, and why graph databases are ideal for compliance mapping.
Why a Graph?
Compliance is inherently relational. A control in ISO 27001 maps to controls in NIST CSF, SOC 2, PCI DSS, and dozens of other frameworks. Those controls relate to risks, which relate to assets, which relate to business processes. Traditional relational databases can model these relationships, but querying across multiple levels of connection becomes prohibitively slow.
Graph databases model relationships as first-class citizens. Traversing connections:"find all controls in NIST 800-53 that map to this ISO 27001 control and also satisfy this GDPR requirement":is a single query, not a series of expensive JOINs.
Our Data Model
The knowledge graph contains five core node types:
- Frameworks (692): Complete metadata including scope, jurisdiction, publishing body, versioning, and relationships to other frameworks
- Controls (14,190+): Individual control statements with identifiers, descriptions, guidance, and implementation notes
- Domains: Logical groupings of controls within each framework
- Relationships (3.2M): Typed edges including MAPS_TO, PART_OF, SUPERSEDES, COMPLEMENTS, and REQUIRES
- Cross-framework mappings (819K+): Control-to-control relationships classified as full match, partial overlap, or related
The Mapping Process
Building 819,000+ cross-framework mappings required a combination of approaches:
Authoritative mappings: Where standards bodies publish official mappings (e.g., NIST provides CSF-to-800-53 mappings), we use those as the foundation.
Expert analysis: Our team has manually reviewed and validated thousands of control-to-control relationships, drawing on 25 years of consulting experience across these frameworks.
Structured inference: When Framework A maps to Framework B, and Framework B maps to Framework C, we can infer potential mappings between A and C. These inferred mappings are flagged and validated.
AI-assisted classification: For the long tail of less common frameworks, we use natural language processing to identify semantically similar controls and suggest mappings for human review.
Architecture Decisions
We chose Neo4j as our graph database for its mature Cypher query language, strong community support, and proven performance with billion-edge graphs. The API layer sits in front of Neo4j, handling caching, authorisation, and query optimisation.
Frequently Asked Questions
Explore this topic on our compliance platform
Our platform covers 692 compliance frameworks with 819,000+ cross-framework control mappings. Start free, no credit card required.
Try the Platform Free →