Constructing Knowledge Graphs Step-By-Step
While domain knowledge graphs require extensive, ongoing development, their construction involves a set of core technical steps:
- Data Gathering
Relevant datasets are identified and ingested from diverse sources including databases, documents, web sites, academic publications, internal knowledge bases and more.
- Entity Extraction
Natural language processing such as named entity recognition is applied to extract mentions of real-world entities like people, organizations, locations, medical conditions, and more.
Statistical and linguistic techniques detect relationships between entities based on their contextual co-occurrence and semantic patterns within text.
An ontology represents the structure and semantics of a domain. Ontologists collaboratively develop reference ontologies that represent key entities, properties, relationships, constraints, rules, and axioms within the domain.
Extracted entities and relations are synthesized into an ontology-aligned knowledge graph and linked with existing structured data. Gaps are incrementally enriched through machine learning and human curation.
Specialized graph databases called triplestores (supporting RDF triples) provide efficient storage and querying of interconnected entities. Popular options include Neo4j, Stardog, GraphDB and Amazon Neptune.
Graph queries unlock insights through techniques like pathfinding, pattern matching, link prediction and community detection within knowledge graphs.