Session 4 · Wednesday, March 18, 2026

Knowledge
Graph Design

From conversations to structure. Entities, facts, triples — and a guest who invented how to tell if two entities are the same person.

WHAT Layer ★ Guest: Jeff Jonas

Today's Session

Agenda

  1. From WHO to WHAT — The Transition
    You've built the social layer. Now we structure what you know. The Knowledge Graph is where opinion becomes fact.
  2. ★ Guest Speaker — Jeff Jonas
    Former IBM Fellow. National Geographic's "Wizard of Big Data." Founder of Senzing. The person who invented entity resolution — the science of figuring out if two records are the same real-world entity.
  3. Entities, Facts, Triples
    The building blocks of knowledge. How to decompose any domain into structured, queryable, composable units.
  4. Building Your 100-Triple Graph
    Hands-on workshop: take your project and build a domain knowledge graph with at least 100 triples.
  5. The "What We Don't Know" Exercise
    The most important graph is the one that maps your ignorance. What's missing? Where are the gaps?

Guest Speaker

Jeff Jonas

Data Scientist · Former IBM Fellow · Founder, Senzing
The creator of entity resolution

Jeff Jonas

The Wizard of Big Data

The Origin Story

In the 1990s, Jeff Jonas built a system called NORA — Non-Obvious Relationship Awareness — for Las Vegas casinos. The problem: clever people using different names, Social Security numbers, and birth dates to avoid detection. His system found the hidden connections that no single database could reveal.

That technology became entity resolution — the science of determining when two records refer to the same real-world entity, despite differences in how they're described. It's now the foundational layer beneath fraud detection, national security, customer intelligence, and every knowledge graph on earth.

Career Highlights

  • Created NORA for casino fraud detection
  • Sold company to IBM (2005)
  • IBM Fellow — led Context Computing
  • Modernized US voter registration with Pew
  • Built Singapore maritime domain awareness
  • Founded Senzing — democratizing entity resolution
  • 14 patents, featured in National Geographic

Why This Matters for You

  • Your graphs will break without entity resolution — is this the same property? Same person? Same company?
  • NORA = awareness refraction — finding non-obvious connections is exactly what the Trinity Graph does
  • Privacy tension — Jeff sits on boards of both intelligence orgs (USGIF) and privacy orgs (EFF, EPIC)
  • Every Inkwell venture needs this — BackyardOne, Block BMOS, Artiquity all face entity resolution problems

📖 Read after class: "To Know Entity Resolution Is To Love ER" — Jeff's primer on why this matters everywhere.

Entity Resolution

The Problem Jeff Solved

Entity resolution answers one question: "Is this the same thing?"

Without Entity Resolution

Record 1: Jon Smith, 123 Main St
Record 2: Jonathan Smith, 123 Main Street
Record 3: J. Smith, 123 Main St, Apt 2

→ System sees 3 different people
→ 3 customer records, 3 invoices, 3 profiles
→ Fraud goes undetected

With Entity Resolution

Record 1: Jon Smith, 123 Main St
Record 2: Jonathan Smith, 123 Main Street
Record 3: J. Smith, 123 Main St, Apt 2

→ System sees 1 person, 3 records
→ Complete picture, one graph node
→ Patterns become visible

💡 For your projects: Is "123 Main St" and "123 Main Street" the same property in BackyardOne? Is "The Weeknd" and "Abel Tesfaye" the same artist in Block BMOS? Is the same collector listed under two gallery names in Artiquity? Entity resolution is the invisible foundation of every knowledge graph.

The Transition

WHO → WHAT

Week 1 was about people and connections.
Week 2 is about what those people know — and what they don't.

Two Graph Layers

Social Graph vs. Knowledge Graph

🟢 Social Graph (WHO)

Nodes: People, organizations, teams

Edges: knows, works_with, reports_to, trusts

Question it answers: Who is connected to whom?

You built this in Week 1.

🔵 Knowledge Graph (WHAT)

Nodes: Entities, concepts, facts, documents

Edges: is_a, has_property, causes, requires, contradicts

Question it answers: What do we know — and how sure are we?

We build this today.

💡 The power comes from connecting them. When a person in your social graph is linked to a fact in your knowledge graph, you know not just WHAT is true but WHO knows it — and how much you trust that source.

The Building Block

What Is a Triple?

Every fact in a knowledge graph is stored as a triple: a subject, a predicate (relationship), and an object.

Subject → Predicate → Object
BackyardOne — solves — LA Property Research Fragmentation
LADBS — publishes — Permit Data
Permit Data — requires — Socrata API Access
BackyardOne — competes_with — ZIMAS Direct Access
LA Developer — needs — Zoning + Permit Cross-Reference

💡 Five triples. Already you can traverse: BackyardOne → solves → fragmentation, and the same developer who needs zoning data also needs permits — which come from LADBS via Socrata. The graph connects what spreadsheets can't.

Taxonomy

Entity Types for Your Project

Every domain has the same core entity categories. Map yours:

👤

Actors

People, orgs, systems that DO things

📦

Assets

Products, data, content, IP

Events

Things that happen with timestamps

📐

Concepts

Abstract ideas, frameworks, categories

📏

Constraints

Rules, regulations, limits, dependencies

Unknowns

Gaps, assumptions, open questions

💡 The Unknowns category is the most important. A graph that only maps what you know is dangerous. A graph that maps what you don't know is strategic.

In Practice

What This Looks Like in Neo4j

Your VanderBot stores every triple in a Neo4j graph database. Here's the Cypher query language:

// Create entities
CREATE (b:Venture {name: "BackyardOne", stage: "PMF"})
CREATE (l:DataSource {name: "LADBS", api: "Socrata"})
CREATE (z:DataSource {name: "ZIMAS", type: "Zoning"})
CREATE (d:Actor {name: "LA Developer", segment: "Target User"})
// Create relationships (triples)
CREATE (b)-[:INGESTS]->(l)
CREATE (b)-[:INGESTS]->(z)
CREATE (d)-[:NEEDS]->(b)
CREATE (l)-[:PUBLISHES]->(p:Asset {name: "Permit Records"})
// Query: What does a developer need?
MATCH (d:Actor)-[:NEEDS]->(v)-[:INGESTS]->(s)
RETURN d.name, v.name, collect(s.name)

You don't need to write Cypher. Your VanderBot does this for you. But understanding the structure helps you ask better questions.

Workshop

Build Your
100-Triple Graph

Workshop — 30 Minutes

The 100-Triple Challenge

Instructions

  1. Open VanderBot and tell it: "I want to build a knowledge graph for [your project]. Start with the core entities."
  2. Map your Actors — Who are the users? Partners? Competitors? Regulators?
  3. Map your Assets — What data, products, or IP exists? What are you building?
  4. Map your Constraints — What regulations, dependencies, or technical limits apply?
  5. Map your Unknowns — What do you not know yet? What assumptions are you making?
  6. Connect everything — The edges matter more than the nodes. How do entities relate?

Target: 100 Triples

  • 20 triples: You've listed some things
  • 50 triples: You have a real structure
  • 100 triples: You have a queryable knowledge base
  • 200+ triples: You have an intelligence system

Good triples:

  • BackyardOne → requires → LADBS API access
  • Permit processing → takes → 4-6 months (LADBS)
  • ADU market → unknown → conversion rate to paid

🎯 Tip: Don't try to be comprehensive. Be specific. "BackyardOne needs data" is useless. "BackyardOne requires Socrata API rate limit of 1000 req/hr" is a real node.

The Hard Part

What We
Don't Know

Risk Analysis

Mapping Your Unknowns

For every project, there are four quadrants of knowledge:

✅ Known Knowns

Facts in your graph. Verified, sourced.

"LADBS publishes permit data via Socrata API"

🔵 Known Unknowns

Questions you know to ask. Gaps you can name.

"We don't know the conversion rate for property data → paid subscription"

🟠 Unknown Knowns

Things you know but haven't structured. Tribal knowledge, intuition.

"The team knows permit expediters use workarounds, but it's not documented"

🔴 Unknown Unknowns

Risks you haven't imagined. This is where failure lives.

"What if LADBS changes their API policy next month?"

💡 Assignment: Your "What We Don't Know" risk analysis is about surfacing the blue and orange quadrants — and imagining the red. The graph should make ignorance visible, not hide it.

Pod Status

Quick Check-In: Where Are You?

Each pod: 2 minutes. Share with the class:

  1. Project: What are you working on?
  2. Graph status: How many nodes/triples have you created?
  3. Biggest unknown: What's the one thing you most need to figure out?
  4. Help needed: Is there a connection, skill, or resource another pod might have?

💡 Listen to other pods. The graph is shared. If BackyardOne's data sources overlap with your project, that's a connection worth making.

Deliverables

Due Before Session 5 (Monday)

📐 Knowledge Architecture Document

2–3 pages. Map your project's entity types, key relationships, and data sources. Include a visual graph diagram (hand-drawn is fine, or export from VanderBot). Submit to Brightspace.

🔴 "What We Don't Know" Risk Analysis

1–2 pages. Four-quadrant analysis for your project. At least 5 Known Unknowns and 3 possible Unknown Unknowns. This is the hard one — and the most valuable.

💬 VanderBot: 100-Triple Milestone

Your project's knowledge graph should hit 100 triples by Monday. Ask your VanderBot: "How many triples do I have?" If you're under 100, keep going.

📚 Read: Survey of Knowledge Graph Embedding Models

Available on Brightspace. Focus on: what are embeddings, why do they matter for search and reasoning, how do they connect to the WHAT IF layer we're building toward.

"A fact without context is trivia.
A fact in a graph is intelligence."

— Session 4 · Knowledge Graph Design