Semantic Network Analyzer

The Problem

User researchers, product managers, and academics face a common problem: you have text data from multiple groups (parents vs. teachers, customers vs. employees, designers vs. engineers) and you want to understand how they think differently without reading 100+ responses.

Traditional approaches don’t cut it. Reading manually doesn’t scale. Sentiment analysis tells you positive/negative, not how they think. LLMs are easy but lose structure and nuance. What’s missing: a tool that builds semantic networks (word co-occurrence graphs) and compares them across groups. Which words matter most to each group? Which concepts are emphasized vs. buried? How is their mental model different?

I built Semantic Network Analyzer to answer these questions visually.

The Approach

Backend (Python FastAPI) — Accepts uploaded CSV/Excel files (each file = one group perspective). Tokenizes text, removes stopwords, normalizes words (plurals, synonyms). Builds co-occurrence networks (if words appear in the same sentence, edge between them). Computes graph metrics: degree, betweenness, closeness, eigenvector centrality. Clusters the network using multiple algorithms. Returns JSON with network structure and metrics.

Frontend (React + TypeScript) — Renders interactive graph with D3/Vis.js. Two layouts: force-directed (organic) and clustered (grouped by semantic clusters). Color modes: “Emphasis” (which words stand out per group) and “Cluster” (by semantic color). Filterable by perspective, score threshold, cluster, edge weight. Side-by-side comparison view. Export: CSV (network data), PNG (visualization), Excel (detailed report).

Key Decisions & Trade-offs

Co-occurrence vs. semantic embeddings — Chose word co-occurrence (simple graph edges) over embedding distance. Less sophisticated, but more interpretable. Product teams need to understand why a word matters, not just that it’s important. “Revenue” appearing with “growth” is meaningful. Embeddings hide that signal.

Two visualization layouts — Included both force-directed (physics simulation) and clustered (predefined groups) layouts. Some people want organic “here’s how this naturally groups.” Others want clear communities. Both are useful; users toggle between them depending on the question they’re asking.

Emphasis color mode — Red for words emphasized in Group A, green for Group B, orange for balanced. Only shows two groups clearly (red/green), but this tool is comparative. Red/green makes the comparison instant. Yes, it’s a limitation; worth it for clarity.

Manual word unification — Curated word mappings (synonyms, plurals) in JSON config rather than generic NLP lemmatization. Doesn’t scale to 10K words, but “impact” and “influence” are different in everyday speech. Manual curation ensures domain accuracy.

Results & Impact

Users upload Excel files (one per group), configure word mappings, click “Analyze,” and view an interactive network graph with emphasis colors. Click on words to see context (which sentences contain them). Switch between layouts. Filter by cluster, score, edge weight. Export for presentations or further analysis.

Real example insight from usage: parents emphasize “trust,” “safety,” “guidance” — teachers emphasize “curriculum,” “standards,” “assessment.” Same domain (education), completely different mental models — visible immediately in the graph.

What I Learned

Interpretability beats sophistication. Embeddings are mathematically elegant, but co-occurrence is understandable. “These words appear together” is a story people get immediately.

Visualization is part of the algorithm. The color scheme (red/green emphasis) isn’t decoration; it’s the output format. Designing the visual made the analysis clearer.

Domain-specific word mappings matter. Generic NLP lemmatization misses nuance. “Customer,” “client,” “user,” “consumer” are different in business but the same lemma. Manual config gave better results.

Two layouts, same data. Force-directed and clustered are different ways of seeing the same network. Users didn’t pick one; they toggled between them depending on the question.