Build a Personal AI Assistant – Part 3: Memory, Context Windows, and Semantic Recall

📚 Build a Personal AI Assistant
Ad Space
Build a Personal AI Assistant – Part 3: Memory, Context Windows, and Semantic Recall
Real assistants remember what mattered last week. In Lumenly’s pilot, customers quickly lost trust when the assistant forgot custom pricing rules. This lesson gives your assistant durable memory: token-aware working context, episodic storage in SQLite, and semantic search using embeddings.
Architecture for Memory Layers
We’ll introduce three modules:
- WorkingMemory: trims conversation history based on tokens.
- EpisodicStore: persists full transcripts with metadata.
- SemanticMemory: stores summaries + embeddings for retrieval.
Data flow per turn:
User turn ─► WorkingMemory ─► LLM
│ │
│ ├─► Summary ─► SemanticMemory
│ └─► Transcript ─► EpisodicStore
└─► Retrieval ────────────────────────┘
Takeaway: Think of memory as a pipeline, not a single database.
Working Memory with Token Budgets
Add tiktoken bindings:
npm install tiktoken-node
src/core/workingMemory.ts:
import { encoding_for_model } from "tiktoken-node";
import { Message } from "./types";
export class WorkingMemory {
private history: Message[] = [];
private encoder = encoding_for_model("gpt-4o-mini");
constructor(private readonly maxTokens = 1200) {}
append(message: Message) {
this.history.push(message);
this.trim();
}
get() {
return [...this.history];
}
private trim() {
let tokens = 0;
const reversed: Message[] = [];
for (let i = this.history.length - 1; i >= 0; i -= 1) {
const msg = this.history[i];
const cost = this.encoder.encode(msg.content).length;
if (tokens + cost > this.maxTokens) break;
reversed.push(msg);
tokens += cost;
}
this.history = reversed.reverse();
}
}
Swap this into Assistant to replace the old conversation store.
Episodic Storage with SQLite
Install better-sqlite3:
npm install better-sqlite3
src/memory/episodic.ts:
import Database from "better-sqlite3";
import { Message } from "../core/types";
import { nanoid } from "nanoid";
export class EpisodicStore {
private db = new Database("data/assistant.db");
private episodeId = nanoid();
private turn = 0;
constructor() {
this.db
.prepare(
`CREATE TABLE IF NOT EXISTS turns(
episode_id TEXT,
turn INTEGER,
role TEXT,
content TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
)`,
)
.run();
}
log(message: Message) {
this.db
.prepare(
`INSERT INTO turns(episode_id, turn, role, content) VALUES (?, ?, ?, ?)`,
)
.run(this.episodeId, this.turn++, message.role, message.content);
}
transcript(episodeId = this.episodeId) {
return this.db
.prepare(`SELECT role, content FROM turns WHERE episode_id=? ORDER BY turn`)
.all(episodeId);
}
}
Call episodic.log() inside Assistant.send for both user and assistant messages.
Semantic Memory and Summaries
Install langchain’s embedding utilities or keep using OpenAI. We’ll use OpenAI embeddings:
// src/memory/semantic.ts
import OpenAI from "openai";
import fs from "node:fs";
import path from "node:path";
import { Message } from "../core/types";
const client = new OpenAI();
const storeDir = path.join("data", "semantic");
fs.mkdirSync(storeDir, { recursive: true });
export async function summarizeTurn(messages: Message[]): Promise<string> {
const resp = await client.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [
{ role: "system", content: "Summarize in <=80 words focusing on commitments." },
...messages,
],
});
return resp.choices[0].message.content ?? "";
}
export async function storeSummary(episodeId: string, summary: string) {
const embedding = await client.embeddings.create({
model: "text-embedding-3-small",
input: summary,
});
const payload = {
episodeId,
summary,
vector: embedding.data[0].embedding,
};
const file = path.join(storeDir, `${episodeId}-${Date.now()}.json`);
fs.writeFileSync(file, JSON.stringify(payload));
}
export async function searchMemories(query: string, topK = 3) {
const qVec = await client.embeddings.create({
model: "text-embedding-3-small",
input: query,
});
const queryVec = qVec.data[0].embedding;
return fs
.readdirSync(storeDir)
.map((file) => JSON.parse(fs.readFileSync(path.join(storeDir, file), "utf-8")))
.map((entry) => ({
summary: entry.summary,
score: cosine(entry.vector, queryVec),
}))
.sort((a, b) => b.score - a.score)
.slice(0, topK);
}
function cosine(a: number[], b: number[]) {
const dot = a.reduce((acc, v, i) => acc + v * b[i], 0);
const magA = Math.sqrt(a.reduce((acc, v) => acc + v * v, 0));
const magB = Math.sqrt(b.reduce((acc, v) => acc + v * v, 0));
return dot / (magA * magB);
}
Integrate:
const summary = await summarizeTurn([message, assistantMessage]);
await storeSummary(this.episodicId, summary);
Takeaway: Summaries create bite-sized vectors and keep storage costs sane.
Retrieval Middleware
Before sending a user message to the LLM, fetch semantically similar summaries:
const memories = await searchMemories(message.content);
const memoryMessages = memories.map((mem) => ({
role: "system" as const,
content: `Relevant past insight: ${mem.summary}`,
}));
const combinedHistory = [...memoryMessages, ...this.workingMemory.get(), message];
If you want to avoid duplication, dedupe summaries by hashing content.
Takeaway: Inject relevant memories as system messages so the model treats them as authoritative context.
CLI Commands for Memory Ops
Extend cli.ts:
program
.command("memories")
.argument("<query>", "search term")
.action(async (query) => {
const hits = await searchMemories(query);
hits.forEach((hit) =>
console.log(`${hit.score.toFixed(2)} ▸ ${hit.summary}`),
);
});
program
.command("transcript")
.option("--episode <id>")
.action((opts) => {
const store = new EpisodicStore();
store.transcript(opts.episode).forEach((row) =>
console.log(`[${row.role}] ${row.content}`),
);
});
Takeaway: Memory inspection commands give humans confidence before production.
Testing Memory Behavior
Add tests:
import { describe, it, expect } from "vitest";
import { WorkingMemory } from "../src/core/workingMemory";
describe("WorkingMemory", () => {
it("respects token budgets", () => {
const wm = new WorkingMemory(50);
for (let i = 0; i < 20; i += 1) {
wm.append({ role: "user", content: `message ${"!" .repeat(i)}` });
}
expect(wm.get().length).toBeLessThan(20);
});
});
For semantic memory, mock embeddings by stubbing client.embeddings.create. Integration tests can run nightly with the real API.
Verification Checklist
npm run cli:chatretains context beyond 30 turns without exceeding token limits.data/assistant.dbcontains transcripts.data/semantic/*.jsonstores summary vectors.npm run cli memories "pricing"returns relevant summaries.npm run test -- --runInBandcovers working memory logic.- Observability (
metrics.ndjson) now includes new metrics (e.g.,memory.summary_tokens).
Once these pass, your assistant can remember what matters—ready for API/tool integrations in Part 4.
Ad Space
Recommended Tools & Resources
* This section contains affiliate links. We may earn a commission when you purchase through these links at no additional cost to you.
📚 Featured AI Books
OpenAI API
AI PlatformAccess GPT-4 and other powerful AI models for your agent development.
LangChain Plus
FrameworkAdvanced framework for building applications with large language models.
Pinecone Vector Database
DatabaseHigh-performance vector database for AI applications and semantic search.
AI Agent Development Course
EducationComplete course on building production-ready AI agents from scratch.
💡 Pro Tip
Start with the free tiers of these tools to experiment, then upgrade as your AI agent projects grow. Most successful developers use a combination of 2-3 core tools rather than trying everything at once.
📚 Build a Personal AI Assistant
🚀 Join the AgentForge Community
Get weekly insights, tutorials, and the latest AI agent developments delivered to your inbox.
No spam, ever. Unsubscribe at any time.



