How I built Mochalc: Designing an AI chat system for SMEs.

March 20, 202618 min readAI/ML

Recent UK research finds that while 56% of businesses using AI see productivity gains, 77% experience no notable revenue change. The main issue is not whether AI works, but if it integrates effectively into actual business workflows.

AI tools are impressive individually, yet businesses operate through specific processes and constraints. For SMEs, adoption is uneven, hinging on context, skills and readiness. Most failures occur because supporting systems are lacking, not due to model weakness.

Mochaic, what it is and why I built it.

My approach was to engineer a scalable, adaptable, and streamlined chatbot equipped with a sophisticated AI framework, enabling SMEs to seamlessly integrate a production-grade chatbot into their operations. It can be used for internal workflows or customer experiences on their website. The main objective is to deliver accurate and timely information to the right users, connecting the power of AI for continuous around-the-clock support.

Mochaic is a small monorepo: a Next.js web app (apps/web/mochaic-chatbot) talks to a FastAPI API (apps/api). The browser never calls the Python service directly with secrets; instead, it posts to a same-origin Next.js route (/api/chat), which forwards the payload to the backend with an API key header. The backend validates that key, applies rate limits and input bounds, and then calls OpenAI and returns the updated conversation. Company-specific copy for the system prompt can live in SQLAlchemy-backed storage (SQLite by default; PostgreSQL via DATABASE_URL).

Repository layout and responsibilities:

Area	Role
apps/web/mochiac-chatbot/	UI, client state, sanitisation, optional local persistence (lib/chat-messages.ts).
app/api/chat/route.ts	Server-side proxy to FastAPI; holds timeout and error mapping.
apps/api/main.py	FastAPI app: routes, middleware, rate limiting, chat orchestration.
apps/api/services.py	OpenAI client usage, system prompt assembly, message sanitisation, retries.
apps/api/schemas.py	Pydantic request/response models (ChatRequest, ChatMessage, config DTOs).
apps/api/auth.py	X-API-Key validation against env-configured keys.
apps/api/database.py	Engine, CompanyConfig model, migration from env when DB is empty.
apps/api/tests/	Pytest integration tests against the FastAPI app.

Frontend: Next.js, React and the chat widget

The widget is a client component ("use client" in ChatWidget.tsx). It owns local React state for messages, input, loading and errors. Messages are typed (ChatMessage: role, content, timestamp, id, messageType). Outbound requests target /api/chat (relative URL), which keeps the api key on the server, not in the bundle.

Rendering safety: assistant/user content passes through SafeHTML(lib/sanatize.tsx) and DOMPurify-style patterns to mitigate XSS. Utilities (lib/utils.ts) prove helpers such as cn, IDs, and clipboard behaviour.

Persistence: lib/chat-messages.ts serialises messages to localStorage under a single STORAGE_KEY, converting things like Date to ISO strings on save and back on load, which is appropriate for client only continuity across refreshes, and not for authoritative server history. Currently, this is optional and can be scalable for specific SME needs if wanted, but certainly necessary in my opinion.

The BFF pattern: app/api/chat/route.ts:

This route handler implements the backend-for-frontend pattern (bff): it reads the backend_api_url, next_public_api_url, attaches x-api-key when mochaic_api_key is set and fetches POST {backend}/api/chat with an AbortController timeout (FRONTEND_TIMEOUT_MS). It maps HTTP status codes to user-facing strings and logs technical detail server-side only, which is a clean separation between operator diagnostics and visitor copy.

Backend: FastAPI application surface:

One startup, lifespan runs init_db() and migrate_from_env_vars() so an empty database can be bootstrapped from environment variables.

Cross-cutting concerns:

CORS from CORS_ORIGIN (comma-separated), with credentials allowed for configured origins.
slowapi rate limiting, i.e. chat requests per minute from env.
Request body size middle, i.e. 10MB cap to reduce DoS surface.
Request ID middleware and structured JSON or text logging.

Chat endpoint (POST /api/chat): depends on verify_api_key because it enforces max message count and per-message length, maps Pydantic messages to plain dicts and calls chat_with_ai. The response is a ChatResponse carrying the full message list (including the new assistant turn), optionally trimmed to a window (for example last 50 messages).

Configuration API: GET/PUT /api/config (and a debug helper!) exposes CompanyConfig for runtime branding and contact details without redeploying prompt text in some setups.

Health: GET/ health reports whether critical env (e.g. OpenAI key, API keys) appears configured which is useful for orchestrators and load balancers.

LLM integration: services.py

get_company_config() (with @lru_cache) resolves company fields from the database with env fallbacks (e.g sales email overridden by SALES_CONTACT_EMAIL).

get_system_prompt() substitues placeholders ({COMPANY_NAME}, {CURRENT_DATE}, etc.) into the default or env-driven OPENAI_SYSTEM_PROMPT.

sanitize_messages() normalises roles, drops empty lines, and strips known filler assistant greetings so the model context stays lean.

chat_with_ai() builds the OpenAI messages array (system message unless already present), reads model and max_tokens from env, and invokes client.chat.completions.create with a retry loop on transient errors (rate limit, connection, timeout). Failures surface as RuntimeError with logging suitable for operators.

Note: today the module may initialise the openAI client at import time and require the openai api key early; operationally, some teams prefer lazy initialisation so /health and non-LLM paths can start without a key.

Data model and configuration:

CompanyConfig stores things like website name, company name, contact channels, description, services (JSON string in DB, list in API) and sales email. What is important to remember is that this can be easily amended to suit the needs of the SME. Migrate_from_env_vars seeds one row when the table is empty and env suggest real configuration; bridging a 12-factor deploy and database-backed multi-tenant-style config later.

Closing thoughts:

Architecturally, Mochaic is a deliberately thin stack, no heavy agent framework, clear separation of concerns (UI to Next BFF to FastAPI to OpenAI) and defence in depth (API key, rate limits, size limits, sanitisation, structured logs). That keeps the system explainable and extendable; whether you add streaming, server-side conversation storage, or RAG later, the seams between layers are already well defined.