Electron AI Desktop Apps: Complete Integration Guide

Building AI-powered desktop applications with Electron

Electron AI Desktop Apps: Integration Guide

Electron remains the fastest path from web-stack team to desktop AI app — and several of the most-used AI desktop products ship on it (ChatGPT's desktop app, Claude's desktop app, Cursor among them). The framework's costs (bundle size, RAM) are real but well-understood; this guide focuses on the AI-specific architecture: where API calls live, how to stream into your renderer, and the local-model story.

The architecture rule: AI calls live in the main process

Electron gives you a Node.js main process and Chromium renderer processes. Two non-negotiables for AI apps:

API keys never enter the renderer — it's a web page; anything there is extractable. Keys live in the main process, stored via safeStorage (OS-level encryption), and renderers request completions over IPC.

Renderers get a narrow, typed bridge — contextIsolation: true (default) plus a preload script exposing exactly the AI operations you support, nothing more.

typescript
// main.ts
import { ipcMain, safeStorage } from 'electron';
import OpenAI from 'openai';ipcMain.handle('ai:ask', async (_e, prompt: string) => {
  const client = new OpenAI({ apiKey: loadKey() });   // decrypted via safeStorage
  const resp = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: prompt }],
  });
  return resp.choices[0].message.content;
});

typescript
// preload.ts — the only surface the renderer sees
import { contextBridge, ipcRenderer } from 'electron';
contextBridge.exposeInMainWorld('ai', {
  ask: (prompt: string) => ipcRenderer.invoke('ai:ask', prompt),
  onToken: (cb: (t: string) => void) => ipcRenderer.on('ai:token', (_e, t) => cb(t)),
});

Streaming tokens over IPC

invoke returns once; for token streaming, push events from main to renderer:

typescript
// main.ts
ipcMain.handle('ai:askStream', async (event, prompt: string) => {
  const stream = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: prompt }],
    stream: true,
  });
  for await (const chunk of stream) {
    if (event.sender.isDestroyed()) break;            // window closed — stop paying
    const token = chunk.choices[0]?.delta?.content ?? '';
    if (token) event.sender.send('ai:token', token);
  }
  event.sender.send('ai:done');
});

The renderer subscribes via the preload bridge and appends tokens — same UX as SSE in a web app, no HTTP layer needed. The isDestroyed() check is the desktop equivalent of the disconnect check in our FastAPI streaming recipe: without it, closed windows keep billing you.

Local models: Ollama sidecar or node bindings

Talk to Ollama (recommended): main process calls localhost:11434, detects/launches Ollama as needed (child_process.spawn), streams via the same IPC pattern. Model management UX (which models, pull progress, disk usage) is yours to build — Ollama vs LM Studio vs Jan covers what users expect.

Embed via node-llama-cpp: llama.cpp bindings run GGUF models in-process with GPU support — fully self-contained apps, at the cost of shipping model weights and owning hardware variance. Run inference in a worker thread or utility process so the main process stays responsive.

Electron vs Tauri for AI apps, honestly

ElectronTauri

Team skillsAll JS/TSRust core required Bundle / RAM~100MB+, heavierMBs, lighter RenderingIdentical Chromium everywhereSystem webview (varies per OS) Local inferencenode-llama-cpp / sidecarRust bindings (llama.cpp, Candle) — stronger fit Ecosystem maturityVery deep (updater, crash reporting, signing)Good and growing

Honest rule: all-web-stack team shipping fast → Electron; binary size/RAM as product values, or Rust on the team → Tauri. For AI specifically, Electron's renderer consistency matters more than usual — AI UIs lean on modern CSS/canvas, and debugging webview differences across three OSes is time not spent on the product.

Production notes

Auto-update (electron-updater) is mandatory for AI apps — models, prompts, and provider APIs change monthly; keep app updates and any local-model downloads separate.

Offline behavior: detect connectivity and degrade visibly (queue requests, or fall back to a local model) — desktop users expect apps to work on planes.

Spend visibility: usage accrues per installed user on your key (or theirs) — track per-user token counts and expose them in-app; surprise bills kill desktop AI products.

FAQ

Can I use the Vercel AI SDK in the renderer? UI hooks yes — but route actual provider calls through main-process IPC so keys stay out of the renderer. Don't call providers directly from renderer code even with a user-supplied key, unless you're comfortable with that key living in a web context.

Voice features? whisper.cpp via node bindings (or an Ollama-adjacent server) in the main/utility process; stream transcription results over the same IPC events.

*Last updated: June 2026.*

Also available in 中文.