AI-Powered Search and Autocomplete with Elasticsearch and LLMs

Semantic search, neural reranking, personalized suggestions, and query understanding

Modern search goes far beyond keyword matching. Architecture combining Elasticsearch + AI: 1) Query understanding: NLP pipeline classifying query intent (navigational, informational, transactional), entity recognition, query spelling correction. 2) Hybrid retrieval: Elasticsearch BM25 for exact keyword matches + dense vector search (with kNN or Elasticsearch k-NN) for semantic similarity, merged with Reciprocal Rank Fusion. 3) Neural reranking: CrossEncoder model scoring top-50 candidates, returning top-10. Cohere Rerank API or BGE-Reranker. 4) Personalization: user history, department/role-based result adjustment, A/B test different ranking strategies. 5) Autocomplete: retrieval-augmented completion using recent queries + product names + NLP-based intent prediction. Personalized suggestions based on user history. 6) Query expansion: use LLM to generate 3-5 semantically related terms, run expanded queries, merge results. Significantly improves recall for specialized domains. Elasticsearch setup: index both sparse (text fields with BM25) and dense (vector field with kNN) embeddings. Use ELSER (Elasticsearch sparse retrieval) for out-of-the-box semantic search without separate embedding model. Performance: with HNSW index, kNN search <50ms at p99 for 10M documents. Total query time including reranking target <200ms.

Also available in 中文.