Building AI Recommendation Systems for E-Commerce: Beyond Collaborative Filtering

Modern approaches to personalization that drive conversion and retention

高级约 22 分钟

Building AI Recommendation Systems for E-Commerce: Beyond Collaborative Filtering

Modern approaches to personalization that drive conversion and retention

Learn how to build and deploy production recommendation systems using modern AI techniques—from two-tower neural networks and session-based recommendations to LLM-powered conversational shopping.

AI recommendation systems e-commerce deep learning personalization collaborative filtering

Building AI Recommendation Systems for E-Commerce: Beyond Collaborative Filtering

The Business Impact of Recommendations

Amazon attributes 35% of its revenue to recommendation systems. Netflix saves $1 billion annually by retaining subscribers through personalized content. For e-commerce companies, recommendations drive 26-30% of total revenue.

But most companies are still using simple collaborative filtering from 2010. Modern AI recommendation systems are dramatically more powerful.

Architecture: Two-Stage Retrieval and Ranking

Production recommendation systems use a two-stage architecture:

Stage 1: Candidate Retrieval Goal: Reduce millions of items to hundreds of candidates Speed: Must be < 10ms Method: Approximate nearest neighbor (ANN) search on item embeddings Stage 2: Ranking Goal: Rank hundreds of candidates by predicted conversion probability Speed: Can be < 100ms (larger model) Method: Deep learning ranking model with rich features

This architecture enables both scale (billions of items) and accuracy (complex ranking model)

Stage 1: Two-Tower Neural Network for Retrieval

python
import tensorflow as tf
import tensorflow_recommenders as tfrs
class TwoTowerModel(tfrs.Model):
    """
    Separate towers for user and item representations
    Enables efficient ANN search at inference time
    """
    
    def __init__(self, user_model, item_model, items_dataset):
        super().__init__()
        self.user_tower = user_model
        self.item_tower = item_model
        
        # Retrieval task
        self.task = tfrs.tasks.Retrieval(
            metrics=tfrs.metrics.FactorizedTopK(
                candidates=items_dataset.batch(128).map(item_model)
            )
        )
    
    def compute_loss(self, features, training=False):
        user_embeddings = self.user_tower(features["user_id"])
        item_embeddings = self.item_tower(features["item_id"])
        return self.task(user_embeddings, item_embeddings)
User model - encodes user history and features
user_model = tf.keras.Sequential([
    tf.keras.layers.StringLookup(vocabulary=unique_user_ids, mask_token=None),
    tf.keras.layers.Embedding(len(unique_user_ids) + 1, 64),
    tf.keras.layers.Dense(32, activation='relu')
])
Item model - encodes item features
item_model = tf.keras.Sequential([
    tf.keras.layers.StringLookup(vocabulary=unique_item_ids, mask_token=None),
    tf.keras.layers.Embedding(len(unique_item_ids) + 1, 64),
    tf.keras.layers.Dense(32, activation='relu')
])model = TwoTowerModel(user_model, item_model, items_dataset)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
model.fit(cached_train, epochs=3)

Stage 2: Deep Learning Ranking

python
import lightgbm as lgb
from sklearn.preprocessing import LabelEncoderdef train_ranking_model(interactions_df: pd.DataFrame) -> lgb.Booster:
    """
    LambdaRank model for Learning-to-Rank
    Optimizes for NDCG (ranking quality metric)
    """
    features = [
        # User features
        'user_age_days', 'user_total_orders', 'user_avg_order_value',
        'user_preferred_category', 'user_price_sensitivity',
        
        # Item features
        'item_price', 'item_category', 'item_rating', 'item_review_count',
        'item_inventory_level', 'item_days_since_launch',
        
        # Interaction features (cross features)
        'user_item_category_affinity', 'user_item_price_match',
        'user_viewed_similar', 'user_cart_abandonment_similar',
        
        # Context features
        'hour_of_day', 'day_of_week', 'platform', 'search_query_match'
    ]
    
    # LambdaRank - optimizes ranking directly
    train_data = lgb.Dataset(
        interactions_df[features],
        label=interactions_df['clicked'],
        group=interactions_df.groupby('query_id').size().values
    )
    
    params = {
        'objective': 'lambdarank',
        'metric': 'ndcg',
        'eval_at': [5, 10],
        'num_leaves': 63,
        'learning_rate': 0.05
    }
    
    model = lgb.train(params, train_data, num_boost_round=500)
    return model

Session-Based Recommendations

python
GRU4Rec - session-based recommendations without user history
import torch
import torch.nn as nn
class GRU4Rec(nn.Module):
    """
    Predicts next item in session based on current session sequence
    Works for anonymous users and new users (cold start problem solved)
    """
    def __init__(self, num_items: int, hidden_size: int = 100, num_layers: int = 1):
        super().__init__()
        self.embedding = nn.Embedding(num_items + 1, hidden_size, padding_idx=0)
        self.gru = nn.GRU(hidden_size, hidden_size, num_layers, batch_first=True)
        self.output_layer = nn.Linear(hidden_size, num_items)
    
    def forward(self, session_items: torch.Tensor) -> torch.Tensor:
        # session_items: [batch_size, session_length]
        embedded = self.embedding(session_items)
        gru_output, _ = self.gru(embedded)
        last_hidden = gru_output[:, -1, :]  # Last item's hidden state
        scores = self.output_layer(last_hidden)
        return scores  # Probability over all items
For new user: recommend based on current session behavior
Works in first session, no history required

LLM-Powered Conversational Recommendations

python
class ConversationalShoppingAssistant:
    def __init__(self, product_catalog: list, vector_store):
        self.catalog = product_catalog
        self.vector_store = vector_store
        self.client = anthropic.Anthropic()
        self.conversation_history = []
    
    def recommend(self, user_message: str, user_profile: dict) -> str:
        # Search for relevant products
        relevant_products = self.vector_store.search(user_message, top_k=20)
        
        # Format context
        product_context = "
".join([
            f"- {p['name']}: {p['description']} (${p['price']}) - {p['rating']} stars"
            for p in relevant_products[:10]
        ])
        
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })
        
        response = self.client.messages.create(
            model="claude-opus-4-5",
            max_tokens=1000,
            system=f"""You are a helpful shopping assistant. 
User profile: {user_profile['preferences']}
Budget: up to ${user_profile['budget']}
Available products matching their request:
{product_context}
Provide personalized recommendations with clear reasoning.""",
            messages=self.conversation_history
        )
        
        assistant_message = response.content[0].text
        self.conversation_history.append({
            "role": "assistant",
            "content": assistant_message
        })
        
        return assistant_message
Results in 40% higher click-through vs traditional recommendations

A/B Testing Recommendation Systems

python
Bandits for recommendation exploration
from vowpalwabbit import pyvwdef contextual_bandit_recommendations(user_features: dict, item_pool: list) -> list:
    """
    Contextual bandits balance exploration (trying new items) 
    with exploitation (showing proven items)
    
    Result: Continuously improving recommendations without formal A/B tests
    """
    # ε-greedy: explore 10% of the time
    if random.random() < 0.10:
        return random.sample(item_pool, 10)  # Explore
    else:
        return top_ranked_items(user_features, item_pool, n=10)  # Exploit

Production Deployment Considerations

ComponentTechnologyScale

Embedding storePinecone / WeaviateBillions of items Feature storeFeast / TectonReal-time features Model servingTFServing / Triton< 10ms latency A/B testingOptimizely / internalTraffic split MonitoringEvidently AIData drift detection

Key Takeaways

Two-tower + ranking is the production standard for large-scale recommendations

Session-based models solve the cold-start problem for new users

LLM conversational recommendations convert 40% better for complex queries

Contextual bandits continuously optimize without full A/B test cycles

Feature freshness matters—real-time features outperform batch-computed features

Getting Started

Learn how to get started with this application.

Learn more

Installation Guide

Building AI Recommendation Systems for E-Commerce: Beyond Collaborative Filtering

Building AI Recommendation Systems for E-Commerce: Beyond Collaborative Filtering

The Business Impact of Recommendations

Architecture: Two-Stage Retrieval and Ranking

Stage 1: Two-Tower Neural Network for Retrieval

User model - encodes user history and features

Item model - encodes item features

Stage 2: Deep Learning Ranking

Session-Based Recommendations

GRU4Rec - session-based recommendations without user history

For new user: recommend based on current session behavior

Works in first session, no history required

LLM-Powered Conversational Recommendations

Results in 40% higher click-through vs traditional recommendations

A/B Testing Recommendation Systems

Bandits for recommendation exploration

Production Deployment Considerations

Key Takeaways

Documentation

Getting Started

Learn more