# AI Churn Prediction Architecture

This document outlines the architecture, machine learning model, database schema, and integration processes for our AI Churn Prediction engine. 

## 1. Machine Learning Overview

We are utilizing a **LightGBM (Light Gradient Boosting Machine)** for our predictive engine (`LGBMClassifier`). 
- **Model Type**: Supervised Binary Classification.
- **Algorithm Details**: `n_estimators=100`, `learning_rate=0.05`, `max_depth=5`.
- **Target Variable**: Predicts if a customer will churn in the next 30 days (1 = Churned/Expired `status=7`, 0 = Active/Retained).
- **Inference Speed**: LightGBM is highly optimized for fast, batch-inference querying on structured tabular data, ensuring that daily prediction pipelines run in seconds rather than hours.

## 2. Database Schema & Tables

The AI pipeline seamlessly integrates with the existing transactional database using 4 dedicated tables specifically built for the Churn Engine:

### A. `churn_feature_store`
Acts as the central repository for pre-computed user features. This ensures ML scripts do not run expensive nested queries against production transactional tables.
* **Core Identifiers**: `sms_cust_id`, `app_id`.
* **Behavior features**: `customer_tenure_days`, `days_since_last_login`, `total_subscriptions`, `active_subscription_count`, `days_until_expiry`, `subscription_gap_days`.
* **Financial features**: `total_revenue`, `avg_order_value`, `failed_payment_count_30d`, `failed_payment_count_90d`, `payment_failure_rate`, `coupon_user`.
* **Device features**: `device_type`, `platform`.

### B. `churn_training_labels`
Stores the ground truth labels across historical observation periods. This allows the model to learn what subscriber metrics "looked like" right before they churned.
* **Columns**: `customer_id`, `app_id`, `churned` (boolean label), `observation_date`, `churn_date`.

### C. `churn_predictions`
The final output table. This acts as the serving layer for the Admin Dashboard and Filament resources.
* **Columns**: `customer_id`, `app_id`, `churn_probability` (0.0 to 1.0), `risk_level` (`low`, `medium`, `high`, `critical`).
* **Explainability**: `top_factors` (JSON object outlining reasons driving the risk score, e.g., "Approaching Expiration", "Low Revenue Engagement").

### D. `churn_model_registry`
A tracking table for auditing ML performance and active model deployments.
* **Columns**: `model_version`, `model_path`, `accuracy`, `precision_score`, `recall_score`, `f1_score`, `auc_roc`, `is_active`, `trained_at`.

## 3. Data Flow & Tech Integration

To prevent degraded performance on the core PHP application, the integration operates asynchronously across **three decoupled stages**:

### Stage 1: Feature Extraction (PHP / Laravel Artisan)
- A Laravel console command (`php artisan churn:sync-features --batch=100`) aggregates raw transactional data from `tbl_customer_subscriptions`, `tbl_order`, and `customer_devices`.
- It calculates tenure, financial trends, and payment failures directly in PHP memory, then bulk inserts (`UPSERT`) the compiled features into the `churn_feature_store`. This process is designed to be sequential and highly resumable.

### Stage 2: Model Training (Python)
- `generate_labels.py` runs historically, traversing the last 6 months to create observation points for who churned vs. stayed.
- `train_model.py` reads the feature store and the historical labels, trains the standard LightGBM algorithm, saves the serialized `.pkl` artifact, and registers its accuracy (`F1 Score`, `AUC`, etc.) inside `churn_model_registry`.

### Stage 3: Daily Prediction Generation (Python)
- A chronologically scheduled script (`predict_churn.py`) sweeps the `churn_feature_store` using the `get_db_connection` mapping. 
- It evaluates current active subscriptions and produces an updated `churn_probability` and assigned `risk_level`. 
- Output is written into `churn_predictions`, immediately propagating the values back to the application.
- The Admin Dashboard (`AiChurnManagement`) simply reads from `churn_predictions` to instantly build charts and list records without performing heavy analytical SQL queries during web page loads, maintaining optimal dashboard response time.
