System Capabilities

PRSense is built on a layered architecture designed to give your repository a persistent memory.

Layer 1: Memory (Storage & Ingestion)

These components are responsible for capturing and storing the “state” of your engineering history.

1. Persistent Memory Store (Storage)

What it does: Stores the vectorized meaning of every PR, Issue, and Decision.
Tech: Supports PostgreSQL (Production) with pgvector for finding semantic similarities in milliseconds, and SQLite (Development) for easy setup.
Why it matters: Unlike a linter that runs and dies, this memory persists forever.

2. Semantic Indexing (Embeddings)

What it does: Converts code changes (Diffs) and natural language (Descriptions) into mathematical vectors.
Privacy First:
- OpenAI: Best for accuracy (95%).
- ONNX (Local): Runs 100% offline on your machine. No code ever leaves your server.
Why it matters: Enables finding “similar intent” even if the words are completely different.

3. Cross-Repo Awareness

What it does: Connects the memory of multiple repositories (e.g., frontend, backend, microservices).
Why it matters: Detects when a change in one repo contradicts or duplicates work in another.

Layer 2: Recall (Intelligence & Search)

These components allow humans and agents to query the memory.

4. Duplicate Detection

What it does: Automatically flags incoming PRs that look like previous work.
Precision: Uses a multi-stage funnel (Bloom Filter -> Vector Search -> Reranking) to ensure <5% false positives.

5. Explainable “Why” (Score Breakdown)

What it does: It doesn’t just say “Duplicate”. It proves it.
Output: “92% Similarity: Text matches ‘Fix login’ (0.95), File overlap ‘auth.ts’ (0.80)”.
Why it matters: Builds trust with engineers. Black boxes get ignored; Explainable AI gets adopted.

6. Semantic Search API

What it does: A natural language interface for your codebase history.
Query: “Have we ever properly fixed the race condition in the payment webhook?”
Result: “Yes, see PR #402 and PR #891.”

Layer 3: Operations (Performance & Integration)

7. The Bloom Filter Guard

What it does: A probabilistic data structure that instantly rejects (in 2ms) any unique PRs.
Why it matters: Ensures the system adds zero latency to 90% of your CI/CD runs.

8. Batch Processing

What it does: Allows backfilling history (indexing the last 5 years of PRs) in minutes.
Why it matters: Day 1 value. You don’t have to wait for new data; you learn from the past immediately.

Advanced Configuration

Fine-tune the system for your specific needs.

9. Embedding Cache

What it does: Caches embeddings to avoid re-computing identical PRs.
Why it matters: Reduces OpenAI API costs by 90% and speeds up indexing.

10. Configurable Weights

What it does: Tune the importance of Text vs. Code Diff vs. File Paths.
Why it matters: Customize detection behavior (e.g., “Ignore descriptions, focus only on code”).

11. Dry-Run Mode

What it does: Simulate detection without saving to the database.
Why it matters: Safely test configuration changes in CI/CD before deploying.

Layer 4: Application & Workflows (v1.1.0+)

Built on top of the Repository Memory, these workflows automate engineering intelligence.

12. Knowledge Graph

What it does: Maps the relationships between Authors, Files, and PRs over time.
Why it matters: Allows you to instantly query “Who owns this file?” or “What parts of the codebase does this author usually touch?” without scraping git blames.

13. AI-Powered PR Descriptions (Local Context)

What it does: Auto-generates PR descriptions based on Diff heuristics and local embedded search of similar historical PRs.
Why it matters: Better descriptions without sending your proprietary code to a 3rd party LLM.

14. Custom Rules Engine

What it does: Allows defining YAML/JSON rules to block or warn on PRs (e.g. require security team review for auth/*).
Why it matters: Moves from “detection” to “enforcement” natively in the PR lifecycle.

15. Stale PR Detection

What it does: Automatically flags inactive PRs based on customizable thresholds.
Why it matters: Keeps the repository clean and ensures reviews don’t slip through the cracks.

16. Smart Triage & Auto-Labeling

What it does: Classifies incoming PRs into categories (bug, feature, refactor, docs, etc.) with confidence scores, and suggests reviewers based on file ownership history.
Why it matters: Saves maintainers 5-10 minutes per PR on manual triage. Labels applied automatically via webhook.

17. Impact Scoring

What it does: Calculates a risk score (0-100) for each PR based on factors like files changed, lines modified, blast radius, and author experience.
Why it matters: Surfaces high-risk PRs that need extra review, preventing production incidents.

18. Multi-Provider Support

What it does: Full provider abstraction supporting GitHub and GitLab out of the box.
Why it matters: PRSense works wherever your team hosts code — not locked to GitHub.

19. Notification System

What it does: Sends real-time alerts to Slack and Discord when duplicates, high-risk PRs, or rule violations are detected.
Why it matters: Teams get notified in their existing workflow tools, not just in GitHub comments.

20. Zero-Click AI Descriptions

What it does: When a PR is opened with an empty description, the webhook automatically generates one using the DescriptionGenerator and posts it as a comment.
Why it matters: Every PR gets a meaningful description — no developer friction required.

Layer 5: Multi-Provider Infrastructure

v2.0.0 transformed PRSense from a GitHub-centric tool into a truly provider-agnostic platform.

21. GitLab Webhook Processing

What it does: Receives GitLab Merge Request Hook events at /api/webhook/gitlab, verifies via X-Gitlab-Token, runs the full duplicate detection pipeline, and dispatches Slack/Discord alerts.
Why it matters: GitLab teams get the same real-time duplicate detection that GitHub teams have enjoyed since v1.0.

22. BYOK (Bring Your Own Key)

What it does: Users supply their own OpenAI API keys via the dashboard, stored securely per-organization.
Why it matters: Removes the single biggest friction point for adoption — teams control their own costs and rate limits.

23. API Key Management

What it does: Create, list, and revoke organization-scoped API keys (prs_live_... / prs_test_...) with secure hashing.
Why it matters: Enables programmatic access to PRSense for CI/CD pipelines and custom integrations.

24. Webhook Management

What it does: CRUD endpoints for configuring Slack/Discord notification webhooks per organization, with event filtering (detection.duplicate, detection.possible, etc.).
Why it matters: Teams get full control over what triggers alerts and where they go.

Layer 6: Semantic Code Analysis & Refactoring (New in v2.1.0 and v2.2.0)

PRSense v2.1.0 and v2.2.0 mark the evolution from duplicate detection to a comprehensive Code Intelligence and Refactoring platform.

25. AST Parsing Engine

What it does: Uses the native TypeScript Compiler API to perfectly parse code into Abstract Syntax Trees, detecting cyclomatic complexity and duplicate logic blocks.
Why it matters: Moves beyond simple text diffs to truly understand the structure and health of your code.

26. Code Health Score

What it does: Computes a 0-100 metric for your codebase based on cyclomatic complexity and code duplication.
Why it matters: Helps engineering managers and CTOs track technical debt over time.

27. LLM Style Profiling

What it does: Actively analyzes your repository’s code to automatically generate a CodebaseStyleProfile (learning your team’s naming conventions and architecture patterns).
Why it matters: PRSense enforces your specific architectural rules on future PRs, not generic rules.

28. Live Refactor Hub

What it does: Automatically analyzes your code to find technical debt and architectural flaws, then provides a “1-click PR” interface to fix them using LLMs.
Why it matters: Moves from just reporting technical debt to actively fixing it.

29. DeepSeek & OpenAI Support

What it does: High-quality, ultra-affordable LLM integrations for all features.
Why it matters: Gives you complete control over your AI infrastructure costs while maintaining top-tier accuracy.