Star on GitHub

System Capabilities

PRSense is built on a layered architecture designed to give your repository a persistent memory.

Layer 1: Memory (Storage & Ingestion)

These components are responsible for capturing and storing the “state” of your engineering history.

1. Persistent Memory Store (Storage)

  • What it does: Stores the vectorized meaning of every PR, Issue, and Decision.
  • Tech: Supports PostgreSQL (Production) with pgvector for finding semantic similarities in milliseconds, and SQLite (Development) for easy setup.
  • Why it matters: Unlike a linter that runs and dies, this memory persists forever.

2. Semantic Indexing (Embeddings)

  • What it does: Converts code changes (Diffs) and natural language (Descriptions) into mathematical vectors.
  • Privacy First:
    • OpenAI: Best for accuracy (95%).
    • ONNX (Local): Runs 100% offline on your machine. No code ever leaves your server.
  • Why it matters: Enables finding “similar intent” even if the words are completely different.

3. Cross-Repo Awareness

  • What it does: Connects the memory of multiple repositories (e.g., frontend, backend, microservices).
  • Why it matters: Detects when a change in one repo contradicts or duplicates work in another.

These components allow humans and agents to query the memory.

4. Duplicate Detection

  • What it does: Automatically flags incoming PRs that look like previous work.
  • Precision: Uses a multi-stage funnel (Bloom Filter -> Vector Search -> Reranking) to ensure <5% false positives.

5. Explainable “Why” (Score Breakdown)

  • What it does: It doesn’t just say “Duplicate”. It proves it.
  • Output: “92% Similarity: Text matches ‘Fix login’ (0.95), File overlap ‘auth.ts’ (0.80)”.
  • Why it matters: Builds trust with engineers. Black boxes get ignored; Explainable AI gets adopted.

6. Semantic Search API

  • What it does: A natural language interface for your codebase history.
  • Query: “Have we ever properly fixed the race condition in the payment webhook?”
  • Result: “Yes, see PR #402 and PR #891.”

Layer 3: Operations (Performance & Integration)

7. The Bloom Filter Guard

  • What it does: A probabilistic data structure that instantly rejects (in 2ms) any unique PRs.
  • Why it matters: Ensures the system adds zero latency to 90% of your CI/CD runs.

8. Batch Processing

  • What it does: Allows backfilling history (indexing the last 5 years of PRs) in minutes.
  • Why it matters: Day 1 value. You don’t have to wait for new data; you learn from the past immediately.

Advanced Configuration

Fine-tune the system for your specific needs.

9. Embedding Cache

  • What it does: Caches embeddings to avoid re-computing identical PRs.
  • Why it matters: Reduces OpenAI API costs by 90% and speeds up indexing.

10. Configurable Weights

  • What it does: Tune the importance of Text vs. Code Diff vs. File Paths.
  • Why it matters: Customize detection behavior (e.g., “Ignore descriptions, focus only on code”).

11. Dry-Run Mode

  • What it does: Simulate detection without saving to the database.
  • Why it matters: Safely test configuration changes in CI/CD before deploying.

Layer 4: Application & Workflows (v1.1.0+)

Built on top of the Repository Memory, these workflows automate engineering intelligence.

12. Knowledge Graph

  • What it does: Maps the relationships between Authors, Files, and PRs over time.
  • Why it matters: Allows you to instantly query “Who owns this file?” or “What parts of the codebase does this author usually touch?” without scraping git blames.

13. AI-Powered PR Descriptions (Local Context)

  • What it does: Auto-generates PR descriptions based on Diff heuristics and local embedded search of similar historical PRs.
  • Why it matters: Better descriptions without sending your proprietary code to a 3rd party LLM.

14. Custom Rules Engine

  • What it does: Allows defining YAML/JSON rules to block or warn on PRs (e.g. require security team review for auth/*).
  • Why it matters: Moves from “detection” to “enforcement” natively in the PR lifecycle.

15. Stale PR Detection

  • What it does: Automatically flags inactive PRs based on customizable thresholds.
  • Why it matters: Keeps the repository clean and ensures reviews don’t slip through the cracks.

16. Smart Triage & Auto-Labeling

  • What it does: Classifies incoming PRs into categories (bug, feature, refactor, docs, etc.) with confidence scores, and suggests reviewers based on file ownership history.
  • Why it matters: Saves maintainers 5-10 minutes per PR on manual triage. Labels applied automatically via webhook.

17. Impact Scoring

  • What it does: Calculates a risk score (0-100) for each PR based on factors like files changed, lines modified, blast radius, and author experience.
  • Why it matters: Surfaces high-risk PRs that need extra review, preventing production incidents.

18. Multi-Provider Support

  • What it does: Full provider abstraction supporting GitHub, GitLab, and Bitbucket out of the box.
  • Why it matters: PRSense works wherever your team hosts code — not locked to GitHub.

19. Notification System

  • What it does: Sends real-time alerts to Slack and Discord when duplicates, high-risk PRs, or rule violations are detected.
  • Why it matters: Teams get notified in their existing workflow tools, not just in GitHub comments.

20. Zero-Click AI Descriptions

  • What it does: When a PR is opened with an empty description, the webhook automatically generates one using the DescriptionGenerator and posts it as a comment.
  • Why it matters: Every PR gets a meaningful description — no developer friction required.

Layer 5: Multi-Provider Infrastructure (New in v2.0.0)

v2.0.0 transforms PRSense from a GitHub-centric tool into a truly provider-agnostic platform.

21. GitLab Webhook Processing

  • What it does: Receives GitLab Merge Request Hook events at /api/webhook/gitlab, verifies via X-Gitlab-Token, runs the full duplicate detection pipeline, and dispatches Slack/Discord alerts.
  • Why it matters: GitLab teams get the same real-time duplicate detection that GitHub teams have enjoyed since v1.0.

22. Bitbucket Webhook Processing

  • What it does: Receives Bitbucket pullrequest:created and pullrequest:updated events at /api/webhook/bitbucket, runs detection, stores results, and dispatches alerts.
  • Why it matters: Bitbucket Cloud teams can now use PRSense natively without any adapters or bridges.

23. BYOK (Bring Your Own Key)

  • What it does: Users supply their own OpenAI API keys via the dashboard, stored securely per-organization.
  • Why it matters: Removes the single biggest friction point for adoption — teams control their own costs and rate limits.

24. API Key Management

  • What it does: Create, list, and revoke organization-scoped API keys (prs_live_... / prs_test_...) with secure hashing.
  • Why it matters: Enables programmatic access to PRSense for CI/CD pipelines and custom integrations.

25. Webhook Management

  • What it does: CRUD endpoints for configuring Slack/Discord notification webhooks per organization, with event filtering (detection.duplicate, detection.possible, etc.).
  • Why it matters: Teams get full control over what triggers alerts and where they go.