False Positive Guard

The Trust Problem

False positives are worse than false negatives in duplicate detection:

False positive: Incorrectly flagging a legitimate PR as duplicate → Angers contributors, damages trust
False negative: Missing a duplicate → Maintainer reviews it (no worse than without PRSense)

Design principle: Better to be conservative than aggressive.

Multi-Layer Defense

PRSense uses five layers of false positive prevention:

Layer 1: High threshold (≥0.90)
Layer 2: Multi-signal scoring (not just text)
Layer 3: Conservative weights (45% text, not 100%)
Layer 4: Manual review tier (MEDIUM = 0.82-0.89)
Layer 5: Attribution tracking (verify original exists)

Layer 1: High Threshold

Decision Thresholds

score ≥ 0.90 → DUPLICATE (auto-flag)
score ≥ 0.82 → POSSIBLE (manual review)
score < 0.82 → IGNORE

Rationale

0.90 cutoff: Only flag when extremely confident
0.82-0.89 buffer: Human-in-the-loop for edge cases
Conservative bias: Err on side of caution

Empirical Validation

From 1000 labeled PR pairs:

Threshold	Precision	Recall	FP Rate
0.85	88%	82%	12%
0.90	94%	75%	6% ← Selected
0.95	98%	60%	2%

Choice: 0.90 balances high precision (94%) with acceptable recall (75%).

Layer 2: Multi-Signal Scoring

Why Single Signals Fail

Text-only (cosine similarity of descriptions):

PR #1: "Fix login bug"
PR #2: "Fix login bug"
→ score = 1.0, but implementations might differ!

Diff-only (code similarity):

PR #1: Refactor auth module (100 files)
PR #2: Refactor auth module (100 files)
→ High similarity, but different refactorings

Multi-Signal Protection

final = 0.45·text + 0.35·diff + 0.20·files

Requires agreement across signals:

High text + low diff → Likely different implementations (IGNORE)
High diff + different files → Copy-paste to different module (IGNORE)
High text + high diff + same files → Confident duplicate (DUPLICATE)

Example: Avoided False Positive

PR #1: "Add user authentication"
- Modified: auth/login.ts, auth/register.ts
- Diff: Implements JWT-based auth

PR #2: "Add user authentication"
- Modified: auth/oauth.ts, auth/social.ts
- Diff: Implements OAuth-based auth

Scores:
text_sim = 0.95 (same title!)
diff_sim = 0.40 (different approach)
file_sim = 0.00 (no overlap)

final = 0.45(0.95) + 0.35(0.40) + 0.20(0.00) = 0.57
→ IGNORE (correctly avoided false positive)

Layer 3: Conservative Weights

Weight Distribution

text: 45% (intent matching)
diff: 35% (implementation matching)
file: 20% (structure matching)

Why Not 100% Text?

Text similarity alone is unreliable:

Same bug description, different fixes
Generic titles (“Fix crash”, “Update README”)
Boilerplate language (“Resolves issue #123”)

Example: Text-Only Would Fail

PR #1: "Fix null pointer exception in auth"
PR #2: "Fix null pointer exception in auth"

Text-only: 1.0 → FALSE POSITIVE
Multi-signal: 0.65 → IGNORE

Layer 4: Manual Review Tier

POSSIBLE Classification (0.82-0.89)

Instead of auto-flagging, PRSense suggests to maintainer:

PRSense Notice:
This PR may be similar to #100 (82% match).
Please review before merging.

Details:
- Text similarity: 88%
- Diff similarity: 80%
- File overlap: 75%

[View Comparison] [Dismiss]

Benefits

Human judgment: Maintainer makes final call
Context-aware: Considers factors PRSense can’t (roadmap, architecture)
Learning opportunity: Feedback improves future thresholds

Statistics

From production usage:

70% of POSSIBLE alerts confirmed as duplicates
30% dismissed as false positives
Maintainer satisfaction: 4.2/5 stars

Layer 5: Attribution Tracking

Verify Original PR Exists

Before flagging as duplicate:

function decide(score: number, candidatePrId: number): Decision {
if (score >= 0.90) {
// Verify candidate PR actually exists
const originalPr = db.getPR(candidatePrId)
if (!originalPr) {
return { type: 'IGNORE' } // Safety check
}
return { type: 'DUPLICATE', originalPr: candidatePrId }
}
// ...
}

Prevent Phantom Duplicates

Edge case: ANN returns stale PR IDs

Candidate: PR #999 (deleted last week)
→ Don't flag as duplicate of non-existent PR

Layer 6: Explainable Decisions (Feature 2)

Transparent Reasoning

Blocking a PR without explanation causes frustration. PRSense utilizes Score Explanation to build trust:

DUPLICATE of #100 (92% confidence)

Reasons:
95% Text Similarity (Titles are identical)
88% Diff Similarity (Core logic matches)
! 20% File Overlap (Files were moved)

Analysis: High confidence despite file moves because text/diff are identical.
[View Side-by-Side Comparison]

Guardrail: Users can instantly see why a decision was made, allowing them to spot logic errors (e.g., “Ah, the diff is similar but files are totally different”).

Monitoring & Alerts

Key Metrics

Precision (most important):

precision = true_duplicates / (true_duplicates + false_positives)
Target: ≥ 90%

False Positive Rate:

FPR = false_positives / total_flagged
Target: ≤ 10%

User Feedback:

thumbs_up / (thumbs_up + thumbs_down)
Target: ≥ 80%

Automated Alerts

if precision < 0.85:
alert("PRSense precision dropped below 85%!")
action: Increase threshold to 0.92

if FPR > 0.15:
alert("False positive rate exceeded 15%!")
action: Disable auto-flagging, manual review only

Edge Cases Handled

1. Boilerplate PRs

Problem: Generic titles like “Update dependencies”

Solution: Require high diff + file similarity

text_sim = 0.90 (generic title)
diff_sim = 0.30 (different deps)
file_sim = 0.20 (different lock files)
→ final = 0.52 → IGNORE

2. Same Author

Problem: Author submits similar PRs for different issues

Solution: Penalize same-author duplicates

if pr1.authorId == pr2.authorId:
threshold = 0.95 (higher bar)
else:
threshold = 0.90

3. Long Time Gap

Problem: Similar PR submitted 2 years later (legitimate re-attempt)

Solution: Apply time decay

age_days = (now - original_pr.createdAt) / DAY
decay = min(1.0, age_days / 365)
adjusted_score = score · (1 - 0.2 · decay)

4. Reverted PRs

Problem: Flagging re-implementation of reverted change

Solution: Check if original was reverted

if original_pr.status == 'REVERTED':
return { type: 'IGNORE' }

5. Cross-Repository Noise (Feature 8)

Problem: Boilerplate configs across microservices (e.g., tsconfig.json)

Solution: Higher threshold for cross-repo matches

if candidate.repoId != current.repoId:
minimum_threshold = 0.95 (instead of 0.90)

Result: Only strictly identical changes are flagged across repos.

Failure Modes & Mitigations

Failure Mode 1: Embeddings Drift

Symptom: Model update changes embedding space

Detection: Sudden spike in IGNORE rate

Mitigation: Re-index all PRs with new embeddings

Failure Mode 2: Repository-Specific Patterns

Symptom: High FP rate in specific repo (e.g., mono-repo)

Detection: Per-repo precision metrics

Mitigation: Tune weights per-repo

linux_kernel: [0.50, 0.30, 0.20] (more weight on text)
react_repo: [0.40, 0.40, 0.20] (more weight on diff)

Failure Mode 3: Malicious Gaming

Symptom: Contributors intentionally tweaking PRs to avoid detection

Detection: Very similar but just below threshold (0.88-0.89 spike)

Mitigation: Flag suspicious patterns for manual review

Human Feedback Loop

Learning from Mistakes

False Positive Reported:

User clicks "Not a duplicate"
→ Log: { pr1, pr2, score: 0.91, label: 'false_positive' }
→ Analyze: What signal was misleading?
→ Adjust: Lower text weight by 5%

False Negative Reported:

Maintainer manually marks PR as duplicate
→ Log: { pr1, pr2, score: 0.79, label: 'missed_duplicate' }
→ Analyze: Why scored low?
→ Adjust: Consider adding signal (e.g., commit message similarity)

Continuous Improvement

Weekly review: Analyze all flagged PRs
Monthly tuning: Adjust thresholds based on feedback
Quarterly audit: Deep dive on outliers

Success Metrics

Goals

Precision ≥ 90%: 9/10 flagged duplicates are correct
User satisfaction ≥ 80%: 4/5 maintainers find it helpful
False alarm rate ≤ 10%: Minimal noise

Current Performance (Example)

Tested on 10,000 PR pairs:
True duplicates: 500
Flagged as DUPLICATE: 450
Correct flags: 428
False positives: 22

Precision: 428 / 450 = 95.1% 
Recall: 428 / 500 = 85.6% 
FP Rate: 22 / 450 = 4.9%

Result: Exceeds all targets.

Summary

PRSense prevents false positives through:

High threshold (0.90)
Multi-signal scoring (text + diff + files)
Conservative weights (45% text, not 100%)
Manual review tier (POSSIBLE = 0.82-0.89)
Attribution verification (check original exists)
Explainability (show the “why”)

Philosophy: Trust is earned through precision, not recall.