Historical Storage Guide

Track security scans over time for trend analysis, regression detection, and compliance reporting.

Overview

The Historical Storage feature stores scan results in a local SQLite database, enabling:

Trend Analysis: Track finding counts over time (critical/high/medium/low)
Regression Detection: Compare current scan with previous runs
Compliance Reporting: Prove security posture improvements over time
Dashboard Data Layer: Future support for time-series visualizations
Multi-Branch Tracking: Compare security across dev/staging/prod branches

Key Features:

SQLite Database: Zero-configuration, file-based storage (.jmo/history.db)
Full Finding History: Stores all CommonFinding v1.2.0 fields with compliance mappings
Automatic Aggregation: Severity counts updated via database triggers
Git Integration: Tracks commit hash, branch, tag, dirty status
CI/CD Metadata: Captures CI provider, build ID, environment variables
Multi-Target Support: Works across all 6 target types (repos, images, IaC, URLs, GitLab, K8s)

Quick Start

Auto-store during scan (recommended):

# Store scan results automatically after completion
jmo scan --repo ./myapp --profile balanced --store-history

# Custom database location
jmo scan --repo ./myapp --profile balanced --store-history --history-db ./scans.db

Manual storage after scanning:

# Run scan first
jmo scan --repo ./myapp --profile balanced --results-dir ./results

# Store results manually
jmo history store --results-dir ./results --profile balanced

# Specify database path
jmo history store --results-dir ./results --profile balanced --db ./scans.db

Security & Privacy Features

The historical storage system includes comprehensive security and privacy controls designed with a privacy-first, defense-in-depth approach.

Privacy-First Defaults

By default, JMo does NOT collect hostname or username metadata. This ensures your personal information stays private without requiring any configuration.

Default Behavior:

CI metadata collected (ci_provider, ci_build_id, ci_run_number) - Non-PII, useful for tracking builds
Hostname NOT collected - Your machine name stays private
Username NOT collected - Your OS username stays private

Opt-In to Metadata Collection:

If you want to track which machine ran scans (useful for multi-developer teams or debugging), use the --collect-metadata flag:

# Opt-in to hostname/username collection
jmo scan --repo ./myapp --store-history --collect-metadata

When to use --collect-metadata:

Team environments where you want to track which developer's machine ran the scan
Debugging scan environment differences (production vs staging builders)
Compliance requirements to log scan operator identity

When to skip --collect-metadata (default):

Personal projects where privacy matters
Shared CI/CD runners where hostname/username is meaningless
Any scenario where you don't want PII in the database

CI Metadata (Always Collected):

CI metadata is always collected because it's non-PII and critical for tracking build context:

ci_provider: Detected CI system (github-actions, gitlab-ci, jenkins, etc.)
ci_build_id: Build number or job ID
ci_run_number: Run attempt number (for retries)

Secret Redaction

Automatic secret redaction prevents sensitive data from being stored in the history database.

How it works:

Secret scanners (trufflehog, noseyparker, semgrep-secrets) automatically redact sensitive fields
Redacted fields: Raw, RawV2, snippet, lines
Non-secret scanners (trivy, semgrep, checkov) retain full raw data
Redaction is automatic and always enabled for secret scanners

Example:

# Trufflehog finding BEFORE storage (in findings.json)
{
  "id": "abc123...",
  "ruleId": "aws-access-key",
  "raw": {
    "Raw": "AKIAIOSFODNN7EXAMPLE",  # Actual secret exposed
    "RawV2": "arn:aws:iam::123456789012:user/example"
  }
}

# AFTER storage in history.db (raw_finding column)
{
  "id": "abc123...",
  "ruleId": "aws-access-key",
  "raw": {
    "Raw": "[REDACTED]",  # Secret removed
    "RawV2": "[REDACTED]"
  }
}

Disable raw finding storage entirely:

If you want maximum privacy and don't need raw finding data in the history database, use --no-store-raw-findings:

# Don't store ANY raw finding data in history database
jmo scan --repo ./myapp --store-history --no-store-raw-findings

When to use --no-store-raw-findings:

Compliance requirements prohibit storing any potential secret data
Database size is a concern (reduces storage by ~40%)
You only need aggregate statistics (severity counts, trends)

Database Schema Impact:

The findings.raw_finding column is nullable
Redacted secrets show "[REDACTED]" placeholder
--no-store-raw-findings stores NULL in raw_finding column

Finding Data Encryption

Encrypt raw finding data using Fernet symmetric encryption for defense-in-depth security.

Setup:

# 1. Generate encryption key (32-byte base64-encoded)
export JMO_ENCRYPTION_KEY=$(python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")

# 2. Run scan with encryption enabled
jmo scan --repo ./myapp --store-history --encrypt-findings

# 3. Key is persisted in environment or CI secrets
echo $JMO_ENCRYPTION_KEY  # Save this securely!

Key Requirements:

Environment variable: JMO_ENCRYPTION_KEY must be set
Key format: Base64-encoded, minimum 32 bytes
Key derivation: SHA-256 hashing ensures proper Fernet key length

Error if key missing:

jmo scan --repo ./myapp --store-history --encrypt-findings
# ERROR: JMO_ENCRYPTION_KEY environment variable not set. Required for --encrypt-findings.

What gets encrypted:

Only the raw_finding column in the findings table
Metadata (severity, rule_id, file paths, timestamps) remains unencrypted for querying
Encryption uses Fernet symmetric encryption (AES-128-CBC with HMAC authentication)

Decryption on retrieval:

# Export JMO_ENCRYPTION_KEY in your shell
export JMO_ENCRYPTION_KEY="your-key-here"

# Findings are automatically decrypted when queried
jmo history query --severity CRITICAL

# JSON export also decrypts findings
jmo history export scan-report.json --scan-id abc123 --include-findings

When to use --encrypt-findings:

Shared databases (multi-user access, need to protect raw findings)
Compliance requirements for encryption at rest
Defense-in-depth strategy (redaction + encryption + file permissions)

When to skip --encrypt-findings:

Single-user local databases (file permissions sufficient)
Performance-critical environments (encryption adds ~10-20ms overhead)
Key management complexity outweighs benefits

File Permissions Hardening

Automatic file permissions ensure only the database owner can read/write the history database.

Default Behavior (Unix/Linux/macOS):

# After first scan with --store-history
ls -la .jmo/history.db
# -rw------- 1 user user 2.4M Nov 04 14:30 .jmo/history.db
#  ^^^ Owner-only read/write (0o600)

File Permissions:

Unix/Linux/macOS: 0o600 (owner read/write, no group/other access)
Windows: NTFS permissions applied (owner full control)
Enforcement: Applied automatically on database creation and connection

Security Benefits:

Prevents other users on shared systems from reading scan results
Protects against accidental disclosure if database copied
Complements encryption (defense-in-depth)

Override (NOT recommended):

File permissions are enforced automatically and cannot be disabled. If you need to share the database, use proper access controls:

# BAD: Weakening permissions
chmod 644 .jmo/history.db  # Will be reset to 0o600 on next connection

# GOOD: Export data for sharing
jmo history export shared-report.json --include-findings
chmod 644 shared-report.json  # Share the export, not the database

Defense-in-Depth Strategy

Combine all three security layers for maximum protection:

# 1. Generate encryption key
export JMO_ENCRYPTION_KEY=$(python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")

# 2. Run scan with all security flags
jmo scan --repo ./myapp --store-history \
  --encrypt-findings \           # Layer 1: Encrypt raw findings
  --no-store-raw-findings \      # Layer 2: Don't store raw data at all
  # --collect-metadata omitted   # Layer 3: No PII collection (default)

# 3. Verify database permissions
ls -la .jmo/history.db
# -rw------- 1 user user ...    # Layer 4: File permissions

Security Layers:

Privacy-first defaults: No PII collection (hostname/username)
Secret redaction: Automatic for secret scanners
Encryption: Fernet symmetric encryption for raw findings
File permissions: Owner-only access (0o600 on Unix)

Recommended Configurations by Use Case:

Use Case	`--collect-metadata`	`--encrypt-findings`	`--no-store-raw-findings`	Rationale
Personal projects	No	No	Yes	Maximum privacy, minimal overhead
Team development	Yes	No	No	Track developers, file permissions sufficient
Shared CI/CD runners	No	Yes	No	Encrypt sensitive data, no PII
Compliance audits	Yes	Yes	Yes	Full auditability with maximum security
Enterprise (multi-tenant)	Yes	Yes	No	Encryption + PII for full context

Environment Variables Reference

Variable	Purpose	Default	Required For
`JMO_ENCRYPTION_KEY`	Fernet encryption key (base64)	Not set	`--encrypt-findings`

Generating encryption keys:

# Python (cryptography library)
python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

# OpenSSL (alternative)
openssl rand -base64 32

# Store in shell profile for persistence
echo "export JMO_ENCRYPTION_KEY='your-key-here'" >> ~/.bashrc
source ~/.bashrc

CLI Integration Summary

All security flags work with jmo scan and jmo history store:

# jmo scan (integrated)
jmo scan --repo ./myapp \
  --store-history \
  --collect-metadata \
  --encrypt-findings \
  --no-store-raw-findings

# jmo history store (manual)
jmo history store --results-dir ./results \
  --collect-metadata \
  --encrypt-findings \
  --no-store-raw-findings

Flag Compatibility:

--encrypt-findings + --no-store-raw-findings: Compatible (encrypts NULL column, no impact)
--collect-metadata + --encrypt-findings: Compatible (metadata unencrypted, findings encrypted)
All three flags together: Compatible (maximum security)

CLI Commands

`jmo history store`

Store scan results in history database.

jmo history store --results-dir RESULTS_DIR [OPTIONS]

Options:

--results-dir DIR - Results directory containing summaries/findings.json (REQUIRED)
--profile PROFILE - Profile name (fast/slim/balanced/deep, default: balanced)
--commit HASH - Git commit hash (auto-detected if in Git repo)
--branch NAME - Git branch name (auto-detected if in Git repo)
--tag TAG - Git tag (auto-detected if in Git repo)
--db PATH - Database path (default: .jmo/history.db)

Example:

jmo history store --results-dir ./results --profile balanced --branch main

`jmo history list`

List stored scans with summary statistics.

jmo history list [OPTIONS]

Options:

--branch NAME - Filter by Git branch
--profile PROFILE - Filter by profile (fast/slim/balanced/deep)
--since TIMESTAMP - Filter by timestamp (Unix epoch or ISO 8601 format)
--limit N - Limit results (default: 50)
--json - Output as JSON instead of table
--db PATH - Database path (default: .jmo/history.db)

Example:

# List all scans
jmo history list

# List scans on main branch
jmo history list --branch main

# List last 10 scans in JSON format
jmo history list --limit 10 --json

# List scans since yesterday (Unix timestamp)
jmo history list --since 1730592000

Sample Output:

+-------------+---------------------+----------+-----------+------------+------------+--------+----------------+
| Scan ID     | Timestamp           | Branch   | Profile   |   Findings |   Critical |   High | Duration (s)   |
+=============+=====================+==========+===========+============+============+========+================+
| a1b2c3d4... | 2025-11-02 14:30:15 | main     | balanced  |         42 |          3 |     12 | 245.2          |
| e5f6g7h8... | 2025-11-01 09:15:42 | main     | balanced  |         38 |          2 |     10 | 238.7          |
| i9j0k1l2... | 2025-10-31 16:20:03 | dev      | fast      |         15 |          0 |      5 | 89.3           |
+-------------+---------------------+----------+-----------+------------+------------+--------+----------------+

`jmo history show`

Show detailed information for a specific scan.

jmo history show SCAN_ID [OPTIONS]

Arguments:

SCAN_ID - Full or partial UUID (e.g., a1b2c3d4 or full a1b2c3d4-...)

Options:

--json - Output as JSON
--db PATH - Database path (default: .jmo/history.db)

Example:

# Show scan details (accepts partial UUID)
jmo history show a1b2c3d4

# Full output in JSON
jmo history show a1b2c3d4-5e6f-7890-abcd-1234567890ab --json

Sample Output:

Scan ID:       a1b2c3d4-5e6f-7890-abcd-1234567890ab
Timestamp:     2025-11-02 14:30:15 (1730559015)
Profile:       balanced
Branch:        main
Commit:        abc1234567890def
Tag:           v1.2.3
Dirty:         No

Targets:       myapp, backend-api
Target Type:   repos
Tools:         trivy, semgrep, trufflehog, checkov

Findings:      42 total
  - CRITICAL:  3
  - HIGH:      12
  - MEDIUM:    18
  - LOW:       9
  - INFO:      0

Metadata:
  - Hostname:  builder-01
  - Username:  ci-user
  - CI Provider: github-actions
  - Build ID:  67890
  - Duration:  245.2 seconds

`jmo history compare`

Compare two historical scans from the SQLite database.

jmo history compare SCAN_ID_1 SCAN_ID_2 [OPTIONS]

Arguments:

SCAN_ID_1 - First scan ID (typically baseline or older scan)
SCAN_ID_2 - Second scan ID (typically current or newer scan)

Options:

--severity LEVEL - Filter by severity levels (CRITICAL, HIGH, MEDIUM, LOW, INFO)
--only {new,fixed,modified} - Show only specific change types
--format {json,md,html} - Output format (default: console)
--db PATH - Database path (default: .jmo/history.db)

Use Cases:

Compare baseline scan vs current scan
Track remediation progress over time
Detect security regressions

Example:

# List available scans
jmo history list

# Compare two scans
jmo history compare abc123 def456

# Show only new HIGH/CRITICAL findings
jmo history compare abc123 def456 --severity HIGH CRITICAL --only new

# Generate HTML report
jmo history compare abc123 def456 --format html > comparison.html

See Also:

jmo diff - Compare result directories
jmo trends compare - Compare against baseline with statistics

`jmo history query`

Query findings across stored scans.

jmo history query [OPTIONS]

Options:

--scan-id ID - Filter by specific scan (full or partial UUID)
--severity LEVEL - Filter by severity (CRITICAL/HIGH/MEDIUM/LOW/INFO)
--rule-id ID - Filter by rule ID (e.g., CVE-2024-1234, CWE-79)
--path PATTERN - Filter by file path pattern (supports wildcards)
--limit N - Limit results (default: 100)
--json - Output as JSON
--db PATH - Database path (default: .jmo/history.db)

Example:

# All critical findings
jmo history query --severity CRITICAL

# Findings in specific file
jmo history query --path "src/auth/*.py"

# Findings for specific rule
jmo history query --rule-id CVE-2024-9999

# Combined filters
jmo history query --severity HIGH --path "src/*" --limit 50

`jmo history prune`

Remove old scans from history database.

jmo history prune [OPTIONS]

Options:

--older-than SECONDS - Delete scans older than N seconds
--keep-scans N - Keep only the N most recent scans
--dry-run - Show what would be deleted without deleting
--db PATH - Database path (default: .jmo/history.db)

Example:

# Delete scans older than 90 days (7776000 seconds)
jmo history prune --older-than 7776000

# Keep only last 100 scans
jmo history prune --keep-scans 100

# Preview what would be deleted
jmo history prune --older-than 2592000 --dry-run

`jmo history export`

Export scan history to JSON file.

jmo history export OUTPUT_FILE [OPTIONS]

Arguments:

OUTPUT_FILE - Path to output JSON file

Options:

--scan-id ID - Export specific scan only
--include-findings - Include full finding details (default: metadata only)
--db PATH - Database path (default: .jmo/history.db)

Example:

# Export all scan metadata
jmo history export scans.json

# Export specific scan with findings
jmo history export scan-a1b2c3d4.json --scan-id a1b2c3d4 --include-findings

`jmo history stats`

Show database statistics and trends.

jmo history stats [OPTIONS]

Options:

--db PATH - Database path (default: .jmo/history.db)

Example:

jmo history stats

Sample Output:

Database: .jmo/history.db
Size:     2.4 MB
------------------------------------------------------------------------
Scans:            127
Findings:         3,842
Date Range:       2024-08-15 to 2025-11-02

Scans by Profile:
  balanced      89 scans
  fast          28 scans
  deep          10 scans

Findings by Severity:
  CRITICAL        42  (1.1%)
  HIGH           385  (10.0%)
  MEDIUM         892  (23.2%)
  LOW          1,823  (47.4%)
  INFO           700  (18.2%)

Top Tools:
  trivy                   1,245 findings
  semgrep                   982 findings
  trufflehog                615 findings
  checkov                   412 findings

`jmo history vacuum`

Optimize the SQLite history database by reclaiming unused space and rebuilding indexes.

jmo history vacuum [OPTIONS]

Options:

--db PATH - Database path (default: .jmo/history.db)

Description:

Optimize the SQLite history database by:

Reclaiming unused space
Rebuilding indexes
Improving query performance

Use Cases:

After pruning old scans (jmo history prune)
Database growing too large
Query performance degradation
Scheduled maintenance

Example:

# Vacuum database
jmo history vacuum

# Typical output:
# Database vacuumed successfully
# Space reclaimed: 15.2 MB -> 8.4 MB (45% reduction)
# Query performance improved

See Also:

jmo history prune - Remove old scans
jmo history verify - Check database integrity

`jmo history verify`

Verify SQLite history database integrity.

jmo history verify [OPTIONS]

Options:

--db PATH - Database path (default: .jmo/history.db)

Description:

Verify SQLite history database integrity by:

Checking for corruption
Validating foreign key constraints
Ensuring schema consistency
Testing read/write operations

Use Cases:

Troubleshooting database errors
Post-upgrade verification
Scheduled health checks
Before database backup

Example:

# Verify database integrity
jmo history verify

# Successful output:
# Database integrity check passed
# Foreign key constraints valid
# Schema version: 1.0.0
# Read/write test successful

# Failed output (if corrupted):
# Database corruption detected
# Recommendation: Restore from backup or reinitialize

See Also:

jmo history vacuum - Optimize database
Troubleshooting - SQLite issues

Database Schema

The history database uses SQLite with the following schema:

Tables:

scans - Scan metadata (timestamp, profile, branch, tools, severity counts, CI metadata)
findings - Individual findings (fingerprint, severity, rule, location, message, full CommonFinding JSON)
compliance_mappings - Framework mappings (OWASP, CWE, CIS, NIST, PCI-DSS, MITRE ATT&CK)
schema_version - Database schema version for migrations

Key Features:

Foreign Key Constraints: CASCADE deletion (deleting scan removes findings)
Automatic Triggers: Severity counts auto-updated on INSERT/UPDATE/DELETE
Indices: Optimized for common queries (timestamp DESC, branch, severity, rule_id)
Views: latest_scan_by_branch, finding_history for quick queries
WAL Mode: Write-Ahead Logging for concurrency and crash resilience

Schema (Security & Privacy):

findings.raw_finding column: Changed from NOT NULL to nullable to support --no-store-raw-findings flag
scans.hostname column: Changed from always populated to NULL by default (requires --collect-metadata opt-in)
scans.username column: Changed from always populated to NULL by default (requires --collect-metadata opt-in)
Encryption support: raw_finding column can store encrypted data (Fernet format) when --encrypt-findings used
File permissions: Database file automatically set to 0o600 (owner-only) on Unix systems

Database Location:

Default: .jmo/history.db (relative to working directory)
Custom: Use --db PATH or --history-db PATH flags
CI/CD: Recommended .jmo/ directory (gitignored by default)
Security: File permissions automatically enforced (0o600 on Unix)

Workflow Examples

Daily Development Workflow

# Morning: Baseline scan
jmo scan --repo ./myapp --profile balanced --store-history --branch dev

# Afternoon: After changes
jmo scan --repo ./myapp --profile balanced --store-history --branch dev

# Compare with previous scan
jmo history list --branch dev --limit 2

Pre-Release Compliance Workflow

# Run comprehensive scan before release
jmo scan --repo ./myapp --profile deep --store-history --tag v1.2.3

# Generate compliance report
jmo history query --severity CRITICAL --json > critical-findings.json

# Verify no critical findings
if [ $(jq '.findings | length' critical-findings.json) -gt 0 ]; then
  echo "FAIL: Critical findings detected"
  exit 1
fi

Multi-Branch Comparison

# Scan production branch
jmo scan --repo ./myapp --profile balanced --store-history --branch main

# Scan staging branch
jmo scan --repo ./myapp --profile balanced --store-history --branch staging

# Compare results
jmo history list --branch main --limit 1
jmo history list --branch staging --limit 1

Historical Trend Analysis

# Weekly scans stored over 3 months
jmo scan --repo ./myapp --profile balanced --store-history

# View trends
jmo history list --branch main --limit 12  # Last 12 scans

# Export for external analysis
jmo history export --include-findings monthly-report.json

CI/CD Integration

GitHub Actions

- name: Run security scan with history
  run: |
    jmo scan --repo . --profile balanced --store-history

    # Upload database as artifact for trend tracking
    tar -czf history-db.tar.gz .jmo/history.db

- name: Upload history database
  uses: actions/upload-artifact@v4
  with:
    name: scan-history
    path: history-db.tar.gz
    retention-days: 90

GitLab CI

security_scan:
  script:
    - jmo scan --repo . --profile balanced --store-history --db scans.db
    - jmo history stats --db scans.db
  artifacts:
    paths:
      - scans.db
    expire_in: 3 months

Best Practices

Use --store-history flag for automatic storage (no manual history store needed)
Consistent profiles - Use same profile for trend comparisons (balanced vs balanced)
Regular pruning - Run jmo history prune monthly to limit database size
Git integration - Run scans in Git repos for automatic branch/commit tracking
CI artifact storage - Upload .jmo/history.db as CI artifact for persistence
Database backups - Back up .jmo/history.db before major schema changes

Performance Benchmarks

Validated: November 2025 Test Environment: Linux (WSL2), Python 3.12, SQLite 3.x

The historical storage database is production-ready with performance exceeding all targets by 9-117x:

Operation	Volume	Performance	Target	Status
Store findings	1,000 findings	0.017s	<2s	117x faster
Query scans	10,000 scans	0.052s (52ms)	<500ms	9.6x faster
Batch insert	10,000 findings	0.151s	<5s	33x faster
Recent scans query	100 scans	<1ms	N/A	Sub-millisecond
Single scan lookup	1 scan	<0.1ms	N/A	Near-instant

Scalability Projections:

Small Deployment (1-10 repos, daily scans): <1 MB database, <1ms queries
Medium Deployment (10-50 repos, multiple scans/day): ~20-50 MB, <10ms queries
Large Deployment (100+ repos, CI/CD integration): ~200-500 MB, <50ms queries
Enterprise (1000+ repos, continuous scanning): ~2-5 GB, <100ms queries

Supported Scale: Up to 1 million scans without modification. For larger deployments, see Future Enhancements for sharding/archival options.

Index Usage: All 5 critical queries use indices (100% coverage verified). No full table scans on common operations.

Throughput:

Insert: ~60,000 findings/second (batch operations)
Query: ~190,000 scans/second (list operations)
Single lookup: Near-instant (primary key lookup)

Troubleshooting

Issue: "Database is locked" error

Cause: Multiple processes writing to database simultaneously
Fix: Ensure only one scan writes at a time, or use separate database files

Issue: "No findings.json found"

Cause: Trying to store before report phase completes
Fix: Ensure jmo report completes before jmo history store, or use --store-history flag which handles timing automatically

Issue: Database growing too large

Cause: Hundreds of scans accumulating
Fix: Run jmo history prune --keep-scans 100 to retain last 100 scans

Issue: Git context not captured

Cause: Not running scan in Git repository
Fix: Run scans from Git repo root, or manually specify --branch and --commit

Advanced History Queries

Python API for custom integrations

Phase 7 adds 9 specialized query functions to scripts/core/history_db.py designed for future integrations with interactive dashboards, AI-powered remediation systems, and compliance reporting tools.

React Dashboard Integration

These functions provide optimized, single-query data fetching for interactive web dashboards built with React and Recharts:

`get_dashboard_summary(conn, scan_id)` - Dashboard-Ready Summary

Single-query summary reducing multiple round-trips:

from scripts.core.history_db import get_connection, get_dashboard_summary

conn = get_connection(".jmo/history.db")
summary = get_dashboard_summary(conn, "abc123")

# Returns:
{
    "scan": {...},  # Full scan metadata
    "severity_counts": {"CRITICAL": 5, "HIGH": 12, "MEDIUM": 18, ...},
    "top_rules": [
        {"rule_id": "CVE-2024-1234", "count": 8, "severity": "HIGH"},
        ...
    ],
    "tools_used": ["trivy", "semgrep", "trufflehog"],
    "findings_by_tool": {"trivy": 45, "semgrep": 32, ...},
    "compliance_coverage": {
        "total_findings": 100,
        "findings_with_compliance": 85,
        "coverage_percentage": 85.0
    }
}

Performance: ~5-10ms for single scan

`get_timeline_data(conn, branch, days=30)` - Time-Series Trends

Optimized for Recharts line/area charts showing severity trends:

timeline = get_timeline_data(conn, branch="main", days=30)

# Returns list of daily data points:
[
    {
        "date": "2025-11-01",
        "scan_id": "abc123",
        "CRITICAL": 3,
        "HIGH": 12,
        "MEDIUM": 18,
        "LOW": 25,
        "INFO": 5,
        "total": 63
    },
    ...
]

Performance: 30 days: ~10-20ms, 90 days: ~20-40ms

`get_finding_details_batch(conn, fingerprints)` - Lazy Loading

Batch fetch finding details for drill-down views:

# User clicks on "12 HIGH findings" in dashboard
fingerprints = ["abc123", "def456", "ghi789", ...]
findings = get_finding_details_batch(conn, fingerprints)

# Returns list of full CommonFinding objects

Performance: 100 findings: ~10-20ms, 1000 findings: ~50-100ms

`search_findings(conn, query, filters=None)` - Full-Text Search

Search across findings with filters:

# Search for SQL injection findings
results = search_findings(
    conn,
    query="SQL injection",
    filters={
        "severity": "HIGH",
        "branch": "main",
        "date_range": ("2025-11-01", "2025-11-30"),
        "limit": 50
    }
)

Performance: Simple query: ~5-10ms, With filters: ~10-20ms

MCP Server Integration

These functions provide AI-ready data formats for Model Context Protocol servers enabling Claude to suggest remediation strategies:

`get_finding_context(conn, fingerprint)` - Full Context for AI Remediation

from scripts.core.history_db import get_finding_context

context = get_finding_context(conn, "abc123")

# Returns:
{
    "finding": {...},  # Current finding
    "history": [...],  # Same finding in past scans (up to 10)
    "similar_findings": [...],  # Related findings (up to 5)
    "remediation_history": [...],  # If fixed before, when/how?
    "compliance_impact": {
        "owasp": ["A03:2021"],
        "cwe": [{"id": "CWE-89", "rank": 3}],
        ...
    }
}

Use Case: AI assistant provides context-aware remediation:

User: "How do I fix finding abc123?"

Claude (using MCP Server):
- "This SQL injection (CWE-89) has appeared 3 times in past 60 days"
- "Previous fix: Use parameterized queries (resolved 2024-10-15)"
- "Compliance impact: OWASP A03:2021, PCI DSS 6.5.1"
- "Suggested fix: [code snippet]"

Performance: ~10-30ms

`get_scan_diff_for_ai(conn, scan_id_1, scan_id_2)` - AI-Optimized Diff

diff = get_scan_diff_for_ai(conn, "scan1", "scan2")

# Returns:
{
    "new_findings": [...],  # With priority scoring (1-10)
    "resolved_findings": [...],  # With "likely_fix" heuristics
    "unchanged_findings_count": 42,
    "priority_sorted": True  # Sorted by priority DESC
}

# Priority scoring formula:
# - CRITICAL: base 9-10
# - HIGH: base 7-8
# - + compliance frameworks (1-2 points)
# - + recent recurrence (1 point)

Use Case: AI prioritizes remediation tasks:

Claude: "Top 3 priorities from this diff:
1. [Priority 10] CVE-2024-9999 (CRITICAL, PCI DSS, CIS)
2. [Priority 9] SQL injection in auth.py (HIGH, OWASP A03)
3. [Priority 8] Hardcoded AWS key (HIGH, recurring 3x)"

Performance: ~20-50ms

`get_recurring_findings(conn, branch, min_occurrences=3)` - Whack-a-Mole Detection

Identifies findings that keep reappearing (systemic issues):

recurring = get_recurring_findings(conn, branch="main", min_occurrences=3)

# Returns:
[
    {
        "fingerprint": "abc123",
        "rule_id": "hardcoded-secret",
        "occurrence_count": 5,
        "first_seen": "2025-09-01 10:30:15",
        "last_seen": "2025-11-01 14:20:03",
        "avg_days_between_fixes": 12.5,
        "finding": {...}  # Full CommonFinding
    },
    ...
]

Use Case: AI suggests process improvements:

Claude: "Warning: 'hardcoded-secret' has recurred 5 times (avg 12.5 days between fixes).
This indicates a systemic issue. Consider:
1. Pre-commit hooks (detect before push)
2. Developer training on secrets management
3. Secrets scanning in CI/CD pipeline"

Performance: 100 scans: ~50-100ms

Compliance Reporting

These functions enable framework-specific compliance dashboards and trend analysis:

`get_compliance_summary(conn, scan_id, framework="all")` - Multi-Framework Summary

summary = get_compliance_summary(conn, "abc123", framework="all")

# Returns:
{
    "framework_summaries": {
        "owasp_top10_2021": {
            "A01:2021": {"count": 5, "severities": {"HIGH": 3, "MEDIUM": 2}},
            "A02:2021": {"count": 8, "severities": {"CRITICAL": 2, "HIGH": 6}},
            ...
        },
        "cwe_top25_2024": {
            "CWE-79": {"count": 12, "rank": 1, "severities": {...}},
            ...
        },
        "cis_controls_v8_1": {...},
        "nist_csf_2_0": {...},
        "pci_dss_4_0": {...},
        "mitre_attack": {...}
    },
    "coverage_stats": {
        "total_findings": 100,
        "findings_with_compliance": 85,
        "coverage_percentage": 85.0,
        "by_framework": {
            "owasp": 72,
            "cwe": 68,
            "cis": 45,
            "nist": 52,
            "pci": 38,
            "mitre": 15
        }
    }
}

Single-framework query:

# OWASP Top 10 only
owasp_summary = get_compliance_summary(conn, "abc123", framework="owasp")

Performance: Single framework: ~10-20ms, All frameworks: ~50-100ms

`get_compliance_trend(conn, branch, framework, days=30)` - Improvement Tracking

Track compliance improvements over time:

trend = get_compliance_trend(conn, branch="main", framework="owasp", days=30)

# Returns:
{
    "trend": "improving",  # or "degrading", "stable", "insufficient_data"
    "data_points": [
        {
            "scan_id": "abc123",
            "timestamp": "2025-11-01 10:30:15",
            "framework_findings": 15,
            "categories": {
                "A01:2021": 2,
                "A02:2021": 5,
                ...
            }
        },
        ...
    ],
    "insights": [
        "OWASP findings reduced by 40% over 30 days (25 -> 15)",
        "A02:2021 (Cryptographic Failures) improved 60% (10 -> 4)",
        "A03:2021 (Injection) stable at 3 findings"
    ],
    "summary_stats": {
        "oldest_count": 25,
        "newest_count": 15,
        "reduction_percentage": 40.0,
        "avg_findings_per_scan": 18.5
    }
}

Use Case: Compliance dashboard showing progress:

OWASP Top 10 Compliance (Last 30 Days)
--------------------------------------
Status: Improving (40% reduction)

Key Improvements:
  - A02:2021 (Cryptographic Failures): 60% reduction
  - A06:2021 (Vulnerable Components): 30% reduction

Stable Issues:
  - A03:2021 (Injection): 3 findings (no change)

Performance: 30 days: ~20-50ms, 90 days: ~50-100ms

Performance Summary

All Phase 7 functions target <100ms response times:

Function	Typical Data	Performance	Target
`get_dashboard_summary`	1 scan	5-10ms	<50ms
`get_timeline_data`	30 days	10-20ms	<50ms
`get_finding_details_batch`	100 findings	10-20ms	<100ms
`search_findings`	Simple query	5-10ms	<50ms
`get_finding_context`	1 finding + history	10-30ms	<100ms
`get_scan_diff_for_ai`	2 scans	20-50ms	<100ms
`get_recurring_findings`	100 scans	50-100ms	<100ms
`get_compliance_summary` (single)	1 scan	10-20ms	<50ms
`get_compliance_summary` (all)	1 scan	50-100ms	<100ms
`get_compliance_trend`	30 days	20-50ms	<100ms

All functions use indices for optimal performance. No full table scans on common operations.

Example Use Cases

Use Case 1: React Dashboard Component

// Dashboard.jsx
import React, { useState, useEffect } from 'react';
import { LineChart, Line, XAxis, YAxis, Tooltip } from 'recharts';

function SecurityDashboard({ scanId }) {
  const [summary, setSummary] = useState(null);
  const [timeline, setTimeline] = useState([]);

  useEffect(() => {
    // Fetch data from Python backend API
    fetch(`/api/history/dashboard/${scanId}`)
      .then(res => res.json())
      .then(data => setSummary(data));

    fetch(`/api/history/timeline?branch=main&days=30`)
      .then(res => res.json())
      .then(data => setTimeline(data));
  }, [scanId]);

  return (
    <div>
      <h2>Security Scan Summary</h2>
      <div>
        <span>CRITICAL: {summary?.severity_counts.CRITICAL}</span>
        <span>HIGH: {summary?.severity_counts.HIGH}</span>
      </div>

      <LineChart width={800} height={400} data={timeline}>
        <XAxis dataKey="date" />
        <YAxis />
        <Line type="monotone" dataKey="CRITICAL" stroke="#dc2626" />
        <Line type="monotone" dataKey="HIGH" stroke="#ea580c" />
        <Tooltip />
      </LineChart>
    </div>
  );
}

Backend API endpoint (Flask example):

from flask import Flask, jsonify
from scripts.core.history_db import get_connection, get_dashboard_summary, get_timeline_data

app = Flask(__name__)

@app.route('/api/history/dashboard/<scan_id>')
def dashboard_summary(scan_id):
    conn = get_connection(".jmo/history.db")
    summary = get_dashboard_summary(conn, scan_id)
    return jsonify(summary)

@app.route('/api/history/timeline')
def timeline(branch, days=30):
    conn = get_connection(".jmo/history.db")
    data = get_timeline_data(conn, branch, days)
    return jsonify(data)

Use Case 2: MCP Server for Claude Integration

# mcp_server.py
from mcp import Server
from scripts.core.history_db import get_connection, get_finding_context, get_scan_diff_for_ai

server = Server("jmo-security-mcp")

@server.tool("get_finding_remediation")
async def get_remediation(fingerprint: str) -> dict:
    """Provide AI with full context for remediation suggestions."""
    conn = get_connection(".jmo/history.db")
    context = get_finding_context(conn, fingerprint)

    return {
        "finding": context["finding"]["message"],
        "history": f"Seen {len(context['history'])} times before",
        "last_fix": context["remediation_history"][0] if context["remediation_history"] else None,
        "compliance": context["compliance_impact"],
        "suggestion": generate_fix_suggestion(context)  # AI-powered
    }

@server.tool("prioritize_findings")
async def prioritize_findings(scan1: str, scan2: str) -> list:
    """Return prioritized list of new findings for remediation."""
    conn = get_connection(".jmo/history.db")
    diff = get_scan_diff_for_ai(conn, scan1, scan2)

    return [
        {
            "priority": f["priority"],
            "rule_id": f["rule_id"],
            "path": f["location"]["path"],
            "message": f["message"]
        }
        for f in diff["new_findings"][:10]  # Top 10
    ]

Use Case 3: Compliance Reporting Dashboard

# compliance_report.py
from scripts.core.history_db import get_connection, get_compliance_summary, get_compliance_trend

def generate_compliance_report(scan_id, output_format="html"):
    conn = get_connection(".jmo/history.db")

    # Get current compliance status
    summary = get_compliance_summary(conn, scan_id, framework="all")

    # Get trends for each framework
    frameworks = ["owasp", "cwe", "cis", "nist", "pci", "mitre"]
    trends = {}
    for framework in frameworks:
        trends[framework] = get_compliance_trend(conn, "main", framework, days=90)

    # Generate HTML report
    html = f"""
    <h1>Compliance Report</h1>
    <h2>Current Status</h2>
    <ul>
        <li>OWASP Top 10: {summary['framework_summaries']['owasp_top10_2021']}</li>
        <li>CWE Top 25: {summary['framework_summaries']['cwe_top25_2024']}</li>
    </ul>

    <h2>90-Day Trends</h2>
    <ul>
        <li>OWASP: {trends['owasp']['trend']} ({trends['owasp']['insights'][0]})</li>
        <li>CWE: {trends['cwe']['trend']} ({trends['cwe']['insights'][0]})</li>
    </ul>
    """

    return html

Future Integration Examples

These functions are designed for extensibility:

Grafana Dashboards: Query functions provide Prometheus-style metrics
Slack/Teams Bots: Real-time compliance trend alerts
Jupyter Notebooks: Data science analysis of security posture
GitHub Actions: Automated compliance gate checks
Security Information and Event Management (SIEM): Export findings to Splunk/ELK

See scripts/core/history_db.py for complete function signatures and implementation details.

Future Enhancements

Time-series dashboard - Interactive charts showing trends over time
Automated comparisons - Diff between scans with highlighted regressions
Metrics API - REST API for external monitoring systems
Custom tags - Label scans with custom metadata (environment, team, project)
Alert thresholds - Notify when findings exceed baselines
Import command - jmo history import for loading external scan data
Schema migrations - Automatic database upgrades for future versions

For Developers: The history database API (scripts/core/history_db.py) is designed for extensibility. Future features can use:

list_scans(branch, since, profile) - Time-series data for trend analysis
get_findings_for_scan(scan_id, severity) - Finding details for comparisons
get_database_stats() - Aggregate statistics for dashboards

See scripts/core/history_db.py for complete API documentation.

Historical Storage Guide

Table of Contents

Overview

Quick Start

Security & Privacy Features

Privacy-First Defaults

Secret Redaction

Finding Data Encryption

File Permissions Hardening

Defense-in-Depth Strategy

Environment Variables Reference

CLI Integration Summary

CLI Commands

jmo history store

jmo history list

jmo history show

jmo history compare

jmo history query

jmo history prune

jmo history export

jmo history stats

jmo history vacuum

jmo history verify

Database Schema

Workflow Examples

Daily Development Workflow

Pre-Release Compliance Workflow

Multi-Branch Comparison

Historical Trend Analysis

CI/CD Integration

GitHub Actions

GitLab CI

Best Practices

Performance Benchmarks

Troubleshooting

Issue: "Database is locked" error

Issue: "No findings.json found"

Issue: Database growing too large

Issue: Git context not captured

Advanced History Queries

Python API for custom integrations

React Dashboard Integration

get_dashboard_summary(conn, scan_id) - Dashboard-Ready Summary

get_timeline_data(conn, branch, days=30) - Time-Series Trends

get_finding_details_batch(conn, fingerprints) - Lazy Loading

search_findings(conn, query, filters=None) - Full-Text Search

MCP Server Integration

get_finding_context(conn, fingerprint) - Full Context for AI Remediation

get_scan_diff_for_ai(conn, scan_id_1, scan_id_2) - AI-Optimized Diff

get_recurring_findings(conn, branch, min_occurrences=3) - Whack-a-Mole Detection

Compliance Reporting

get_compliance_summary(conn, scan_id, framework="all") - Multi-Framework Summary

get_compliance_trend(conn, branch, framework, days=30) - Improvement Tracking

Performance Summary

Example Use Cases

Use Case 1: React Dashboard Component

Use Case 2: MCP Server for Claude Integration

Use Case 3: Compliance Reporting Dashboard

Future Integration Examples

Future Enhancements

`jmo history store`

`jmo history list`

`jmo history show`

`jmo history compare`

`jmo history query`

`jmo history prune`

`jmo history export`

`jmo history stats`

`jmo history vacuum`

`jmo history verify`

`get_dashboard_summary(conn, scan_id)` - Dashboard-Ready Summary

`get_timeline_data(conn, branch, days=30)` - Time-Series Trends

`get_finding_details_batch(conn, fingerprints)` - Lazy Loading

`search_findings(conn, query, filters=None)` - Full-Text Search

`get_finding_context(conn, fingerprint)` - Full Context for AI Remediation

`get_scan_diff_for_ai(conn, scan_id_1, scan_id_2)` - AI-Optimized Diff

`get_recurring_findings(conn, branch, min_occurrences=3)` - Whack-a-Mole Detection

`get_compliance_summary(conn, scan_id, framework="all")` - Multi-Framework Summary

`get_compliance_trend(conn, branch, framework, days=30)` - Improvement Tracking