Internal Data Harvesting

Info

Internal data harvesting is a post-compromise technique where attackers, after gaining internal access, strategically explore available data to identify what will cause the greatest business impact. Instead of exfiltrating everything, they selectively identify and harvest high-value assets - like financial records, PII, or credentials - that can be abused for leverage, extortion, or further compromise.

What makes this technique uniquely dangerous is its focus on detection evasion. By limiting the volume and frequency of access, attackers avoid triggering alerts tied to bulk data transfer, CPU spikes, or abnormal storage usage. They often operate through normal-looking service accounts, query internal APIs, or access systems already permitted by their compromised identity — all while staying below traditional detection thresholds. This post-intrusion method aims to maximize impact while minimizing the chance of being caught. Detection requires moving beyond volume-based triggers to advanced contextual behavior analysis, such as access patterns that are technically allowed but behaviorally inconsistent.

Examples in the Wild

Notable Internal Data Harvesting Attacks:

Equifax Data Breach (2017) The Equifax breach demonstrated sophisticated internal data harvesting where attackers spent months after initial compromise systematically identifying and accessing high-value databases containing 147 million records. Rather than immediately exfiltrating everything, they strategically mapped internal systems to locate the most sensitive consumer data including SSNs, birthdates, and addresses.

Capital One Data Breach (2019) The Capital One breach showcased selective data harvesting where the attacker (Paige Thompson) systematically identified and accessed specific S3 buckets containing customer data. The attack involved strategic enumeration of cloud storage resources to locate the most valuable datasets including 100 million credit card applications.

SolarWinds SUNBURST - Post-Compromise Data Harvesting Following the initial SolarWinds compromise, attackers demonstrated sophisticated internal data harvesting by selectively targeting specific organizations' email servers and cloud environments. They avoided bulk data extraction, instead focusing on high-value intelligence targets including government agencies and security companies.

Attack Mechanism

Internal Data Harvesting Techniques:

Selective Data Enumeration
Systematic database and file system exploration
Identifying high-value data repositories
Mapping data classification and sensitivity levels
Prioritizing targets based on business impact
Stealthy Access Patterns
Using legitimate service accounts for data access
Mimicking normal user behavior and timing
Accessing data through authorized APIs and interfaces
Staying below detection thresholds
Strategic Data Identification
Searching for financial records and payment data
Identifying personally identifiable information (PII)
Locating credentials and authentication tokens
Finding intellectual property and trade secrets
Evasive Harvesting Methods
Limiting data volume per access session
Spacing out data access over time
Using normal business hours for access
Leveraging existing permissions and roles

Real-World Examples

# Equifax Internal Data Harvesting
attack_pattern:
  - phase: "Initial Access"
    method: "Apache Struts vulnerability exploitation"
  - phase: "Internal Harvesting"
    targets: ["ACIS database", "Dispute databases", "Legacy systems"]
    data_types: ["SSN", "birthdates", "addresses", "credit_data"]
    timeline: "76 days of undetected access"

# Capital One Cloud Data Harvesting
attack_pattern:
  - phase: "Initial Access"
    method: "SSRF via misconfigured WAF"
  - phase: "Internal Harvesting"
    targets: ["S3 buckets", "Customer databases"]
    data_types: ["credit_applications", "bank_routing_numbers"]
    approach: "Selective bucket enumeration"

# Strategic Harvesting Indicators
detection_patterns:
  - pattern: "Unusual data access patterns"
    indicators: ["Off-hours database queries", "Systematic table enumeration"]
  - pattern: "Privilege escalation for data access"
    indicators: ["Service account abuse", "Permission requests"]

Detection Challenges

Why Traditional Security Tools Fail:

Legitimate Access Patterns
Attackers use authorized accounts and permissions
Data access appears normal to automated systems
No obvious indicators of malicious intent
Blends with regular business operations
Volume-Based Detection Limitations
Traditional tools focus on bulk data transfer
Selective harvesting stays below alert thresholds
Gradual data access over time avoids detection
No immediate performance impact
Context-Aware Analysis Requirements
Need to understand normal vs. abnormal patterns
Requires behavioral analysis beyond technical indicators
Must consider business context and data sensitivity
Challenging to implement without false positives

Required Detection Strategy:

# Behavioral analysis rules
- rule: "Unusual Data Access Pattern"
  condition: |
    user.access_pattern != historical_baseline AND
    accessed_data.sensitivity == "high" AND
    access_time.distribution == "systematic"
  severity: high

# Contextual analysis
- rule: "Privilege-Data Mismatch"
  condition: |
    user.normal_role != data_access_level OR
    user.department != data_owner_department OR
    access_justification == "missing"
  severity: critical