Internal Data Harvesting
Info
Tactic: Expanding Control
Internal data harvesting is a post-compromise technique where attackers, after gaining internal access, strategically explore available data to identify what will cause the greatest business impact. Instead of exfiltrating everything, they selectively identify and harvest high-value assets - like financial records, PII, or credentials - that can be abused for leverage, extortion, or further compromise.
What makes this technique uniquely dangerous is its focus on detection evasion. By limiting the volume and frequency of access, attackers avoid triggering alerts tied to bulk data transfer, CPU spikes, or abnormal storage usage. They often operate through normal-looking service accounts, query internal APIs, or access systems already permitted by their compromised identity — all while staying below traditional detection thresholds. This post-intrusion method aims to maximize impact while minimizing the chance of being caught. Detection requires moving beyond volume-based triggers to advanced contextual behavior analysis, such as access patterns that are technically allowed but behaviorally inconsistent.
Examples in the Wild
Notable Internal Data Harvesting Attacks:
Equifax Data Breach (2017) The Equifax breach demonstrated sophisticated internal data harvesting where attackers spent months after initial compromise systematically identifying and accessing high-value databases containing 147 million records. Rather than immediately exfiltrating everything, they strategically mapped internal systems to locate the most sensitive consumer data including SSNs, birthdates, and addresses.
Capital One Data Breach (2019) The Capital One breach showcased selective data harvesting where the attacker (Paige Thompson) systematically identified and accessed specific S3 buckets containing customer data. The attack involved strategic enumeration of cloud storage resources to locate the most valuable datasets including 100 million credit card applications.
SolarWinds SUNBURST - Post-Compromise Data Harvesting Following the initial SolarWinds compromise, attackers demonstrated sophisticated internal data harvesting by selectively targeting specific organizations' email servers and cloud environments. They avoided bulk data extraction, instead focusing on high-value intelligence targets including government agencies and security companies.
Attack Mechanism
Internal Data Harvesting Techniques:
- Selective Data Enumeration
- Systematic database and file system exploration
- Identifying high-value data repositories
- Mapping data classification and sensitivity levels
-
Prioritizing targets based on business impact
-
Stealthy Access Patterns
- Using legitimate service accounts for data access
- Mimicking normal user behavior and timing
- Accessing data through authorized APIs and interfaces
-
Staying below detection thresholds
-
Strategic Data Identification
- Searching for financial records and payment data
- Identifying personally identifiable information (PII)
- Locating credentials and authentication tokens
-
Finding intellectual property and trade secrets
-
Evasive Harvesting Methods
- Limiting data volume per access session
- Spacing out data access over time
- Using normal business hours for access
- Leveraging existing permissions and roles
Real-World Examples
# Equifax Internal Data Harvesting
attack_pattern:
- phase: "Initial Access"
method: "Apache Struts vulnerability exploitation"
- phase: "Internal Harvesting"
targets: ["ACIS database", "Dispute databases", "Legacy systems"]
data_types: ["SSN", "birthdates", "addresses", "credit_data"]
timeline: "76 days of undetected access"
# Capital One Cloud Data Harvesting
attack_pattern:
- phase: "Initial Access"
method: "SSRF via misconfigured WAF"
- phase: "Internal Harvesting"
targets: ["S3 buckets", "Customer databases"]
data_types: ["credit_applications", "bank_routing_numbers"]
approach: "Selective bucket enumeration"
# Strategic Harvesting Indicators
detection_patterns:
- pattern: "Unusual data access patterns"
indicators: ["Off-hours database queries", "Systematic table enumeration"]
- pattern: "Privilege escalation for data access"
indicators: ["Service account abuse", "Permission requests"]
Detection Challenges
Why Traditional Security Tools Fail:
- Legitimate Access Patterns
- Attackers use authorized accounts and permissions
- Data access appears normal to automated systems
- No obvious indicators of malicious intent
-
Blends with regular business operations
-
Volume-Based Detection Limitations
- Traditional tools focus on bulk data transfer
- Selective harvesting stays below alert thresholds
- Gradual data access over time avoids detection
-
No immediate performance impact
-
Context-Aware Analysis Requirements
- Need to understand normal vs. abnormal patterns
- Requires behavioral analysis beyond technical indicators
- Must consider business context and data sensitivity
- Challenging to implement without false positives
Required Detection Strategy:
# Behavioral analysis rules
- rule: "Unusual Data Access Pattern"
condition: |
user.access_pattern != historical_baseline AND
accessed_data.sensitivity == "high" AND
access_time.distribution == "systematic"
severity: high
# Contextual analysis
- rule: "Privilege-Data Mismatch"
condition: |
user.normal_role != data_access_level OR
user.department != data_owner_department OR
access_justification == "missing"
severity: critical