Internal Data Harvesting

Info

Tactic: Expanding Control

Internal data harvesting is a post-compromise technique where attackers, after gaining internal access, strategically explore available data to identify what will cause the greatest business impact. Instead of exfiltrating everything, they selectively identify and harvest high-value assets - like financial records, PII, or credentials - that can be abused for leverage, extortion, or further compromise.

What makes this technique uniquely dangerous is its focus on detection evasion. By limiting the volume and frequency of access, attackers avoid triggering alerts tied to bulk data transfer, CPU spikes, or abnormal storage usage. They often operate through normal-looking service accounts, query internal APIs, or access systems already permitted by their compromised identity — all while staying below traditional detection thresholds. This post-intrusion method aims to maximize impact while minimizing the chance of being caught. Detection requires moving beyond volume-based triggers to advanced contextual behavior analysis, such as access patterns that are technically allowed but behaviorally inconsistent.

Examples in the Wild

Notable Internal Data Harvesting Attacks:

Equifax Data Breach (2017) The Equifax breach demonstrated sophisticated internal data harvesting where attackers spent months after initial compromise systematically identifying and accessing high-value databases containing 147 million records. Rather than immediately exfiltrating everything, they strategically mapped internal systems to locate the most sensitive consumer data including SSNs, birthdates, and addresses.

Capital One Data Breach (2019) The Capital One breach showcased selective data harvesting where the attacker (Paige Thompson) systematically identified and accessed specific S3 buckets containing customer data. The attack involved strategic enumeration of cloud storage resources to locate the most valuable datasets including 100 million credit card applications.

SolarWinds SUNBURST - Post-Compromise Data Harvesting Following the initial SolarWinds compromise, attackers demonstrated sophisticated internal data harvesting by selectively targeting specific organizations' email servers and cloud environments. They avoided bulk data extraction, instead focusing on high-value intelligence targets including government agencies and security companies.

Attack Mechanism

Internal Data Harvesting Techniques:

  1. Selective Data Enumeration
  2. Systematic database and file system exploration
  3. Identifying high-value data repositories
  4. Mapping data classification and sensitivity levels
  5. Prioritizing targets based on business impact

  6. Stealthy Access Patterns

  7. Using legitimate service accounts for data access
  8. Mimicking normal user behavior and timing
  9. Accessing data through authorized APIs and interfaces
  10. Staying below detection thresholds

  11. Strategic Data Identification

  12. Searching for financial records and payment data
  13. Identifying personally identifiable information (PII)
  14. Locating credentials and authentication tokens
  15. Finding intellectual property and trade secrets

  16. Evasive Harvesting Methods

  17. Limiting data volume per access session
  18. Spacing out data access over time
  19. Using normal business hours for access
  20. Leveraging existing permissions and roles
Real-World Examples
# Equifax Internal Data Harvesting
attack_pattern:
  - phase: "Initial Access"
    method: "Apache Struts vulnerability exploitation"
  - phase: "Internal Harvesting"
    targets: ["ACIS database", "Dispute databases", "Legacy systems"]
    data_types: ["SSN", "birthdates", "addresses", "credit_data"]
    timeline: "76 days of undetected access"

# Capital One Cloud Data Harvesting
attack_pattern:
  - phase: "Initial Access"
    method: "SSRF via misconfigured WAF"
  - phase: "Internal Harvesting"
    targets: ["S3 buckets", "Customer databases"]
    data_types: ["credit_applications", "bank_routing_numbers"]
    approach: "Selective bucket enumeration"

# Strategic Harvesting Indicators
detection_patterns:
  - pattern: "Unusual data access patterns"
    indicators: ["Off-hours database queries", "Systematic table enumeration"]
  - pattern: "Privilege escalation for data access"
    indicators: ["Service account abuse", "Permission requests"]
Detection Challenges

Why Traditional Security Tools Fail:

  1. Legitimate Access Patterns
  2. Attackers use authorized accounts and permissions
  3. Data access appears normal to automated systems
  4. No obvious indicators of malicious intent
  5. Blends with regular business operations

  6. Volume-Based Detection Limitations

  7. Traditional tools focus on bulk data transfer
  8. Selective harvesting stays below alert thresholds
  9. Gradual data access over time avoids detection
  10. No immediate performance impact

  11. Context-Aware Analysis Requirements

  12. Need to understand normal vs. abnormal patterns
  13. Requires behavioral analysis beyond technical indicators
  14. Must consider business context and data sensitivity
  15. Challenging to implement without false positives

Required Detection Strategy:

# Behavioral analysis rules
- rule: "Unusual Data Access Pattern"
  condition: |
    user.access_pattern != historical_baseline AND
    accessed_data.sensitivity == "high" AND
    access_time.distribution == "systematic"
  severity: high

# Contextual analysis
- rule: "Privilege-Data Mismatch"
  condition: |
    user.normal_role != data_access_level OR
    user.department != data_owner_department OR
    access_justification == "missing"
  severity: critical