Intelligence data collection
There is no intelligence without data. After carefully planning and directing the intelligence team, the next step is to access the data. Data is collected to fulfill the requirements that have been assembled in the planning phase. It is recommended to collect data from different sources to have a rich arsenal of information and an effective intelligence product. Intelligence data sources can be divided into internal and external sources (detailed in Chapter 7, Threat Intelligence Data Sources):
- Internal sources: Internal sources constitute, or should constitute, the foundation of the data. It is essential to have an idea of the internal information first before looking at external sources. This data source includes network element logs and records of past incident responses. The most common internal data source collection could consist of intrusion analysis data by using the Lockheed Martin Kill Chain, such as internal malware analysis data (one of the most valuable data sources of threat intelligence), domain information, and TLS/SSL certificates.
- External sources: External sources are mandatory data collection points as they bring new visibility to threats. Those sources include external malware analysis and online sandbox tools, technical blogs and magazines, the dark web, and other resourceful sources such as open source and counterintelligence data. Malware zoos are also an essential part of external sources. By using and accessing an online sandbox system or using a malware analysis tool, intelligence analysts can collect useful information about adversaries' signatures to enrich the intelligence database.
As we will see in Chapter 7, Threat Intelligence Data Sources, collected data is placed into lists of indicators of compromise (IOC). Those indicators include, but are not limited to, domain information, IP addresses, SSL/TLS certificate information, file hashes, network scanning information, vulnerability assessment information, malware analysis results, packet inspection information, social media news (in raw format), email addresses, email senders, email links, and attachments. The more data that's collected, the richer the intelligence's repository and the more effective the intelligence product.
Suppose an attacker sends an email to a person in the organization who downloads and opens an attachment. A trojan is installed on the system and creates a communication link with an adversary. The relevant data needs to be available to detect and react to such an incident. For example, the threat intelligence analyst can use the network, domain, and certain protocol information to detect and prevent the trojan from infecting the system.
Therefore, collecting the right data is critical. We can directly create a link to the first step. If the intelligence framework's choice was poorly conducted, it would take time and a lot of effort to react to such a threat (adversary). Therefore, when selecting a framework, a CTI analyst should project the amount of data sources they intend to integrate into the system. They must also choose a platform that can accommodate big data.