Mitigating cyberrisks with smart log management

By Ivan Kornoukhov and Henning Soller

During the COVID-19 pandemic, cyberattacks have become increasingly targeted. Hackers, sometimes posing as an acquaintance, authority figure, or crisis resource, manipulate people into sharing information. And since people are conducting more mobile-banking transactions, hackers have sought to gain access to user credentials to tap open-banking application programming interfaces (APIs). In all, security experts estimate that pandemic-related cyberattacks could create massive damage if left unchecked.

While companies already collect, aggregate, and correlate tens of gigabytes (if not terabytes) of logs daily, each of these areas—collection, processing, and correlation—has challenges that prevent organizations from fully employing log-level data to mitigate cyberrisks. However, new technology is flipping the script. Log APIs and data lakes are just some of the technologies that can help companies better use log data to identify suspect or fraudulent attempts to log into apps or accounts. This method consists of aggregating the data, processing it so that it’s usable, and then performing analytics analysis to uncover patterns that are out of the ordinary, helping companies to understand the vector of attacks and preemptively respond to impending threats. Employing smart log management means companies can tackle collection, processing, and correlation to boost their cyber defense and get more from their data.

Improve log collection

Log collection is often hindered when applications or devices are not allowed to gather or share logs or certain systems don’t enable or manage logging. In addition, the volume of log data takes up a significant amount of server space. However, the API economy is shifting log collection into a routine, real-time business. Log APIs can simplify the IT landscape by eliminating the need for third-party log-collection tools and improving performance by allowing the server to transfer log data directly to an underlying database, instead of performing separate collections and transformations. Handling log calls in the API can also drastically reduce the volume of logs collected because users get only what they request. A call, for instance, might specify that the API return only sessions of remote users on a given weekend or only those above a certain network-traffic threshold.

APIs also enable data streaming. While log data streaming is not new—it’s been around since the 1980s—today’s platforms and solutions can send actionable alerts in real time rather than following the collection and aggregation of hundreds of data sources. There has also been an emergence of SIEM-like (security information and event management) solutions enabled to receive data in real time and correlate it on the spot—either using a rule-based approach or looking for undefined anomalies.

Enhance log processing

Log processing has long been difficult for companies to do well. Raw log data is massive and rapidly growing, demanding extensive processing capacity and large budgets for log-management platforms. And because the logs are unstructured text data, attempting to unify the format for usability has been another challenge. A 2019 McKinsey survey on cyber found that 80 to 98 percent of logs are simply noise—which makes it tough to balance the requirements for storing relevant logs while not being overwhelmed with nonrelevant data.

However, today companies can improve their log-processing power with data lakes. These systems have no data limits and can store compressed logs, which means organizations have even greater storage capacity. Companies can also deploy simple apps to cleanse data from each of the log sources as it arrives in the lake, such as by removing a failed HTTP status or duplicate records. Streaming data to a cleaning app first could also significantly reduce licensing and/or processing-capacity costs and facilitate data mining for patterns and insights.

One company, for instance, increased its storage capacity, streamlined its storage process, and cut subscription costs by investing in data lakes. The organization cut endpoints (user computers), reduced the number of logs pulled from software-as-a-service (SaaS) platforms, and decreased network log volume by 95 percent in the SIEM platform. Ultimately, by leaving hundreds of terabytes worth of logs generated each year compressed in the data lake, the organization saved hundreds of thousands of dollars in SIEM subscription costs on an annual basis. Storing the logs in this fashion also made it easier for forensics and data-science teams to access the data because of the reduced volume that needed to be processed over the network. This solution provides the best of both worlds—gaining enterprise insights faster with lower costs and storing the extensive log data for more advanced pattern-recognition exercises.

Boost log correlation and analysis capabilities

Data collection and processing are the lightest challenges for companies to tackle; combing through the data to find actionable insights is hard. Most out-of-the-box tools fail at autonomously building systems that identify pattern blocks and anomalies and clearly outline the attacks’ signatures.

However, data-science and threat-intelligence databases present opportunities to boost these capabilities. A threat-intelligence database can feed a company’s SIEM platform with patterns to identify and recognize potential issues in the logs collected. When the system finds a log that matches a pattern, it can alert the information-security personnel and trigger the appropriate automated response.

Those databases, containing hashes (“fingerprints” to help identify malware), blacklisted IP addresses, URLs, and other data, can be free or open-source feeds or a paid threat-intelligence platform. In our experience, a threat-intelligence platform that includes commercial APIs supplying curated intelligence data (often inclusive of public logs) is a better enterprise solution. The collected data sets provide the basis for a pattern search for similar events within the existing logs. Therefore, the implementation of the required correlation analysis—security orchestration automation response—becomes more targeted.

With cyberattacks on the rise and work-from-home arrangements creating new vulnerabilities, now is the time for companies to invest in smart log management. And because more advanced technology such as APIs, SIEM systems, and threat-intelligence platforms are available, it is easier to get started. The return on more secure systems—risk mitigation, cost savings, and reputation management—makes smart log management a worthy initiative.

Ivan Kornoukhov is an alumnus of McKinsey’s Moscow office, and Henning Soller is a partner in the Frankfurt office.