Big Data and cybersecurity: pushing the SIEM envelope

If you’ve ever been the victim of a robbery (let’s hope not) then you’ll know at first hand that feeling of anger and impotence that comes over you, among other reasons due to your negligent lack of foresight or inability to avoid it. Last summer the gated development where I live was burgled. The burglars sneaked under cover of night to literally break open 4 or 5 lumber rooms and make off with some valuable objects. Their burgling method was pretty homespun but nonetheless effective: a wad of folded paper wedged into the latch to be able to open the lumber-room safety door without forcing it. That simple. Simple to do and simple to counter. Whenever I enter the lumber-room area nowadays I make sure there is no other wad of paper wedged in the latch. We know this is a reactive response. We defend ourselves against the form of attack we have suffered in the past. But as soon as the “evildoers” turn to another modus operandi, we are defenseless and will be robbed anew.

Security Information and Event Management (SIEM) systems are nowadays most company’s stock response for safeguarding their systems and data but they represent a similarly reactive approach. In other words, SIEMs limit themselves to detecting events that might imply a security risk (anyone surfing on deep internet?) Going back to the previous analogy, a traditional SIEM will look out for any wads of folded paper wedged in the latch because someone has warned that this jeopardizes security. This is based on past experience (own or others’) or putting yourself in the attacker’s shoes (how would I violate the security of this environment?).

But there is a third way: an intelligent system would be capable of detecting that a door with a wad of folded paper is an aberration and hence poses a risk to security.

Can we anticipate new forms of attack without having to wait to suffer them (often with irreparable consequences)? Can we, in short, try to forestall cybercriminals’ attacks proactively? In the case of cybersecurity, thanks to Big Data and headway made in artificial intelligence, the answer is “yes”.

Big Data to the rescue

IT security systems generate a volume of information that, until fairly recently, could not be mined and used in real time. Much of this information was processed in forensic investigations to determine the sources and causes of an attack. Although it is true that SIEM manufacturers have been evolving and developing their products to come up with an increasingly sophisticated response to attacks, Big Data is going to mark a watershed moment in surveillance systems like SIEMs. And all thanks to the famous 3Vs of Big Data:

  • Volume: a great amount of data, sometimes only processable at network package level.
  • Velocity in data analysis, in the desire of real-time results or at least with minimal latency.
  • Variety, using different formats and data sources.

When aberration is bad

In our daily lives we hate monotony, always doing the same thing, drab uniformity. In the security world it’s just the opposite: problems usually crop up when something strays outside the norm, flouting the habitual, expected forms of behavior. And it is precisely here, in the analysis of patterns, trends and deviations, where predictive analytics comes into its own.

The combination of data analysis together with Big Data, i.e., the abundance of data (volume), their heterogeneity (variety) and the need of a quick response (velocity) is bringing about a game-changing shift in traditional SIEMs.


Let’s assume for arguments sake that we’re capable of automatically characterizing and classifying the company staff’s behavior from their internet browsing habits, i.e., the sites they visit, volume of data, activity timetable, etc.

  • It stands to reason that web contents and marketing personnel will clock up heavy traffic at particular moments (such as uploading onto the web a promotional video) and on given sites (social media, website / corporate blogs …)
  • The uploading traffic generated by a software developer is without doubt more limited and large volumes of data are unlikely to be uploaded.
  • A member of the security team might have to visit “unrecommendable” sites in their investigations, doing so from “authorized” computers and addresses.

It is impossible to watch over all the users of any organization with thousands of employees, unless there are suspicions hanging over any of them. But if we automatically classify users by their browsing behavior in information systems and networks, on the basis of their activity during a given spell of time, and if we then crosscheck this information against, for example, human resources databases (position in the firm, department, workplace, …) or participation in projects, we would then be capable of ascertaining if they are carrying out any “abnormal” network activity at any given time.


Crosschecking information with these databases furnishes us with other detection capacities. There have been known cases of the use of user accounts when the holder no longer belongs to the firm (and hence raising suspicions of fraudulent or malevolent use).

It is important here to stress the word “automatically” since this is an intelligent algorithm that classifies groups and their membership without the need for human intervention.

It should also be pointed out here that the overriding interest is security. Browsing of online newspapers, sporting press, social networking sites or streaming services might pose a problem of work efficiency and proper dedication to the task in hand, and this might be worrying to departments like human resources. But the interest in tools of this type resides in protecting the company’s infrastructure and information security, forestalling any undermining of infrastructure and critical data. Although obviously they will have a certain deterrent effect too.

A New-Generation SIEM: SIEM_NG

Drawing on its wealth of cybersecurity experience and know-how in Big Data projects, GMV has developed the new-generation SIEM called SIEM_NG. This SIEM enables the company to forecast events that might jeopardize any organization’s security. Rather than excluding or replacing the current market SIEMs (HP’s ArcSight SIEM, QRadar, Splunk, LogRhythm, …) it complements them by adding on advanced analytical and predictive capacities.

SIEM_NG allows all the following:

  • Processing huge amounts of information from diverse and heterogeneous sources
  • Analyzing complex behavior and case studies well beyond the possibilities of a traditional SIEM
  • Swift configuration and adaptation to the environment, allowing the rapid creation of new rules or modification of existing ones.
  • A scalable solution based on Open Source technologies
  • Affordable implementation cost and a rapid learning curve, using as it does a flexible architecture based on standard technologies.


As the invisible gorilla  experiment shows, if we concentrate only on the events we are primed to watch out for we run the risk of missing unexpected events. Events that might well turn out to be fatal for the security of our systems.

The advances in data processing technology (Big Data and artificial intelligence) give the SIEM à la carte analytical and predictive capacities to suit each particular company’s needs and environment. Features such as automatic user classification represent a quantitative and qualitative leap forwards in cybersecurity prevention and detection capabilities. But that is not all:

  • Detection of active users and services that should not be operative
  • Complex event programming (CEP) for the detection of the password-recovery fraud (to give only one example)
  • Prevention of data exfiltration

GMV’s inhouse solution SIEM_NG is now proving to be enormously valuable wherever it is set up. It represents the crucial transition from a reactive to proactive approach. In any case, despite the huge advance in potential event-detection capability on the strength of automation and artificial intelligence, any SIEM needs to be operated by skilled and experienced personnel. The generated alerts have to be analyzed to rule out false positives. The system also needs to be able to improve its forecasting capacity by feeding back this information. As with any SIEM, it is not enough for the system to detect events and then just let them pile up. In cybersecurity matters, machine intelligence has to go hand in hand with the intelligence of the experts.

Author: Ángel Gavín Alarcón

Las opiniones vertidas por el autor son enteramente suyas y no siempre representan la opinión de GMV
The author’s views are entirely his own and may not reflect the views of GMV

    Leave a Reply

    Your email address will not be published. Required fields are marked *

      I accept the privacy policy Acepto la Política de Privacidad

    Basic Data-Protection information:

    Data-protection supervisor: GMV Innovating Solutions SL
    Purpose: Answer questions, respond to user complaints and recommendations, receive job applicants resumes and career information.
    Legitimation: Consent of data subject
    Addresses: Grupo GMV companies
    Rights: Access, rectify and cancel data plus other rights, as explained in additional information
    Additional information: You can check out the additional and detailed data-protection information on our website:Privacy Policy

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

    WP-SpamFree by Pole Position Marketing

    13,038 Total Views