All blog posts

Klemens Arro: Are data leaks our new reality?

Photo: cloudtech-klemens-arro
Photo: cloudtech-klemens-arro

We have seen a lot of different leaks this year, including a few that are connected to Estonian companies. Is this a new reality that we should just accept or simply a temporary problem that we should patiently wait to pass?

I would say it is a new reality, one that we are living in already. Data leaks won’t disappear, just like software problems and the increasingly more common cyberattacks won’t disappear.

Leaks are not actually a new thing, but people have a much better understanding of the value of data nowadays. Additionally, new methods have been developed for discovering leaks and there has been much more media coverage on such discoveries. You can’t have the good without the bad, so thanks to the increasing number of publications surrounding data leaks, general awareness of these issues among customers and companies has been increasing as well. In turn, this helps online environments become more secure.

If you think about the leaks that have been discovered in the past few months, would you say that they all share a common characteristic, something that connects them all?

If we are talking about the leaks that took place in Estonia in the last few months, then they definitely share a pattern. But if we are talking about leaks in a broader, more international sense, then there is no common pattern there.

The three most common causes behind leaks are accidental human errors, processes for handling data that are lacking in security, and problems with infrastructure management. And generally, these issues stem from two root causes: cost savings and ignorance.

At the same time, if we look at how much data volumes and their worth has changed, then the old saying “I am not so rich as to afford buying cheap things” should be clear to everyone. In other words, skimping on important systems that handle your data will not be cheaper in the end. If you think about the financial repercussions of new laws, then it becomes immediately clear that it is a lot cheaper to invest in building a proper system and the necessary processes to manage that system from the get-go.


Artificial intelligence can be found in more and more areas and fields. Everything is becoming more automated and losing the human element. So why has AI been unable to make information systems more secure?

I would claim the opposite. It has already done so in multiple areas and systems. But we must keep in mind that AI is not a magical solution for everything and similarly, information leaks and other cybersecurity issues are not all the same, despite all of them falling under the same category.

Today’s AI is used for resolving specific issues in cybersecurity and data handling, but it is currently impossible to create an AI that could resolve all the cybersecurity issues in the world.

But we could look to sci-fi to see what the future could bring and AI in the future might be able to do all of that. Since AI is increasingly being implemented in the creation of systems, then in theory, the AI could set up mechanisms that can learn how to protect themselves from the ground up.

Google, for example, has been using a special AI for a while now – one that can create other AIs. However, so far, it has only been used in lab experiments, but they hope to implement it on a wider scale soon.

In many ways, AI is actually already protecting our data, but it is doing so in very specific cases and so, the human factor is still very important here.


Let us look at the example of the phishing attacks that started in spring, the ones targeting both banks and Mobile-ID. We can clearly see that the end-user bears a big responsibility in these cases – if the user themselves presses a wrong button and types in their PIN code somewhere where they should not, then the service provider cannot really help protect the user from themselves. Is the situation the same for large information systems or is the bulk of the responsibility in these cases on the company who actually handles and manages the data?

I believe this is another important thing that the average user should be taught, because it really is difficult to protect someone who just gives an attacker everything they need. To make matters worse, the average person just shrugs these things off and thinks that they have nothing to hide anyway. Whether someone has something to hide or not has not been the issue for years. The problem lies in what the attacker could do after they manage to successfully gain access to someone’s e-mail as a result of a direct attack. And even more, that person could then be used as a springboard for getting access to people and systems that could be used to cause even more problems.

But phishing has also become quite clever – even long-time professionals sometimes need a moment to figure out that what they are looking at really is phishing. This is especially true for direct attacks.

In the case of larger information systems, everything gets even more complicated, because the company’s various partners will also have access to the data, which is being handled by these partners on very different levels.

The newer, so-called cloud-native and security-first architecture models are being used to try and lower the risks involved multiple parties having access to data by having a system for handling accesses from a data object level built deep into these architectures.

An oversimplification of the most common process is this: one person is responsible for the physical safety of the servers; the second is responsible for the security of the servers on the levels of the operating system, the software, and the network traffic; another is responsible for the software that has access to the user database. And there is also a separate layer of managing and handling rights and accesses to data objects. In newer architectural models, handling data object rights is actually done on an infrastructure level, regardless of how many partners are involved in the various system layers.

Of course, the physical safety of servers will always remain a priority. The risks involved with this are quite low in the case of properly set up cloud services since the data is encrypted before saving and divided up between different servers and hard drives in small parts. On a software level, there is no layer of managing and handling rights and accesses to data object, which means that if the software malfunctions, it’s impossible for a situation to arise, where someone would gain access to the whole database – access is only given to the data objects concerning the current authorised session.


Legally speaking, how should these responsibilities be resolved?

According to law, at least right now, the responsibility is shared among everyone. This is another oversimplification, but everyone who has access to the data is responsible for the data that gets moved around via their accesses. This goes for everyone, from the manager of the physical server all the way to the end-user.


Wouldn’t it be easier to buy in data handling services?

If a company lacks a strong and clear understanding of data handling, then it’s definitely worth it to buy this service in or at the very least, to hire an external specialist who can help the company come up with and establish the solutions they need to better protect their data. They can provide this help in all cases, whether during the creation of new systems and processes, changing pre-existing ones, or while pre-existing data records are being mapped out.

But buying in a service doesn’t mean that the company is suddenly released from its obligations. Outsourcing just means that there is an additional responsible party who should also contribute to lowering any possible risks.


How can the average user protect their own data? For example, if they are a customer at a large gas station chain, they use their services and while doing so, they use the company’s client card? This transaction will leave a mark in any case and the customer has no way of being actively involved in what happens to their data.

Unfortunately, in these types of cases, the customer can’t do much to protect their data. If someone is concerned about how their data is being used, then the law gives them the right to contact the organisation who is handling their data and to ask for copies of all the data regarding them specifically. They also have the right to ask the organisation to delete all data about them from their systems.


Does this kind of data (for example, customer history) even have a significant importance? Does it make a difference if something like this gets leaked or not?

I believe most of us think that this kind of data isn’t particularly important, because most of us engage in activities such as buying fuel or a coffee and hot dog. But if this type of data is added up over time, it’s surprising just how much information you can glean about a person. This is the same for any kind of data that depicts our habits as users.

Since data can gain new value when combined with new data and more context, then any kind of data leaks are a problem.

Of course, there are also a few specific cases, where just one line of leaked data could cause huge issues in certain situations.


If a company is just starting to think about how to protect their customers’ data better, what is the very first thing they should do?

The first thing that needs to be done is to clarify what kind of data does the company handle and in what quantities. This involves all data, from the largest data records to the oldest marketing campaigns, which could still have connected databases lying around somewhere in older servers. Then, the data will need to be categorised, everything that is unnecessary and illegal to keep must be deleted, and all the necessary data must be divided into groups based on how sensitive the data is. Then, the situation must be approached from an individual level and the best possible solution for protecting this data should be implemented. Often, this means separating data from their personalised connections, which removes the need to acquire new hardware or software. If, for example, a database with some kinds of transactions gets leaked, then at least none of them are connected to any personalised data.


Who should be tasked with taking care of this?

Data handling is a strategic decision and it should be made on a board level. But even the CIO or the Head of IT can’t escape having to deal with it. Ideally, the whole board, the CIO and the Head of IT should all be involved in data handling questions.


What is the most secure type of data storage? Would you say clouds, a personal server, a hybrid of the two, or something completely different?

On the one hand, it depends on the particular situation, and on the other, on judicial regulations, which may implement some limitations.

For example, the average organisation is usually unable to invest as much money into protecting their data as a large cloud service provider could. But it’s also important to keep in mind that at the end of the day, even the most secure and the best system is only as secure as it’s been set up to be. And further, its security depends on how it is used. That’s why it is cheaper to hire a specialist in the field right away, since they will immediately be able to spot any possible risks, decide how to best lower those risks, and make the necessary adjustments.


How to recognise a good data storage solution? How to pick one?

It’s very difficult to pinpoint one or two aspects of what makes a system truly secure.

For example, we have noticed that some SaaS service providers will claim that a secure service has an encrypted network (HTTPS) and its servers are managed by Amazon Web Service (AWS). Although both are good things to have in general, they don’t really say anything about actual security. HTTPS should be a staple of any web service and having AWS doesn’t automatically mean it is implemented securely.

On the other end of the spectrum are SaaS services whose security policy is so detailed that you get a very clear overview of their technical security solutions, processes etc.

The best advice I can give is to do your research – get better acquainted with the company offering the service, their history, their origin (for example, it is better to avoid companies from countries that have a history of not respecting human rights), learn more about their security and privacy policies, and finally, if, who and how is auditing them.


The data storage locations of big global cloud service providers are either kept a secret or, if it is known where they are located, then it is never in Estonia. How important is the location of the data?

If the data is located within the European Union, which has very strong data protection regulations, then the exact location isn’t that important for the average company. And thanks to the General Data Protection Regulation (GDPR), companies located outside the EU must also follow the requirements set out in this regulation if they handle the data of any citizen of an EU Member State. Even so, it is much more likely for a data centre in the EU to follow the legislature here, then for a US service provider to do so, as they might not have even heard of the GDPR.

And there are situations where our law required the data to be kept within the territory of the Republic of Estonia or at least be replicated here as well. This means that some Estonian companies can’t, for example, use Amazon’s or Google’s cloud services. Luckily, this will change as soon as AWS’s more important services become available in data centres located in Estonia.


How much is security affected by the network connection?

The security of a connection is extremely important, both in the internal and the external networks. This is because all data moves through both, regardless of how sensitive it is. Unfortunately, a lot of the time, attention is given to the external network and people forget that the internal network also needs to be encrypted and follow the highest security measures.


Please give use 3 recommendations that could help all companies avoid huge mistakes!

First, ensure that you have an overview of what kind of data the company is handling and where it is located. Without this information, it is impossible to create the correct processes or to implement the proper measures for avoiding leaking sensitive data.

Second, make sure that the processes involved in data storage and accesses is well-thought-out.

Thinking of web services, my third recommendation would be to create a secure process between the development and product environments to ensure that the publication of any software or its updates doesn’t create huge damages instead of profits

What is sensitive data? What isn’t?

The GDPR describes special categories of personal data as follows (known as sensitive data in earlier regulations).

  1. data that describes political opinions, religious or philosophical beliefs, excluding data about whether private law legal bodies who have properly registered according to the law are members or not;
  2. data that describes ethnic or racial origin;
  3. data concerning health or disabilities;
  4. data about heredity;
  5. biometric data (especially images of a person’s fingerprint, palm print, and eye iris, and their genetic data);
  6. data concerning a person’s sex life;
  7. data concerning trade union membership;
  8. data concerning the commission of a crime or being the victim of a crime prior to a public court hearing, a verdict regarding the crime, or the closing of the case

The new General Data Protection Regulation went into effect in January this year: (in Estonian).

Five danger factors to consider when handling a client’s data

  1. low awareness of cyber hygiene among the employees
  2. an incomplete overview of the data
  3. insufficiently adjusted services
  4. insufficient security for the infrastructure, servers, or the software
  5. weak or non-existent access policies

Published at:

All blog posts