I could tell you a lot about caching, how it has accelerated websites’ loading, the convenience of working with “complex” pages, what caching is for the future, and Google convinces webmasters to maximally cache everything that is possible ... But I’m a specialist in security areas and I will talk about something else, about the way the cached sites are stored on your computer, about how the cache of the website can be stolen altogether with valuable information, about the way how the cache of the website can pose a threat to your anonymity ...
When you download a website from a server, you are given a set of code on your computer. This is what the Google site code looks like.
This code is processed by your browser, turning it into the website that you are used to seeing.
The average page on a modern website weighs around 5 MB. This is not much, but users expect sites to get loaded as quickly as possible, and webmasters (creators and site managers), encouraged by Google, are doing everything possible to speed up the redownloading of the site.
Go to google.com. Now close it and open it again. Has anything changed? Nothing at all? Agree, there is no point in downloading this website to your computer a second time, it is enough just to keep it there. That's the whole idea of caching. Downloading from a computer is always faster than from a server; in addition, caching reduces the overall load on networks in the world.
By the way, not only browsers are caching sites, but also proxy servers are doing that. The proxy server loads the site once and on a repeated request it provides the users with a cached version of the page on the server. This significantly speeds up the loading of the website by the user.
If someone replaces the site’s cache at the server level, users can download a malicious version of the site. This attack is called Web Cache Poisoning, we will talk about it in more detail in the chapter on proxy servers.
Browser's cache is like a new pin to a bank card, meaning that the first time you enter it by piece of paper, then try to keep it in your head, since from there you can extract it faster and more conveniently than looking for the piece of paper. You will need the paper again only in case of changing the pin.
Why might I need to delete the cache? This is how it is explained in one of the blogs on the Internet.
The trouble is that these blogs are mostly written by people who have only 100 GB of hard drive space, 3/4 of which have already been occupied with folder XXX. In practice, cached sites rarely take up more than 1 GB. You yourself can limit the amount of space for the browser to store information, there is no need to delete the entire cache.
The cache has some features that you should remember. One of them is the limited space for the browser on the hard drive. When the place ends, the new data gradually begin to push out the old ones.
Honestly, I do not want to say anything bad about the author, but this approach upsets me when one writes some just to write them. There are keywords like "browser cache" and "delete cache", and they write an article on them, taking materials from another similar blog written by the same "expert". Therefore, the advice in this chapter will be very unusual.
Tip
Do not read doubtful websites and blogs about computer optimization and network security.Browser Cache Poisoning
This attack sounds rather threateningly, but today it is not that popular. As a rule, the attacker's goal is to change the JavaScript scripts stored in the cache with malicious ones for further execution of the ones on the victim's computer. Like other MITM attacks, Browser Cache Poisoning can be performed against a victim, for example, by hacking Wi-Fi to which they are connected, or by having access to a VPN or proxy that they use.
Browser Cache Poisoning provides an attacker with the ability to launch malicious scripts in the victim's browser. The goal could be to collect data, get logins, passwords and other information entered in the browser, download malware, and redirect to the attacker's website.
The small popularity of Browser Cache Poisoning is associated with low efficiency in comparison with other types of attacks. For example, the modern MITM attack allows you to redirect a user to an attacker's site or simply intercept information without unnecessary actions, such as infecting a cache.
Cache timing
If to explain it more deeply, this is an attack based on measuring the load time of the browser cache. On the Internet, they write a little about this attack, going into the boondocks of its implementation and practical application. The main objective of this attack is to obtain information whether a particular site has ever been opened.
I will describe its general meaning in the simplest and most understandable example. You took a friend's computer and open one of the popular porn sites on it. Your friend can clear the history, and the browser will not remember that this site has ever been opened. But if it opened this site, there is a strong chance it has a cached version.
You open the site in a private browser mode and in normal mode, comparing download speeds. If the speed is the same, it means that the site is opened for the first time, if in normal mode the site is loaded quickly, and in private mode it takes longer, it means that the computer has a cached version, which speeds up loading in normal mode. That's the whole point of cache timing.
Geo-Inference Attack
This is one of the methods of timing attacks, the purpose of which is to determine the exact country, city, address and language of the user's browser without their permission. Geo-Inference Attack is an extremely interesting attack, and with your permission, I will pay it a little more attention.
How is Geo-Inference Attack arranged?
Go to Google.com. If you are a resident of Russia, you will be redirected to the regional site google.ru. It is saved in your browser's cache.
And this is the picture which is stored in the cache of a user from Germany.
And this is a user from Japan.
It is not that much original, but these are different pictures. The malefactor saves all possible options for the Google logo, it will be needed to implement the Geo-Inference Attack.
The user will get the uploaded pictures. The first time the picture is loaded with the note "do not check for cache availability", and the load time is measured. This way we will know how long it takes to load an image without using a cache. Let it be 10 milliseconds.
Then we try to load the same image, but using the cache. In this case, the browser will check for the presence of this image in the cache and will take it, this will shorten the loading time, for example, to 3 milliseconds. If we see that a user with an IP address from Germany has never downloaded the German version of Google, but the Google Russia picture is kept in their cache, it is hardly a German resident, do you agree?
How is the city determined? Absolutely the same way, i.e. there are services whose sites differ depending on the user's city. But the most interesting thing is the definition of the address, or rather, the addresses visited. You probably already guessed that sites like Google Maps and Yandex.Maps are caching data, so you can find out which addresses the user downloaded.
How to protect against Geo-Inference Attack?
The only effective method is to prohibit data caching on sites that have unique versions for each country or city, for example Google. Browser private mode can help only to some extent. The browser will forget all the data at the end of the private mode session, but during the session you will visit Google and then the site with the script for the Geo-Inference Attack, the attack will work out.
There are a couple more methods of protection. The first is to mislead the attacker with another country's Google version. For example, while in Russia, you can use the IP address of Ukraine and the Ukrainian version of Google. The attacker will see your use of Google Ukraine, the Russian language and the Ukrainian IP address, so how can you not believe it?
The second method of protection is limiting the execution of scripts on the site. To check the speed of caching images, the attacker uses scripts, if they are not allowed to launch, it is impossible to conduct a Geo-Inference Attack. Restricting the execution of scripts on the site is certainly the right step from a security point of view, we'll talk about it in more detail later.
Compromised site cache
This is one of the most dangerous threats that can be carried by the cache of information from the browser. For example, the site was hacked, and you have cached version of the site with malicious content in your browser. Even when the owners restore the rights over the site, you will download the version that was originally saved on your hard drive. It is still highly likely to contain malicious content that has a threat to you. Therefore, in the case of loading a previously compromised site, clearing the cache is required.
Do you use only large sites and think that they are not hacked? There are known stories about hacking popular sites such as MySQL.com, linuxmint.com and many others. The malefactors for sure do not always change the site so that malicious information is cached by users, for example, MySQL.com simply infected all visitors, the download link containing a compromised Linux Mint image was changed on linuxmint.com, but the risk of caching malicious information is very high.
Forensic Browser Cache Analysis
Browser cache analysis can show which sites you have visited and when, and therefore, it is used in complex forensic analysis of the device.
Moreover, the cache often shows whether you are logged in to the site, since the interface of many sites before and after login is different, for example, on the social networking site Facebook. In many ways, this is why we have added the removal of the browser cache to the comprehensive browser cleanup in the emergency erasure program for the valuable information from Panic Button.