Google & Your Website – A Blind Alliance
Assume you have a website “onlineshopperdotcom” and when you search it on Google with keywords “online shopper website” you might get a sneak peek on the page results of your website and other websites relating to your keyword. That’s quite universal as we all urge to have our websites searched and indexed by Google. This is quite common for all e-commerce websites.
A. Your website “onlineshopperdotcom” is directly allied with Google.
B. Your website & your web server (where you have all usernames & passwords saved) are directly allied with each other.
C. Alarmingly, Google is indirectly allied to your web server.
You might be convinced that this is normal and may not expect a phishing attack using Google to retrieve any information from your web server. Now given a second thought, instead of searching “online shopper website” on Google, what if I search “online shopper website usernames and passwords”, will Google be able to give the list of usernames and passwords for online shopper website? As a security consultant, the answer will be “MAYBE, SOMETIMES!”, but if you use Google dorks (proper keywords for accessing Google), the answer will be a big “YES!” if your website ends up with mislaid security configurations.
Google Dorks can be intimidating.
Google pops in as a serving guardian until you see the other side of it. Google may have answers to all your queries, but you need to frame your questions properly and that’s where GOOGLE DORKS pitches in. It’s not a complicated software to install, execute and wait for results, instead it’s a combination of keywords (intitle, inurl, site, intext, allinurl etc) with which you can access Google to get what you are exactly after.
For example, your objective is to download pdf documents related to JAVA, the normal Google search will be “java pdf document free download” (free is a mandatory keyword without which any Google search is not complete). But when you use Google dorks, your search will be “filetype: pdf intext: java”. Now with these keywords, Google will understand what exactly you are looking for than your previous search. Also, you will get more accurate results. That seems promising for an effective Google search.
However, attackers can use these keyword searches for a very different purpose – to steal/extract information from your website/server. Now assuming I need usernames and passwords which are cached in servers, I can use a simple query like this. “filetype:xls passwords site: in”, this will give you Google results of cached contents from different websites in India which have usernames and passwords saved in it. It is as simple as that. In relation to online shopper website, if I use a query “filetype:xls passwords inurl:onlineshopper.com” the results might dismay anyone. In simple terms, your private or sensitive information will be available on the internet, not because someone hacked your information but because Google was able to retrieve it free of cost.
How to prevent this?
The file named “robots.txt” (often referred to as web robots, wanderers, crawlers, spiders) is a program that can traverse the web automatically. Many search engines like Google, Bing, and Yahoo use robots.txt to scan websites and extract information.
robots.txt is a file that gives permission to search engines what to access & what not to access from the website. It is a kind of control you have over search engines. Configuring Google dorks isn’t rocket science, you need to know which information to be allowed and not allowed in search engines. Sample configuration of robots.txt will look like this.