Before we start troubleshooting, let’s take some time to understand how the network works. Finding web vulnerabilities is all about exploiting the weaknesses of the technology, so all good hackers should have a clear understanding of them. If you are already familiar with these processes, you can move on to monitoring Internet security. The following question is a good starting point: what happens when you type www.google.com into your browser? In other words, how does your browser know how to navigate from a domain name like google.com to the web page you’re looking for? Let’s find out.
Part 1: Client-server model
The Internet consists of two types of devices: clients and servers. Clients request resources or services, and servers provide those resources and services. When you visit a website using a browser, it acts as a client and requests a web page from the web server. The web server will then send your browser a web page (picture below):
A web page is nothing but a collection of resources or files sent by a web server. For example, at a minimum, the server will send your browser a text file written in a hypertext markup language ( HTML ), a language that tells your browser what to display. Most web pages also include Cascading Style Sheets ( CSS ) files to make them look beautiful. Sometimes web pages also contain JavaScript (JS) files , which allow sites to animate the web page and respond to user input without using a server.
For example, JavaScript can resize images as users scroll and validate user input on the client side before sending it to the server. Finally, your browser can receive embedded resources such as images and videos. Your browser will combine these resources to display the web page you see.
Servers don’t just return web pages to the user. Web APIs allow applications to request data from other systems. This allows applications to communicate with each other and control the exchange of data and resources. For example, Twitter APIs allow other websites to send requests to Twitter servers to obtain data such as lists of public tweets and their authors. APIs provide many functions of the Internet beyond this, and we will return to them, as well as their security, in future sections.
Discover: So You Want to Be a Hacker: 2024 Edition
Part 2: Domain name system | Internet ports
Well, every device connected to the Internet has a unique Internet Protocol ( IP ) address that other devices can use to find it. However, IP addresses consist of numbers and letters that are difficult for humans to remember. For example, the old IPv4 IP address format looks like this: 123.45.67.89 . The new version of IPv6 looks even more complex: 2001:db8::ff00:42:8329 .This is where the Domain Name System ( DNS ) comes to the rescue. A DNS server functions like a phone book on the Internet, converting domain names into IP addresses (picture below). When you enter a domain name in a browser, the DNS server must first resolve the domain name to an IP address. Our browser asks the DNS server: “What IP address is this domain on?”
Internet portsOnce your browser receives the correct IP address, it will try to connect to that IP address through the port. A port is a logical separation of devices that identifies a specific network service. We identify ports by their numbers, which can range from 0 to 65535 .Ports allow a server to provide multiple services to the Internet at the same time. Because there are conventions for traffic received on specific ports, port numbers also allow the server to quickly forward incoming Internet messages to the appropriate service for processing. For example, if an internet client connects to port 80 , the web server understands that the client wants to access its web services (picture below).
By default, we use port 80 for HTTP messages and port 443 for HTTPS , the encrypted version of HTTP .
Part 3: HTTP requests and responses
Once a connection is established, the browser and server communicate via the Hypertext Transfer Protocol ( HTTP ). HTTP is a set of rules that define how Internet messages are structured and interpreted, and how web clients and web servers should exchange information.
When your browser wants to communicate with the server, it sends an HTTP request to the server. There are different types of HTTP requests, the most common being GET and POST . By convention, GET requests retrieve data from the server, and POST requests transfer data to it. Other common HTTP methods include OPTIONS , used to request allowed HTTP methods for a given URL ; PUT – used to update a resource; and DELETE , used to delete a resource.
Here is an example of a GET request that requests the home page www.google.com from the server :
GET / HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0
Accept: text/html,application/xhtml+xml,application/xml
Accept-Language: en-US
Accept-Encoding: gzip, deflate
Connection: close
Let’s go through the structure of this request since you will come across many such requests in this series of articles. All HTTP requests consist of a query line, request headers, and an optional request body. The previous example contains only the query string and headers.
The query line is the first line of an HTTP request. It specifies the request method, the URL requested, and the HTTP version used. Here you can see that the client is sending an HTTP GET request to the home page of www.google.com using HTTP version 1.1.
The remaining lines are the HTTP request headers. They are used to pass additional information about the request to the server. This allows the server to customize the results sent to the client. In the previous example, the Host header specifies the hostname of the request. The User-Agent header contains information about the operating system and version of the requesting software, such as the user’s web browser. The Accept, Accept-Language, and Accept-Encoding headers tell the server what format the responses should be in. The Connection header tells the server whether the network connection should remain open after the server responds.
You may see several other common headers in requests. The Cookie header is used to send cookies from the client to the server. The Referer header indicates the address of the previous web page that linked to the current page. The authorization header contains the credentials to authenticate the user to the server. Once the server receives the request, it will try to fulfill it. The server will return all resources used to create your web page using HTTP responses. The HTTP response contains several elements: an HTTP status code indicating whether the request was successful; HTTP headers, which are pieces of information that browsers and servers use to communicate with each other regarding authentication, content format, and security policies; and the HTTP response body or actual web content that you requested. Web content can include HTML code, CSS style sheets, JavaScript code, images, and more.
Here is an example HTTP response:
Notice the 200 OK message on the first line (1) . This is the status code. An HTTP status code in the range of 200 indicates a successful request. A status code in the 300 range indicates a redirect to another page, while a 400 range indicates an error on the client side, such as a request for a page that does not exist. A range of 500 means that there was an error on the server itself.
As a bug hunter, you should always keep an eye on these status codes as they can tell you a lot about how the server is performing. For example, status code 403 means that the resource is prohibited for you. This could mean that sensitive data is hidden on a page that you can access if you can bypass access controls.
The next few lines in the response, separated by a colon (:), are the HTTP response headers. They allow the server to pass additional information about the response to the client. In this case, you can see that the response time was Tue, 31 Aug 2021 17:38:14 GMT (2). The Content-Type header specifies the file type of the response body. In this case, the Content-Type of this page is text/html (3) . The server version is Google Web Server (gws) (4) and the Content-Length is 190,532 bytes (5) . Typically, additional response headers indicate the content of the content: format, language, and security policies.
In addition to these, you may encounter several other common response headers. The Set-Cookie header is sent by the server to the client to set the cookie. The Location header specifies the URL to which the page should be redirected. The Access-Control-Allow-Origin header specifies which origins can access the page’s content. Content-Security-Policy controls the origin of resources that the browser is allowed to load, and the X-Frame-Options header specifies whether a page can be loaded inside an iframe. The data after the empty line represents the response body. It contains the actual content of the web page, such as HTML and JavaScript code. Once your browser has all the information it needs to create a web page, it will render everything for you.
❤️ If you liked the article, like and subscribe to my channel “Codelivly”.
👍 If you have any questions or if I would like to discuss the described hacking tools in more detail, then write in the comments. Your opinion is very important to me!