The web refers to the World Wide Web (commonly used as WWW), a sub-concept of the Internet, and is a system that connects web resources such as special format documents (e.g. HTML), images, and videos to each other through the Internet and hypertext.
People often use the terms web and the Internet interchangeably, such as “using the Internet” to refer to the act of browsing websites, but in fact, the Internet and the Web are different concepts. The Internet is a large global network connected through the TCP/IP protocol. The Internet and hypertext had already existed before the creation of the web, but no one had thought of a way to connect documents using this technology, and around 1989, Tim Berners-Lee developed a plan to help scientists analyze data more easily. The web was invented to provide a way to share. The birth and development of the web have made it possible for everyone in the world to connect, share information, and communicate.
In this article, you will learn basic knowledge about the web and the HTTP protocol used on the web.
What is a web resource?
The object requested through the web is called a web resource and refers to all content used on the web. Web resources include HTML, CSS, JAVASCRIPT, text, images, etc.
Identifying web resources
Web resources are identified through URIs.
URI (Uniform Resource Identifier) is a unified resource identifier, and as mentioned earlier, URI is an identifier that can uniquely identify a web resource on the Internet.
You may have already heard of the term URL. URI and URL are often used interchangeably, but there are some differences between the two terms.
A Unifrom Resource Locator (URL) indicates the location of a resource on the Internet. A resource in a URL refers to a single file. In other words, it refers to the location of files such as documents, images, and videos that can be accessed on the web.
The format below to view the user’s profile photo (myphoto.jpg) is a URL, which can also be a URI.
https://www.bugbountyclub/profile/myphoto.jpg
So what about the format that allows you to view specific posts from a blog implemented in PHP as shown below?
https://www.bugbountyclub.com/blog.php?category_no=1&article_no=1
In the above format, URL and URI are used with different meanings.
Here, the URL extends to a PHP file called http://www.bugbountyclub.com/blog.php . And ?category_no=1&article_no=1 (this part is called the query string) used to identify a specific post stored in the backend data storage through the blog.php file is collectively called a URI.
URLs and URIs
In other words, URL is one of the forms belonging to URI, and URI can be said to be a larger concept.
As such, the meaning of URL and URI is slightly different. In this article, we will use the term URI, but if you are young and you do not need to use it separately, you can just use the URL.
What does the URI structure look like?
You’ve seen that URIs are used to identify and request resources across the web. So you need to understand what the structure of a URI looks like.
URIs are typically used optionally, following the format below:
Scheme://Username:Password@Host:Port/Path?Query#Fragment
Here’s what each component means:
- Scheme: Indicates which protocol will be used to request resources. For the web, HTTP and HTTPS are used, and protocols such as FTP and file are also used.
- Username: If the requested resource requires authentication, this refers to the user name to access the resource.
- Password: If the requested resource requires authentication, this refers to the user password to access the resource.
- Host: The computer (server) from which the client requests resources.
- Port: This refers to the port number for accessing a specific service on the web server. The web uses port 80 or 443.
- Path: refers to the path to the resource on the host.
- Query: Used when passing data to the web server in a GET request.
- Fragment: Used to scroll to a specific element within one HTML page.
The following is an example of URI classification according to the above format. Each part is distinguished and interpreted using the light purple shaded area as a separator.
Format of URI
How the web works and what it’s made of
Let’s look at an illustration of what happens when a user visits the Bug Bounty Club website.
When the user enters the URL (https://www.bugbountyclub.com) in the address bar of the web browser and moves to it, although not shown in the picture above, the web browser first retrieves the IP address of the entered web address from the DNS server. Find out. The web browser then requests a copy of the website from the web server via HTTP. The web server that received the request finds the web page (document) corresponding to the request in the running web application and sends it to the web browser as a response, and the web browser that receives the response displays the web page in the browser.
Here you can see the five components that make up the web.
- Web Client: Refers to the entity making the request, i.e. the user.
- Web Browser: Software used by users to send requests to a web server.
- HTTP (Hyper Text Transfer Protocol): A communication protocol (protocol) for information transmission through the web.
- Web Server: An entity that provides web pages corresponding to requests from web browsers.
- Web Application: An application that can be accessed through a web browser.
Let’s take a closer look at the components of the web.
web client
This refers to the user who makes a request using a web browser.
web browser
According to Wikipedia, the definition of web browser is:
” A web browser (or browser) is software for accessing information on the web. When a user requests a web page of a specific website, the web browser receives the necessary content from the web server and displays it on the user’s device. (Omitted below) “
In other words, it is a type of application used to visit a website, search for documents, and use various functions of the website.
Types of web browsers include Google’s Chrome, Apple’s Safari, Microsoft’s Edge, and Opera as of the time of writing this article, of which the most popular are currently available worldwide. The web browser with the highest market share is Google’s Chrome.
HTTP (Hyper Text Transfer Protocol)
HTTP is a communication protocol belonging to the 7th layer (Application Layer) of OSI 7 Layer for sending and receiving HTML documents on the web and is the core communication protocol of the web. HTTP follows the traditional client/server model and exchanges information through message-based requests and responses. In addition, HTTP has the characteristics of stateless and connectionless, which means that after the server sends a response to the client’s request, it terminates the connection without maintaining it and does not store any state.
HTTP/1.1
Due to these characteristics, web applications use sessions and cookies to track users, but we will discuss this later.
So what is HTTPS? |
HTTPS is just the first letter of Hyper Text Transfer Protocol Secure Socket Layer and can be thought of as HTTP with enhanced security through SSL. HTTP communication is characterized by the fact that end-to-end communication is not encrypted, making it vulnerable to man-in-the-middle attacks, while HTTPS communication is protected through encryption. For this reason, the use of HTTPS rather than HTTP is recommended these days, and most web applications are serviced through HTTPS. HTTP communicates through the TCP 80 port, and HTTPS communicates through the TCP 443 port, but the user can configure it as many times as necessary. Changes are possible. |
What is HTTP 2.0? |
HTTP 2.0 is a new version that improves on the limitations of HTTP 1.1. Unlike HTTP 1.1, which sends and receives requests and responses once for a single connection, it has the advantage of being able to process multiple requests and responses in parallel for a single connection. In addition, header compression can reduce unnecessary load by removing duplicate header values that exist in consecutive requests made in HTTP 1.1. |
HTTP request
A typical HTTP request is divided into four parts: request line, request header, blank line, and message body.
request line
The request line is the top line and consists of Request Method, Request-URI, and HTTP-Version separated by spaces as shown below.
Request Method {Space} Request URI {Space} HTTP Version
In the HTTP request shown in the example above, the content below becomes the request line.
P OST /account HTTP/1.1
- POST: HTTP request method
- /account: Request URI
- HTTP/1.1: HTTP version
There are the following types of HTTP request methods:
- OPTIONS: Used to determine the HTTP request method appropriate for the requested resource. The server responds to the client by listing the available headers in the Allow header.
- HEAD: Similar to a GET request, but responds without including a Body in the response. Used to determine in advance whether the requested resource exists.
- GET: Used when requesting a specific resource on a web server (also used when transmitting).
- POST: Used when transmitting resources to a web server (specific actions such as saving or changing). Mainly used in Form forms. (It is also used when making a request.)
- PUT: Used when uploading resources such as files to the server. It can be used by attackers to upload malicious script files to servers.
- DELETE: Deletes a specific resource on the server.
- TRACE: Performs a message loop-back test along the path of the target resource. Returns the request as is.
- CONNECT: Establishes a tunnel with the target server.
The most commonly used methods in web applications are GET and POST. You must be familiar with these two methods, and let’s take a closer look at the GET and POST methods.
GET request
For example, when a request is made to view a specific post on the Bug Bounty Club blog, the following request is sent to the web server. In other words, the user visited the page https://www.bugbountyclub.com/blog?category_no=1&article_no=1 through a web browser.
GET /blog?category_no=1&article_no=1 HTTP/1.1
Host: www.bugbountyclub.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0
Accept: text/html, application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
...Omitted...
If you look at the contents of the request line learned above, you can see that the method is GET, the request URI is /blog?category_no=1&article_no=1, and the HTTP version is 1.1. As you can see, this request simply reads the resource (post) that exists on the server, so it uses the GET method to retrieve the resource. One thing to note is that the request URI includes a parameter and a value (this is called the Query String ) to identify the resource: category_no=1&article_no=1. The? located in front of the query string is a delimiter to separate the query string, and the & used within the query string is used to separate each parameter. That is, in the example above, a request is sent to the web server with the two parameters category_no and article_no each having a value of 1. And one more thing you can check is that there is no message body area below the request line and request header area.
POST request
Let’s look at a case where a user logs in to a website that implements a form-based login method as follows.
When the user enters the login ID and password in the login form and clicks the Log In button, the web browser sends the following request to the web server.
POST /login HTTP/1.1
Host: www.bugbountyclub.comUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0
Accept: text/html,application/xhtml+xml,application /xml;q=0.9,image/webp,*/*;q=0.8 Content-Type: application/x-www-form-urlencoded Content-Length: 19
...omitted...
id=foo&password=bar
If you look at the request line, you can see that we are requesting the /login page using the POST method. Of course the HTTP version is 1.1. What is different from the GET request seen above is that the parameters and values corresponding to the login ID and password, id=foo&password=bar, are included in the message body area of the bottom line. And, you can see that this message body is divided into a request line, a request header area, and an empty line. Also, remember that the value of the Content-Type header in bold letters in the request header area is application/x-www-form-urlencoded and move on to the next step.
There are other forms of POST requests as well. In general, the form form has a different POST request type depending on the value given to the enctype attribute.
<form action="target" method="POST" enctype="some value" >
If the enctype attribute is omitted, the request is basically sent in the form we looked at first, but if enctype=”multipart/form-data” is specified, the following POST request is sent to the web server. For comparison, we applied the same login page as the example above with enctype=”multipart/form-data”.
POST /login HTTP/1.1
Host: www.codelivly.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary4nAP0jkXBQK2Owkk
Content-Length: 6671
...omitted...
------WebKitFormBoundary4nAP0jkXBQK2Owkk
Content-Disposition: form-data; name="id"
foo
------WebKitFormBoundary4nAP0jkXBQK2Owkk
Content-Disposition: form-data; name="password"
bar
------WebKitFormBoundary4nAP0jkXBQK2Owkk--
If you look at the Con tent-Type header, you can see that a series of values are assigned to the boundary, and the parameters in the message body area are distinguished through this boundary. This form is mainly used when uploading files.
One thing to keep in mind when developing web is that you must use the POST method when sensitive information is transmitted to the web server. If the above login page is implemented with the GET method, the following GET request will be made when the user logs in, and the user’s login ID and password will be exposed as is on the URL. Information exposed on URLs like this can be exploited by malicious attackers.
https://www.codelivy.com/login? id=foo&password=bar
request header
Basically, both requests and responses contain various information through headers. Since there are headers commonly used in requests and responses, let’s check them at once by looking at the HTTP response.
HTTP response
A typical HTTP response looks like this: Similar to HTTP requests.
HTTP response message
status line
The status line is the top line and consists of the HTTP version, status code, and reason, also separated by a space.
HTTP Version {Space} Status Code {Space} Reason
In the example, the line below is the status line, meaning it uses HTTP 1.1 version, and the status code and reason are 200 OK, showing that the request was successful.
HTTP/1.1 200 OK
- HTTP/1.1: HTTP version
- 200: status code
- OK: reason
status code
HTTP status codes are three-digit integer codes that can be broadly classified into five types based on the first number.
- 1xx : For simple information purposes only.
- 2xx : means the request was successful.
- 3xx : means redirect.
- 4xx : Indicates a client-side error.
- 5xx : Indicates a server-side error.
There are many status codes in each of the above categories, but the representative status codes below are frequently seen when testing web applications, so you must be familiar with them.
- 200 OK : means the request was successful.
- 201 Created : This means that the request was successful and a new resource was created. Used as a result of PUT and POST requests.
- 301 Moved Permanently : Indicates that the requested URI has been permanently changed. The changed URI is displayed in the Location header in response to the client.
- 302 Found : Indicates that the requested URI has temporarily changed. It also responds with the changed URI in the Location header.
- 400 Bad Request : This means that the client’s request could not be processed by the server due to a syntax error.
- 401 Unauthorized : This means that the client requests without authentication a request that requires authentication. The server responds with a www-authenticate header containing the authentication method.
- 403 Forbidden : This means that the client does not have permission to access the requested resource.
- 404 Not Found : This means that the resource requested by the client could not be found.
- 500 Internal Server Error : This means that the server cannot properly process the client’s request due to an error on the server side.
- 503 Service Unavailable : This means that the web server responds normally to the client’s request, but the running web application cannot respond.
For information about other status codes, see the Mozilla MDN Web Docs .
HTTP headers
Now let’s take a look at the HTTP headers that we’ve put off for a while.
HTTP requests and responses can exchange additional information through headers, and are separated by a colon (:), with the header name on the left and the value on the right, as follows.
Header Name: Value
HTTP headers can be classified into four categories depending on the context in which they are used.
General Header
This header is used in both requests and responses.
- Cache-Control : Specifies the caching mechanism for requests and responses.
- Connection : Determines whether to maintain the connection between the server and client after sending the request. It has one of the following values: keep-alive (maintain the connection) or close (close the connection).
- Date : Indicates the creation date and time of the HTTP message.
- Transfer-Encoding : Specifies the encoding format for secure entity transfer.
Entity Header
Used in requests and responses, this is a header related to the content in the message body area.
- Content-Encoding : Determines the encoding method to use for the content.
- Content-Language : Specifies the language for the user. If a web page implemented in English is served to Koreans, the header value may be ko-KR.
- Content-Length : Indicates the length of content in bytes.
- Content-Location : Indicates the location replacing the requested content. It is different from the Location header, which is one of the response headers.
- Content-Type : Indicates the type of content. MIME TYPEs such as text/html and application/json are used.
Request Header
Header used in HTTP requests.
- Accept : Indicates the type of content that the client can understand. MIME TYPE is also used.
- Accept-Encoding : Indicates what kind of encoding schemes the client can understand. The server chooses one of the values in this header and informs the client.
- Authorization : Used when transmitting user identification information to the server through HTTP authentication .
- Cookie : Used to send back to the server the value of the Set-Cookie header received from the server. It is used to identify users in web applications that use cookie-based session mechanism authentication.
- Host : Indicates the domain name and port of the server to which the request will be sent.
- If-Match : This is a header for a conditional request and has the Etag value (response header) of the resource provided by the client from the web server. If the Etag value sent in the If-Match header matches the Etag value of the web resource stored on the server, the request is successful.
- If-None-Match : Has an Etag value like the If-Match header. If the Etag value included in this header matches the Etag value of the web resource stored on the server, it instructs to use the cached resource. If it does not match, the existing web resource is received again from the server.
- If-Modified-Since : Used by the caching mechanism to ensure that the cached resource matches the latest version (date information in Last-Modified) stored on the server. Similar to If-None-Match.
- If-Unmodified-Since : The request is accepted if the date information included in this header is more recent than the resource’s Last-Modified information stored on the server.
- Referer : Indicates which page the current request is being sent from. In other words, it contains the URL value immediately before the current request occurred.
- User-Agent : Indicates information such as the user’s browser type, version, and operating system.
Response Header
Headers used in HTTP responses.
- Access-Control-Allow-Origin : Determines which hosts can share cross-domain resources through CORS (Cross Origin Resource Sharing).
- Etag : Used by the caching mechanism to identify the version of the resource.
- Expires : Indicates the resource’s caching expiration date. The client will use the client’s copy until the date and time indicated in this header.
- Location : Indicates the URL to redirect the request to.
- Pragma : Used using the no-cache directive to validate a cached copy with the server before serving it to the client.
- Server : Contains information such as type and version of software used as a web server.
- Set-Cookie : Used when creating a cookie and sending it to the client. Afterwards, the client automatically sends this value to the server through the Cookie header every time it makes a request.
- WWW-Authenticate : Defines the authentication method that should be used to access the requested resource.
- X-Frame-Options : Determines whether the responded resource can be included in the form of a frame in another web page through frame-related tags, etc. Used to defend against clickjacking attacks.
web server
A web server is software or hardware (computer) that statically or dynamically provides web resources requested by a web browser through HTTP.
Here we will look at it from a software perspective. Please refer to the definition of web server described in Wikipedia .
” A web server is server software or dedicated hardware for running this software that can serve requests from clients on the World Wide Web. A web server can typically contain one or more websites. A web server supports HTTP and many other The main function of a web server is to store, process , and deliver web pages to clients using the Hypertext Transfer Protocol (HTTP). Pages served by are most commonly HTML documents and may contain images, style sheets, and scripts in addition to text (omitted below)” – Source: Wikipedia .
As you can see, a web server is also a type of software, so it runs on an operating system such as Linux or Windows. Most web servers support server-side scripting functions such as PHP or ASP.
Types of web servers include Apache HTTP Server , NGINX , IIS (Internet Information Service) , Node.js (itself has a built-in web server), and GWS (Google Web Server) .
web application
It is an application that a web client (user) can access and use through a web browser, and runs on a web server. It is also called a web app for short. Web apps have the advantage of being able to be accessed and used from anywhere with just a web browser without the need to install a separate program on the local computer. Of course, some web apps run only on specific browsers, but most run regardless of the type of web browser.
Representative web apps include online shopping malls, online banking, and email programs such as Gmail, as well as programs for creating word, presentation, and spreadsheets.
Web application vs website |
Strictly speaking, a web application is implemented interactively with the user, operates dynamically in response to user requests, and performs various functions, while a website simply provides a number of static pages that are not interactive. However, it is true that most websites these days have implemented functions that receive and process user input such as search and comments, so the boundaries have become blurred. |
That’s all. Have a nice day, everyone!
❤️ If you liked the article, like and subscribe to my channel “Codelivly”.
👍 If you have any questions or if I would like to discuss the described hacking tools in more detail, then write in the comments. Your opinion is very important to me!