1 Goal
The project is to build a tool at web client to collect information regarding a web server. The purpose of this project is twofold:
to provide students with hands-on experience with socket programming in Python,
to help students understand the application-layer protocols HTTP/HTTPs. Note that HTTPs is not a standalone protocol, but instead, it is HTTP over Transport Layer Security (TLS). In this assignment, your main focus is HTTP, not TLS.
2 Background
2.1 HTTP
HTTP stands for Hyper Text Transfer Protocol and is used for communication for web services. The web client initiates a conversation by opening a connection to a web server. Once a connection is set up, the client sends an HTTP request to the web server. The server sends an HTTP response back to the client. An HTTP request consists of two parts: a header and a body. Whether a body follows a header or not is specified in the header. Using single-line header of HTTP request as an example, the first line of any request header should be: the method field: The method field can take on several different values, including GET, POST, HEAD, and so on. the URL field: It is the field to identify a network resource, e.g., “http://www.csc.uvic.ca/index.html”. the HTTP version field The response from a server also has two parts: a header and a body. The first line of a header should be: the HTTP version field, the status code field,
the phrase field.
Two main status codes include 200 and 404. The status code 200 means that the request succeeded and the information is returned in the response. The status code 404 means that the requested document does not exist on this server. Two example response messages are: “HTTP/1.0 404 Not Found " and “HTTP/1.0 200 OK data data data ...” Another two status codes 505: “HTTP Version Not Supported”, and 302: “302 found” for URL redirection are also useful for this assignment.
2.2 URI
URI stands for Uniform Resource Identifier and is also known as the combination of Uniform Resource Locators (URL) and Uniform Resource Names (URN). It is a formatted string that identifies a network resource. It generally has the format:protocol://host[:port]/filepath. When a port is not specified, the default HTTP port number is 80, and the default HTTPS port number is 443.
2.3 Cookies
An HTTP cookie is a small piece of data that a server sends to the user’s web browser. The browser may store it and send it back with the next request to the same server. Typically, it’s used to tell if two requests came from the same browser— keeping a user logged-in, for example. It remembers stateful information for the stateless HTTP protocol. Cookies have many applications in web, such as tracking, authentication, and web analytics. Due to this reason, cookies also cause many concerns about security and privacy breach.
The textbook includes simple introduction on cookies. More detailed information could be found at: https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies. Python includes dedicated modules to handle Cookies: https://docs.python.org/3/library/http.cookies.html. Nevertheless, you are no allowed to use this package because it defeats the purpose of this assignment: understanding the nuts and bolts of HTTP.
3 Project Description
You are required to build a smart web client tool, called SmartClient, in Python. Note that for consistency, program in other languages will not be accepted! Given the URL of a web page, your SmartClient needs to find out the following information regarding the web server: 1. whether or not the web server supports http2, 2. the cookie name, the expire time (if any), and the domain name (in any) of cookies that the web server will use, 3. whether or not the requested web page is password-protected. Your program first accepts URI from stdin and parses it. Then it connects to a server, sends an HTTP request, and receives an HTTP response. You should also implement a routine that prints out the response from the server, marking the header and the body. When you finish the client, you can try to connect to any HTTP server. For instance, type “www.uvic.ca” as the input to the client program and see what response you get. As an example output, after you run your code with
% python SmartClient.py www.uvic.ca
Your SmartClient may output the received response from the server (optional), e.g.,
---Request begin---
GET http://www.uvic.ca/index.html HTTP/1.1
Host: www.uvic.ca
Connection: Keep-Alive
---Request end---
HTTP request sent, awaiting response...
---Response header ---
HTTP/1.1 200 OK
Date: Tue, 03 Jan 2017 22:42:27GMT
Expires: Thu, 23 Nov 2017 08:52:00GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: SESSID_UV_128004=VD3vOJhqL3YUbmaZSTJre1; path=/; domain=www.uvic.ca
Set-Cookie: uvic_bar=deleted; expires=Thu, 04-Jan-2018 00:00:01 GMT; Max-Age=0; path=/;
Keep-Alive: timeout=5, max=100
Connection: close
Content-Type: text/html; charset=UTF-8
Set-Cookie: www_def=2548525198.20480.0000;path=/
Set-Cookie: TS01a564a5=0183e07534a2511a2dcd274bee873845d67a2c07b7074587c948f80a42c427b1f7ea
Set-Cookie: TS01c8da3c=0183e075346a73ab4544c7b9ba9d7fa022c07af441fc6214c4960d6a9d0db2896;
Set-Cookie: TS014bf86f=0183e075347c174a4754aeb42d669781e0fafb1f43d3eb2783b1354159a9ad8d81f7
--- Response body ---
Body Body .... (the actual content)
Note that some lines in the above output were truncated. Your code may need to send multiple requests in order to find out the required information. In particular, if you get an HTTP response with code 302 or 301, you need to send further HTTP requests to the new URI provided by the Location header. Your code should output the final results(mandatory), for example: website: www.uvic.ca 1. Supports http2: no 2. List of Cookies: cookie name: SESSID_UV_128004, domain name: www.uvic.ca cookie name: uvic_bar, expires time: Thu, 04-Jan-2018 00:00:01 GMT; domain name: .uvic.ca cookie name: www_def,
cookie name: TS01a564a5
cookie name: TS01c8da3c, domain name: www.uvic.ca
cookie name: TS014bf86f, domain name: .uvic.ca
3. Password-protected: no
Note that the above output may be outdated and does not necessarily reflect the
ground truth of the current configuration of www.uvic.ca.
3.1 Other Notes
1. Regarding other printouts: Anything not specified in Assignment 1 is optional. For example, you can decide whether or not to print out the IP address, port number, and so on. When TAs test your code, if your code works fine without any problem, you are fine even if you do not printout anything not required in Assignment 1. Nevertheless, if your code does not ground truth of the current configuration of end time to figure out what is wrong and you get a zero mark on the required function (Refer to the table in Section 5 of Assignment 1). In this case, if your code includes some printout to show intermediate results, TAs will have an idea of how far you have achieved and give you some partial mark based on their judgement.
2. Regarding the Python network/http packages: those that you are allowed to use include:
socket, ssl, re, time, os, sys, select, threading, etc; those that you are not allowed to use
include: httplib, TCPClient, hyper, requests, and other similar third-party packages, since
these packages define classes that already implement the client side of the HTTP and HTTPS protocols. By calling the functions in these packages, you essentially ask Python to do all the jobs for you. This defeats the purpose of a network course.
3. Regarding the readme file: Readme file is important. Without it TAs will not know how to compile your code and how to run your code. It would waste our time to deal with your complaint if TAs cannot run your code and give you a zero.
4. For more information on HTTP, HTML, URI, etc., please refer to http://www.w3.org. It is
the home page of W3 Consortium, and you will find many useful links to subjects related to the World Wide Web.
Sample:
Concepts:
python
Computer Communications and Networks
IP header parsing
If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free contact us.
Comments