Important
This assignment must be completed individually. The solution submitted for grading must be yours. Please refer to the Academic Integrity section of our class syllabus.
Overview of Domain Name Resolution
The Domain Name database is perhaps the largest distributed database on the Internet. As of 2012, it was estimated about 10 million DNS servers on the Internet, each manages a minute fraction of the entire distributed database. Collectively, millions of these servers work seamlessly and are accessible via a query/response mechanism as specified by the DNS protocol provided in the following documents:
- RFC1034: Domain Names - Concepts and Facilities
- RFC1035: Domain Names - Implementation and Implementation
Logically, all these million servers form a 3-level tree of nodes, referred to as the domain name space in RFC1034. These nodes are structured as follows:
- Root Name Servers (13 nodes) which maintain DNS records of TLD name servers
- TLD Name Servers (approximately 1500+ nodes) which maintain DNS records of Authoritative name servers for the TLD domains (.com, .org, .net, .edu, .int, .gov, .mil, .app, .art, .audio, ...)
- Authoritative Name Servers (10M+ nodes per 2012 estimate) which maintain DNS records of a particular domain, and usually the last node consulted in the journey of resolving a domain name of its IP address(es)
DNS Resolvers and DNS Records
A DNS resolver is a (client) program that consults one or more DNS servers in order to resolve a domain name (such as www.gvsu.edu) to IP4 address (104.17.88.18) or IP6 address (2606:4700::6811:5812). Each DNS server typically stores many types of records, but for this assignment only the following types matters:
| Type | Description |
|---|---|
| A | IPv4 address of a domain |
| AAAA | IPv6 address of a domain |
| NS | a name server associated with a domain |
| CNAME | a canonical name (alias) of a domain |
Name Resolution Algorithm
As explained in lecture, domain name resolvers can operate either recursively or iteratively. More technical explanation on the two modes can be found in Section 4.3.1 RFC1034.
To resolve a domain name, for instance www.cis.gvsu.edu, an iterative client must consult at least three different servers (one at each level):
Send what-is-NS(
edu) to one of the root servers which will respond with a list of TLD servers.TIP
Since we have only 13 root servers on the entire internet, you can hardcode the IP4 address of one of them in your program.
Send what-is-NS(
gvsu.edu) to one of the edu TLD servers which will respond with a list of authoritative serversSend what-is-A(
www.cis.gvsu.edu) to one of the authoritative server of gvsu.edu, which SHOULD response with a list of IPv4 address
The same 3-step sequence also applies to shorter domain names, such as github.io, google.com, or gvsu.edu:
- Send what-is-NS(
io) to one of the root servers which will respond with a list of TLD servers - Send what-is-NS(
github.io) to one of the.ioTLD servers which will respond with a list of authoritative servers - Send what-is-A(
github.io) to one of the authoritative server ofgithub.io, which SHOULD response with a list of IPv4 address
WARNING
When sending an A-type question to a name server (step 3 in the two examples above), ideally a resolver program receives an A-type answer (IP4 address) and this completes the IP address resolution. However, the response from a name server may be a different type:
- An NS-type answer containing the details of (alternate) name servers, in which case your resolver program must resend the A-type question to the said name server
- A CNAME-type answer containing a host alias in which case your resolver program must start over as if the user has entered a new domain name
TIP
Section 3.1 of RFC1034 the subdomains "edu", "gvsu.edu", ... are called a label
Section 4.3.2 of RFC1034 give additional explanation of the resolution algorithm.
Host Name Aliases
Resolving a domain name which has (many) aliases requires more work. For instance,
lms.gvsu.eduisgvsu.blackboard.com- but
gvsu.blackboard.comis an alias tolearn-prod.<...>.amazonaws.com
In this scenario, you resolver has to perform the following steps:
- Send what-is-NS(
edu) to one of the edu root servers - Send what-is-NS(
.gvsu.edu) to one of the.eduTLD servers - Send what-is-A(
lms.gvsu.edu) to one of the authoritative servers ofgvsu.edu. At this point you expect A-type response, but instead your receive CNAME response (gvsu.blackboard.com) - Send what-is-NS(
com) to one of the edu root servers - Send what-is-NS(
.blackboard.com) to one of the.comTLD servers - Send what-is-A(
gvsu.blackboard.com) to one of the authoritative servers. You expect A-type response, but received another CNAME response (learn-prod.<...>.amazonaws.com) - Send what-is-NS(
com) to one of the edu root servers - Send what-is-NS(
.amazonaws.com) to one of the.comTLD servers - Send what-is-A(
learn-prod.<...>.amazonaws.com) to one of the authoritative servers. At this point, you should receive an A-type response
TIP
Refer to Section 5.2.2 of RFC1035 for more details on working with aliases.
Starter Code
The following starter code uses functions and classes provided by the dnslib module. You may have to first install the module by typing the following command in your terminal:
pip install dnslibEXPAND THIS PANEL TO SHOW CODE
[Gist]API rate limit exceeded for 52.20.27.197. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
Important
You have to modify the get_dns_record() function in the starter code (its parameters, return value(s), overall structure, etc.) to match the overall design of your program. The starter code is essentially a demonstration of how to use the dnslib module.
Nevertheless,
DO NOT modify the following code at line 12:
q.header.rd = 0 # Do not recurseThe
rdflags stands for "recursion desired". Since we are implementing iterative queries, we have to force the remote name servers not to recursively resolve the domain name.DO NO modify the order of the for-loops towards the end of the function
Using repr() in Python
Using the repr() Python function throughout the code is a useful technique for debugging the internal representation of a Python object. For instance, the following statement shows the name of the fields in a Python object.
print(x). # print x as a string
print(repr(x)). # print the internal object structuore of x<DNS RR: 'a.edu-servers.net' rtype=A, rclass=IN ttl=172800 rdata='192.5.6.30'>indicates that the RR object has the rtype, rclass, ttl, rdata properties which can be accessed from your code as follows:
# Line 49-51 of the starter code
a = RR.parse(____)
if a.rtype == QTYPE.A:TIP
Lines 47-58 of dns.py on GitHub shows other symbolic constants for QTYPE.
Running The Starter Code
TIP
It is strongly recommended that you keep Wireshark running and apply the filter tcp.port == 53 to monitor the communication between your program and various domain name server(s).
The starter code uses a hardcoded address of the ICANN root serverl.root-servers.net at IP 199.7.83.42. When the program failed due to timeout, try a different alternate root server. The first call to get_dns_record()
get_dns_record(sock, "edu", ROOT_SERVER, "NS")produces the following output that shows all the available TLD servers for edu top-level domain:
DNS query <DNS Header: id=0x4eb3 type=QUERY opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=0 ar=0>
<DNS Question: 'edu.' qtype=NS qclass=IN>
DNS header <DNS Header: id=0x4eb3 type=RESPONSE opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=13 ar=12>
Question-0 <DNS Question: 'edu.' qtype=NS qclass=IN>
Authority-0 <DNS RR: 'edu.' rtype=NS rclass=IN ttl=172800 rdata='a.edu-servers.net.'>
Authority-1 <DNS RR: 'edu.' rtype=NS rclass=IN ttl=172800 rdata='b.edu-servers.net.'>
... more output
Additional-0 <DNS RR: 'a.edu-servers.net.' rtype=A rclass=IN ttl=172800 rdata='192.5.6.30'> Name: a.edu-servers.net.
Additional-1 <DNS RR: 'a.edu-servers.net.' rtype=AAAA rclass=IN ttl=172800 rdata='2001:503:a83e::2:30'> Name: a.edu-servers.net.
Additional-2 <DNS RR: 'b.edu-servers.net.' rtype=A rclass=IN ttl=172800 rdata='192.33.14.30'> Name: b.edu-servers.net.
Additional-3 <DNS RR: 'b.edu-servers.net.' rtype=AAAA rclass=IN ttl=172800 rdata='2001:503:231d::2:30'> Name: b.edu-servers.net.
...Explanations of the output:
The
rcode=NOERRORindicates that the query was successfully executed. Section 4.1.1 of RFC 1035 shows other possible numeric values ofrcode. In thednslibmodule, these values are symbolically represented as a Python bi-directional map Lines 66-69 ofdns.py.The last four numbers on line 3 (q=1, a=0, ns=13, ar=12) indicates that the DNS response includes
- 1 query (in the Question section)
- 13 resource records in the Authority section, each record has type NS (Name Server). Specifically, it shows the available
.eduTLD servers (obtained by consulting the ROOT server 199.7.83.42) - 12 resource records in the Additional Records section, each record has type A (IPv4 address) or AAAA (IPv6 address)
Line 4 of the output is printed from the Question section (the for-loop at lines 38-41 of the starter code)
There are no records from the Answer section (the for-loop at lines 48-52 of the starter code)
Lines 5-7 are printed from the Authority section (the for-loop at lines 54-57 of the starter code)
Lines 9-12 are printed from the Additional Records section which shows the associated IP address of each name server (from the for-loop at lines 60-62 of the starter code)
Query Failures
The last two function calls are provided to show what happened when the server cannot find the answer to your queries:
get_dns_record(sock, "gvsu.edu", "8.8.8.8", "NS") # (1)
get_dns_record(sock, "www.gvsu.edu", "8.8.8.8", "A") # (2)produce the following output that indicates "Server Failure" (rcode=SERVFAIL)
DNS query <DNS Header: id=0xe286 type=QUERY opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=0 ar=0>
<DNS Question: 'gvsu.edu.' qtype=NS qclass=IN>
DNS header <DNS Header: id=0xe286 type=RESPONSE opcode=QUERY flags=RA rcode='SERVFAIL' q=1 a=0 ns=0 ar=0>
Query failed
DNS query <DNS Header: id=0xbc6a type=QUERY opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=0 ar=0>
<DNS Question: 'www.gvsu.edu.' qtype=A qclass=IN>
DNS header <DNS Header: id=0xbc6a type=RESPONSE opcode=QUERY flags=RA rcode='SERVFAIL' q=1 a=0 ns=0 ar=0>
Query failedRefer to Section 4.1.1 of RFC1035 for a complete list of error codes.
DNS Messages
Both DNS clients and servers communicate using message of the same format; query and response messages include the following five sections:
- Header (always present, 96 bytes in size)
- Question
- Answer: resource records answering the question
- Authority: resource records pointing toward an authority
- Additional: resource records holding additional information
Sections 2-5 (after the header) have varying size may not be present. If they do, their order will always follow the above sequence. For instance,
- The authority section will never show up BEFORE the answer section.
DNS Resource Records
When viewed as a conventional SQL database, the entire domain name database is like a giant table of SQL records. Each DNS record (which shows up in the Answer, Authority, and Additional sections above) defines the following six fields/columns (see RFC1034 Section 3.6 for more details):
Owner: the domain name associated with this record
Type: type of record. For this assignment, only the A, AAAA, NS, and CNAME are relevant to your client (see RFC1035 Section 3.2.2 for more details)
Class: not relevant for this assignment
TTL: time to live (in seconds), not relevant for this assignment
RD Length: length of the RDData field
RDATA: extra data whose content depends on the record type
Type RDATA content Description A 32-bit IPv4 address The record includes IPv4 address NS Host name The record includes the host name of a name server CNAME Domain name The record includes an alias name of a domain
Caching
Notice that in the 9-step example above, steps 4 and 7 sent the same queries. If the result of step (4) is saved into a cache within your resolver program, step 7 can be avoided. Hence, the name resolution can be made faster.
Likewise, after the first time resolving www.cis.gvsu.edu, your resolver program should have keep the result from what-is-NS(gvsu.edu). In which case, steps (1) and (2) in the above 9-step example can be skipped.
To take advantage of cache copies, your client should check the available cache copy from the longest subdomain first. For instance, when resolving lms.gvsu.edu your client should:
- First check if a cached copy of
lms.gvsu.eduIP address is available - Otherwise, check if a cached copy of
gvsu.eduname server IP address is available - Otherwise, check if a cached copy of the
.eduname server IP address is available - Otherwise, consult one of the root name servers and so on.
Program Requirements
In this assignment you are to develop an iterative name resolver client. The domain names to resolve will be typed directly by the user. So, you are NOT writing a "local" name server that receives incoming request from "local" programs. Your program will only send queries to other name servers and handle the corresponding responses.
Important
The goal of this assignment is to develop a program which communicates with various name servers (at the three levels in the namespace tree). Not just obtaining the IPv4 address of a given host/domain name, which can be done by a one line function call
import socket
ip_addr = socket.gethostbyname("computing.gvsu.edu")Your program is required to include code for sending/receiving DNS messages using a UDP socket, as well as parsing the incoming DNS messages.
Implement an iterative domain name resolver in Python (3.12 or newer). Your code shall be designed to use
whileloop(s) that send DNS messages and parse their corresponding responses. Avoid code bloat as the result of copying chunk of code multiple times. Instead, organize them into function(s).RFC 1034 specifies that the protocol can be implemented using either TCP or UDP. For this assignment, your client shall use a UDP socket. Specifically, create only one socket, design the program to run in a loop prompting the user to enter domain name. The loop stops when the user types
.exitpythonif __name__ == '__main__': sock = socket(AF_INET, SOCK_DGRAM) sock.settimeout(2) while True: domain_name = input("Enter a domain name or .exit > ") if domain_name == '.exit': break while _____: # Use the function get_dns_record(____) (from the starter code # below) to resolve the IP address of the domain name in question sock.close()Your client shall print the following informational output:
- Which TLD name server(s) are consulted, whether it is obtained from the cache or from a root server
- Which Authoritative name server(s) are consulted, whether it is obtained from the cache or from a TLD name server
- The IP address resolution of the domain name in question, whether it is obtained from the cache or from an authoritative name server
- When a domain name has an alias
WARNING
Also, remove unnecessary debugging output
Your client shall print sufficient error message when the user queried a non-existing domains
When the user types a domain which has an associated alias, your client shall continue to "follow" the alias until it eventually resolves to an IP address.
IMPORTANT
Your client should be designed to handle theoretically unlimited chain of aliases.
Add additional logic to consult alternative name servers if one shows no response after a timeout interval
Use appropriate Python data structures (dictionary/map) to implement caching strategy that stores both the name servers address and the resolved IP addresses.
- Use the
.listcommand to show cache. The output should be numbered starting from 1, the number will be used by the.removecommand below - Use the
.clearto remove all cache copy - Use the
.remove Nto delete a specific cache copy whereNis an integer shown in the.listoutput above. Implement error handling when the user attempts to remove non-existing cache copy (Nis non-positive or too big)
- Use the
IP addresses shall be displayed in dot-decimal notation i.e. 175.23.28.184
Your program should be able to resolve domain name with any length of subdomains
- github.io
- www.gvsu.edu
- 8ck2mf8.x.incapdns.net
- some-random-name651234.with.many-dots.here.and-there.app
Extra Credit Options
- (2 pts) In addition to resolving to IPv4 address(es), also resolve the domain name to IPv6 address(es)
- (3 pts) When multiple name servers are available and one of the servers is non-responsive, resend queries to alternate name servers exhaustively
- (4-8 points) Implement the iterative domain name resolver without any external DNS module (dnslib, dnspython, ...). Warning: if you decide to take this challenge, keep two separate copies of your program: one that uses
dnsliband another withoutdnslib. A separate handout has been provided to give you some idea on parsing DNS messages using your own Python code.WARNING
To be eligible for the minimum extra credit (4 points) you code must be able to parse the incoming DNS response messages and prints the correct information. The extra credit will not be given for "effort".
Grading Rubrics (Tentative)
| Feature | Point |
|---|---|
| Client accepts multiple user input | 2 |
Handling of .exit command | 2 |
| Fetch and parse DNS records from Root Name Servers | 3 |
| Fetch and parse DNS records from TLD Name Servers | 3 |
| Fetch and parse DNS records from Authoritative Name Servers | 3 |
| Resolve Domain Name to IP4 address(es) | 3 |
| Resolve Domain Alias(es) to IP4 address(es) | 5 |
| Correct lookup order of cached information | 2 |
| Skip appropriate name server(s) when details are found in cache | 3 |
Handling of .list to show cache | 2 |
Handling of .clear to clear cache | 2 |
Handling of .remove to remove a specific cache entry | 2 |
| Show informational output on the source of address resolution (cache or query) | 2 |
| Show error message on non-existing domain names | 2 |
| Penalty of code bloat due to poor code design/organization | -4 max |
| (Extra) Resolve domain name to both IPv4 and IPv6 | 2 |
| (Extra) Use multiple name servers exhaustively | 3 |
(Extra) Implementation without dnslib | 4-8 |