Important
This assignment must be completed individually. The solution submitted for grading must be yours. Please refer to the Academic Integrity section of our class syllabus.
Overview of Domain Name Resolution
In terms of geographical coverage, the Domain Name database is perhaps the largest distributed database on the Internet. As of 2012, it was estimated about 10 million DNS servers on the Internet, each manages a minute fraction of the entire distributed database. After more than a decade later, the number must have increased significantly. Collectively, millions of these servers work seamlessly and are accessible via a query/response mechanism as specified by the DNS protocol provided in the following documents:
- RFC1034: Domain Names - Concepts and Facilities
- RFC1035: Domain Names - Implementation and Implementation
Logically, all these million servers form a 3-level tree of nodes, referred to as the domain name space in RFC1034. These nodes are structured as follows:
- Root Name Servers (13 nodes) which maintain DNS records of TLD name servers
- TLD Name Servers (approximately 1500+ nodes) which maintain DNS records of Authoritative name servers for the TLD domains (.com, .org, .net, .edu, .int, .gov, .mil, .app, .art, .audio, ...)
- Authoritative Name Servers (10M+ nodes per 2012 estimate) which maintain DNS records of a particular domain, and usually the last node consulted in the journey of resolving a domain name of its IP address(es)
DNS Record
The distributed Domain Name Database stores many types of records, but for this assignment only the following types matters:
- Type A: is associated with IP4 address
- Type AAAA: is associated with IP6 address
- Type NS: is associated with details of a name server
- Type CNAME: is associated with host alias
Name Resolution Algorithm
As explained in lecture, domain name resolvers can operate either recursively or iteratively. More technical explanation on the two modes can be found in Section 4.3.1 RFC1034.
To resolve a domain name, for instance www.cis.gvsu.edu
, an iterative client must consult at least three nodes (one at each level):
- First, the client (C) sends an NS-type question to one of the 13 root servers to provide the details of the name servers which manage the
.edu
top-level domain. Upon a successful query, the selected root server returns a list of name servers L1. - Next, the client (C) selects one name name server X from the list L1
- Next, the client (C) sends X an NS-type question to provide the details of the name servers which manage the
.gvsu.edu
domain. Upon a successful query, X returns a list of name servers L2. - Next, the client (C) selects one name servers Y from the list L2
- Next, the client (C) sends Y an A-type question to resolve the IP address of
www.cis.gvsu.edu
. Upon a successful query, Y returns a list of IP4 addresses
In total, the above address resolution procedure requires at least three DNS queries and three DNS responses.
TIP
Section 3.1 of RFC1034 specifies that each node is associated with a label (which is the subdomain derived from the full domain name). For instance, in the above scenario edu
becomes the label associated with node X, and gvsu.edu
associated with node Y.
Section 4.3.2 of RFC1034 give additional explanation of the resolution algorithm.
The same sequence also applies to shorter domain names, such as github.io
:
- First, the client (C) sends an NS-type question to one of the 13 root servers to provide the details of the name servers in charge of the
.io
top level domain. Upon a successful query, the root server return a list of name servers (L1) - Next, the client (C) selects one name server (R) from L1 and sends it an NS-type question to provide the details of the name servers in charge of the
.github.io
domain. Upos a successful query, R returns a list of name servers (L2) - Next, the client (C) selects one name server (S) from L2 and sends it an A-type question to resolve the IP address of
github.io
Important
In both scenarios (for resolving longer name www.cis.gvsu.edu
and shorter name github.io
), the client sends:
- two NS-type questions and
- one A-type question
to three different name servers.
Record Types of NS-Type Questions
When sending an NS-type question to a name server, the answer from the name server may include both the name and IP address of the servers, in which case your client shall use the IP address for subsequent queries. However, sometimes the answer includes only the name (with no associated IP address) of the name servers. Fortunately, the dnslib
(used in the starter code) works fine when you supply the name of the server.
Response Types of A-Type Questions
When sending an A-type question to a name server, ideally your client gets an A-type answer and this completes the IP address resolution. However, occasionally, a name server may respond with:
- An NS-type answer containing the details of (alternate) name servers, in which case your client to resend the A-type question to the said name server
- A CNAME-type answer containing a host alias (explained in the next section below)
Host Name Aliases
The DNS distributed database permits host names to have aliases. This feature helps Internet users to use easy to remember names instead of longer and cryptic names. More importantly, it also enables virtual hosting to be decoupled from the name of the domain (company) which serves the virtual host. The following example shows that:
- our GVSU Bb Ultra is hosted at Blackboard (as
gvsu.blackboard.com
) - and Blackboard itself runs on the Amazon Web Services.
Practically, all access to lms.gvsu.edu
must be redirected to AWS. And the "redirection" is magically implemented by creating a host name alias of lms.gvsu.edu
to the hostname on AWS.
A longer sequence may be required if the domain name to be resolved involves one or more aliases. For instance, lms.gvsu.edu
is an alias of gvsu.blackboard.com
which is also an alias of learn-prod.<...>.amazonaws.com
. In this aliasing scenario, during the third step in the above sequence, the .gvsu.edu
authoritative server will respond with the alias(es), instead of the IP address. Hence, the resolution sequence will typically go as follows:
- First, the client (C) queries one of the 13 root servers for
.edu
name servers (X) - Next, C queries X for
.gvsu.edu
name servers (Y) - Next, C queries Y to resolve the IP address of
lms.gvsu.edu
but Y returns the aliasgvsu.blackboard.com
- Next, C queries one of the 13 root servers for
.com
name servers (G) - Next, C queries G for
blackboard.com
name servers (H) - Next, C queries H to resolve the IP address of
gvsu.blackboard.com
but H returns the aliaslearn-proc.<...>.amazonaws.com
- Next, C queries one of the 13 root servers for
.com
name servers (G) - Next, C queries G for
amazonaws.com
name servers (Z) - Next, C queries Z to resolve the IP address of for
learn-prod.<...>.amazonaws.com
TIP
Refer to Section 5.2.2 of RFC1035 for more details on working with aliases.
Caching
Without caching enabled, the above sequence requires a total of 18 DNS messages (9 queries + 9 responses). With caching enabled, step (7) above can be skipped the step (8) will (re)use the IP address of (G) obtained from step (4). Furthermore, if prior to resolving lms.gvsu.edu
your client had resolved www.cis.gvsu.edu
and the IP address of name resolver for gvsu.edu
is in the cache, steps (1) and (2) above can be skipped.
Caching can also be applied to the host IP address as well such that repeated queries can be resolved directly from the cache without even consulting any external name servers. For instance, when the first query to 'lms.gvsu.edu' has been resolved. Repeated queries to lms.gvsu.edu
can be resolved internally from your client cache copy.
To take advantage of cache copies, your client should check the available cache copy from the longest subdomain first. For instance, when resolving lms.gvsu.edu
your client should:
- First check if a cached copy of
lms.gvsu.edu
IP address is available - Otherwise, check if a cached copy of
gvsu.edu
name server IP address is available - Otherwise, check if a cached copy of the
.edu
name server IP address is available - Otherwise, consult one of the root name servers and so on.
DNS Messages
Both DNS clients and servers communicate using message of the same format; query and response messages include the following five sections:
- Header (always present, 96 bytes in size)
- Question
- Answer: resource records answering the question
- Authority: resource records pointing toward an authority
- Additional: resource records holding additional information
The other sections after the header may not be present, but if they do, their sized vary and must be placed in the order specified above.
DNS Resource Records
When viewed as a conventional SQL database, the entire domain name database is like a giant table of SQL records. Each DNS record (which shows up in the Answer, Authority, and Additional sections above) defines the following six fields/columns (see RFC1034 Section 3.6 for more details):
Owner: the domain name associated with this record
Type: type of record. For this assignment, only the A, AAAA, NS, and CNAME are relevant to your client (see RFC1035 Section 3.2.2 for more details)
Class: not relevant for this assignment
TTL: time to live (in seconds), not relevant for this assignment
RD Length: length of the RDData field
RDATA: extra data whose content depends on the record type
Type RDATA content Description A 32-bit IPv4 address The record includes IPv4 address NS Host name The record includes the host name of a name server CNAME Domain name The record includes an alias name of a domain
Program Requirements
In this assignment you are to develop an iterative name resolver client. The domain names to resolve will be typed directly by the user. So, you are NOT writing a "local" name server that receives incoming request from "local" programs. Your program will only send queries to other name servers and handle the corresponding responses.
Important
The goal of this assignment is to develop a program which communicates with various name servers (at the three levels in the namespace tree). Not just obtaining the IPv4 address of a given host/domain name, which can be done by a one line function call
import socket
ip_addr = socket.gethostbyname("computing.gvsu.edu")
Your program is required to include code for sending/receiving DNS messages using a UDP socket, as well as parsing the incoming DNS messages.
Implement an iterative domain name resolver in Python (3.12 or newer). Your code shall be designed to use
while
loop(s) that send DNS messages and parse their corresponding responses. Avoid code bloat as the result of copying chunk of code multiple times. Instead, organize them into function(s).RFC 1034 specifies that the protocol can be implemented using either TCP or UDP. For this assignment, your client shall use a UDP socket. Specifically, create only one socket, design the program to run in a loop prompting the user to enter domain name. The loop stops when the user types
.exit
pythonif __name__ == '__main__': sock = socket(AF_INET, SOCK_DGRAM) sock.settimeout(2) while True: domain_name = input("Enter a domain name or .exit > ") if domain_name == '.exit': break while _____: # Use the function get_dns_record(____) (from the starter code # below) to resolve the IP address of the domain name in question sock.close()
Your client shall print the following informational output:
- Which TLD name server(s) are consulted, whether it is obtained from the cache or from a root server
- Which Authoritative name server(s) are consulted, whether it is obtained from the cache or from a TLD name server
- The IP address resolution of the domain name in question, whether it is obtained from the cache or from an authoritative name server
- When a domain name has an alias
WARNING
Also, remove unnecessary debugging output
Your client shall print sufficient error message when the user queried a non-existing domains
When the user types a domain which has an associated alias, your client shall continue to "follow" the alias until it eventually resolves to an IP address.
IMPORTANT
Your client should be designed to handle theoretically unlimited chain of aliases.
Add additional logic to consult alternative name servers if one shows no response after a timeout interval
Use appropriate Python data structures (dictionary/map) to implement caching strategy that stores both the name servers address and the resolved IP addresses.
- Use the
.list
command to show cache. The output should be numbered starting from 1, the number will be used by the.remove
command below - Use the
.clear
to remove all cache copy - Use the
.remove N
to delete a specific cache copy whereN
is an integer shown in the.list
output above. Implement error handling when the user attempts to remove non-existing cache copy (N
is non-positive or too big)
- Use the
IP addresses shall be displayed in dot-decimal notation i.e. 175.23.28.184
Your program should be able to resolve domain name with any length of subdomains
- github.io
- www.gvsu.edu
- 8ck2mf8.x.incapdns.net
- some-random-name651234.with.many-dots.here.and-there.app
Extra Credit Options
- (2 pts) In addition to resolving to IPv4 address(es), also resolve the domain name to IPv6 address(es)
- (3 pts) When multiple name servers are available and one of the servers is non-responsive, resend queries to alternate name servers exhaustively
- (4-8 points) Implement the iterative domain name resolver without any external DNS module (dnslib, dnspython, ...). Warning: if you decide to take this challenge, keep two separate copies of your program: one that uses
dnslib
and another withoutdnslib
. A separate handout has been provided to give you some idea on parsing DNS messages using your own Python code.WARNING
To be eligible for the minimum extra credit (4 points) you code must be able to parse the incoming DNS response messages and prints the correct information. The extra credit will not be given for "effort".
Starter Code
The following starter code uses functions and classes provided by the dnslib
module. You may have to first install the module by typing the following command in your terminal:
pip install dnslib
EXPAND THIS PANEL TO SHOW CODE
[Gist]API rate limit exceeded for 52.70.121.181. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
Important
You have to modify the get_dns_record()
function in the starter code (its parameters, return value(s), overall structure, etc.) to match the overall design of your program. The starter code is essentially a demonstration of how to use the dnslib
module.
Nevertheless, DO NOT modify the following code at line 12:
q.header.rd = 0 # Do not recurse
The rd
flags stands for "recursion desired". Since we are implementing iterative queries, we have to force the remote name servers not to recursively resolve the domain name.
Using the repr()
Python function throughout the code is a useful technique for debugging the internal representation of a Python object. For instance, the output
<DNS RR: 'a.edu-servers.net' rtype=A, rclass=IN ttl=172800 rdata='192.5.6.30'>
indicates that the RR object has the rtype
, rclass
, ttl
, rdata
properties which can be accessed from your code:
# Line 49-51 of the starter code
a = RR.parse(____)
if a.rtype == QTYPE.A:
TIP
Lines 47-58 of dns.py
on GitHub shows other symbolic constants for QTYPE
.
Running The Starter Code
The starter code uses a hardcoded address of the ICANN root serverl.root-servers.net
at IP 199.7.83.42. When the program failed due to timeout, try a different alternate root server. The first call to get_dns_record()
get_dns_record(sock, "edu", ROOT_SERVER, "NS")
produces the following output that shows all the available TLD servers for edu
top-level domain:
DNS query <DNS Header: id=0x4eb3 type=QUERY opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=0 ar=0>
<DNS Question: 'edu.' qtype=NS qclass=IN>
DNS header <DNS Header: id=0x4eb3 type=RESPONSE opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=13 ar=12>
Question-0 <DNS Question: 'edu.' qtype=NS qclass=IN>
Authority-0 <DNS RR: 'edu.' rtype=NS rclass=IN ttl=172800 rdata='a.edu-servers.net.'>
Authority-1 <DNS RR: 'edu.' rtype=NS rclass=IN ttl=172800 rdata='b.edu-servers.net.'>
... more output
Additional-0 <DNS RR: 'a.edu-servers.net.' rtype=A rclass=IN ttl=172800 rdata='192.5.6.30'> Name: a.edu-servers.net.
Additional-1 <DNS RR: 'a.edu-servers.net.' rtype=AAAA rclass=IN ttl=172800 rdata='2001:503:a83e::2:30'> Name: a.edu-servers.net.
Additional-2 <DNS RR: 'b.edu-servers.net.' rtype=A rclass=IN ttl=172800 rdata='192.33.14.30'> Name: b.edu-servers.net.
Additional-3 <DNS RR: 'b.edu-servers.net.' rtype=AAAA rclass=IN ttl=172800 rdata='2001:503:231d::2:30'> Name: b.edu-servers.net.
...
Explanations of the output:
The
rcode=NOERROR
indicates that the query was successfully executed. Section 4.1.1 of RFC 1035 shows other possible numeric values ofrcode
. In thednslib
module, these values are symbolically represented as a Python bi-directional map Lines 66-69 ofdns.py
.The last four numbers on line 3 (q=1, a=0, ns=13, ar=12) indicates that the DNS response includes
- 1 query (in the Question section)
- 13 resource records in the Authority section, each record has type NS (Name Server). Specifically, it shows the available
.edu
TLD servers (obtained by consulting a ROOT server) - 12 resource records in the Additional Records section, each record has type A (IPv4 address) or AAAA (IPv6 address)
Line 4 of the output is printed from the Question section (the for-loop at lines 38-41 of the starter code)
There are no records from the Answer section (the for-loop at lines 48-52 of the starter code)
Lines 5-7 are printed from the Authority section (the for-loop at lines 54-57 of the starter code)
Lines 9-12 are printed from the Additional Records section which shows the associated IP address of each name server (from the for-loop at lines 60-62 of the starter code)
Query Failures
The last two function calls are provided to show what happened when the server cannot find the answer to your queries:
get_dns_record(sock, "gvsu.edu", "8.8.8.8", "NS") # (1)
get_dns_record(sock, "www.gvsu.edu", "8.8.8.8", "A") # (2)
produce the following output that indicates "Server Failure" (rcode=SERVFAIL
)
DNS query <DNS Header: id=0xe286 type=QUERY opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=0 ar=0>
<DNS Question: 'gvsu.edu.' qtype=NS qclass=IN>
DNS header <DNS Header: id=0xe286 type=RESPONSE opcode=QUERY flags=RA rcode='SERVFAIL' q=1 a=0 ns=0 ar=0>
Query failed
DNS query <DNS Header: id=0xbc6a type=QUERY opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=0 ar=0>
<DNS Question: 'www.gvsu.edu.' qtype=A qclass=IN>
DNS header <DNS Header: id=0xbc6a type=RESPONSE opcode=QUERY flags=RA rcode='SERVFAIL' q=1 a=0 ns=0 ar=0>
Query failed
Refer to Section 4.1.1 of RFC1035 for a complete list of error codes.
Grading Rubrics (Tentative)
Feature | Point |
---|---|
Client accepts multiple user input | 2 |
Handling of .exit command | 2 |
Fetch and parse DNS records from Root Name Servers | 3 |
Fetch and parse DNS records from TLD Name Servers | 3 |
Fetch and parse DNS records from Authoritative Name Servers | 3 |
Resolve Domain Name to IP4 address(es) | 3 |
Resolve Domain Alias(es) to IP4 address(es) | 5 |
Correct lookup order of cached information | 2 |
Skip appropriate name server(s) when details are found in cache | 3 |
Handling of .list to show cache | 2 |
Handling of .clear to clear cache | 2 |
Handling of .remove to remove a specific cache entry | 2 |
Show informational output on the source of address resolution (cache or query) | 2 |
Show error message on non-existing domain names | 2 |
Penalty of code bloat due to poor code design/organization | -4 max |
(Extra) Resolve domain name to both IPv4 and IPv6 | 2 |
(Extra) Use multiple name servers exhaustively | 3 |
(Extra) Implementation without dnslib | 4-8 |