Skip to content

Important

This assignment must be completed individually. The solution submitted for grading must be yours. Please refer to the Academic Integrity section of our class syllabus.

Overview of Domain Name Resolution

In terms of geographical coverage, the Domain Name database is perhaps the largest distributed database on the Internet. As of 2012, it was estimated about 10 million DNS servers on the Internet, each manages a minute fraction of the entire distributed database. After more than a decade later, the number must have increased significantly. Collectively, millions of these servers work seamlessly and are accessible via a query/response mechanism as specified by the DNS protocol provided in the following documents:

  • RFC1034: Domain Names - Concepts and Facilities
  • RFC1035: Domain Names - Implementation and Implementation

Logically, all these million servers form a 3-level tree of nodes, referred to as the domain name space in RFC1034. These nodes are structured as follows:

  • Root Name Servers (13 nodes) which maintain DNS records of TLD name servers
  • TLD Name Servers (approximately 1500+ nodes) which maintain DNS records of Authoritative name servers for the TLD domains (.com, .org, .net, .edu, .int, .gov, .mil, .app, .art, .audio, ...)
  • Authoritative Name Servers (10M+ nodes per 2012 estimate) which maintain DNS records of a particular domain, and usually the last node consulted in the journey of resolving a domain name of its IP address(es)

DNS Record

The distributed Domain Name Database stores many types of records, but for this assignment only the following types matters:

  • Type A: is associated with IP4 address
  • Type AAAA: is associated with IP6 address
  • Type NS: is associated with details of a name server
  • Type CNAME: is associated with host alias

Name Resolution Algorithm

As explained in lecture, domain name resolvers can operate either recursively or iteratively. More technical explanation on the two modes can be found in Section 4.3.1 RFC1034.

To resolve a domain name, for instance www.cis.gvsu.edu, an iterative client must consult at least three nodes (one at each level):

  1. First, the client (C) sends an NS-type question to one of the 13 root servers to provide the details of the name servers which manage the .edu top-level domain. Upon a successful query, the selected root server returns a list of name servers L1.
  2. Next, the client (C) selects one name name server X from the list L1
  3. Next, the client (C) sends X an NS-type question to provide the details of the name servers which manage the .gvsu.edu domain. Upon a successful query, X returns a list of name servers L2.
  4. Next, the client (C) selects one name servers Y from the list L2
  5. Next, the client (C) sends Y an A-type question to resolve the IP address of www.cis.gvsu.edu. Upon a successful query, Y returns a list of IP4 addresses

In total, the above address resolution procedure requires at least three DNS queries and three DNS responses.

TIP

Section 3.1 of RFC1034 specifies that each node is associated with a label (which is the subdomain derived from the full domain name). For instance, in the above scenario edu becomes the label associated with node X, and gvsu.edu associated with node Y.

Section 4.3.2 of RFC1034 give additional explanation of the resolution algorithm.

The same sequence also applies to shorter domain names, such as github.io:

  1. First, the client (C) sends an NS-type question to one of the 13 root servers to provide the details of the name servers in charge of the .io top level domain. Upon a successful query, the root server return a list of name servers (L1)
  2. Next, the client (C) selects one name server (R) from L1 and sends it an NS-type question to provide the details of the name servers in charge of the .github.io domain. Upos a successful query, R returns a list of name servers (L2)
  3. Next, the client (C) selects one name server (S) from L2 and sends it an A-type question to resolve the IP address of github.io

Important

In both scenarios (for resolving longer name www.cis.gvsu.edu and shorter name github.io), the client sends:

  • two NS-type questions and
  • one A-type question

to three different name servers.

Record Types of NS-Type Questions

When sending an NS-type question to a name server, the answer from the name server may include both the name and IP address of the servers, in which case your client shall use the IP address for subsequent queries. However, sometimes the answer includes only the name (with no associated IP address) of the name servers. Fortunately, the dnslib (used in the starter code) works fine when you supply the name of the server.

Response Types of A-Type Questions

When sending an A-type question to a name server, ideally your client gets an A-type answer and this completes the IP address resolution. However, occasionally, a name server may respond with:

  • An NS-type answer containing the details of (alternate) name servers, in which case your client to resend the A-type question to the said name server
  • A CNAME-type answer containing a host alias (explained in the next section below)

Host Name Aliases

The DNS distributed database permits host names to have aliases. This feature helps Internet users to use easy to remember names instead of longer and cryptic names. More importantly, it also enables virtual hosting to be decoupled from the name of the domain (company) which serves the virtual host. The following example shows that:

  • our GVSU Bb Ultra is hosted at Blackboard (as gvsu.blackboard.com)
  • and Blackboard itself runs on the Amazon Web Services.

Practically, all access to lms.gvsu.edu must be redirected to AWS. And the "redirection" is magically implemented by creating a host name alias of lms.gvsu.edu to the hostname on AWS.

A longer sequence may be required if the domain name to be resolved involves one or more aliases. For instance, lms.gvsu.edu is an alias of gvsu.blackboard.com which is also an alias of learn-prod.<...>.amazonaws.com. In this aliasing scenario, during the third step in the above sequence, the .gvsu.edu authoritative server will respond with the alias(es), instead of the IP address. Hence, the resolution sequence will typically go as follows:

  1. First, the client (C) queries one of the 13 root servers for .edu name servers (X)
  2. Next, C queries X for .gvsu.edu name servers (Y)
  3. Next, C queries Y to resolve the IP address of lms.gvsu.edu but Y returns the alias gvsu.blackboard.com
  4. Next, C queries one of the 13 root servers for .com name servers (G)
  5. Next, C queries G for blackboard.com name servers (H)
  6. Next, C queries H to resolve the IP address of gvsu.blackboard.com but H returns the alias learn-proc.<...>.amazonaws.com
  7. Next, C queries one of the 13 root servers for .com name servers (G)
  8. Next, C queries G for amazonaws.com name servers (Z)
  9. Next, C queries Z to resolve the IP address of for learn-prod.<...>.amazonaws.com

TIP

Refer to Section 5.2.2 of RFC1035 for more details on working with aliases.

Caching

Without caching enabled, the above sequence requires a total of 18 DNS messages (9 queries + 9 responses). With caching enabled, step (7) above can be skipped the step (8) will (re)use the IP address of (G) obtained from step (4). Furthermore, if prior to resolving lms.gvsu.edu your client had resolved www.cis.gvsu.edu and the IP address of name resolver for gvsu.edu is in the cache, steps (1) and (2) above can be skipped.

Caching can also be applied to the host IP address as well such that repeated queries can be resolved directly from the cache without even consulting any external name servers. For instance, when the first query to 'lms.gvsu.edu' has been resolved. Repeated queries to lms.gvsu.edu can be resolved internally from your client cache copy.

To take advantage of cache copies, your client should check the available cache copy from the longest subdomain first. For instance, when resolving lms.gvsu.edu your client should:

  • First check if a cached copy of lms.gvsu.edu IP address is available
  • Otherwise, check if a cached copy of gvsu.edu name server IP address is available
  • Otherwise, check if a cached copy of the .edu name server IP address is available
  • Otherwise, consult one of the root name servers and so on.

DNS Messages

Both DNS clients and servers communicate using message of the same format; query and response messages include the following five sections:

  1. Header (always present, 96 bytes in size)
  2. Question
  3. Answer: resource records answering the question
  4. Authority: resource records pointing toward an authority
  5. Additional: resource records holding additional information

The other sections after the header may not be present, but if they do, their sized vary and must be placed in the order specified above.

DNS Resource Records

When viewed as a conventional SQL database, the entire domain name database is like a giant table of SQL records. Each DNS record (which shows up in the Answer, Authority, and Additional sections above) defines the following six fields/columns (see RFC1034 Section 3.6 for more details):

  • Owner: the domain name associated with this record

  • Type: type of record. For this assignment, only the A, AAAA, NS, and CNAME are relevant to your client (see RFC1035 Section 3.2.2 for more details)

  • Class: not relevant for this assignment

  • TTL: time to live (in seconds), not relevant for this assignment

  • RD Length: length of the RDData field

  • RDATA: extra data whose content depends on the record type

    TypeRDATA contentDescription
    A32-bit IPv4 addressThe record includes IPv4 address
    NSHost nameThe record includes the host name of a name server
    CNAMEDomain nameThe record includes an alias name of a domain

Program Requirements

In this assignment you are to develop an iterative name resolver client. The domain names to resolve will be typed directly by the user. So, you are NOT writing a "local" name server that receives incoming request from "local" programs. Your program will only send queries to other name servers and handle the corresponding responses.

Important

The goal of this assignment is to develop a program which communicates with various name servers (at the three levels in the namespace tree). Not just obtaining the IPv4 address of a given host/domain name, which can be done by a one line function call

python
import socket
ip_addr = socket.gethostbyname("computing.gvsu.edu")

Your program is required to include code for sending/receiving DNS messages using a UDP socket, as well as parsing the incoming DNS messages.

  1. Implement an iterative domain name resolver in Python (3.12 or newer). Your code shall be designed to use while loop(s) that send DNS messages and parse their corresponding responses. Avoid code bloat as the result of copying chunk of code multiple times. Instead, organize them into function(s).

  2. RFC 1034 specifies that the protocol can be implemented using either TCP or UDP. For this assignment, your client shall use a UDP socket. Specifically, create only one socket, design the program to run in a loop prompting the user to enter domain name. The loop stops when the user types .exit

    python
    if __name__ == '__main__':
       sock = socket(AF_INET, SOCK_DGRAM)
       sock.settimeout(2)
       while True:
         domain_name = input("Enter a domain name or .exit > ")
    
         if domain_name == '.exit':
           break
        
         while _____:
           # Use the function get_dns_record(____) (from the starter code
           # below) to resolve the IP address of the domain name in question
       sock.close()
  3. Your client shall print the following informational output:

    • Which TLD name server(s) are consulted, whether it is obtained from the cache or from a root server
    • Which Authoritative name server(s) are consulted, whether it is obtained from the cache or from a TLD name server
    • The IP address resolution of the domain name in question, whether it is obtained from the cache or from an authoritative name server
    • When a domain name has an alias

    WARNING

    Also, remove unnecessary debugging output

  4. Your client shall print sufficient error message when the user queried a non-existing domains

  5. When the user types a domain which has an associated alias, your client shall continue to "follow" the alias until it eventually resolves to an IP address.

    IMPORTANT

    Your client should be designed to handle theoretically unlimited chain of aliases.

  6. Add additional logic to consult alternative name servers if one shows no response after a timeout interval

  7. Use appropriate Python data structures (dictionary/map) to implement caching strategy that stores both the name servers address and the resolved IP addresses.

    • Use the .list command to show cache. The output should be numbered starting from 1, the number will be used by the .remove command below
    • Use the .clear to remove all cache copy
    • Use the .remove N to delete a specific cache copy where N is an integer shown in the .list output above. Implement error handling when the user attempts to remove non-existing cache copy (N is non-positive or too big)
  8. IP addresses shall be displayed in dot-decimal notation i.e. 175.23.28.184

  9. Your program should be able to resolve domain name with any length of subdomains

    • github.io
    • www.gvsu.edu
    • 8ck2mf8.x.incapdns.net
    • some-random-name651234.with.many-dots.here.and-there.app

Extra Credit Options

  • (2 pts) In addition to resolving to IPv4 address(es), also resolve the domain name to IPv6 address(es)
  • (3 pts) When multiple name servers are available and one of the servers is non-responsive, resend queries to alternate name servers exhaustively
  • (4-8 points) Implement the iterative domain name resolver without any external DNS module (dnslib, dnspython, ...). Warning: if you decide to take this challenge, keep two separate copies of your program: one that uses dnslib and another without dnslib. A separate handout has been provided to give you some idea on parsing DNS messages using your own Python code.

    WARNING

    To be eligible for the minimum extra credit (4 points) you code must be able to parse the incoming DNS response messages and prints the correct information. The extra credit will not be given for "effort".

Starter Code

The following starter code uses functions and classes provided by the dnslib module. You may have to first install the module by typing the following command in your terminal:

bash
pip install dnslib
EXPAND THIS PANEL TO SHOW CODE [Gist]

API rate limit exceeded for 52.70.121.181. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

Important

You have to modify the get_dns_record() function in the starter code (its parameters, return value(s), overall structure, etc.) to match the overall design of your program. The starter code is essentially a demonstration of how to use the dnslib module.

Nevertheless, DO NOT modify the following code at line 12:

python
q.header.rd = 0    # Do not recurse

The rd flags stands for "recursion desired". Since we are implementing iterative queries, we have to force the remote name servers not to recursively resolve the domain name.

Using the repr() Python function throughout the code is a useful technique for debugging the internal representation of a Python object. For instance, the output

<DNS RR: 'a.edu-servers.net' rtype=A, rclass=IN ttl=172800 rdata='192.5.6.30'>

indicates that the RR object has the rtype, rclass, ttl, rdata properties which can be accessed from your code:

python
# Line 49-51 of the starter code
a = RR.parse(____)
if a.rtype == QTYPE.A:

TIP

Lines 47-58 of dns.py on GitHub shows other symbolic constants for QTYPE.

Running The Starter Code

The starter code uses a hardcoded address of the ICANN root serverl.root-servers.net at IP 199.7.83.42. When the program failed due to timeout, try a different alternate root server. The first call to get_dns_record()

python
get_dns_record(sock, "edu", ROOT_SERVER, "NS")

produces the following output that shows all the available TLD servers for edu top-level domain:

DNS query <DNS Header: id=0x4eb3 type=QUERY opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=0 ar=0>
<DNS Question: 'edu.' qtype=NS qclass=IN>
DNS header <DNS Header: id=0x4eb3 type=RESPONSE opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=13 ar=12>
Question-0 <DNS Question: 'edu.' qtype=NS qclass=IN>
Authority-0 <DNS RR: 'edu.' rtype=NS rclass=IN ttl=172800 rdata='a.edu-servers.net.'>
Authority-1 <DNS RR: 'edu.' rtype=NS rclass=IN ttl=172800 rdata='b.edu-servers.net.'>
... more output

Additional-0 <DNS RR: 'a.edu-servers.net.' rtype=A rclass=IN ttl=172800 rdata='192.5.6.30'> Name: a.edu-servers.net.
Additional-1 <DNS RR: 'a.edu-servers.net.' rtype=AAAA rclass=IN ttl=172800 rdata='2001:503:a83e::2:30'> Name: a.edu-servers.net.
Additional-2 <DNS RR: 'b.edu-servers.net.' rtype=A rclass=IN ttl=172800 rdata='192.33.14.30'> Name: b.edu-servers.net.
Additional-3 <DNS RR: 'b.edu-servers.net.' rtype=AAAA rclass=IN ttl=172800 rdata='2001:503:231d::2:30'> Name: b.edu-servers.net.
...

Explanations of the output:

  • The rcode=NOERROR indicates that the query was successfully executed. Section 4.1.1 of RFC 1035 shows other possible numeric values of rcode. In the dnslib module, these values are symbolically represented as a Python bi-directional map Lines 66-69 of dns.py.

  • The last four numbers on line 3 (q=1, a=0, ns=13, ar=12) indicates that the DNS response includes

    • 1 query (in the Question section)
    • 13 resource records in the Authority section, each record has type NS (Name Server). Specifically, it shows the available .edu TLD servers (obtained by consulting a ROOT server)
    • 12 resource records in the Additional Records section, each record has type A (IPv4 address) or AAAA (IPv6 address)
  • Line 4 of the output is printed from the Question section (the for-loop at lines 38-41 of the starter code)

  • There are no records from the Answer section (the for-loop at lines 48-52 of the starter code)

  • Lines 5-7 are printed from the Authority section (the for-loop at lines 54-57 of the starter code)

  • Lines 9-12 are printed from the Additional Records section which shows the associated IP address of each name server (from the for-loop at lines 60-62 of the starter code)

Query Failures

The last two function calls are provided to show what happened when the server cannot find the answer to your queries:

get_dns_record(sock, "gvsu.edu", "8.8.8.8", "NS")      # (1)
get_dns_record(sock, "www.gvsu.edu", "8.8.8.8", "A")   # (2)

produce the following output that indicates "Server Failure" (rcode=SERVFAIL)

DNS query <DNS Header: id=0xe286 type=QUERY opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=0 ar=0>
<DNS Question: 'gvsu.edu.' qtype=NS qclass=IN>
DNS header <DNS Header: id=0xe286 type=RESPONSE opcode=QUERY flags=RA rcode='SERVFAIL' q=1 a=0 ns=0 ar=0>
Query failed

DNS query <DNS Header: id=0xbc6a type=QUERY opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=0 ar=0>
<DNS Question: 'www.gvsu.edu.' qtype=A qclass=IN>
DNS header <DNS Header: id=0xbc6a type=RESPONSE opcode=QUERY flags=RA rcode='SERVFAIL' q=1 a=0 ns=0 ar=0>
Query failed

Refer to Section 4.1.1 of RFC1035 for a complete list of error codes.

Grading Rubrics (Tentative)

FeaturePoint
Client accepts multiple user input2
Handling of .exit command2
Fetch and parse DNS records from Root Name Servers3
Fetch and parse DNS records from TLD Name Servers3
Fetch and parse DNS records from Authoritative Name Servers3
Resolve Domain Name to IP4 address(es)3
Resolve Domain Alias(es) to IP4 address(es)5
Correct lookup order of cached information2
Skip appropriate name server(s) when details are found in cache3
Handling of .list to show cache2
Handling of .clear to clear cache2
Handling of .remove to remove a specific cache entry2
Show informational output on the source of address resolution (cache or query)2
Show error message on non-existing domain names2
Penalty of code bloat due to poor code design/organization-4 max
(Extra) Resolve domain name to both IPv4 and IPv62
(Extra) Use multiple name servers exhaustively3
(Extra) Implementation without dnslib4-8