Skip to content

Important

This assignment must be completed individually. The solution submitted for grading must be yours. Please refer to the Academic Integrity section of our class syllabus.

Overview of Domain Name Resolution

The Domain Name database is perhaps the largest distributed database on the Internet. As of 2012, it was estimated about 10 million DNS servers on the Internet, each manages a minute fraction of the entire distributed database. Collectively, millions of these servers work seamlessly and are accessible via a query/response mechanism as specified by the DNS protocol provided in the following documents:

  • RFC1034: Domain Names - Concepts and Facilities
  • RFC1035: Domain Names - Implementation and Implementation

Logically, all these million servers form a 3-level tree of nodes, referred to as the domain name space in RFC1034. These nodes are structured as follows:

  • Root Name Servers (13 nodes) which maintain DNS records of TLD name servers
  • TLD Name Servers (approximately 1500+ nodes) which maintain DNS records of Authoritative name servers for the TLD domains (.com, .org, .net, .edu, .int, .gov, .mil, .app, .art, .audio, ...)
  • Authoritative Name Servers (10M+ nodes per 2012 estimate) which maintain DNS records of a particular domain, and usually the last node consulted in the journey of resolving a domain name of its IP address(es)

DNS Resolvers and DNS Records

A DNS resolver is a (client) program that consults one or more DNS servers in order to resolve a domain name (such as www.gvsu.edu) to IP4 address (104.17.88.18) or IP6 address (2606:4700::6811:5812). Each DNS server typically stores many types of records, but for this assignment only the following types matters:

TypeDescription
AIPv4 address of a domain
AAAAIPv6 address of a domain
NSa name server associated with a domain
CNAMEa canonical name (alias) of a domain

Name Resolution Algorithm

As explained in lecture, domain name resolvers can operate either recursively or iteratively. More technical explanation on the two modes can be found in Section 4.3.1 RFC1034.

To resolve a domain name, for instance www.cis.gvsu.edu, an iterative client must consult at least three different servers (one at each level):

  1. Send what-is-NS(edu) to one of the root servers which will respond with a list of TLD servers.

    TIP

    Since we have only 13 root servers on the entire internet, you can hardcode the IP4 address of one of them in your program.

  2. Send what-is-NS(gvsu.edu) to one of the edu TLD servers which will respond with a list of authoritative servers

  3. Send what-is-A(www.cis.gvsu.edu) to one of the authoritative server of gvsu.edu, which SHOULD response with a list of IPv4 address

The same 3-step sequence also applies to shorter domain names, such as github.io, google.com, or gvsu.edu:

  1. Send what-is-NS(io) to one of the root servers which will respond with a list of TLD servers
  2. Send what-is-NS(github.io) to one of the .io TLD servers which will respond with a list of authoritative servers
  3. Send what-is-A(github.io) to one of the authoritative server of github.io, which SHOULD response with a list of IPv4 address

WARNING

When sending an A-type question to a name server (step 3 in the two examples above), ideally a resolver program receives an A-type answer (IP4 address) and this completes the IP address resolution. However, the response from a name server may be a different type:

  • An NS-type answer containing the details of (alternate) name servers, in which case your resolver program must resend the A-type question to the said name server
  • A CNAME-type answer containing a host alias in which case your resolver program must start over as if the user has entered a new domain name

TIP

Host Name Aliases

Resolving a domain name which has (many) aliases requires more work. For instance,

  • lms.gvsu.edu is gvsu.blackboard.com
  • but gvsu.blackboard.com is an alias to learn-prod.<...>.amazonaws.com

In this scenario, you resolver has to perform the following steps:

  1. Send what-is-NS(edu) to one of the edu root servers
  2. Send what-is-NS(.gvsu.edu) to one of the .edu TLD servers
  3. Send what-is-A(lms.gvsu.edu) to one of the authoritative servers of gvsu.edu. At this point you expect A-type response, but instead your receive CNAME response (gvsu.blackboard.com)
  4. Send what-is-NS(com) to one of the edu root servers
  5. Send what-is-NS(.blackboard.com) to one of the .com TLD servers
  6. Send what-is-A(gvsu.blackboard.com) to one of the authoritative servers. You expect A-type response, but received another CNAME response (learn-prod.<...>.amazonaws.com)
  7. Send what-is-NS(com) to one of the edu root servers
  8. Send what-is-NS(.amazonaws.com) to one of the .com TLD servers
  9. Send what-is-A(learn-prod.<...>.amazonaws.com) to one of the authoritative servers. At this point, you should receive an A-type response

TIP

Refer to Section 5.2.2 of RFC1035 for more details on working with aliases.

Starter Code

The following starter code uses functions and classes provided by the dnslib module. You may have to first install the module by typing the following command in your terminal:

pip install dnslib
EXPAND THIS PANEL TO SHOW CODE [Gist]

API rate limit exceeded for 52.20.27.197. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

Important

You have to modify the get_dns_record() function in the starter code (its parameters, return value(s), overall structure, etc.) to match the overall design of your program. The starter code is essentially a demonstration of how to use the dnslib module.

Nevertheless,

  1. DO NOT modify the following code at line 12:

    q.header.rd = 0    # Do not recurse

    The rd flags stands for "recursion desired". Since we are implementing iterative queries, we have to force the remote name servers not to recursively resolve the domain name.

  2. DO NO modify the order of the for-loops towards the end of the function

Using repr() in Python

Using the repr() Python function throughout the code is a useful technique for debugging the internal representation of a Python object. For instance, the following statement shows the name of the fields in a Python object.

print(x).           # print x as a string
print(repr(x)).     # print the internal object structuore of x
<DNS RR: 'a.edu-servers.net' rtype=A, rclass=IN ttl=172800 rdata='192.5.6.30'>

indicates that the RR object has the rtype, rclass, ttl, rdata properties which can be accessed from your code as follows:

python
# Line 49-51 of the starter code
a = RR.parse(____)
if a.rtype == QTYPE.A:

TIP

Lines 47-58 of dns.py on GitHub shows other symbolic constants for QTYPE.

Running The Starter Code

TIP

It is strongly recommended that you keep Wireshark running and apply the filter tcp.port == 53 to monitor the communication between your program and various domain name server(s).

The starter code uses a hardcoded address of the ICANN root serverl.root-servers.net at IP 199.7.83.42. When the program failed due to timeout, try a different alternate root server. The first call to get_dns_record()

python
get_dns_record(sock, "edu", ROOT_SERVER, "NS")

produces the following output that shows all the available TLD servers for edu top-level domain:

DNS query <DNS Header: id=0x4eb3 type=QUERY opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=0 ar=0>
<DNS Question: 'edu.' qtype=NS qclass=IN>
DNS header <DNS Header: id=0x4eb3 type=RESPONSE opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=13 ar=12>
Question-0 <DNS Question: 'edu.' qtype=NS qclass=IN>
Authority-0 <DNS RR: 'edu.' rtype=NS rclass=IN ttl=172800 rdata='a.edu-servers.net.'>
Authority-1 <DNS RR: 'edu.' rtype=NS rclass=IN ttl=172800 rdata='b.edu-servers.net.'>
... more output

Additional-0 <DNS RR: 'a.edu-servers.net.' rtype=A rclass=IN ttl=172800 rdata='192.5.6.30'> Name: a.edu-servers.net.
Additional-1 <DNS RR: 'a.edu-servers.net.' rtype=AAAA rclass=IN ttl=172800 rdata='2001:503:a83e::2:30'> Name: a.edu-servers.net.
Additional-2 <DNS RR: 'b.edu-servers.net.' rtype=A rclass=IN ttl=172800 rdata='192.33.14.30'> Name: b.edu-servers.net.
Additional-3 <DNS RR: 'b.edu-servers.net.' rtype=AAAA rclass=IN ttl=172800 rdata='2001:503:231d::2:30'> Name: b.edu-servers.net.
...

Explanations of the output:

  • The rcode=NOERROR indicates that the query was successfully executed. Section 4.1.1 of RFC 1035 shows other possible numeric values of rcode. In the dnslib module, these values are symbolically represented as a Python bi-directional map Lines 66-69 of dns.py.

  • The last four numbers on line 3 (q=1, a=0, ns=13, ar=12) indicates that the DNS response includes

    • 1 query (in the Question section)
    • 13 resource records in the Authority section, each record has type NS (Name Server). Specifically, it shows the available .edu TLD servers (obtained by consulting the ROOT server 199.7.83.42)
    • 12 resource records in the Additional Records section, each record has type A (IPv4 address) or AAAA (IPv6 address)
  • Line 4 of the output is printed from the Question section (the for-loop at lines 38-41 of the starter code)

  • There are no records from the Answer section (the for-loop at lines 48-52 of the starter code)

  • Lines 5-7 are printed from the Authority section (the for-loop at lines 54-57 of the starter code)

  • Lines 9-12 are printed from the Additional Records section which shows the associated IP address of each name server (from the for-loop at lines 60-62 of the starter code)

Query Failures

The last two function calls are provided to show what happened when the server cannot find the answer to your queries:

get_dns_record(sock, "gvsu.edu", "8.8.8.8", "NS")      # (1)
get_dns_record(sock, "www.gvsu.edu", "8.8.8.8", "A")   # (2)

produce the following output that indicates "Server Failure" (rcode=SERVFAIL)

DNS query <DNS Header: id=0xe286 type=QUERY opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=0 ar=0>
<DNS Question: 'gvsu.edu.' qtype=NS qclass=IN>
DNS header <DNS Header: id=0xe286 type=RESPONSE opcode=QUERY flags=RA rcode='SERVFAIL' q=1 a=0 ns=0 ar=0>
Query failed

DNS query <DNS Header: id=0xbc6a type=QUERY opcode=QUERY flags= rcode='NOERROR' q=1 a=0 ns=0 ar=0>
<DNS Question: 'www.gvsu.edu.' qtype=A qclass=IN>
DNS header <DNS Header: id=0xbc6a type=RESPONSE opcode=QUERY flags=RA rcode='SERVFAIL' q=1 a=0 ns=0 ar=0>
Query failed

Refer to Section 4.1.1 of RFC1035 for a complete list of error codes.

DNS Messages

Both DNS clients and servers communicate using message of the same format; query and response messages include the following five sections:

  1. Header (always present, 96 bytes in size)
  2. Question
  3. Answer: resource records answering the question
  4. Authority: resource records pointing toward an authority
  5. Additional: resource records holding additional information

Sections 2-5 (after the header) have varying size may not be present. If they do, their order will always follow the above sequence. For instance,

  • The authority section will never show up BEFORE the answer section.

DNS Resource Records

When viewed as a conventional SQL database, the entire domain name database is like a giant table of SQL records. Each DNS record (which shows up in the Answer, Authority, and Additional sections above) defines the following six fields/columns (see RFC1034 Section 3.6 for more details):

  • Owner: the domain name associated with this record

  • Type: type of record. For this assignment, only the A, AAAA, NS, and CNAME are relevant to your client (see RFC1035 Section 3.2.2 for more details)

  • Class: not relevant for this assignment

  • TTL: time to live (in seconds), not relevant for this assignment

  • RD Length: length of the RDData field

  • RDATA: extra data whose content depends on the record type

    TypeRDATA contentDescription
    A32-bit IPv4 addressThe record includes IPv4 address
    NSHost nameThe record includes the host name of a name server
    CNAMEDomain nameThe record includes an alias name of a domain

Caching

Notice that in the 9-step example above, steps 4 and 7 sent the same queries. If the result of step (4) is saved into a cache within your resolver program, step 7 can be avoided. Hence, the name resolution can be made faster.

Likewise, after the first time resolving www.cis.gvsu.edu, your resolver program should have keep the result from what-is-NS(gvsu.edu). In which case, steps (1) and (2) in the above 9-step example can be skipped.

To take advantage of cache copies, your client should check the available cache copy from the longest subdomain first. For instance, when resolving lms.gvsu.edu your client should:

  • First check if a cached copy of lms.gvsu.edu IP address is available
  • Otherwise, check if a cached copy of gvsu.edu name server IP address is available
  • Otherwise, check if a cached copy of the .edu name server IP address is available
  • Otherwise, consult one of the root name servers and so on.

Program Requirements

In this assignment you are to develop an iterative name resolver client. The domain names to resolve will be typed directly by the user. So, you are NOT writing a "local" name server that receives incoming request from "local" programs. Your program will only send queries to other name servers and handle the corresponding responses.

Important

The goal of this assignment is to develop a program which communicates with various name servers (at the three levels in the namespace tree). Not just obtaining the IPv4 address of a given host/domain name, which can be done by a one line function call

python
import socket
ip_addr = socket.gethostbyname("computing.gvsu.edu")

Your program is required to include code for sending/receiving DNS messages using a UDP socket, as well as parsing the incoming DNS messages.

  1. Implement an iterative domain name resolver in Python (3.12 or newer). Your code shall be designed to use while loop(s) that send DNS messages and parse their corresponding responses. Avoid code bloat as the result of copying chunk of code multiple times. Instead, organize them into function(s).

  2. RFC 1034 specifies that the protocol can be implemented using either TCP or UDP. For this assignment, your client shall use a UDP socket. Specifically, create only one socket, design the program to run in a loop prompting the user to enter domain name. The loop stops when the user types .exit

    python
    if __name__ == '__main__':
       sock = socket(AF_INET, SOCK_DGRAM)
       sock.settimeout(2)
       while True:
         domain_name = input("Enter a domain name or .exit > ")
    
         if domain_name == '.exit':
           break
        
         while _____:
           # Use the function get_dns_record(____) (from the starter code
           # below) to resolve the IP address of the domain name in question
       sock.close()
  3. Your client shall print the following informational output:

    • Which TLD name server(s) are consulted, whether it is obtained from the cache or from a root server
    • Which Authoritative name server(s) are consulted, whether it is obtained from the cache or from a TLD name server
    • The IP address resolution of the domain name in question, whether it is obtained from the cache or from an authoritative name server
    • When a domain name has an alias

    WARNING

    Also, remove unnecessary debugging output

  4. Your client shall print sufficient error message when the user queried a non-existing domains

  5. When the user types a domain which has an associated alias, your client shall continue to "follow" the alias until it eventually resolves to an IP address.

    IMPORTANT

    Your client should be designed to handle theoretically unlimited chain of aliases.

  6. Add additional logic to consult alternative name servers if one shows no response after a timeout interval

  7. Use appropriate Python data structures (dictionary/map) to implement caching strategy that stores both the name servers address and the resolved IP addresses.

    • Use the .list command to show cache. The output should be numbered starting from 1, the number will be used by the .remove command below
    • Use the .clear to remove all cache copy
    • Use the .remove N to delete a specific cache copy where N is an integer shown in the .list output above. Implement error handling when the user attempts to remove non-existing cache copy (N is non-positive or too big)
  8. IP addresses shall be displayed in dot-decimal notation i.e. 175.23.28.184

  9. Your program should be able to resolve domain name with any length of subdomains

    • github.io
    • www.gvsu.edu
    • 8ck2mf8.x.incapdns.net
    • some-random-name651234.with.many-dots.here.and-there.app

Extra Credit Options

  • (2 pts) In addition to resolving to IPv4 address(es), also resolve the domain name to IPv6 address(es)
  • (3 pts) When multiple name servers are available and one of the servers is non-responsive, resend queries to alternate name servers exhaustively
  • (4-8 points) Implement the iterative domain name resolver without any external DNS module (dnslib, dnspython, ...). Warning: if you decide to take this challenge, keep two separate copies of your program: one that uses dnslib and another without dnslib. A separate handout has been provided to give you some idea on parsing DNS messages using your own Python code.

    WARNING

    To be eligible for the minimum extra credit (4 points) you code must be able to parse the incoming DNS response messages and prints the correct information. The extra credit will not be given for "effort".

Grading Rubrics (Tentative)

FeaturePoint
Client accepts multiple user input2
Handling of .exit command2
Fetch and parse DNS records from Root Name Servers3
Fetch and parse DNS records from TLD Name Servers3
Fetch and parse DNS records from Authoritative Name Servers3
Resolve Domain Name to IP4 address(es)3
Resolve Domain Alias(es) to IP4 address(es)5
Correct lookup order of cached information2
Skip appropriate name server(s) when details are found in cache3
Handling of .list to show cache2
Handling of .clear to clear cache2
Handling of .remove to remove a specific cache entry2
Show informational output on the source of address resolution (cache or query)2
Show error message on non-existing domain names2
Penalty of code bloat due to poor code design/organization-4 max
(Extra) Resolve domain name to both IPv4 and IPv62
(Extra) Use multiple name servers exhaustively3
(Extra) Implementation without dnslib4-8