MIT-licensed
Fork me on GitHub!

URL parser tester

This page parses a given URL with several available parsers, and compares their outputs. To support a variety of programming languages, we make heavy use of WebAssembly and Web Workers. Unfortunately, this may lead to some random crashes/reloads on Safari; Firefox and Chromium-based browsers are preferred.

Know of any other parser that you'd like to see here? File an issue.

Filter by:

Interesting examples

Here are some URLs that parsers tend to diverge on. Parsers that act differently from others are in parentheses.

Here's a list of URLs that used to differ, but have since reached convergence:

Detailed methodology

Parsers sometimes have different names for the same part of a URL. Here, we made things consistent by using names used in the WHATWG URL API, available in browsers and Node.js. This translation may not always be exact; below we document how we figured the output of each parser.

Go net/url
Go net/http

This is Go's built-in net/url package. The parser is based on RFC 3986, with some compatibility fixes. We compiled it to WebAssembly using Go's built-in compiler support. net/url itself does not support IDNA, but the built-in net/http package does through the golang.org/x/net/idna package. We added a "Go net/http" entry to emulate how net/http handles a URL.

The part mappings are as follows. Go's url.URL object has multiple accessors for the path, query, and fragment components, each with a different level of encoded-ness; we choose the same fields/methods as the URL.String() serialization method.

Property url.URL field/method
hrefString()
protocolScheme
usernameUser.Username()
passwordUser.Password()
hostnameHostname()
portPort()
pathnameOpaque || EscapedPath()
searchRawQuery
hashEscapedFragment()
Node.js legacy

This is the Node.js's legacy URL parser, written in JavaScript based on RFC 3986. Developers have been encouraged to switch to the modern parser based on the WHATWG URL Standard since version 8 (released in 2017). We copied the parser as well as some required internal Node.js source files and bundled them using esbuild for use here.

Compared to the official Node.js binaries, the version presented here could have some slight differences when handling IDNA. This is since Node.js generally uses ICU4C's IDNA support (which is difficult to compile to WebAssembly), while here we have replaced it with a pure JavaScript implementation tr46.

The part mappings are as follows:

Property Legacy urlObject property
hrefhref
protocolprotocol
usernameauth.split(:)[0]
passwordauth.split(:)[1…].join(:)
hostnamehostname
portport
pathnamepathname
searchsearch
hashhash
Python urlparse

This combines Python's built-in urllib.parse module with Python library Requests' requote_uri() function. Python's urllib uses various RFCs (primarily 1738 and 1808) as the basis for its parser. To run Python in the browser, we use Pyodide, which compiles Python to WebAssembly.

Since the parser does no normalization by default, we use the popular Requests library's requote_uri() for parity with other parsers listed here. The part mappings are as follows:

Property ParseResult properties
hrefgeturl()
protocolscheme
usernameusername
passwordpassword
hostnamehostname
portport
pathnamepath
searchquery
hashfragment

Note: We ignore the params part, which exists in RFC 1738 but has no equivalent in other parsers and was removed in RFC 3986.

Python requests

This captures how Python library Requests' deals with request URLs. Requests uses urllib3, which is based on RFC 3986, to parse incoming URLs. However, it does some additional normalization on top of urllib3, such as applying the requote_uri() function. IDNA support in both Requests and urllib3 is provided through the idna package. The part mappings are as follows:

Property urllib3.util.Url properties
hrefurl
protocolscheme
usernameauth.split(:)[0]
passwordauth.split(:)[1…].join(:)
hostnamehost
portport
pathnamepath
searchquery
hashfragment
libcurl

This is libcurl's URL API. curl uses RFC 3986 as the basis for its parser, with some features of the WHATWG URL Standard mixed in, as detailed on its URL Syntax documentation page. We created a simple C application "frontend" for the API and compiled it to WebAssembly using Emscripten. While curl does support IDNA using the libidn2 library, the functionality is not exposed through the URL API.

When parsing the URL, we use CURLU_NON_SUPPORT_SCHEME and CURLU_URLENCODE flags. When getting individual parts of the URL, we pass 0 as flags. The part mappings are as follows:

Property CURLUPart
hrefCURLUPART_URL
protocolCURLUPART_SCHEME
usernameCURLUPART_USER
passwordCURLUPART_PASSWORD
hostnameCURLUPART_HOST
portCURLUPART_PORT
pathnameCURLUPART_PATH
searchCURLUPART_QUERY
hashCURLUPART_FRAGMENT

Note: We ignore CURLUPART_OPTIONS, used for IMAP/POP3/SMTP "login options." We also do not list CURLUPART_ZONEID separately as it is included in CURLUPART_HOST.

spec-url
spec-url absolute

This is the JavaScript spec-url library, a reference implementation of Alwin Blok's URL Specification. Blok's specification is designed to be a rephrasing of the WHATWG URL Standard in more mathematical terms. We used esbuild to generate a bundle for the library.

The actual parsing steps done by this tool is similar to the proposed parse-resolve-and-normalise algorithm in Blok's specification. If no base URL is specified, "web-mode" is used, and the "force resolve" step in the algorithm is not done.

The absolute variant optimizes for use of the input string as an "absolute URL," at the risk of losing some information. Concretely, the absolute variant always forces the parser output. The absolute variant is closer to how the WHATWG URL Standard operates, while the normal variant is closer to how Go's net/url and Node.js' legacy parser operate.

The part mappings are derived from Blok's specification:

Property Field/function
hrefprint()
protocolscheme
usernameuser
passwordpass
hostnamehost
portport
pathnameroot + (dirs && (dirs.join(/) + /)) + file
searchquery
hashhash
Rust url

This is Rust's url crate, created by the Servo Project. It should be highly compatible with the WHATWG URL Standard, with complete IDNA support. We compiled it to WebAssembly using wasm-pack and wasm-bindgen.

The part mappings are as follows:

Property url::Url method
hrefas_str()
protocolscheme()
usernameusername()
passwordpassword()
hostnamehost_str()
portport()
pathnamepath()
searchquery()
hashfragment()
whatwg-url
This is the JavaScript whatwg-url library, designed from scratch to be a reference implementation of the WHATWG URL Standard. We load the latest (nightly) bundle of the JavaScript whatwg-url library, which is also used for its own URL Viewer program. This utility is, to a large extent, inspired by URL Viewer. URL part mapping is trivial, as whatwg-url exposes the same properties as a browser URL object.
your browser
For comparison, we also parse every URL with your own browser's URL class.