MIT-licensed
Fork me on GitHub!

URL parser tester

This page parses a given URL with several available parsers, and compares their outputs. To support a variety of programming languages, we make heavy use of WebAssembly and Web Workers. Unfortunately, this may lead to some random crashes/reloads on Safari; Firefox and Chromium-based browsers are preferred.

Detailed methodology

Parsers sometimes have different names for the same part of a URL. Here, we made things consistent by using names used in the WHATWG URL API, available in browsers and Node.js. This translation may not always be exact; below we document how we figured the output of each parser.

Python urlparse+requests
This combines Python's built-in urllib.parse module with Python library Requests' requote_uri() function. Python's urllib uses various RFCs (primarily 1738 and 1808) as the basis for its parser. To run Python in the browser, we use Pyodide, which compiles Python to WebAssembly.
Since the parser does no normalization by default, we use the popular Requests library's requote_uri() for parity with other parsers listed here. The part mappings are as follows:
Property ParseResult properties
hrefgeturl()
protocolscheme
usernameusername
passwordpassword
hostnamehostname
portport
pathnamepath
searchquery
hashfragment
Note: We ignore the params part, which exists in RFC 1738 but has no equivalent in other parsers and was removed in RFC 3986.
libcurl
This is libcurl's URL API. curl uses RFC 3986 as the basis for its parser, with some features of the WHATWG URL Standard mixed in, as detailed on its URL Syntax documentation page. We created a simple C application "frontend" for the API and compiled it to WebAssembly using Emscripten. While curl does support IDNA through the libidn2 library, the functionality is not exposed through the URL API.
When parsing the URL, we use CURLU_NON_SUPPORT_SCHEME and CURLU_URLENCODE flags. When getting individual parts of the URL, we pass 0 as flags. The part mappings are as follows:
Property CURLUPart
hrefCURLUPART_URL
protocolCURLUPART_SCHEME
usernameCURLUPART_USER
passwordCURLUPART_PASSWORD
hostnameCURLUPART_HOST
portCURLUPART_PORT
pathnameCURLUPART_PATH
searchCURLUPART_QUERY
hashCURLUPART_FRAGMENT
Note: We ignore CURLUPART_OPTIONS, used for IMAP/POP3/SMTP "login options." We also do not list CURLUPART_ZONEID separately as it is included in CURLUPART_HOST.
Go net/url
Go net/http
This is Go's built-in net/url package. The parser is based on RFC 3986, with some compatibility fixes. We compiled it to WebAssembly using Go's built-in compiler support. net/url itself does not support IDNA, but the built-in net/http package does through the golang.org/x/net/idna package. We added a "Go net/http" entry to emulate how net/http handles a URL.
The part mappings are as follows:
Property url.URL field/method
hrefString()
protocolScheme
usernameUser.Username()
passwordUser.Password()
hostnameHostname()
portPort()
pathnameEscapedPath() or Opaque
searchRawQuery
hashEscapedFragment()
Node.js legacy
This is the Node.js's legacy URL parser, written in JavaScript based on RFC 3986. Developers have been encouraged to switch to the modern parser based on the WHATWG URL Standard since version 8 (released in 2017). We copied the parser as well as some required internal Node.js source files and bundled them using Browserify for use here.
Compared to the official Node.js binaries, the version presented here could have some slight differences when handling IDNA. This is since Node.js generally uses ICU4C's IDNA support (which is difficult to compile to WebAssembly), while here we have replaced it with a pure JavaScript implementation tr46.
The part mappings are as follows:
Property Legacy urlObject property
hrefhref
protocolprotocol
usernameauth.split(:)[0]
passwordauth.split(:)[1…].join(:)
hostnamehostname
portport
pathnamepathname
searchsearch
hashhash
spec-url
spec-url absolute
This is the JavaScript spec-url library, a reference implementation of Alwin Blok's URL Specification. Blok's specification is designed to be a rephrasing of the WHATWG URL Standard in more theoretic terms. We used Browserify to generate a bundle for the library.
Since spec-url does not yet provide a high-level parsing function, we have to describe what exactly we do here. The actual parsing steps done by this tool is similar to the proposed parse-resolve-and-normalise algorithm in Blok's specification. If no base URL is specified, "web-mode" is used, and the "force resolve" step in the algorithm is not done.
The absolute variant optimizes for use of the input string as an "absolute URL," at the risk of losing some information. Concretely, the absolute variant always forces the parser output. The absolute variant is closer to how the WHATWG URL Standard operates, while the normal variant is closer to how Go's net/url operates.
The part mappings are derived from Blok's specification:
Property Field/function
hrefprint()
protocolscheme
usernameuser
passwordpass
hostnamehost
portport
pathnameroot + (dirs && (dirs.join(/) + /)) + file
searchquery
hashhash
Rust url
This is Rust's url crate, created by the Servo Project. It should be highly compatible with the WHATWG URL Standard, with complete IDNA support. We compiled it to WebAssembly using wasm-pack and wasm-bindgen.
The part mappings are as follows:
Property url::Url method
hrefas_str()
protocolscheme()
usernameusername()
passwordpassword()
hostnamehost_str()
portport()
pathnamepath()
searchquery()
hashfragment()
whatwg-url
This is the JavaScript whatwg-url library, designed from scratch to be a reference implementation of the WHATWG URL Standard. We load the latest (nightly) bundle of the JavaScript whatwg-url library, which is also used for its own URL Viewer program. This utility is, to a large extent, inspired by URL Viewer. URL part mapping is trivial, as whatwg-url exposes the same properties as a browser URL object.
your browser
For comparison, we also parse every URL with your own browser's URL class. URL part mapping is trivial, as whatwg-url exposes the same properties as a browser URL object.