URL parser tester

This page parses a given URL with several available parsers, and compares their outputs. To support a variety of programming languages, we make heavy use of WebAssembly and Web Workers. Unfortunately, this may lead to some random crashes/reloads on Safari; Firefox and Chromium-based browsers are preferred.

Know of any other parser that you'd like to see here? File an issue.

Interesting examples

Here are some URLs that parsers tend to diverge on. Parsers that act differently from others are in parentheses.

Empty fragment: https://example.com/# (Python, libcurl, Go)
Path normalization: https://example.com/foo/../bar (Python urlparse, Go, Node.js legacy)
Query encoding: https://example.com/?<>" (libcurl and Go)
file URL with host (Windows UNC paths): file://server01/sage/jobcosting (libcurl and Firefox)
file URL path normalization with Windows drive letter: file:///C:/.. (Firefox vs. other browsers)
Invalid percent encoding: https://example.com/%xyz (Python and Go)
Invalid IDNA encoding: https://xn--abc.com/ (Node.js legacy, Rust url, whatwg-url, Safari)

Here's a list of URLs that used to differ, but have since reached convergence:

IDNA 2003 vs. 2008: https://faß.de/ (~~Chrome <110~~, ~~Go <1.18~~)

Detailed methodology

Parsers sometimes have different names for the same part of a URL. Here, we made things consistent by using names used in the WHATWG URL API, available in browsers and Node.js. This translation may not always be exact; below we document how we figured the output of each parser.

Go net/url

Go net/http

This is Go's built-in net/url package. The parser is based on RFC 3986, with some compatibility fixes. We compiled it to WebAssembly using Go's built-in compiler support. net/url itself does not support IDNA, but the built-in net/http package does through the golang.org/x/net/idna package. We added a "Go net/http" entry to emulate how net/http handles a URL.

The part mappings are as follows. Go's url.URL object has multiple accessors for the path, query, and fragment components, each with a different level of encoded-ness; we choose the same fields/methods as the URL.String() serialization method.

Property	url.URL field/method
href	String()
protocol	Scheme
username	User.Username()
password	User.Password()
hostname	Hostname()
port	Port()
pathname	Opaque \|\| EscapedPath()
search	RawQuery
hash	EscapedFragment()

Node.js legacy

This is the Node.js's legacy URL parser, written in JavaScript based on RFC 3986. Developers have been encouraged to switch to the modern parser based on the WHATWG URL Standard since version 8 (released in 2017). We copied the parser as well as some required internal Node.js source files and bundled them using esbuild for use here.

Compared to the official Node.js binaries, the version presented here could have some slight differences when handling IDNA. This is since Node.js generally uses ICU4C's IDNA support (which is difficult to compile to WebAssembly), while here we have replaced it with a pure JavaScript implementation tr46.

The part mappings are as follows:

Property	Legacy urlObject property
href	href
protocol	protocol
username	auth.split(:)[0]
password	auth.split(:)[1…].join(:)
hostname	hostname
port	port
pathname	pathname
search	search
hash	hash

Python urlparse

This combines Python's built-in urllib.parse module with Python library Requests' requote_uri() function. Python's urllib uses various RFCs (primarily 1738 and 1808) as the basis for its parser. To run Python in the browser, we use Pyodide, which compiles Python to WebAssembly.

Since the parser does no normalization by default, we use the popular Requests library's requote_uri() for parity with other parsers listed here. The part mappings are as follows:

Property	ParseResult properties
href	geturl()
protocol	scheme
username	username
password	password
hostname	hostname
port	port
pathname	path
search	query
hash	fragment

Note: We ignore the params part, which exists in RFC 1738 but has no equivalent in other parsers and was removed in RFC 3986.

Python requests

This captures how Python library Requests' deals with request URLs. Requests uses urllib3, which is based on RFC 3986, to parse incoming URLs. However, it does some additional normalization on top of urllib3, such as applying the requote_uri() function. IDNA support in both Requests and urllib3 is provided through the idna package. The part mappings are as follows:

Property	urllib3.util.Url properties
href	url
protocol	scheme
username	auth.split(:)[0]
password	auth.split(:)[1…].join(:)
hostname	host
port	port
pathname	path
search	query
hash	fragment

libcurl

This is libcurl's URL API. curl uses RFC 3986 as the basis for its parser, with some features of the WHATWG URL Standard mixed in, as detailed on its URL Syntax documentation page. We created a simple C application "frontend" for the API and compiled it to WebAssembly using Emscripten. While curl does support IDNA using the libidn2 library, the functionality is not exposed through the URL API.

When parsing the URL, we use CURLU_NON_SUPPORT_SCHEME and CURLU_URLENCODE flags. When getting individual parts of the URL, we pass 0 as flags. The part mappings are as follows:

Property	CURLUPart
href	CURLUPART_URL
protocol	CURLUPART_SCHEME
username	CURLUPART_USER
password	CURLUPART_PASSWORD
hostname	CURLUPART_HOST
port	CURLUPART_PORT
pathname	CURLUPART_PATH
search	CURLUPART_QUERY
hash	CURLUPART_FRAGMENT

Note: We ignore CURLUPART_OPTIONS, used for IMAP/POP3/SMTP "login options." We also do not list CURLUPART_ZONEID separately as it is included in CURLUPART_HOST.

spec-url

spec-url absolute

This is the JavaScript spec-url library, a reference implementation of Alwin Blok's URL Specification. Blok's specification is designed to be a rephrasing of the WHATWG URL Standard in more mathematical terms. We used esbuild to generate a bundle for the library.

The actual parsing steps done by this tool is similar to the proposed parse-resolve-and-normalise algorithm in Blok's specification. If no base URL is specified, "web-mode" is used, and the "force resolve" step in the algorithm is not done.

The absolute variant optimizes for use of the input string as an "absolute URL," at the risk of losing some information. Concretely, the absolute variant always forces the parser output. The absolute variant is closer to how the WHATWG URL Standard operates, while the normal variant is closer to how Go's net/url and Node.js' legacy parser operate.

The part mappings are derived from Blok's specification:

Property	Field/function
href	print()
protocol	scheme
username	user
password	pass
hostname	host
port	port
pathname	root + (dirs && (dirs.join(/) + /)) + file
search	query
hash	hash

Rust url

This is Rust's url crate, created by the Servo Project. It should be highly compatible with the WHATWG URL Standard, with complete IDNA support. We compiled it to WebAssembly using wasm-pack and wasm-bindgen.

The part mappings are as follows:

Property	url::Url method
href	as_str()
protocol	scheme()
username	username()
password	password()
hostname	host_str()
port	port()
pathname	path()
search	query()
hash	fragment()

whatwg-url

This is the JavaScript whatwg-url library, designed from scratch to be a reference implementation of the WHATWG URL Standard. We load the latest (nightly) bundle of the JavaScript whatwg-url library, which is also used for its own URL Viewer program. This utility is, to a large extent, inspired by URL Viewer. URL part mapping is trivial, as whatwg-url exposes the same properties as a browser URL object.

your browser

For comparison, we also parse every URL with your own browser's URL class.