wire / encoding / percent-encoding

Percent-encoding (URI percent-encoding, RFC 3986 §2)

Percent-encoding (a.k.a. URL encoding) represents a byte that is unsafe or reserved in a given URI component as '%' followed by its two-digit uppercase hexadecimal value. Non-ASCII text is first encoded to bytes (UTF-8 in modern usage) and each byte is then percent-encoded — so 'é' (U+00E9 -> bytes C3 A9) becomes '%C3%A9'. Unreserved characters (A–Z a–z 0–9 and '-' '.' '_' '~') are never encoded.

encoding kind encoding status standard verification verified tier A encoding@1

aka: URL encoding · URI encoding · percent encoding · pct-encoding

test vectors

RFC3986

inputoutputnote
a b utf8 a%20b ascii Space (0x20) -> '%20'. The canonical percent form of a space in a path/query component.
café utf8 caf%C3%A9 ascii Non-ASCII via UTF-8 then percent: 'é' (U+00E9) = bytes C3 A9 -> '%C3%A9'. ASCII 'caf' passes through unencoded.
a/b?c#d&e=f utf8 a%2Fb%3Fc%23d%26e%3Df ascii Reserved gen-delims/sub-delims as DATA: '/'->%2F, '?'->%3F, '#'->%23, '&'->%26, '='->%3D.
AZaz09-._~ utf8 AZaz09-._~ ascii Unreserved set (RFC 3986 §2.3) passes through unchanged: letters, digits, and '-' '.' '_' '~'.
caf%C3%A9 ascii café utf8 Round-trip: percent-decode('caf%C3%A9') interprets the two %-octets as UTF-8 bytes C3 A9 -> 'é', giving 'café'.

provenance

see also

agent: curl -H 'accept: application/json' wire.phall.io/encoding/percent-encoding or /encoding/percent-encoding.json