wire / encoding / punycode

Punycode (RFC 3492) — the IDNA ToASCII building block

Punycode is a reversible encoding that represents a Unicode string as a host-name-safe ASCII string (the letters/digits/hyphen 'LDH' set). It is the core transformation inside IDNA's ToASCII operation: each label of an internationalized domain name is Bootstring-encoded and prefixed with the ACE marker 'xn--'. The raw Punycode of 'münchen' is 'mnchen-3ya'; the full IDNA label is 'xn--mnchen-3ya'.

encoding kind encoding status standard verification verified tier A encoding@1

aka: RFC 3492 punycode · Bootstring · IDNA ACE · xn-- encoding

test vectors

RFC3492

inputoutputnote
münchen utf8 xn--mnchen-3ya ascii IDNA ToASCII of the single label 'münchen' = 'xn--mnchen-3ya'. The raw RFC 3492 Punycode (without the 'xn--' ACE prefix) is 'mnchen-3ya'.
münchen.de utf8 xn--mnchen-3ya.de ascii Full domain via IDNA ToASCII: only the non-ASCII label is encoded; '.de' passes through -> 'xn--mnchen-3ya.de'.
bücher.example utf8 xn--bcher-kva.example ascii Second domain example: 'bücher' -> 'xn--bcher-kva' (raw Punycode 'bcher-kva'), '.example' unchanged.
xn--mnchen-3ya.de ascii münchen.de utf8 Round-trip: IDNA ToUnicode('xn--mnchen-3ya.de') = 'münchen.de'.

provenance

see also

agent: curl -H 'accept: application/json' wire.phall.io/encoding/punycode or /encoding/punycode.json