Mungomash LLC
String Escape Translator

Encode any text as a string literal — or decode an escaped literal back to plain text

Private Runs in your browser. Your strings stay on this page — nothing is sent to Mungomash, and no third-party API is contacted for the analysis.

Try a sample

Type or paste a string above — or click a sample chip — to see it expressed as a literal in every common language. Nothing is sent over the network for the analysis.

Why every language escapes strings differently

The reason “just escape it” isn’t a single answer is that string literals are part of every programming language’s grammar, and grammars don’t agree. The same character that’s totally fine inside a Python string literal might break a JSON parser, or vice versa. Each language picks its own set of meta-characters — characters that have to be replaced with an escape sequence to appear in the literal at all — and its own escape syntax for the rest of Unicode.

Three rules cover most languages most of the time: (1) the surrounding quote character has to be escaped, (2) the backslash itself has to be escaped if backslashes are the escape character, (3) certain control characters (newline, tab, NUL) have to be escaped because they break the literal’s meaning. Beyond those, the rules diverge.

The big differences worth remembering

JSON’s minimal escape set. JSON allows only six special escapes (\", \\, \/, \b, \f, \n, \r, \t) plus \uNNNN for everything else. No \xNN — that’s a Python / C / JavaScript thing. No \u{NNNNN} — for code points above U+FFFF (most emoji), JSON requires the UTF-16 surrogate pair (😀). The strict minimum set is what makes JSON portable across every parser; the trade-off is that JSON literals for non-ASCII text look ugly compared to the same text in a JavaScript or Python literal.

JavaScript’s \u{...}. Modern JavaScript (ES2015+) accepts \u{1F600} for any code point, no surrogate-pair math required. Older runtimes (and JSON) require the surrogate-pair form. This is the main difference between “modern JS source” and “portable JSON” literal output for high code points.

Python’s three Unicode escape forms. Python accepts \xNN (2-digit hex, byte value), \uNNNN (4-digit hex, BMP code point), and \UNNNNNNNN (uppercase U, 8-digit hex, any code point) as separate forms in the same literal. It also has \N{LATIN SMALL LETTER E WITH ACUTE} for named lookup — verbose but memorable.

Java’s UTF-16-only escapes. Java strings are sequences of UTF-16 code units. The only Unicode escape is \uNNNN; for code points above U+FFFF you write the surrogate pair (😀). There’s no \xNN, no \u{...}, and no \UNNNNNNNN. (Note: Java’s \uNNNN is processed at the lexer level, before any other parsing — surprising things happen if you put one inside a comment.)

Rust’s \u{...}. Same shape as modern JavaScript but the braces are required — there’s no bare \uNNNN form. \xNN is also restricted to \x00\x7F (the ASCII range); for higher byte values use \u{...}.

Bash’s three quoting modes. Single-quoted ('...') is fully literal — no escapes, no expansion, but you can’t include a literal single quote at all. Double-quoted ("...") does parameter expansion ($VAR), command substitution ($(cmd)), and a few escapes (\$, \\, \") but newlines and non-ASCII pass through literally. ANSI-C quoted ($'...') is the C-style escape mode — \n, \t, \xNN, \uNNNN, \UNNNNNNNN all work. For literal text from variables, double-quote. For data with control characters or non-ASCII you can’t put on the command line directly, ANSI-C quote.

HTML attributes vs HTML entities. Inside an HTML attribute (value="..."), only the four characters that would break the markup need entity-encoding (&, <, >, "). Non-ASCII passes through fine if the document is UTF-8. The full “HTML entities” output, in contrast, encodes every non-ASCII character as either a named entity (é) or a numeric reference (é) — useful when the document encoding is something other than UTF-8 or when you want to be paranoid about portability.

When “escape non-ASCII” matters

By default this page emits non-ASCII characters literally where the target language allows it. JavaScript, Python 3, Go, Rust, modern Java source, JSON, and Bash all accept literal UTF-8 in string literals (assuming the source file is UTF-8, which is the universal modern default). The output is shorter and easier to read; for most modern code, this is what you want.

Toggle Escape non-ASCII on when: the target source file’s encoding is unknown or ASCII-only; the literal will travel through a tool that mangles non-ASCII (some legacy diff viewers, some grep alternatives, some terminal pagers); or you want a self-documenting literal where every character is visible as its code point. With the toggle on, every code point above U+007F emits as \uNNNN / \u{NNNN} / \xNN \xNN \xNN (the byte-form for languages without a code-point form, like C without C99 or Bash double-quoted).

Decode mode — reading what you found in a log

Decode mode is for when you’re looking at an escaped literal in a log line, an error message, or a config file and want to see what it actually says. Pick the language the literal came from in the “Treat input as” dropdown, paste the escaped form, and the page renders the decoded raw text plus the same value re-encoded in every other language — useful for porting a string from one source to another.

Common decode shapes the page handles: JSON / JavaScript / Python \uNNNN escapes; Python \xNN bytes (also valid in JS, C, ANSI-C bash) and \UNNNNNNNN 8-hex code points; HTML named entities (é, ©, —) and numeric references (', '); Java \uNNNN with surrogate pairs; and the full set of ANSI-C bash escapes. If the input is malformed (e.g. \u with fewer than four hex digits), the page surfaces an inline error pointing at the offending position.

For the per-character breakdown of any decoded value — what code points and bytes it’s actually made of — cross over to the Text Encoding Inspector.