File Encoding Converter

How to Use This Tool

Upload a Text File

Click Upload File or drag and drop a text file onto the input area. 35+ text-based extensions are accepted (.txt, .csv, .tsv, .log, .json, .xml, .yaml, .md, source code like .js / .ts / .py / .java / .c / .cpp, web files like .html / .css / .php, subtitle files like .srt / .ass / .vtt, and more). Binary files are rejected. Maximum file size is 50 MB.

Verify the Detected Encoding

The source encoding is auto-detected on upload using a multi-strategy pipeline (BOM > UTF-8 validation > null-byte UTF-16 sniff > CJK byte-range scoring > frequency heuristics for Cyrillic/Greek/Arabic/Hebrew/Central European > ISO-8859-15/Windows-1252 fallback). The detected encoding, confidence level (High / Medium / Low with reason), BOM presence, and file size appear in the detection panel. The first 500 characters of the source preview let you verify visually - if the text looks garbled, change the Source Encoding dropdown and click Refresh preview.

Choose Target Encoding & BOM Mode

Pick the target encoding from the dropdown (any of the 30 supported encodings). When the target is UTF-8 or UTF-16, a BOM control appears with three modes: Auto (UTF-16 gets BOM, UTF-8 does not), Include (force BOM), or Strip (remove BOM). The Output Preview panel updates live as you change source, target, or BOM - no extra click required.

Convert & Download

Click Convert & Download to write the converted bytes to a file with the original filename and save it to your device. Click Change File to load a different file, or Clear All to reset every field and start fresh.

Frequently Asked Questions

Is my file uploaded to a server?

▼

No. Your file is never sent to our servers - all encoding detection, conversion, BOM handling, and preview rendering happens directly in your browser on your device using JavaScript and the browser-native TextDecoder / TextEncoder APIs. There are no network requests at all.

How many encodings are supported?

▼

30 encodings across every major writing system: Unicode (UTF-8, UTF-16 LE, UTF-16 BE), ASCII, Western European (ISO-8859-1, ISO-8859-15, Windows-1252), Central European (ISO-8859-2, Windows-1250), South European (ISO-8859-3), North European (ISO-8859-4), Cyrillic (ISO-8859-5, Windows-1251, KOI8-R, IBM866), Arabic (ISO-8859-6, Windows-1256), Greek (ISO-8859-7, Windows-1253), Hebrew (ISO-8859-8, Windows-1255), Turkish (Windows-1254), Baltic (Windows-1257), Japanese (Shift_JIS, EUC-JP, ISO-2022-JP), Korean (EUC-KR), Simplified Chinese (GB2312, GBK), and Traditional Chinese (Big5). Both reading (decoding) and writing (encoding) are fully supported for every one of them.

How does auto-detection work?

▼

The detector runs a hand-written priority pipeline: BOM detection first (highest confidence), then null-byte sniffing for UTF-16 LE / BE, then UTF-8 multi-byte sequence validation, then ASCII detection, then ISO-2022-JP escape-sequence scan, then byte-range scoring for the five CJK encodings (Shift_JIS / EUC-JP / EUC-KR / GBK / Big5), then frequency heuristics for Cyrillic (KOI8-R vs Windows-1251 vs ISO-8859-5), Greek (Windows-1253), Arabic (Windows-1256), Hebrew (Windows-1255), and Central European (Windows-1250), and finally an ISO-8859-15 vs Windows-1252 fallback based on the presence of the euro sign byte.

Why is the confidence sometimes "Low"?

▼

Confidence is High when a Byte Order Mark is present (definitive proof of encoding), Medium when valid UTF-8 multi-byte sequences are found (almost always correct), and Low when only single-byte heuristics matched. Single-byte encodings like ISO-8859-x and Windows-125x have overlapping byte ranges so a small file in one of them is genuinely ambiguous - check the source preview and override the source encoding manually if it looks wrong.

Why do I see "?" characters in the converted output?

▼

Those are unmappable characters - your text contains a character that exists in the source encoding but has no equivalent in the target encoding. For example, converting a UTF-8 file containing emoji to ISO-8859-1 will replace the emoji with "?" because ISO-8859-1 only has 256 code points. To preserve every character, pick a Unicode target like UTF-8 or UTF-16.

What about the BOM (Byte Order Mark)?

▼

When the target is UTF-8 or UTF-16, a BOM control appears with three modes. Auto includes a BOM for UTF-16 (where it indicates byte order) and strips it for UTF-8 (where it is optional and breaks some tools). Include forces a BOM into the output. Strip removes any BOM from the output. Use Strip for PHP / Bash / shell scripts that break with BOM, and Include for CSV files you want Excel to open as UTF-8.

What file types and sizes are accepted?

▼

Only text-based files are accepted - 35+ extensions including .txt, .csv, .tsv, .log, source code (.js, .ts, .py, .java, .c, .cpp, .html, .css, .php, .rb, .go, .rs), data formats (.json, .xml, .yaml, .yml, .ini, .env, .toml, .conf, .properties), documents (.md, .rst, .tex, .rtf), .sql, subtitle files (.srt, .sub, .ass, .vtt), and .svg. Binary files like images, PDFs, ZIPs, and Office documents are rejected with an error. Maximum file size is 50 MB to keep the browser tab responsive.

Will my file persist between visits?

▼

No. This tool intentionally does not use localStorage - your file lives only in browser memory while the tab is open. Closing the tab, refreshing the page, or clicking Clear All wipes everything. Nothing is ever written to disk except the file you explicitly download with Convert & Download.

Convert File Encoding Online Free - No Upload Required

Convert text file encoding online for free with this browser-based file encoding converter. Upload a text file, auto-detect its encoding using a multi-strategy pipeline, preview the source content, pick a target encoding, control the BOM, and download the converted file - all directly in your browser. Your files are never sent to our servers. No registration or software installation required.

This online encoding converter supports 30 encodings across every major writing system: Unicode (UTF-8, UTF-16 LE/BE), ASCII, Western European (ISO-8859-1, ISO-8859-15, Windows-1252), Central European (ISO-8859-2, Windows-1250), Cyrillic (KOI8-R, Windows-1251, ISO-8859-5, IBM866), Arabic, Greek, Hebrew, Turkish, Baltic, Japanese (Shift_JIS, EUC-JP, ISO-2022-JP), Korean (EUC-KR), Simplified Chinese (GB2312, GBK), and Traditional Chinese (Big5). Features include BOM detection and control (Auto / Include / Strip) for UTF-8 and UTF-16 targets, hand-written CJK byte-range detection, frequency-based Cyrillic disambiguation (KOI8-R vs Windows-1251), confidence levels with reasoning, live source and output preview, drag-and-drop upload, file type and 50 MB size validation, and a memory-only privacy model with no localStorage and no network calls. No registration needed to start converting instantly.

Features Explained

30 Encodings Across Every Major Writing System

▼

Supports UTF-8, UTF-16 LE/BE, ASCII, ISO-8859-1 / -2 / -3 / -4 / -5 / -6 / -7 / -8 / -15, Windows-1250 / 1251 / 1252 / 1253 / 1254 / 1255 / 1256 / 1257, KOI8-R, IBM866, Shift_JIS, EUC-JP, ISO-2022-JP, EUC-KR, GB2312, GBK, and Big5. Both reading (decoding) and writing (encoding) are fully supported for every one of them - this is not a UTF-8-only converter pretending to support other formats.

Multi-Strategy Auto-Detection Pipeline

▼

The detector runs a hand-written priority pipeline: BOM detection first (highest confidence), then null-byte sniffing for UTF-16 LE / BE, then UTF-8 multi-byte sequence validation, then ASCII detection, then ISO-2022-JP escape-sequence scan, then byte-range scoring for the five CJK encodings, then frequency heuristics for Cyrillic / Greek / Arabic / Hebrew / Central European, and finally an ISO-8859-15 vs Windows-1252 fallback based on the presence of the euro sign byte.

BOM Detection (Highest Confidence)

▼

Files starting with the UTF-8 BOM (EF BB BF), the UTF-16 LE BOM (FF FE), or the UTF-16 BE BOM (FE FF) are detected with High confidence on the very first three bytes. The detection panel shows "BOM detected" so you know the result is definitive.

UTF-8 Multi-Byte Validation

▼

Files without a BOM are scanned byte-by-byte and validated against the UTF-8 specification: 2-byte sequences must start with 110xxxxx and have one continuation byte, 3-byte with 1110xxxx and two continuation bytes, 4-byte with 11110xxx and three. If every multi-byte sequence is well-formed and at least one exists, the file is reported as UTF-8 with Medium confidence.

CJK Encoding Disambiguation

▼

Five different CJK encodings (Shift_JIS, EUC-JP, EUC-KR, GBK / GB2312, Big5) are scored by counting valid lead-byte / trail-byte pairs in the file. The encoding with the highest score wins, provided the score crosses a minimum threshold to filter noise. ISO-2022-JP is detected separately by scanning for the ESC ($ escape sequence.

Cyrillic / Greek / Arabic / Hebrew Frequency Analysis

▼

For non-CJK files, byte-frequency heuristics distinguish KOI8-R vs Windows-1251 (different Cyrillic letter ranges), pick up ISO-8859-5 Cyrillic, Windows-1253 Greek, Windows-1256 Arabic, and Windows-1255 Hebrew based on the dominant byte ranges typical of each script.

Detection Confidence Levels

▼

Three levels with explicit reasoning: High ("BOM detected") for files with a Byte Order Mark, Medium ("valid UTF-8 byte sequences") for valid UTF-8 without BOM, and Low ("fallback heuristic") for single-byte encodings where the result is a best-effort guess. The confidence label appears in the detection details panel so you know how much to trust the result.

Detection Details Panel

▼

After upload (or after clicking Detect Encoding manually), a panel shows the detected encoding name, confidence level with reason, BOM presence (yes / no), and exact file size. Useful when you just want to identify an unknown file's encoding without converting it.

True Target-Encoding Output

▼

Single-byte encodings use a reverse character-to-byte map built from the browser's TextDecoder. Multi-byte CJK encodings use two-byte reverse lookup tables seeded by iterating valid lead/trail byte pairs through TextDecoder. UTF-16 LE/BE are written byte by byte with manual surrogate-pair handling. Unmappable characters (e.g. emoji in ISO-8859-1) are replaced with '?'.

BOM Control: Auto / Include / Strip

▼

When the target is UTF-8 or UTF-16, a BOM mode toggle appears. Auto includes a BOM for UTF-16 (where it indicates byte order) and strips it for UTF-8 (where it's optional and breaks some tools). Include forces a BOM. Strip removes any BOM. Use Strip for PHP / Bash scripts that break with BOM, Include for CSV files you want Excel to open as UTF-8.

Live Source Preview (First 500 Characters)

▼

The source area shows the first 500 characters of your file decoded with the currently selected source encoding. If the preview is garbled, change the Source Encoding dropdown and click Refresh preview to re-decode and visually verify until the text looks right.

Auto-Updating Output Preview

▼

The output preview re-renders automatically every time you change the source encoding, target encoding, or BOM mode (via a useEffect). No extra Generate or Preview Output button - the converted text appears as soon as you tweak any setting, so you can iterate fast.

Drag & Drop Upload with Highlight

▼

Drag a text file from your file explorer directly onto the input area. The drop zone highlights while you drag. Files are validated for type and size on drop just like clicked uploads, with clear error messages for rejected files.

File Type & Size Validation

▼

Only text-based files are accepted - 35+ extensions are whitelisted (.txt, .csv, .tsv, .log, source code, web files, data formats, subtitles, .svg). Binary files like images, PDFs, ZIPs, and Office documents are rejected before they're read, with a clear error message naming the unsupported extension. Files over 50 MB are also rejected to keep the browser tab responsive.

Original Filename Preserved on Download

▼

When you click Convert & Download, the converted file is saved with the original filename - not a generic "converted.txt". This means a UTF-8-to-Windows-1252 round trip of report.csv saves back as report.csv, ready to drop into the same workflow that produced the input.

Memory-Only Privacy (No localStorage)

▼

Unlike most tools on this site, the File Encoding Converter intentionally does not persist anything to localStorage. Your file lives only in browser memory while the tab is open. Closing the tab, refreshing, or clicking Clear All wipes everything - the only output is the file you explicitly download.

Who Is This Tool For?

Software Developers

Fix encoding issues in source files, build scripts, and config files. Convert legacy Windows-1252 sources to UTF-8 in one click.

Backend Developers

Normalize CSV / JSON / XML data dumps to UTF-8 before importing into databases that expect a specific encoding.

Web Developers

Ensure HTML, CSS, and JavaScript files use the correct encoding for proper rendering across browsers and locales. Strip troublesome UTF-8 BOMs from PHP files.

Mobile App Developers

Convert localization files (.strings, .properties, .json) between encodings when integrating translations from teams using different tooling.

Localization Engineers

Receive client files in regional encodings (Shift_JIS, GB2312, Windows-1251) and convert them to UTF-8 for modern translation memory tools.

i18n Engineers

Audit and normalize string catalogs across languages and platforms - especially when consolidating legacy CJK files into a single Unicode workflow.

Translators

Open files from agencies and clients that arrive in legacy encodings without garbled text. Convert deliverables back to whatever encoding the client requested.

Subtitle & Caption Editors

Convert .srt, .ass, .ssa, .sub, and .vtt subtitle files between encodings to fix character display issues in media players that expect specific encodings.

Manga & Anime Fan-Translators

Convert Japanese script files between Shift_JIS, EUC-JP, ISO-2022-JP, and UTF-8 when working across editor tools, scanlation pipelines, and online repositories.

Game Developers

Convert CJK asset files, level data, and dialogue scripts between regional encodings and UTF-8 when porting games or merging contributions from international teams.

Data Engineers

Clean up CSV / TSV exports from legacy systems before feeding them into ETL pipelines that assume UTF-8. Spot mojibake before it pollutes your warehouse.

ETL Developers

Pre-process source files arriving in mixed encodings into a single Unicode standard before staging into your target database.

Database Administrators

Convert SQL dumps and seed files between encodings when migrating databases with different default character sets (latin1 -> utf8mb4 and similar).

Data Analysts

Fix CSV files where accented characters or non-Latin scripts show as garbled symbols in Excel, Google Sheets, or pandas.

System Administrators

Convert log files, config files, and shell scripts between encodings when migrating between Windows, Linux, and macOS systems.

DevOps Engineers

Strip UTF-8 BOMs from YAML / TOML / Dockerfile / .env files that some parsers (Bash, older Python) refuse to handle correctly.

Security Researchers

Decode suspect files from samples and exfil dumps that arrive in unknown encodings - the detection details panel identifies the encoding even when the file extension lies.

Forensic Analysts

Identify the original encoding of recovered text files from non-English systems before extracting evidence into a UTF-8 case file.

Tech Support Staff

Help customers whose files arrive with mojibake by converting their attachments to a known-good encoding before opening them.

Archivists & Digital Preservationists

Migrate historical text collections from legacy single-byte encodings to UTF-8 for long-term preservation and modern access.

Librarians (Digital Collections)

Normalize bibliographic and metadata files (MARC, plain-text catalogs) to UTF-8 for inclusion in modern library systems.

Legacy System Maintainers

Convert COBOL data files, EBCDIC-style exports (after upstream conversion), and mainframe text dumps into UTF-8 for analysis on modern tooling.

Open Source Maintainers

Reproduce contributor bug reports involving mojibake by converting their attached files into the encoding their build expects.

Students & Researchers

Open non-English text corpora, historical documents, and academic datasets in the right encoding instead of staring at boxes and question marks.

Tips for Encoding Conversion

Check the source preview first

If the source preview shows garbled text (mojibake), the auto-detected encoding is wrong. Pick a different Source Encoding and click Refresh preview until the text looks correct, then proceed with conversion.

Trust High-confidence BOM detection

When the detection panel says "High (BOM detected)", the encoding is definitive - the file literally starts with bytes that identify its encoding. Never override that without a very good reason.

UTF-8 is usually the right target

UTF-8 supports every Unicode character, is the default for the modern web, GitHub, JSON, XML, and almost every modern tool. Convert TO UTF-8 unless you have a specific reason to target a legacy encoding.

Strip the UTF-8 BOM for code files

PHP, Bash, older Python, and many shell tools break with a UTF-8 BOM. Pick UTF-8 as the target and set BOM mode to Strip for source code, .env files, YAML, JSON, and shell scripts.

Include a UTF-8 BOM for Excel CSVs

Microsoft Excel only opens a CSV as UTF-8 if it starts with a BOM. For CSV files you intend to open in Excel, pick UTF-8 as the target and set BOM mode to Include.

CJK files need the right family pick

Japanese files might be Shift_JIS, EUC-JP, or ISO-2022-JP. Chinese files might be GB2312, GBK, or Big5. The auto-detector is good but not perfect - if the source preview is garbled, manually try each CJK encoding from the dropdown.

Watch for "?" in the output preview

If the converted output shows "?" characters, your text contains characters the target encoding can't represent. Switch to a Unicode target (UTF-8 or UTF-16) to preserve everything, or accept the lossy conversion if the target is required.

Small files are harder to detect

Auto-detection needs enough bytes to find statistical patterns. For files under ~100 bytes, confidence drops to Low and the result may be a fallback guess. Verify the source preview manually for tiny files.

Use the detection panel as a standalone tool

If you only need to identify a file's encoding (not convert it), upload the file and read the detection panel - the encoding name, confidence, BOM presence, and file size are everything you need to label or document a mystery file.

Live output preview means no extra clicks

Change the Source Encoding, Target Encoding, or BOM mode and the output preview re-renders instantly - no Generate button. Iterate fast until the converted text looks correct, then click Convert & Download.

The downloaded file keeps your filename

Convert & Download saves the converted bytes under the original filename, so a UTF-8 round-trip of report.csv comes back as report.csv. No need to rename anything.

Memory-only - close the tab to wipe

This tool intentionally has no localStorage. Your file lives only in browser memory while the tab is open. Closing the tab, refreshing, or clicking Clear All removes everything - nothing is ever saved to disk except the file you explicitly download.

Supported Input Formats

The tool accepts text-based files only. Binary files like images, PDFs, ZIPs, and Office documents are rejected before being read. The maximum file size is 50 MB. Once uploaded, the file's encoding is auto-detected and you can convert between any of the 30 supported encodings.

Accepted File Extensions

Category	Extensions
Plain text & docs	.txt, .md, .rst, .tex, .rtf, .log
Tabular data	.csv, .tsv
Data formats	.json, .xml, .yaml, .yml, .ini, .env, .toml, .conf, .properties
Source code	.js, .ts, .py, .java, .c, .cpp, .php, .rb, .go, .rs, .sql
Web	.html, .htm, .css, .svg
Subtitles	.srt, .sub, .ass, .vtt

Supported Encodings (30 total)

Region / Script	Encodings	Auto-Detected
Unicode	utf-8, utf-16le, utf-16be	Yes (BOM + heuristics)
ASCII	ascii	Yes (when all bytes < 128)
Western European	iso-8859-1, iso-8859-15, windows-1252	Yes (fallback heuristic)
Central European	iso-8859-2, windows-1250	Yes (frequency heuristic)
South / North European	iso-8859-3, iso-8859-4	No (manual select)
Cyrillic	iso-8859-5, windows-1251, koi8-r, ibm866	Yes (frequency disambiguation)
Arabic	iso-8859-6, windows-1256	Yes (frequency heuristic)
Greek	iso-8859-7, windows-1253	Yes (frequency heuristic)
Hebrew	iso-8859-8, windows-1255	Yes (frequency heuristic)
Turkish	windows-1254	No (manual select)
Baltic	windows-1257	No (manual select)
Japanese	shift_jis, euc-jp, iso-2022-jp	Yes (CJK byte-range scoring + ESC scan)
Korean	euc-kr	Yes (CJK byte-range scoring)
Simplified Chinese	gb2312, gbk	Yes (CJK byte-range scoring)
Traditional Chinese	big5	Yes (CJK byte-range scoring)

Encodings marked "No (manual select)" are fully supported for both reading and writing - they just are not part of the auto-detection pipeline because their byte ranges overlap too heavily with similar encodings. Pick them manually from the Source Encoding dropdown when you know the file format.

BOM Modes (UTF-8 / UTF-16 Targets Only)

Mode	UTF-8 Target	UTF-16 LE / BE Target
Auto (default)	No BOM (modern web standard)	BOM included (byte order indicator)
Include	EF BB BF prepended	FF FE (LE) or FE FF (BE) prepended
Strip	Any existing BOM removed	Any existing BOM removed

Privacy & Security

This free file encoding converter runs entirely in your browser. Your file, the detected encoding, the source preview, the output preview, and the converted bytes are never sent to our servers - all reading, detection, conversion, and BOM handling happens on your device using JavaScript and the browser-native TextDecoder / TextEncoder APIs. There are no network requests at all.

Memory-only privacy model: unlike most tools on this site, the File Encoding Converter intentionally does not use localStorage. Your file lives only in browser memory while the tab is open. Closing the tab, refreshing the page, or clicking Clear All wipes everything immediately. The only output that ever leaves the browser is the file you explicitly choose to save with Convert & Download. We have no logs, no analytics, no tracking, and no database.