By default, this script will normalize the input text to Unicode form NFC; trim outside whitespace; replace two annoying diacritics specific to the romanization of Persian (this can be disabled); remove any remaining combining diacritics (none should be present after NFC normalization); replace any unusual horizontal space character (including tab) with a normal space; reduce any instance of multiple spaces to one; remove any space before or after a line break; reduce any instance of multiple empty lines to one; and replace any non-breaking hyphen or figure dash with a normal hyphen.
With “extras” enabled, this will further replace single or double curly quotes with their straight equivalents; and replace any en or em dash with two or three normal hyphens, respectively.
There is also an option to convert all letters in the input text to lowercase. (The JavaScript method in question will attempt to do this in a locale-sensitive manner.)
Naïve counts of characters, words, and paragraphs (where separated by an empty line) are given for the cleaned output text.
User input is persisted to browser localStorage. Don’t worry—there’s no backend.
Other features may be added over time, as they occur to me. In any case, the source files are available on GitHub under the MIT License.
A plague year diversion of Theo Beers
Last updated 5 Apr. 2022