<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jonathan Hedley &#187; javascript</title>
	<atom:link href="http://jonathanhedley.com/tag/javascript/feed" rel="self" type="application/rss+xml" />
	<link>http://jonathanhedley.com</link>
	<description>Winning at everything so that you don&#039;t have to.</description>
	<lastBuildDate>Wed, 18 Aug 2010 10:25:32 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Convert Microsoft Word to plain text</title>
		<link>http://jonathanhedley.com/articles/2008/03/convert-microsoft-word-to-plain-text</link>
		<comments>http://jonathanhedley.com/articles/2008/03/convert-microsoft-word-to-plain-text#comments</comments>
		<pubDate>Mon, 17 Mar 2008 08:05:00 +0000</pubDate>
		<dc:creator>Jonathan Hedley</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[microsoft word]]></category>
		<category><![CDATA[plain text]]></category>
		<category><![CDATA[workflow]]></category>

		<guid isPermaLink="false">http://jonathanhedley.com/articles/2008/03/convert-microsoft-word-to-plain-text</guid>
		<description><![CDATA[This script converts text from Microsoft Word into plain text. Paste in your Word text, hit clean, and get lovely clean text out.]]></description>
			<content:encoded><![CDATA[<p><em>This is a repost of an entry from 2004. This Word-cleaning functionality is showing up in more and more web editors, but people might still find this useful.</em></p>
<p>Most of the time when I&#8217;m writing content for the web (for this blog, or a forum comment, or whatever), I&#8217;ll write in Microsoft Word for the spell check and other features that aren&#8217;t in a standard <code>textarea</code> widget, and then I&#8217;ll cut and paste into the form on the site.</p>
<p>The problem is that this carries all of the high characters (&#8220;smart-quotes&#8221; and the like) that MS Word makes straight through to the site &#8212; and most sites aren&#8217;t set up to handle them. They expect plain (&#8220;Latin&#8221;) text.</p>
<p class="a1"><b>A solution</b>: this script converts text copied from MS word into plain text. Paste your input into the top box, press <b>clean</b>, and the input will be scrubbed and sent to the lower box.</p>
<p>(If you want to clean up Word HTML, rather than just create plain text, I suggest that you use <a href="http://infohound.net/tidy/" target="_top">HTML Tidy</a> with the &#8220;clean&#8221; and &#8220;Word 2000&#8221; boxes checked.)</p>
<div class="leftSpan">
<!-- BEGIN SNIPPET -->
<pre>
<script language="JavaScript">
var swapCodes   = new Array(8211, 8212, 8216, 8217, 8220, 8221, 8226, 8230); // dec codes from char at
var swapStrings = new Array("--", "--", "'",  "'",  "\"",  "\"",  "*",  "...");  
function cleanWordClipboard(input) {
    // debug for new codes
    // for (i = 0; i < input.length; i++)  alert("'" + input.charAt(i) + "': " + input.charCodeAt(i));    
    var output = input;
    for (i = 0; i < swapCodes.length; i++) {
        var swapper = new RegExp("\\u" + swapCodes[i].toString(16), "g"); // hex codes
        output = output.replace(swapper, swapStrings[i]);
    }
    return output;
}
</script>
<form name="wordCleaner" onsubmit="return false;">
<textarea style="width: 720px; height: 150px" id="wordInput">&bull; &ldquo;Double Quotes&rdquo;,
&bull; &lsquo;Single quotes&rsquo;,
&bull; Ellipsis &hellip;,
&bull; em-dash &mdash;</textarea>&nbsp;

<textarea style="width: 720px; height: 150px" id="wordOutput"></textarea>&nbsp;
<button onClick='document.getElementById("wordOutput").value = cleanWordClipboard(document.getElementById("wordInput").value)'>Clean</button>
</form>
</pre>
<p><!-- END SNIPPET  -->
</div>
<p>Web Developers: feel free to <span class="a2">use this code on your own forms</span> to clean your user&#8217;s input (although you&#8217;d probably be better off doing it server-side). Just change the <code>onClick()</code> method to convert the text inplace.</p>
<p><script language="Javascript">document.getElementById("wordOutput").value = cleanWordClipboard(document.getElementById("wordInput").value)</script></p>
<div class="rhs">
<div class="a1">
<p>Obviously after this phase, you'll want to cast those now clean but dumb quotes into smart HTML quotes: for that you'll want to use <a href="http://daringfireball.net/projects/smartypants/">SmartyPants</a>.</p>
</p>
<p>Unless, of course, you think that <a href="http://www.metafilter.com/69543/looking-for-some-dumb-quotes">smart quotes are pretentious</a>. In that case, you're all set!</p>
</div>
<p class="a2"><a href="http://static.jonathanhedley.com/2008/03/word-to-plain-text.js" title="Word to plain text Javascript">Download the source</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://jonathanhedley.com/articles/2008/03/convert-microsoft-word-to-plain-text/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
