<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.office.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Manipulating and Massaging Data in Excel</title><link>http://blogs.office.com/b/microsoft-excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx</link><description>Today’s author: Chad Rothschiller, a program manager on the Excel team. Chad is going to discuss using formulas to 'clean up' data in Excel. Overview Excel is a great tool to use when you need to take data in one format, manipulate it into another format</description><dc:language>en-US</dc:language><generator>Telligent Community 1.5.134.15456 (Build: 5.5.134.15456)</generator><item><title>re: Manipulating and Massaging Data in Excel</title><link>http://blogs.office.com/b/microsoft-excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx#6085</link><pubDate>Wed, 12 Dec 2007 12:48:56 GMT</pubDate><guid isPermaLink="false">53587256-c606-4c9b-bad4-97c86b12ce62:6085</guid><dc:creator>TechieBird</dc:creator><description>&lt;p&gt;David/Chad&lt;/p&gt;
&lt;p&gt;Great post - these are techniques I use all the time, and I frequently get asked to help colleagues &amp;#39;clean&amp;#39; their data. &amp;nbsp;Nice to see that someone has had time to put all this information together so now I have a place I can send people when I don&amp;#39;t have time to help!&lt;/p&gt;
&lt;p&gt;-TB &amp;nbsp;8&amp;gt;&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.office.com/aggbug.aspx?PostID=6085" width="1" height="1"&gt;</description></item><item><title>re: Manipulating and Massaging Data in Excel</title><link>http://blogs.office.com/b/microsoft-excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx#6086</link><pubDate>Wed, 05 Dec 2007 23:01:28 GMT</pubDate><guid isPermaLink="false">53587256-c606-4c9b-bad4-97c86b12ce62:6086</guid><dc:creator>Chad Rothschiller</dc:creator><description>&lt;p&gt;Charles - could you send an example of each type in a workbook? Use &lt;a rel="nofollow" target="_new" href="http://blogs.msdn.com/excel/contact.aspx"&gt;blogs.msdn.com/.../contact.aspx&lt;/a&gt; to start an email conversation and I can take a look.&lt;/p&gt;
&lt;p&gt;--Chad&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.office.com/aggbug.aspx?PostID=6086" width="1" height="1"&gt;</description></item><item><title>re: Manipulating and Massaging Data in Excel</title><link>http://blogs.office.com/b/microsoft-excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx#6087</link><pubDate>Wed, 05 Dec 2007 22:41:02 GMT</pubDate><guid isPermaLink="false">53587256-c606-4c9b-bad4-97c86b12ce62:6087</guid><dc:creator>Charles</dc:creator><description>&lt;p&gt;I have a spreadsheet of email addresses that all appear to have the @ sign in them - but when some of them reach their destination they @ sign has been replaced with the HTML code %40. &amp;nbsp;There is no visual difference between those that appear this way and those that don&amp;#39;t.&lt;/p&gt;
&lt;p&gt;Is there any way to clean this?&lt;/p&gt;
&lt;p&gt;Thanks!&lt;/p&gt;
&lt;p&gt;Charles&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.office.com/aggbug.aspx?PostID=6087" width="1" height="1"&gt;</description></item><item><title>re: Manipulating and Massaging Data in Excel</title><link>http://blogs.office.com/b/microsoft-excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx#6088</link><pubDate>Wed, 05 Dec 2007 19:30:19 GMT</pubDate><guid isPermaLink="false">53587256-c606-4c9b-bad4-97c86b12ce62:6088</guid><dc:creator>David Gainer</dc:creator><description>&lt;p&gt;msn999 - this is a known issue that we are hoping to fix in the upcoming SP1.&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.office.com/aggbug.aspx?PostID=6088" width="1" height="1"&gt;</description></item><item><title>re: Manipulating and Massaging Data in Excel</title><link>http://blogs.office.com/b/microsoft-excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx#6089</link><pubDate>Tue, 04 Dec 2007 19:05:34 GMT</pubDate><guid isPermaLink="false">53587256-c606-4c9b-bad4-97c86b12ce62:6089</guid><dc:creator>msn999</dc:creator><description>&lt;p&gt;Sorry, newbees and don&amp;#39;t know where to blog.&lt;/p&gt;
&lt;p&gt;when I plot my data as an exponential curve in Excel, the Excel 2007 gave a wrong equation (y=18159e0.0236x)&lt;/p&gt;
&lt;p&gt;while Excel 2003 (y=181596e0.0236x) give correct equation&lt;/p&gt;
&lt;p&gt;Raw data:&lt;/p&gt;
&lt;p&gt;x	y&lt;/p&gt;
&lt;p&gt;0	200000&lt;/p&gt;
&lt;p&gt;25	302500&lt;/p&gt;
&lt;p&gt;49	535833&lt;/p&gt;
&lt;p&gt;74	1035000&lt;/p&gt;
&lt;p&gt;96	1863333&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.office.com/aggbug.aspx?PostID=6089" width="1" height="1"&gt;</description></item><item><title>re: Manipulating and Massaging Data in Excel</title><link>http://blogs.office.com/b/microsoft-excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx#6090</link><pubDate>Thu, 29 Nov 2007 19:59:12 GMT</pubDate><guid isPermaLink="false">53587256-c606-4c9b-bad4-97c86b12ce62:6090</guid><dc:creator>Chad Rothschiller</dc:creator><description>&lt;p&gt;&amp;lt;p&amp;gt;Excel Novice - I&amp;#39;d just copy the CLEAN function down the whole column. To cover multiple columns, just keep adding new columns of formulas off to the right, for as many as you need to clean.&amp;lt;/p&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;p&amp;gt;Robert LaSalle - You&amp;#39;re essentially talking about Data Quality, Cleansing, Standardization, and Deduplication. I have seen companies spend tens of thousands of dollars to clean up their marketing databases / address lists. It is just plain irritating to send someone a piece of mail 3 different times. It gives the impression you don&amp;#39;t know who they are. You have a similar problem with names: is S. Thorpe the same as Steven Thorpe? I would say it depends. If the address is different, then certainly they’re not the same. If the address is the same, then, well, what if Steven has a wife Shelly? But at least you know it&amp;#39;s the same household.&amp;lt;/p&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;p&amp;gt;The real way to solve this problem is to first normalize the address (N becomes North, St becomes Street (in most cases), and so on) and parse it out into sub fields (like house number, direction, street name, street type). Then you can use matching rules to decide if they&amp;#39;re the same. You can even weight certain fields as stronger indications of a match than others. e.g. it isn&amp;#39;t that big of a deal if the direction is missing on 1 of the records you&amp;#39;re comparing, but it certainly isn&amp;#39;t a match if the street name doesn&amp;#39;t match. To do all this, you&amp;#39;ll need specialized software that might cost you upwards of $100K (see &amp;lt;a href=&amp;quot;&lt;a rel="nofollow" target="_new" href="http://www.trilliumsoftware.com/&amp;quot;"&gt;www.trilliumsoftware.com/&amp;quot;&lt;/a&gt; target=&amp;quot;_new&amp;quot; rel=&amp;quot;nofollow&amp;quot;&amp;gt;&lt;a rel="nofollow" target="_new" href="http://www.trilliumsoftware.com&amp;lt;/a&amp;gt;"&gt;http://www.trilliumsoftware.com&amp;lt;/a&amp;gt;&lt;/a&gt; for an example of such software).&amp;lt;/p&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;p&amp;gt;To do a one-off job in Excel, you&amp;#39;re going to have to play with the data and eyeball it. You might consider making a copy of your data, then removing all the directional pieces from the address (N/S/E/W/NE/NW/SE/SW/North/South/East/West/etc), and maybe even the suffix data as well (Ave/Avenue/St/Street/Drive/Dr/etc). Then sort by those fields and look for duplicates, looking at first name/last name to help, and so on. Then go back to the original data to find and remove possible duplicates. You could also try sorting by last name then first name, and looking for duplicates there.&amp;lt;/p&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;p&amp;gt;To quickly flag duplicates in a list, here&amp;#39;s what I sometimes do. Say there are sorted addresses in A1:A10. In B2 I&amp;#39;d put this formula: =IF(A2=A1,&amp;quot;X&amp;quot;,&amp;quot;&amp;quot;), and then copy it down. The X will flag any duplicates and you can quickly work through them.&amp;lt;/p&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;p&amp;gt;OK, enough on that topic!&amp;lt;/p&amp;gt;&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.office.com/aggbug.aspx?PostID=6090" width="1" height="1"&gt;</description></item><item><title>re: Manipulating and Massaging Data in Excel</title><link>http://blogs.office.com/b/microsoft-excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx#6091</link><pubDate>Mon, 26 Nov 2007 15:36:37 GMT</pubDate><guid isPermaLink="false">53587256-c606-4c9b-bad4-97c86b12ce62:6091</guid><dc:creator>Robert LaSalle</dc:creator><description>&lt;p&gt;I have an excel spreadsheet with 2000 lines of data. Some of the data is the same address. One may be writtes as 301 N Howeth St, one may be written 301 Howeth St, one may be written 301 Howeth. I want to have a way to run a script and find these similar rows and ask me if I want to delete any so I can get rid of the dups. Any help on this? rlasalle@mfgtx.com&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.office.com/aggbug.aspx?PostID=6091" width="1" height="1"&gt;</description></item><item><title>re: Manipulating and Massaging Data in Excel</title><link>http://blogs.office.com/b/microsoft-excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx#6092</link><pubDate>Tue, 20 Nov 2007 16:21:45 GMT</pubDate><guid isPermaLink="false">53587256-c606-4c9b-bad4-97c86b12ce62:6092</guid><dc:creator>Excel Novice</dc:creator><description>&lt;p&gt;Any way to use CLEAN to clean-up non-printable characters in the whole cloumn or from multiple columns?&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.office.com/aggbug.aspx?PostID=6092" width="1" height="1"&gt;</description></item><item><title>re: Manipulating and Massaging Data in Excel</title><link>http://blogs.office.com/b/microsoft-excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx#6093</link><pubDate>Tue, 20 Nov 2007 16:16:21 GMT</pubDate><guid isPermaLink="false">53587256-c606-4c9b-bad4-97c86b12ce62:6093</guid><dc:creator>Excel Novice</dc:creator><description>&lt;p&gt;Any way to use CLEAN to clean-up non-printable characters in the whole cloumn or from multiple columns?&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.office.com/aggbug.aspx?PostID=6093" width="1" height="1"&gt;</description></item><item><title>re: Manipulating and Massaging Data in Excel</title><link>http://blogs.office.com/b/microsoft-excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx#6094</link><pubDate>Sat, 17 Nov 2007 02:21:37 GMT</pubDate><guid isPermaLink="false">53587256-c606-4c9b-bad4-97c86b12ce62:6094</guid><dc:creator>Charlie Ellis </dc:creator><description>&lt;p&gt;Thanks for bringing the issue with Data Tables to our attention. &amp;nbsp;This is a known issue and we&amp;#39;re working on getting a fix out as a part of a future release. &amp;nbsp;We&amp;#39;ll update you on when the fix will be available when we have more information to share.&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.office.com/aggbug.aspx?PostID=6094" width="1" height="1"&gt;</description></item></channel></rss>