Clean up imported or pasted data in Excel

The data in your beautiful spreadsheet doesn’t necessarily start out that way, especially if you’ve copied or imported it in from some other place. By using some Excel functions and formatting techniques, you can make your names, dates and numbers look just the way you want.

Chad Rothschiller, an Excel Program Manager, put up a post on the Excel blog about this very subject a while back: Manipulating and massaging data in Excel. In his post, Chad covered, in detail, several Excel worksheet functions you can use to manipulate data.  For example, converting text to all lowercase, or to name (or title) case, extracting data from cells and combining them into one cell, and lots more.

I took Chad’s sample data and copied it into a text file that I saved with a .txt file name extension. I then ran the Excel Text Import Wizard and pointed it to my file, and imported it into a worksheet. The worksheet came out looking like this. Columns A through F are shown here, and columns G through L are shown in the next graphic.

Spreadsheet data before cleaning up, columns A through F

 And this:

Spreadsheet data before cleaning up, columns G through L

As you can see, people’s names are all lowercase and are separated into separate columns. I’d rather have them as “last, first.” Instead of retyping them or copying and pasting, I can combine or “concatenate” them into that “last, first” form. Also notice that the SSN values don’t have hyphens, and that the DOB values don’t look much like dates at all.

I took Chad’s workbook and saved it to a SkyDrive folder on*. Then, I used a feature in SkyDrive to “embed” that very same workbook in this blog post, below, where you can take a closer look at it and see how I followed Chad’s steps to clean things up.

* is a free Microsoft web site!

Take a tour of this embedded workbook

So, here’s my embedded Excel workbook with the imported data. Below the original rows are the  cleaned up cells (notice the red font in the cleaned up cells), along with explanations of how I did these. Things are color-coded so that you’ll see how I got from point A to point B, with explanatory text nearby.



You can see by looking at the rows near the top that the names are messy (not capitalized, middle name BEFORE first name). And the birthdates in column E (DOB) are actually numbers, not dates! Also, the phone numbers are missing hyphens —  in fact, they’d look even better with the area code in parentheses. These problems can all be fixed either by applying formatting or by using functions on the cells.

Changing gears a bit, a salute to blog reader ruve1k, who suggested a great formula that combines the LEFT, MID, RIGHT, and DATE functions to elegantly convert the values in the DOB column to the m/d/yyyy format! I liked it so much I just updated the workbook. Rest assured, we’re definitely listening to you!

As you peruse this worksheet, be sure to follow the color coding to see the before and after data. And read the nearby text in the same colored cells for the “how-to” information.

Even better, if you have Excel on your computer…

To really get a feel for how this all works, download the embedded workbook by clicking the View full-size workbook button at the bottom of the embedded workbook (in the black bar, at the bottom right).

Then, in the new window that appears, right above the worksheet grid, click the green Download button and save the workbook to your computer. You’ll be able to better see the formulas and try things out yourself.

So open it in Excel, and it’s all yours…experiment away!

The Download button for this file in SkyDrive


Summing up

After just a little work, people’s names look the way I want, and Social Security numbers, dates, and phone numbers are easily identifiable by their format. I did this all by applying formats in some places and using functions in others.

Chad addresses all of these issues in his blog post. In addition, he covers how to deal with unwanted spaces and oddball characters that you can’t even see that can make your data look weird. He uses the TRIM and CLEAN functions to fix those. So don’t forget to read his post! Here’s the link again:

Manipulating and massaging data in Excel


— Gary Willoughby