Back
Excel

Flash Fill

This post is brought to you by Chad Rothschiller a Program Manager on the Excel team. To try out this new feature download the Office Preview here.

Before coming to Microsoft I spent a good deal of time messing around with data. I would extract first names, last names, and email addresses from chunks of text. I would join first names and last names together into a single column of text. Sometimes I only needed first name initials, put together with the last name. I had to change the format & display of dates, social security numbers, and dollar amounts. Because the data came from a database, I often had to make sure the names had Proper Casing (e.g. not ALL CAPS, not all lower).

Today I’m proud to be writing about a new feature in Excel 2013 called Flash Fill that makes all of those tasks very simple for even the most novice user. Several years ago, I wrote a blog post demonstrating how to do string manipulations like the above examples. The problem is that the solutions relied heavily on using formulas, which can feel quite complex and intimidating for people not as familiar with Excel. Now, with Flash Fill, you can do it all without formulas. In fact, it’s easier than that. Now you can do it all by providing a few examples.

Here’s what Flash Fill looks like in action:

clip_image001

Let’s work through the examples in my original post, but using Flash Fill as the solution. I updated a few of the values from the original data set to make things more interesting, you can get it here. After pasting the data into Excel, you’ll need to split it out into separate columns. I used Text to Columns feature, using “comma” as the separator.

clip_image002
Note: I used bold formatting on the top row so they stand out. But it also helps Excel to understand that the top row is a header row so that Flash Fill won’t use it as part of your examples when generating pattern rules.

Step 1: Format social security numbers like this: 123-45-6789

Insert a new column between SSN and Last Name and title it “SSN”. Notice below that as you start typing the formatted SSN, Excel will help you complete the column based on the value you entered in the prior row.

  1. Make sure the inserted column is wide enough to display the formatted value (80 pixels should be enough; this isn’t required to use Flash Fill, but it looks better that way)
  2. Type the first social security number in B2, with hyphens: 413-66-4341
  3. Start typing the second SSN value: 2

At this point Flash Fill recognizes what you’re doing and makes additional suggestions:

clip_image003

If you like the suggestions (in this case, I do!), just hit ENTER to accept them.

Now, if you’re paranoid like me, then at step 2. You won’t trust yourself to re-type the SSN correctly. Instead you can double click in A2 (goes into “cell edit mode”), copy the value, then double click cell B2 (back into “edit mode”), and paste the value. Then, before hitting ENTER to commit the value, put the cursor between the 3rd and 4th digit, and type the hyphen. Ditto for between the 5th and 6th character. Now hit enter, and continue to step 3.

Step 2: Format names like this: Rothschiller, Chad B. (in other words: <Last Name>, <First Name> <Middle Initial>.)

Insert another column between First Name and DOB, call it Name.

  1. Make sure the new Name column is wide enough for everyone’s names (162 pixels should be enough; again, not required, just looks nice)
  2. Enter into cell F2: Freehafer, Nancy D.
  3. Type the “C” for Cencini, and see Excel Flash Fill make suggestions for the rest of the dataclip_image004 
  4. If you like the results (not perfect, but I’ll take them), hit ENTER to accept. clip_image005
     
    In this case, Flash Fill wasn’t sure what to do for rows without a middle name or initial, because the only example I provided used a middle initial.
    By providing additional examples, called revisions, I can teach Flash Fill what I want for those cases that it got wrong.
  5. Go down to cell F5 (first blank cell)
  6. Type: Sergienko, Mariya, then hit ENTER
    Notice that while editing that cell, the Flash Fill visuals come back for the Flash Filled range: clip_image006

   

After hitting enter, Flash Fill adds my new example, creates a new set of rules, and re-applies the new set of rules to the entire range (excluding the values you typed), which includes the other row without a middle name / initial (Guissani, Laura, 2nd from the bottom):clip_image007 

Notice that all the extra spaces in the original name values were removed. Flash Fill ignored the extra whitespace and cleaned up my results.

Step 3: Convert DOB from 19610222 to 2/22/1961 (more generally, from YYYYMMDD to M/D/YYYY)

Let’s convert the DOB column to date format so it’s more readable and so Excel can understand the values as dates. Insert a new column called DOB.

  1. Make sure the column is wide enough so you can see what you’re doing (75 pixels would work).
  2. Select the entire range where the new dates will go, and format them as mm/dd/yyyy (Format Cells / Number / Custom, then enter “mm/dd/yyyy” into the Type: text box).clip_image008
  3. Type the first 3 dates.clip_image009
  4. Notice that Flash Fill hasn’t triggered. We decided to disable automatic Flash Fill for numeric data. After all, there are only 10 unique characters in our numbering system, and it’s just too easy to come up with random patterns that don’t make good sense. For this case, providing additional examples and using the ribbon button is the way to go.
  5. On the Data tab, click the Flash Fill button:clip_image010

…and the results:clip_image011

Now that we have converted to dates, we can change the date format to whatever we like, say m/d/yyyy, by using the Number Format dialog box again.clip_image012

You can use the same steps to convert Date of Letter, so I’ll skip that step.

Notice that there were a few extra steps when dealing with dates (and numbers in general). The algorithm engine we’re using looks at everything as text. So it doesn’t know any special rules that go along with numbers (for example, it’s OK to drop leading zeros when dealing with numbers, but not so much when you’re dealing with Postal Codes or other IDs). If I hadn’t initially formatted the new DOB column with two digits for month and two digits for day, then the algorithm would see me converting “02” to “2” (say, “February”), and figure that I only want the 2nd digit of the month number. But then it would also convert “12” to “2”, because it doesn’t know that there’s a special number rule that says it’s OK to drop leading zeros, but not other leading digits. Looking to the future, this is definitely an area where we can push the feature forward and get better results faster.

Step 4: Format phone numbers like (206) 555-1212

Insert a new column after Home Phone, and call it Phone (I made it 96 pixels so I could see the suggestions).

  1. Start typing! (206) 695-9457
  2. When you start typing the “(“ for the second row, the suggestions kick in (Note: while we’re working with numbers here, since it’s mixed with text, Excel treats them as text, not numbers, so you will automatically get suggestions). clip_image013
  3. Hit enter to accept, and you’re done!

Step 5: Put the leading zero back on Postal Codes (like 01001)

This is going to be very similar to the date step. Once you’ve got your target column set up (I called mine Zip), select the whole range, and format it as TEXT. We do this so that Excel doesn’t treat the number as a number and re-drop the leading zero.

1. Format the Zip range as TEXT.clip_image014

2. Start typing Postal Codes, and see the suggestion UI.clip_image015

I do see some of the suggestions are a bit off, but I’m going to take it anyway, and then go revise the ones it didn’t get right. Flash Fill is probably thinking that instead of a “hard coded” leading zero, I want to extract that digit from somewhere else in my data.

3. After accepting, I go to the one that needs to be revised (in my case, it’s in cell S5), and type the desired value. clip_image016

4. …and hit enter. clip_image017

Notice that with that single revision, the other values were updated as well.

What about the other columns in your source data?

I didn’t give detailed steps for these because getting the desired results is very straight forward, once you know how to trigger Flash Fill.

  • E-Mail Address values can be converted to lower case. Just insert a new column and start typing lower-case versions of the values and Flash Fill will suggest to do the rest of them for you.
  • Extra spaces can be removed from Address. Just start typing the values without the spaces in a new column and let Flash Fill do the rest.
  • State values can be made UPPER CASE by typing a couple UPPER CASE values in a new column.

Some final tips on using Flash Fill

You might have noticed some patterns to how I was working with my data. Here are a couple things you should know to get the most out of Flash Fill.

  • The Flash Fill algorithm for recognizing patterns doesn’t treat numbers or digits any differently than text.

    • It doesn’t know that zeros to the left of the left-most non-zero digit are insignificant and can be dropped (or shouldn’t be dropped, if it’s a Postal Code).
    • It doesn’t know that 1 might correspond to “January”, 2 to “February”, and so on.
    • Having said the above, Flash Fill does a good job with numbers and dates, though in some cases you might have to think more explicitly about how to get the desired results.
  • Flash Fill does better when the relationship between the source data and the example output you provide is clear, not ambiguous. For example, if your source data looks like this:

clip_image018

Then it would be better to provide an example suffix value for the 2nd or 3rd row of data (“EF” or “YZ”, respectively), rather than the first (“AB”). The reason is because it’s unclear whether the “AB” value would come from before or after the hyphen in the source data.

  • To get automatic suggestions, you have to be editing right next to related data.

    • You can’t have one or more blank columns between the cell you’re editing and the data its related to.
    • If you’re in doubt, just select the cell, and hit CTRL + A (select the contiguous range). If the resulting selection is the whole sheet, then Excel doesn’t think the cell is next to any data. If the select is some set of data that’s related, then your cell is in the right spot.
  • To get automatic suggestions, you have to make two sequential edits, one right on top of the other. “Two sequential edits” means that you don’t do something else between edit #1 and edit #2 (e.g. switch to another sheet) .
  • If your table of data has headers at the top, make sure Excel recognizes those as headers, so that it excludes them when generating Flash Fill rules.
    • The way I like to see whether Excel thinks I have headers is to put the active cell in the data range and hit CTRL + T (create Table). The dialog that comes up has a checkbox saying “My table has headers”. If that box is checked, then Excel’s automatic header detection logic thinks there are headers. Otherwise, Excel doesn’t think you have headers.
    • The easiest way to “tell” Excel you have headers is to change your range into a Table (just hit OK on the dialog after hitting CTRL + T).

Thanks everyone for reading and providing comments. If you come across a pattern that doesn’t work well with Flash Fill, please let us know. We’re always looking for ways to improve, and your feedback is valued and appreciated!

Join the conversation

10 comments
  1. Chad – this is a great feature. Thanks.
    Say I have 2000 cells and I flash Fill, You get 95% of them, but can not figure out 5% of them and leave them blank. There is a little UI dropdown that pops up where I can select the 100 blank cells. Does selecting just the missing cells help when providing the second rule, or were you figuring we might want to change the fill color in this case?

    • The selection doesn’t matter when providing the second example. That revsion will be added to the example set, a new rule (or set of rules) will be re-created from the new example set and applied over the entire original Flash Filled range.

      Yes, we were thinking people might want to fill those a certain color, or you could also tab through them to review them (though 100 is probably too many for that). Note that you get a similar option for selecting the changed cells, and that’s also available in the status bar at the bottom.

      Finally, another way to review your results is to sort the Flash Filled range (sorting is one of the few operations that doesn’t automatically dismiss the Flash Filled range from being dynamic/active) and look at the results that way.

  2. When your list of data is complex and the Flash Fill is not able to "understand" your rule, it’s better to sort your list before to use the Flash Fill. This tip solves a lot of problems ;)

    • Yes Frederic, the more uniform your data, the better. One trick to help isolate where the Flash Fill rule will be applied is to explicitly select only the range you want filled in (which would include a couple of good examples you provide). So if you have 40 rows of data, and after sorting it looks like the first 10 are similar in shape/structure, then provide an example, then select those first 10 rows, and hit CTRL + E (or Data tab / Flash Fill) to fill only the first 10 rows. Repeat for the other sets of similar data.

      Another option would be to use a filter to narrow down to only the data you want to Flash Fill (weed out the rows you don’t want Flash Fill to operate on) – - the feature will only work on the non-filtered (visible) data.

  3. Chad – another tip. You noted that Flash Fill doesn’t know that 1 might correspond to “January”, 2 to “February”, and so on. However, if you convert the column of dates to use the Long Date format, you can get Flash Fill to give you "January", "February" and so on.

  4. Fill to the end is very good. But my Excel 2013 Preview doesn’t slow down scrolling at the end of the table, like Excel 2010 did. Is that final decision to omit that feature? That would be a disaster.

  5. FlashFill looks really interesting, but this touches on something related that I’ve run into a bunch of times.
    In the other blog post you reference you have:
    1. In P2 enter this formula: =DATE(MID(E2,1,4),MID(E2,5,2),MID(E2,7,2))
    2. Fill that formula down across the rows of data (P2:P10 in this case)
    But, what I’ve always wanted to be able to do is basically to create a user-defined function (but not vb-script).
    So for the above, I’d write define a function like so:

    define MyFn(cell) DATE(MID(cell,1,4),MID(cell, 5,2),MID(cell, 7,2))

    Then in P2:P10 I’d fill with “=MyFn(E2)”

    The closest I’ve come to this is to define a table and have excel automatically fill-down when you edit the top row.

Comments are closed.