You can use your favorite social network to register or link an existing account:
Or use your email address to register without a social network:
Sign in with these social networks:
Or enter your username and password
Forgot your password?
Yes, please link my existing account with for quick, secure access.
No, I would like to create a new account with my profile information.
Tips
How-to
News
Videos
Stories
This post is brought to you by Chad Rothschiller a Program Manager on the Excel team. To try out this new feature download the Office Preview here.
Before coming to Microsoft I spent a good deal of time messing around with data. I would extract first names, last names, and email addresses from chunks of text. I would join first names and last names together into a single column of text. Sometimes I only needed first name initials, put together with the last name. I had to change the format & display of dates, social security numbers, and dollar amounts. Because the data came from a database, I often had to make sure the names had Proper Casing (e.g. not ALL CAPS, not all lower).
Today I’m proud to be writing about a new feature in Excel 2013 called Flash Fill that makes all of those tasks very simple for even the most novice user. Several years ago, I wrote a blog post demonstrating how to do string manipulations like the above examples. The problem is that the solutions relied heavily on using formulas, which can feel quite complex and intimidating for people not as familiar with Excel. Now, with Flash Fill, you can do it all without formulas. In fact, it’s easier than that. Now you can do it all by providing a few examples.
Here’s what Flash Fill looks like in action:
Let’s work through the examples in my original post, but using Flash Fill as the solution. I updated a few of the values from the original data set to make things more interesting, you can get it here. After pasting the data into Excel, you’ll need to split it out into separate columns. I used Text to Columns feature, using “comma” as the separator.
Note: I used bold formatting on the top row so they stand out. But it also helps Excel to understand that the top row is a header row so that Flash Fill won’t use it as part of your examples when generating pattern rules.
Insert a new column between SSN and Last Name and title it “SSN”. Notice below that as you start typing the formatted SSN, Excel will help you complete the column based on the value you entered in the prior row.
At this point Flash Fill recognizes what you’re doing and makes additional suggestions:
If you like the suggestions (in this case, I do!), just hit ENTER to accept them.
Now, if you’re paranoid like me, then at step 2. You won’t trust yourself to re-type the SSN correctly. Instead you can double click in A2 (goes into “cell edit mode”), copy the value, then double click cell B2 (back into “edit mode”), and paste the value. Then, before hitting ENTER to commit the value, put the cursor between the 3rd and 4th digit, and type the hyphen. Ditto for between the 5th and 6th character. Now hit enter, and continue to step 3.
Insert another column between First Name and DOB, call it Name.
After hitting enter, Flash Fill adds my new example, creates a new set of rules, and re-applies the new set of rules to the entire range (excluding the values you typed), which includes the other row without a middle name / initial (Guissani, Laura, 2nd from the bottom): Notice that all the extra spaces in the original name values were removed. Flash Fill ignored the extra whitespace and cleaned up my results.
Let’s convert the DOB column to date format so it’s more readable and so Excel can understand the values as dates. Insert a new column called DOB.
…and the results:
Now that we have converted to dates, we can change the date format to whatever we like, say m/d/yyyy, by using the Number Format dialog box again.
You can use the same steps to convert Date of Letter, so I’ll skip that step.
Notice that there were a few extra steps when dealing with dates (and numbers in general). The algorithm engine we’re using looks at everything as text. So it doesn’t know any special rules that go along with numbers (for example, it’s OK to drop leading zeros when dealing with numbers, but not so much when you’re dealing with Postal Codes or other IDs). If I hadn’t initially formatted the new DOB column with two digits for month and two digits for day, then the algorithm would see me converting “02” to “2” (say, “February”), and figure that I only want the 2nd digit of the month number. But then it would also convert “12” to “2”, because it doesn’t know that there’s a special number rule that says it’s OK to drop leading zeros, but not other leading digits. Looking to the future, this is definitely an area where we can push the feature forward and get better results faster.
Insert a new column after Home Phone, and call it Phone (I made it 96 pixels so I could see the suggestions).
This is going to be very similar to the date step. Once you’ve got your target column set up (I called mine Zip), select the whole range, and format it as TEXT. We do this so that Excel doesn’t treat the number as a number and re-drop the leading zero.
1. Format the Zip range as TEXT.
2. Start typing Postal Codes, and see the suggestion UI.
I do see some of the suggestions are a bit off, but I’m going to take it anyway, and then go revise the ones it didn’t get right. Flash Fill is probably thinking that instead of a “hard coded” leading zero, I want to extract that digit from somewhere else in my data.
3. After accepting, I go to the one that needs to be revised (in my case, it’s in cell S5), and type the desired value.
4. …and hit enter.
Notice that with that single revision, the other values were updated as well.
I didn’t give detailed steps for these because getting the desired results is very straight forward, once you know how to trigger Flash Fill.
You might have noticed some patterns to how I was working with my data. Here are a couple things you should know to get the most out of Flash Fill.
Then it would be better to provide an example suffix value for the 2nd or 3rd row of data (“EF” or “YZ”, respectively), rather than the first (“AB”). The reason is because it’s unclear whether the “AB” value would come from before or after the hyphen in the source data.
Thanks everyone for reading and providing comments. If you come across a pattern that doesn’t work well with Flash Fill, please let us know. We’re always looking for ways to improve, and your feedback is valued and appreciated!
Comments: (10) Collapse
You can get it here www.microsoft.com/.../try-office-preview
Chad - this is a great feature. Thanks.
Say I have 2000 cells and I flash Fill, You get 95% of them, but can not figure out 5% of them and leave them blank. There is a little UI dropdown that pops up where I can select the 100 blank cells. Does selecting just the missing cells help when providing the second rule, or were you figuring we might want to change the fill color in this case?
When your list of data is complex and the Flash Fill is not able to "understand" your rule, it's better to sort your list before to use the Flash Fill. This tip solves a lot of problems ;)
Very interesting post and it prompted me to try the feature out alongside the new WebService() function. I've blogged about my experiments here
workerthread.wordpress.com/.../excel-2013-preview-webservice-function-and-flash-fill
Chad - another tip. You noted that Flash Fill doesn’t know that 1 might correspond to “January”, 2 to “February”, and so on. However, if you convert the column of dates to use the Long Date format, you can get Flash Fill to give you "January", "February" and so on.
The selection doesn't matter when providing the second example. That revsion will be added to the example set, a new rule (or set of rules) will be re-created from the new example set and applied over the entire original Flash Filled range.
Yes, we were thinking people might want to fill those a certain color, or you could also tab through them to review them (though 100 is probably too many for that). Note that you get a similar option for selecting the changed cells, and that's also available in the status bar at the bottom.
Finally, another way to review your results is to sort the Flash Filled range (sorting is one of the few operations that doesn't automatically dismiss the Flash Filled range from being dynamic/active) and look at the results that way.
Thanks Bill. Yes, date formatting is really powerful in Excel, though ultimately we want Flash Fill to do all the heavy lifting for you. Here's a pointer to an article on custom number formatting: office.microsoft.com/.../number-format-codes-HP005198679.aspx
Yes Frederic, the more uniform your data, the better. One trick to help isolate where the Flash Fill rule will be applied is to explicitly select only the range you want filled in (which would include a couple of good examples you provide). So if you have 40 rows of data, and after sorting it looks like the first 10 are similar in shape/structure, then provide an example, then select those first 10 rows, and hit CTRL + E (or Data tab / Flash Fill) to fill only the first 10 rows. Repeat for the other sets of similar data.
Another option would be to use a filter to narrow down to only the data you want to Flash Fill (weed out the rows you don't want Flash Fill to operate on) - - the feature will only work on the non-filtered (visible) data.
Excel Flash Fill is a brilliant time saver, i wish i have it years ago. It automatically separates data pasted from sites into separate columns. Really cool!
FlashFill looks really interesting, but this touches on something related that I’ve run into a bunch of times.
In the other blog post you reference you have:
1. In P2 enter this formula: =DATE(MID(E2,1,4),MID(E2,5,2),MID(E2,7,2))
2. Fill that formula down across the rows of data (P2:P10 in this case)
But, what I’ve always wanted to be able to do is basically to create a user-defined function (but not vb-script).
So for the above, I’d write define a function like so:
define MyFn(cell) DATE(MID(cell,1,4),MID(cell, 5,2),MID(cell, 7,2))
Then in P2:P10 I’d fill with “=MyFn(E2)”
The closest I’ve come to this is to define a table and have excel automatically fill-down when you edit the top row.