Pulling RSS data into Excel (or: Using Excel to Search Craigslist) - Part 1

image

 

This blog post is brought to you by Dan Battagin a Lead Program Manager on the Excel team.

OK, so I'm going to talk a bit about a relatively unknown feature in Excel: XML data import. It was introduced in Excel 2003, but we've done a pretty good job hiding it since Excel 2007 by putting it on the Developer tab of the Ribbon. I'm going to make the topic even a bit more geeky by using it in conjunction with VBA.

In exchange, I'm going to produce, at the end, a nice little solution that you can use to search Craigslist...across multiple Craigslist sites, all from a single Excel sheet. Oh, and I'll give you the workbook too, in case that's all you want. Cool ehh?

Right, so let's set the stage. Craigslist is an amazingly cool marketplace - everything's free, and so both buyers and sellers can get a pretty good deal. A problem that I've run into though is that the local selection isn't always great, and so I find myself searching a few different sites for the item I want. Sound familiar?

Let's build a sheet that looks something like this, that allows us to search as many (or few) craigslist sites as we want for a given item:

clip_image002

Searching the entirety of craigslist - nice.

To do this, we'll take a few overall steps:

  1. Using XML import (for RSS), setup the results table for the workbook (for a single site).
  2. Add the "Search box" to parameterize the XML import.
  3. Add the ability to search multiple sites in one fell swoop.
  4. Add the "Status indicator" so we know how much longer the search will take.

With that, let's get ready to rumble!

Using XML Import to Setup the Results Table

To get started, let's setup the results table - this is actually quite easy - it just entails us getting the URL to the RSS results for a search. Something like this:

http://seattle.craigslist.org/search/?areaID=2&subAreaID=&query=delta+jointer&catAbb=sss&format=rss

To get the data into Excel, follow these steps:

  1. Enable the Developer ribbon, if it's not already by clicking File | Options | Customize Ribbon and then check the "Developer" checkbox in the right-hand listbox.
  2. Select cell B5 in your sheet (I'm positioning us here due to future steps in this blog post)
  3. Click Developer | XML > Source, then click XML Maps... | Add... from the task pane that opens
  4. Paste in the URL above and click Open, then OK the XML Schema warning that is shown, then click OK again.
  5. Drag the node labeled "ns3:item" into B5 of your sheet. You should now be looking at something like this:

clip_image004

XML Mapping in Excel - probably a new feaure for you!

Whew, we've got our "data connection" into Excel now, and we just need to clean up the Table a bit. Again, this isn't hard, but there are a few steps to take...

  1. We only need a few columns. Rename them as follows by just typing in the table header, and reorder them if you want (I put Date after Linky):
    • ns3:title > Title
    • ns3:link > Linky
    • ns3:description > Description
    • ns3:date > RawPostDate
  2. Delete all the other columns in the table by right-clicking on them and choosing to Delete Table Column
  3. Add a couple new new table columns wherever you want in the table. Call them "Date" and "Location"
  4. For each of the new columns, enter the following formulas in the first row under the table headers:

(by the way - you'll see two things: Firstly, the new "@" notation for referring to the current table row, which we added in Excel 2010 to make working with tables easier. Secondly, you may get some #VALUE!'s in cells - don't worry about those for now)

 

Column<:o:p>

Value<:o:p>

Date<:o:p>

=VALUE(LEFT([@RawPostDate], SEARCH("T", [@RawPostDate])-1))

Location<:o:p>

=LEFT(RIGHT([@Linky], LEN([@Linky])-LEN("http://")), (SEARCH(".", [@Linky])-1)-LEN("http://"))

5. OK, now we're ready for some data. Just right-click on the table and choose Refresh. Ignore any errors, they are meaningless for our purposes, and we'll take care of them later.

6. Next, let's get the table looking a bit better by:

  1. Select a single cell in the table | right-click | XML | XML Map Properties... and
  1. uncheck "Adjust column width"
  2. select "Append new data to existing XML tables"
  • Select the entire table and click Home | Cells | Format |Row Height, and set the height to 15.

7. Lastly, a couple cool formatting "tricks."

  • Select the Linky column and right-click | Format Cells. Select the Number tab | Custom, and use the following format, exactly as shown, which tells Excel to always display the text "link" in the cells (oooooo, aaaaaahhhhh):
    ;;;"link"
  • Select the Date column and right-click | Format Cells. Select Date, and choose a format to your liking.

And just like that, we've got a search results table. Cool beans.

clip_image005

RSS data in Excel!

Adding the Search Box to Parameterize the XML Import

OK, the search results are cool, but I mean really...it's just a baby step in the right direction. After all, it's hard-coded to a single search term right now. To change that, we'll add a search box, and let the user update the results for the table. That's going to involve a bit of VBA...

  1. Select cells B2:E3 and merge them.
  2. Select the merged cell and click Formulas | Define Name, and call the name rngSearchTerm.
  3. Insert an image or shape or button that will be the "start" search button.
  4. Right click on the "start" button and choose "Assign Macro", then name the macro "RunSearch" and click Edit in the dialog that comes up.
  5. Enter the following VBA for the RunSearch method:

 6. Close the VBA editor and search away.

Now that's pretty cool - a parameterizable search that uses XML maps to import data from the Craigslist RSS feed. Next time, we'll look at updating the example so that we pull data from multiple Craigslist sites, and we'll add a little progress meter.

Office Blogs Comments

Comments: (18) Collapse

  • Would you be nice enough to provide an example of a similar parametric search on a SharePoint List's View?

  • I absolutely cannot wait for the next post.  I remarkably underutilize XML, and this is a great example.

  • When I pasted in the URL I got an error message stating: "Cannot load the specified XML or schema source."

    Any way to fix this?

  • Dan: There is no need to uncover the Developer tab first, you can simply use the File, Open dialog and type/paste the url to the rss in the filename box. Excel will prompt what to do with the XML file, select "Use the XML Source taskpane".

    See this related article: www.jkp-ads.com/articles/xmlandexcel00.asp

  • @Ruby

    I had the same problem (using xl2003).  I followed the tutorial using my LinkedIn RSS feed instead.

  • By far one of the neatest blogs yet.  You now have me out there searching for RSS feeds that I can incorporate.  This was definitely a feature that I didn't even know existed!

  • Thank you for this. It's got just the right level of detail to help a power user learn cool new stuff.  I've always been a bit scared of XML but this makes it easy.

  • Hey folks - thanks for the comments.  Couple responses -

    @Ruby, @Joe - thanks for the heads up - I wrote this post a while ago and it looks like CraigsList updated their URL format.  I've updated the post (the URL and the sample VBA) to fix the problem and work with the new format.

    @JK - yep, true.  Like with most everything in Excel, there are many ways to do the same thing :).  I used the Developer ribbon so that people could learn about XML lists, and possibly use them in the future.

    Cheers!

  • When is the next installment?  You have me hanging on the edge of my seat....

  • @Ken Next installement is next week :)

  • This is great! I can't wait until you post part 2.

  • Outstanding post, I've gone back and created some shim code to convert from the CSV format that some legacy sites export to XML, but one question, I pivot the data and it seems like there is something strange happening with formatted data.  A value is a percentage with Excel seems to recognize (shows format as Percent), if I do any simple formula (divide by 1.0) it behaves like a number, BUT in the pivot it seems to be acting like a string, because only Count works and any calculation (e.g. average shows divide by zero).  I this something unique to XML map, or the more likely case am I doing something stupid?

  • Great Post.  Thanks.  When I used your custom format for the Linky column, it returned the text "Link" but is not a hyperlink to the item.  What did I do wrong?  Also the two formulas I copied into the two cells below the Date and Test Column headers disappear when I run the macro and no vaule shows in either.  

  • @ John L

    Don't just copy and paste the formulas for the Date and Location from this page into your excel sheet.  Type it in manually, and be sure to take out the '@' symbols.

    I think that is what you are talking about...and I hope it helps

  • David,

    Thanks.  That took care of one problem.  Any ideas on getting the Linky cells to have the hyperlink?  I typed this for the cell formatting: (oooooo, aaaaaahhhhh):;;;"link" and the cells show the word Link but they're not hyperlinks.  I'm using Office 2007 if that makes any difference.

1 2  Next >

Comments: (loading) Collapse