Retrieving Information From Excel 2007 Files

Ever needed a sample code to programmatically extract data from Excel 2007 table?

 

Given an excel table like this (in book1.xlsx):

 

Name

ID

Color

Ball

1

White

Phone

2

Green

Cup

3

Yellow

 

 

Here’s a function to retrieve the contents of the table:

 

/// <summary>

/// Retrieves an excel table from a given sheet in an xlsx file

/// </summary>

/// <param name="fileName">xlsx file name</param>

/// <param name="sheetName">Sheet name (e.g. "Sheet1"</param>

/// <returns>table[rows][cols] containing all the table information for the given sheet in the given file

/// Cells with no information are returned as null. The table starts at the first element containing data in the sheet.</returns>

public List<List<string>> XLGetTable(string fileName, string sheetName);

 

And a sample usage of the above function:

 

List<List<string>> table = XLGetTable(@"book1.xlsx", "Sheet1");

 

int rows, cols;

if (table.Count == 0) {

    rows = 0;

    cols = 0;

}

else {

    rows = table.Count;

    cols = table[0].Count;

}

 

Debug.WriteLine(String.Format("rows = {0}, cols = {1}", rows, cols));

 

for (int row = 0; row < rows; row++) {

    for (int col = 0; col < cols; col++) {

        Debug.Write(String.Format("{0} ({1}, {2})\t\t", table[row][col], row, col));

    }

    Debug.WriteLine(String.Empty);

}

 

Output for the above sample in the VS Debug Window:

 

rows = 4, cols = 3

Name (0, 0) ID (0, 1) Color (0, 2)

Ball (1, 0) 1 (1, 1) White (1, 2)

Phone (2, 0) 2 (2, 1) Green (2, 2)

Cup (3, 0) 3 (3, 1) Yellow (3, 2)

 

See the attached ExcelHelper.cs for the full code of the above sample.

 

The sample is based on 2007 Office System Sample: Open XML File Format Code Snippets for Visual Studio 2005.

The Office 2007 files are using the Open XML file format. They are zip files with embedded data.

For more information check out the Ecma Office Open XML Formats architecture guide.

If you look inside an xlsx file (rename it to Zip and then open it) or uncomment the debugging code in the sample; you will see that the Excel data sheet is partially represented with XML similar to this one:

 

<?

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

- <worksheet xmlns="**http://schemas.openxmlformats.org/spreadsheetml/2006/main**" xmlns:r="**http://schemas.openxmlformats.org/officeDocument/2006/relationships**"\

<dimension ref="B3:D6" />

- <sheetViews>

- <sheetView tabSelected="1" workbookViewId="0">

<selection activeCell="B3" sqref="B3:D3" />

</sheetView>

</sheetViews>

<sheetFormatPr defaultRowHeight="15" />

- <sheetData>

- <row r="3" spans="2:4">

- <c r="B3" t="s">

<v>0</v>

</c>

- <c r="C3" t="s">

<v>3</v>

</c>

- <c r="D3" t="s">

<v>2</v>

</c>

</row>

- <row r="4" spans="2:4">

- <c r="B4" t="s">

<v>1</v>

</c>

- <c r="C4">

<v>1</v>

</c>

- <c r="D4" t="s">

<v>4</v>

</c>

</row>

- <row r="5" spans="2:4">

- <c r="B5" t="s">

<v>5</v>

</c>

- <c r="C5">

<v>2</v>

</c>

- <c r="D5" t="s">

<v>6</v>

</c>

</row>

- <row r="6" spans="2:4">

- <c r="B6" s="1" t="s">

<v>7</v>

</c>

- <c r="C6" s="1">

<v>3</v>

</c>

- <c r="D6" s="1" t="s">

<v>8</v>

</c>

</row>

</sheetData>

<pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3" />

<pageSetup orientation="portrait" horizontalDpi="4294967293" verticalDpi="0" r:id="rId1" />

- <tableParts count="1">

<tablePart r:id="rId2" />

</tableParts>

</worksheet>

 

 

You probably notice that there are no strings like “Ball”, “Phone” or “Cup” in the above XML file. That is because the strings are stored in a separate file to avoid redundancy. The sheet XML contains references to the actual strings.

 

Xlsx files can be read using functions from the System.IO.Packaging namespace. The attached code opens the xlsx file, retrieves the “main document part” - worksheet, looks for the given sheet name, and extracts the XML to construct the table information.

 

ExcelHelper.cs