Skip to content

Parse and show URLs for hyperlink cells in XLSX#717

Open
Hasan-75 wants to merge 11 commits intoExcelDataReader:developfrom
Hasan-75:feature/extract-cell-hyperlinks-xlsx
Open

Parse and show URLs for hyperlink cells in XLSX#717
Hasan-75 wants to merge 11 commits intoExcelDataReader:developfrom
Hasan-75:feature/extract-cell-hyperlinks-xlsx

Conversation

@Hasan-75
Copy link
Copy Markdown
Contributor

@Hasan-75 Hasan-75 commented Aug 23, 2025

This is a solution for issue: #663 (XLSX link cell type)

For cells with hyperlink in xlsx files, the reader will parse both display text and URL.

Parsing hyperlinks with AsDataSet():

Hyperlink cells can be read as either display text, URL, or both.
When using HyperlinkParsingOption.Tuple, each hyperlink cell is returned as a Tuple<object, object> containing (DisplayText, URL).
You can then transform the tuple into whichever representation you need.

Example code:

// Example: transform hyperlink cells to return only the URL.
var transformHyperlinkValue = (IExcelDataReader reader, int index, object value) =>
    value is Tuple<object, object> hyperlink
        ? hyperlink.Item2  // Extract URL
        : value;

var ds = reader.AsDataSet(new ExcelDataSetConfiguration
{
    UseColumnDataType = false,
    ConfigureDataTable = _ => new ExcelDataTableConfiguration
    {
        UseHeaderRow = firstRowNamesCheckBox.Checked,
        HyperlinkParsingOption = HyperlinkParsingOption.Tuple,
        TransformValue = transformHyperlinkValue
    }
});

Example Output:

For a cell containing a hyperlink with display text "Click here" and URL "https://example.com":

HyperlinkParsingOption Result
DisplayText "Click here"
URL "https://example.com"
Tuple ("Click here", "https://example.com")

Note:
✅ Only External (Absolute) Hyperlinks parsing are supported. (i.e.-- http://example.com, mailto:user@example.com, file:///C:/docs/file.pdf.)

❌ Internal Document Links are not supported. (i.e. -- #Sheet2!A1, #MyNamedRange)


Screenshot:

Left: The xlsx file | Right: The parsed dataset view
image

@Hasan-75 Hasan-75 marked this pull request as draft August 24, 2025 12:06
@Hasan-75 Hasan-75 marked this pull request as ready for review August 24, 2025 12:56
@Hasan-75 Hasan-75 mentioned this pull request Aug 24, 2025
@hughbe
Copy link
Copy Markdown
Contributor

hughbe commented Oct 30, 2025

Hey. I'm the author or excel-mapper, a library built on ExcelDataReader that allows you to read excels into c# objects.

We support mapping Uri values automatically. But it would be super for users to be able to read a uri from the text of the cell.

For example we've seen examples of values that have the text value link and are hyperlinked.

Would be great to have this reviewed

@Hasan-75
Copy link
Copy Markdown
Contributor Author

@hughbe can you please share a sample xlsx file? I can take a look. Thanks!

@appel1
Copy link
Copy Markdown
Collaborator

appel1 commented Oct 31, 2025

I haven't had time to properly go through this. Sorry.

But can't we have a proper object (or struct) with the hyperlink info instead of object that you get from GetHyperlink if there is one or null if there isn't or parsing urls is not enabled in the configuration? And unless there's a huge performance impact, can't parsing hyperlink information be done always?

And what about .xls support?

@hughbe
Copy link
Copy Markdown
Contributor

hughbe commented Oct 31, 2025

GetHyperlink(i) would work nicely and match what we do with GetNumberFormatString. For formats that don't support this (csv), we'd return null?

Here's an anonymised example: Urls_Example.xlsx

User is ingested a table where the ID is linked. Wants to read the URI

Screenshot 2025-10-31 at 14 18 13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants