question

sreeprasad-9894 avatar image
0 Votes"
sreeprasad-9894 asked karenpayneoregon answered

How to get duplicate values in one excel coloumn and count using c# and excel interoperbility

hi there, I have to get count of repeated values in a particular column dynamically using c#

lets say like

coloumn A
........................
MOUSE
CAT
CAT
TIGER
CAT

here i need to select each column cell value and find how many times that value occurs

dotnet-csharpoffice-excel-itpro
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Paul-5034 avatar image
0 Votes"
Paul-5034 answered Paul-5034 commented

I'm not sure what format you're dealing with as input, but if you can get each cell into a collection of strings it should just involve grouping by the value:

 var columnA = new string[] { "MOUSE", "CAT", "CAT", "TIGER", "CAT" };
    
 var groups = columnA.GroupBy(a => a)
     .Select(grp => $"{grp.Key} occurs {grp.Count()} times");
    
 Console.WriteLine(string.Join("\n", groups));

Produces:

MOUSE occurs 1 times
CAT occurs 3 times
TIGER occurs 1 times

· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

column A is having 100 data suppose. What i need to do i have one input value CAT in my first iteration i will check all CAT values and replace to CATZZ. Like tTIGER,RAT etc...what i can do

0 Votes 0 ·

Do you mean for this input:
MOUSE
CAT
CAT
TIGER
CAT

You want this as output?:
MOUSE 1
CAT 1
CAT 2
TIGER 1
CAT 3

0 Votes 0 ·

Not exactly i will tell simply with an Insurance policy example ..consider my excel

!129311-image.png
1.Concider I have a function named "UpdatePolicyNoWithSSN"

2.this function will take one row of excel and its new SSN number
3.i will query to databese with this SSN and I will get a new POLICY NUMBER
4.Find the Old POLICY_NO of excel and Check any duplicates, if yes Replace all with NEW POLICY_NO but not replace id=01
5.Create a new excel sheet with changed POLICY_NO with all ID except ID=01.

EXPECTED RESULT IS BELOW
!129293-image.png


MYCODE WHICH IS NOT GETTING EXPECTED OUTPUT


       for (int i = 2; i <= lastUsedRow; i++)
             {
                 Range SSN = (Range)xlWorksheet.Cells[i, 10];
                 string NewPolicynumber = "HP" + GetNewPolicyNumber(SSN.Value);
                    
                 for (int j = 2; j <= countOfEveryDuplicates; j++)
                 {

                     Range ID = (Range)xlWorksheet.Cells[j, 17];
                     Range OldpolicyNumb = (Range)xlWorksheet.Cells[j, 2];
                     Range cell = (Range)xlWorksheet.Cells[j, 2];
                     if ((string)cell.Value == OldpolicyNumb.Value && ID.Value != "01") 
                     {
                         xlWorksheet.Cells[j, 2] = NewPolicynumber;
                     }

                   }

                }

I


0 Votes 0 ·
image.png (14.3 KiB)
image.png (20.9 KiB)
image.png (22.3 KiB)
Show more comments
karenpayneoregon avatar image
0 Votes"
karenpayneoregon answered

A recommendation would be, if working with .xlsx style of Excel is to look at SpreadSheetLight (free and on NuGet with two versions one for .NET Framework, one for .NET Core Framework).

See also

129306-excelfinddups.png

Conceptual starter which works source code.

 using System;
 using System.Collections.Generic;
 using SpreadsheetLight;
    
 namespace SpreadsheetLightDataGridViewExport.Classes
 {
     public class SpreadSheetLightOperations
     {
         /// <summary>
         /// Find duplicate string values and return their row index
         /// </summary>
         /// <param name="fileName">Excel file to read</param>
         /// <param name="sheetName">WorkSheet to work with</param>
         /// <param name="search">Text to find</param>
         /// <param name="columnIndex">Column index to search</param>
         /// <returns>
         /// Named Value Tuple
         /// items      - list of indices
         /// exception  - reports run time exceptions, 99 percent of the time it's from open the file outside of the
         ///              application while working on it in code.
         /// </returns>
         public static (List<int> items, Exception exception) FindDuplicates(string fileName, string sheetName, string search, int columnIndex)
         {
             var indicesList = new List<int>();
    
             try
             {
                 using (var document = new SLDocument(fileName, sheetName))
                 {
                        
                     var stats = document.GetWorksheetStatistics();
                     var test = stats.EndRowIndex;
                     for (int index = 1; index < stats.EndRowIndex + 1; index++)
                     {
                         if (document.GetCellValueAsString(index, columnIndex).EqualsIgnoreCase(search))
                         {
                             indicesList.Add(index);
                         }
                     }
                 }
    
                 return (indicesList, null);
                    
             }
             catch (Exception exception)
             {
                 return (indicesList, exception);
             }
               
         }
     }
 }

Form code source code

 using System;
 using System.IO;
 using System.Linq;
 using System.Windows.Forms;
 using SpreadsheetLightDataGridViewExport.Classes;
    
 namespace SpreadsheetLightDataGridViewExport
 {
     public partial class PoliciesForm : Form
     {
         public PoliciesForm()
         {
             InitializeComponent();
         }
    
         private void PoliciesForm_Load(object sender, EventArgs e)
         {
             PoliciesListBox.DataSource = Enumerable.Range(1, 6).Select(x => $"OLD0{x}").ToList();
         }
    
         private void FindDuplicatesButton_Click(object sender, EventArgs e)
         {
             ResultsListBox.Items.Clear();
                
             var policy = PoliciesListBox.Text;
             //ExcelOperations
    
             var fileName = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "FindDups.xlsx");
             var sheetName = "Polices";
    
             /*
              * pass in file name, text to find and the column index, in this case A column
              */
             var (items, exception) = SpreadSheetLightOperations
                 .FindDuplicates(fileName, sheetName, policy, 1);
                
             if (exception != null)
             {
                 MessageBox.Show($"Error\n{exception.Message}");
             }
             else
             {
                 if (items.Count >1)
                 {
                     foreach (var item in items)
                     {
                         ResultsListBox.Items.Add(item.ToString());
                     }
                 }
                 else
                 {
                     MessageBox.Show($"No rows found for {PoliciesListBox.Text}");
                 }
             }
         }
     }
 }



excelfinddups.png (20.6 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.