question

kirukim-9614 avatar image
0 Votes"
kirukim-9614 asked kirukim-9614 edited

How to get all the values in the second column that are related to the first one directly or not?

Urgent!!!Need help!
It's been days trying to solve this
I wanna write code(python or M for Power Bi) that will allow me to iterate over those 2 columns and check the values and get all the values from the second column that may have a relationship with the first one directly or not as shown in the second picture.

83823-4.png83852-3.png


power-query-not-supported
4.png (6.0 KiB)
3.png (7.2 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Appears to be a duplicate of Iterate over a table using M


0 Votes 0 ·
kirukim-9614 avatar image
0 Votes"
kirukim-9614 answered

@Lz-3068 It took me all week and I couldn't upload my data after applying the first code I keept only the two columns to try it and It not finished yet, Now what I think to do is to duplicate my table and apply some filters on the "TYPE_APPELANT" column and upload table by table I'll post my data to avoid more effort.
Sorry for taking so long
https://drive.google.com/file/d/10nEORIdTVuZRW3xf_idJIPbKvHBS2uzz/view?usp=sharing

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Lz-3068 avatar image
0 Votes"
Lz-3068 answered kirukim-9614 commented

Hi @kirukim-9614
(was busy with a customer engagement). Had a quick look at the .csv you shared (> 160k records) that appears to have a bunch of duplicates. With such number of records and what you expect to do this will potentially generate millions of records...

In this workbook (a couple of query options are disabled) I've taken a different approach (merging instead of selecting rows) that's more efficient (tested with your first 1000 records). Note that it starts by removing duplicates.
With these first 1000 records > 646k rows were generated (this took about 90 mins on a Core i7 & Excel/PQ 64bit). I haven't tried with your 160k records as I anticipate this will take hourssss to complete - assuming this can complete...

TBH I don't see what else to do/to suggest as I don't see how to approach the problem differently and/or where optimizations are possible. However I'm not an expert and folks with more experience/expertise might have better ideas...

Hope this helps a bit anyway


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

kirukim-9614 avatar image
0 Votes"
kirukim-9614 answered Lz-3068 commented

@Lz-3068 The workbook is empty, I can't see anything but anyway I don't know wich code did u used to have the result u talked about.
I can wait a week if it will give me the result i want I'm already late but hope it will work finally

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@kirukim-9614
With the potential millions of records you should get this won't fit on a worksheet => this must be loaded to the Data Model (Power Pivot). Reason why I setup QueryResult as a Connection only + Add this data to the Data Model. So right-click on QueryResult > Refresh

0 Votes 0 ·
kirukim-9614 avatar image
0 Votes"
kirukim-9614 answered

@Lz-3068 I can't load the Data Model (Power Pivot) cause in my version of excel 2013 it not allowed and I can't change the excel version, I'm just wondering if I should apply the same code that you added recently (Get Data, AddAppele and the QueryResult) and wait for it to be done.
I've a i5-3700U CPU and I tried the same code on a collegue laptop an i7 but I take alot too and I just stopped it

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Lz-3068 avatar image
0 Votes"
Lz-3068 answered

Hi @kirukim-9614

Q: I'm just wondering if I should apply the same code that you added recently (Get Data, AddAppele and the QueryResult) and wait for it to be done
A: Why would you do this given that the output won't fit on an Excel sheet and you can't load to the Data Model?

Alternatively you could do it with Power BI Desktop (pref. the 64 bit version). However, the key question is: what do you want to do afterwards with this output (again, this is going to generate millions of rows...)?
(PBI Desktop allows exporting to .csv)

I'm quite busy but should be able to put something together for you if this can help. Looking back at one of your last replies I saw you considered treating each [TYPE_APPELANT] seperately and this could definitively help as this would reduce the number of records to match (Appelants <-> Appeles)
The way to go would be to split CROSS_REF.csv by [TYPE_APPELANT] to generate 16 .csv. The PBI query would go over each .csv to apply to logic previously discussed and finally combine all records as a single table

Let me know...

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

kirukim-9614 avatar image
0 Votes"
kirukim-9614 answered

@Lz-3068 I need to create a network graph to show me each root (file) and the files related to it and and one of those files name may have a relation with athor files and i need to know the root files the files into it and the files related to those files. It's like a complicated request.
For the thing with the seperate types , I give it some time but it took a lot of time to so i gived up on it I don't know if it's related to the machine cause I was keep in it nights for all the tries

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Lz-3068 avatar image
0 Votes"
Lz-3068 answered kirukim-9614 edited

@kirukim-9614

In this .zip file:
- VBAPQ_SplitCrossRef.xlsm: allows splitting your CROSSREF.csv by [TYPE_APPELANT] and save each sheet as a CSV. Update the 2 parameters (in red) at the top of Sheet1 before clicking the buttons, one after the other
- TEST_FILE.csv: to be copied in a folder (i.e. ...\Testing) for testing purpose (see Recommendation below)
- PBI_AppelantsAppeles.pbix: the PowerBI solution I talked about earlier

When you open the .pibx > Swith to the Data view > Go to the Home tab > Transform Data > Edit parameters:

PIC

  • Set the path to the folder containing the Csv files (i.e. ...\Testing)

  • Set DevelopmentMode to True (initially)

  • Right-click somewhere in the data table > Refresh data

When DevelopmentMode parameter is set to True, only the top 10 records (after removing duplicates) of each CSV are kept. This will allow you to check you can do your next step on a limited number of records

Recommendation
To check the query does what you want, only copy TEST_FILE.csv in the ...\Testing folder initially

Notes
- After testing with TRA.csv, buffering a couple of tables speed up things a bit
- I have no idea at all how much time it will take you to get what you want with the 16 Csv but I wouldn't be suprised this takes days...

Hope this helps...



demo.png (65.3 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@Lz-3068 I get this error while splitting the sheets when I click at "create Sheets & Queries"

0 Votes 0 ·
Lz-3068 avatar image
0 Votes"
Lz-3068 answered kirukim-9614 edited

@kirukim-9614

The error is due to the Excel version you use (2013). After checking the doc. the WorkbookQuery object was introduced with Office 2016. So none of the Subs in the VBA Create module will work with your version. That's not the most important problem. I asked a friend of mine to have a look to the query I suggested you. And from day1 it doesn't do what you expect :-(

89668-demo.png

Add the orange row to the TEST_FILE.csv I shared yesterday and refresh the query in Power BI (it does the same thing as the Excel version) => The following APPELANT_KEY-APPELE_KEY pairs are missing: E-F, F-E - something you and I should have checked earlier

I'm stuck with this problem and don't see what else to suggest - really sorry about that. Wish you'll find a solution sooner or later (feel free to unmark this thread)



demo.png (17.8 KiB)
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@Lz-3068 Thank you very much you really lost your days for me I really appreciate it.

0 Votes 0 ·

@kirukim-9614. Appreciate your feedback & understanding. I didn't waste time at all (IMHO contributing is an opportunity to learn something). I'm more concerned I couldn't help you :( Take care...

1 Vote 1 ·

Thank you @Lz-3068 for your help your a helpful person glade you're in this community..Take care

0 Votes 0 ·