Excel remove duplicates
Re: Excel remove duplicates
Ok, understand, thanks.
Re: Excel remove duplicates
My plan to sort records in DBF via INDEX UNIQUE failed because my index key is longer than 240 characters,
I also tried Sort, it takes it, but again I would have to deal with comparing a lot of values (35 fields).
I haven't tried the activeX SORT method yet because I'm very curious as to why RemoveDuplicates doesn't want to take my Columns as an array.
I've tried different syntaxes but it doesn't want to accept anything except a single number
For example i tryed this :
x:=array(3)
x[1]:=0
x[2]:=1
x[3]:=2
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates((x),Header:=xlYes)
or :
oSheet:Range("A1:F15"):RemoveDuplicates(Columns:=Array(1,2), Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates("A:C", Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates(oSheet:Columns("A:C"), Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(,Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(("A:E"),Header:=xlYes)
still get error Invalid index
Does anyone have an example of the source code (not VBA, because it doesn't take it) where the RemoveDuplicates function is used with the columns and xl parameters?
Syntax of VBA is RemoveDuplicates Columns:=varArray1(index), Header:=xlNo
I also tried Sort, it takes it, but again I would have to deal with comparing a lot of values (35 fields).
I haven't tried the activeX SORT method yet because I'm very curious as to why RemoveDuplicates doesn't want to take my Columns as an array.
I've tried different syntaxes but it doesn't want to accept anything except a single number
For example i tryed this :
x:=array(3)
x[1]:=0
x[2]:=1
x[3]:=2
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates((x),Header:=xlYes)
or :
oSheet:Range("A1:F15"):RemoveDuplicates(Columns:=Array(1,2), Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates("A:C", Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates(oSheet:Columns("A:C"), Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(,Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(("A:E"),Header:=xlYes)
still get error Invalid index
Does anyone have an example of the source code (not VBA, because it doesn't take it) where the RemoveDuplicates function is used with the columns and xl parameters?
Syntax of VBA is RemoveDuplicates Columns:=varArray1(index), Header:=xlNo
Re: Excel remove duplicates
Hi Victorio,
I was wondering what you want to do? It can't be that problem with Xbase++.
There are 34 colums/fields which you want to test if they are duplicated? Do you want to check if all fields are the same, or if ONE of these fields is the same?
An indexkey can't be that long, but you could use an array, depending on the size of the file. If you create an array with { recno() , field1+field2+... } then you can search for duplicates in the array. When you find a duplicate you have the recno() to process as you want.
Is you data in a dbf at start or is it an xls you want to process? Maybe post a sample of your data?
I was wondering what you want to do? It can't be that problem with Xbase++.
There are 34 colums/fields which you want to test if they are duplicated? Do you want to check if all fields are the same, or if ONE of these fields is the same?
An indexkey can't be that long, but you could use an array, depending on the size of the file. If you create an array with { recno() , field1+field2+... } then you can search for duplicates in the array. When you find a duplicate you have the recno() to process as you want.
Is you data in a dbf at start or is it an xls you want to process? Maybe post a sample of your data?
Re: Excel remove duplicates
Hi Chris,
At the beginning I have a DBF (Visual Foxpro structure) and later I generate XLSX from it
I originally wanted to solve it through Excel, but I don't know because of the problem.
I need to delete records that have the same 35 out of 36 fields.
I wanted to avoid working with arrays because I don't know how fast it will be with large databases.
But yesterday I tried to program a different way using DBF using Var2Bin() , it looks like it should work, but I'm just going to test it:
So, this is only first try without code optimization
At the beginning I have a DBF (Visual Foxpro structure) and later I generate XLSX from it
I originally wanted to solve it through Excel, but I don't know because of the problem.
I need to delete records that have the same 35 out of 36 fields.
I wanted to avoid working with arrays because I don't know how fast it will be with large databases.
But yesterday I tried to program a different way using DBF using Var2Bin() , it looks like it should work, but I'm just going to test it:
Code: Select all
SELECT 1
use d:\#POKUSY\TEST_Excel\POKUS2 EXCLUSIVE
go top
do while !eof()
replace unik with .T. //
skip
enddo
go top
prvy:=1
duplicit:=0
lastrec:=""
for i=1 to 36
lastrec:=lastrec+Var2Bin(FieldGet(i))
next
do while !eof()
aktrec:=""
for i=1 to 36
aktrec:=aktrec+Var2Bin(FieldGet(i))
next
if aktrec==lastrec .and. prvy!=1
replace unik with .F.
duplicit++
endif
if prvy=1
prvy=0
endif
lastrec:=aktrec
skip
enddo
close all
Re: Excel remove duplicates
Hi Viktor,
Since you want to compare 35 out of 36 fields for equality, my Excel code can be modified not to compare each column values as it is now, but to compare values of entire rows, or values of a range of columns. So, there will be only one compare statement per each row, and no need to sum values of columns in the range, as in your test program. I believe you test program will work too.
Slavko
Since you want to compare 35 out of 36 fields for equality, my Excel code can be modified not to compare each column values as it is now, but to compare values of entire rows, or values of a range of columns. So, there will be only one compare statement per each row, and no need to sum values of columns in the range, as in your test program. I believe you test program will work too.
Slavko
Best regards,
Slavoljub Damnjanovic
SD-SoftDesign, Alaska Software Technology Partner
https://www.sd-softdesign.com
https://www.sd-softdesign.rs
Slavoljub Damnjanovic
SD-SoftDesign, Alaska Software Technology Partner
https://www.sd-softdesign.com
https://www.sd-softdesign.rs
Re: Excel remove duplicates
How many chars?!
As it is only for unique, you should change method how you determine uniqueness of some record data.
Put everything into expression into function and calculate SHA (hash value) of fields you want to account for unique.
Hash functions are extremely fast in any but musem grade computers.
FUNCTION MySomethingUq()
LOCAL cSomething := ""
LOCAL cHash := ""
//concat long string
cSomething := SOMEFIELD+SOMEFIELD2+DTOS(SOMEDATE)+STRZERO(SOMENUM,15,4)
cHash := Char2Hash( cSomething, 256 ) //you might use 128 too as this is only for comparison not security use
RETURN cHash
DBUseArea(.t.,,"somefile","hashwoes",.t.,.t.)
hashwoes->(dbcreateindex("someuniq","MySomethingUq()",{|| hashwoes->(MySomethingUq()) }, .T.))
Links:
https://doc.alaska-software.com/content ... 2hash.html
As it is only for unique, you should change method how you determine uniqueness of some record data.
Put everything into expression into function and calculate SHA (hash value) of fields you want to account for unique.
Hash functions are extremely fast in any but musem grade computers.
FUNCTION MySomethingUq()
LOCAL cSomething := ""
LOCAL cHash := ""
//concat long string
cSomething := SOMEFIELD+SOMEFIELD2+DTOS(SOMEDATE)+STRZERO(SOMENUM,15,4)
cHash := Char2Hash( cSomething, 256 ) //you might use 128 too as this is only for comparison not security use
RETURN cHash
DBUseArea(.t.,,"somefile","hashwoes",.t.,.t.)
hashwoes->(dbcreateindex("someuniq","MySomethingUq()",{|| hashwoes->(MySomethingUq()) }, .T.))
Links:
https://doc.alaska-software.com/content ... 2hash.html
Victorio wrote: ↑Tue Feb 27, 2024 8:09 am My plan to sort records in DBF via INDEX UNIQUE failed because my index key is longer than 240 characters,
I also tried Sort, it takes it, but again I would have to deal with comparing a lot of values (35 fields).
I haven't tried the activeX SORT method yet because I'm very curious as to why RemoveDuplicates doesn't want to take my Columns as an array.
I've tried different syntaxes but it doesn't want to accept anything except a single number
For example i tryed this :
x:=array(3)
x[1]:=0
x[2]:=1
x[3]:=2
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates((x),Header:=xlYes)
or :
oSheet:Range("A1:F15"):RemoveDuplicates(Columns:=Array(1,2), Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates("A:C", Header:=xlYes)
oSheet:Range("A1:F15"):RemoveDuplicates(oSheet:Columns("A:C"), Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(,Header:=xlYes)
oSheet:Range("A1:F15"):Columns("A:F"):RemoveDuplicates(("A:E"),Header:=xlYes)
still get error Invalid index
Does anyone have an example of the source code (not VBA, because it doesn't take it) where the RemoveDuplicates function is used with the columns and xl parameters?
Syntax of VBA is RemoveDuplicates Columns:=varArray1(index), Header:=xlNo
Re: Excel remove duplicates
Yes, i think about use HASH, but now works fine also with Var2Bin without problems.
First I sort DBF to other temporary DBF, then in cycle delete sign duplicates, then delete it,
and final i append to new database.
When I use PACK, program often crashed, because I use append, with this eliminate deleted records.
And modified database I can process with DC_WorkArea2Excel.
At this moment problem is solved, (only still do not know if it is possible direct use RemoveDiplicities in excel.)
Thanks for solution ideas,
Viktor
First I sort DBF to other temporary DBF, then in cycle delete sign duplicates, then delete it,
and final i append to new database.
When I use PACK, program often crashed, because I use append, with this eliminate deleted records.
And modified database I can process with DC_WorkArea2Excel.
At this moment problem is solved, (only still do not know if it is possible direct use RemoveDiplicities in excel.)
Thanks for solution ideas,
Viktor