Friday, December 7, 2018

codingdirectional: Remove the duplicate file from nested folders with python

In this article we will continue to develop our python application which will remove the duplicate file within a folder. In the last chapter we have removed a duplicate file in another folder and this time we will remove all the duplicate files within the nested folders by slightly modifying the previous program. First of all we will edit the main file by replacing the forward slash with backward slash to suite the windows file path’s format.
from tkinter import *
from tkinter import filedialog
from Remove import Remove

win = Tk() # 1 Create instance
win.title("Multitas") # 2 Add a title
win.resizable(0, 0) # 3 Disable resizing the GUI
win.configure(background='black') # 4 change background color

# 5 Create a label
aLabel = Label(win, text="Remove duplicate", anchor="center")
aLabel.grid(column=0, row=1)
aLabel.configure(foreground="white")
aLabel.configure(background="black")

# 6 Create a selectFile function to be used by button
def selectFile():

    filename = filedialog.askopenfilename(initialdir="/", title="Select file")
    if(filename != ''):
        filename = filename.split('/')[-1] # this is for the windows separator only
        folder = filedialog.askdirectory() # 7 open a folder then create and start a new remove thread to delete the duplicate file
        if(folder != ''):
            folder = folder.replace('/', '\\')
            remove = Remove(folder, aLabel, filename)
            remove.start()

# 8 Adding a Button
action = Button(win, text="Select File", command=selectFile)
action.grid(column=0, row=0) # 9 Position the button
action.configure(background='brown')
action.configure(foreground='white')

win.mainloop()  # 10 start GUI
Next we will modify the remove class so now we can remove all the duplicate files within nested folder.
import threading
import os

class Remove(threading.Thread):

   def __init__(self, massage, aLabel, filename):

      threading.Thread.__init__(self)
      self.massage = massage
      self.label = aLabel
      self.filename = filename

   def run(self):

      filepaths = os.listdir(self.massage)

      for filepath in list(filepaths):
         os.chdir(self.massage)
         if(os.path.isfile(filepath)):
            if(filepath == self.filename):
               os.remove(filepath)
         else:
            self.delete_duplicate(os.path.join(self.massage, filepath))
      return

   def delete_duplicate(self, folder): # sub method to pass folder to

      filepaths = os.listdir(folder)

      for filepath in list(filepaths):
         os.chdir(folder)   # need this to reset the current folder
         if(os.path.isfile(filepath)):
            if (filepath == self.filename):
               os.remove(filepath)
         else:
            self.delete_duplicate(os.path.join(folder, filepath))
After you have selected a file from a folder and selected another folder which you want to search and remove the file with the same file name as the one inside the previous folder, you can just sit back and wait for the program to search and destroy all the duplicate files in that second folder plus all the duplicate files inside the folders that are within the second folder as well as the duplicate file insides the folder which contains inside the folder which is insides that second folder. The program has successfully removed all the duplicate files within folders that have less than 50 files each without any delays with only one new thread. Will the program slow down if there are lots of folders and files to search for? We don’t know yet and will only modify it if we need more threads to handle the job. Our next goal is to remove the file if and only if the content of that file is the same as the selected one. Remember what I have written in the previous chapter? We simply cannot assume that two files with the same name and the same file’s extension contain the same content. So stay tune for the next chapter.

from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...