cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Want to learn some quick and useful tips to make your day easier? Check out how Calvin uses Replay to get feedback from other teams at Dropbox here.

Discuss Dropbox Developer & API

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Searching for a specific file type

Searching for a specific file type

Bin2
New member | Level 2

Hi there,

I'll ask for some patience and forgiveness in advance.  Im about 2 weeks in to Python devleopment, so Im likely missing some obvious approaches - please dont assume a lot of knowledge on my part if you can help.

 

Objective:  I've been tasked with 'crawling' through folders on DropBox via the API to look for certain image types (specific file extensions only - *.dco for reference  - as I wont have knowledge of the file names), and then extracting the path and filename (and then 'do stuff').  Locally I have already completed this code (in other words, if the files are on my computer it works fine) - but now it needs to work in DropBox as well because the data sets will be quite large.  I cannot assume that the files in the folders will be the types I want - hence I need to search for filename extensions.

 

I have access to the DropBox and authorization sorted.  I've created a folder and put in some temp files (which are the pdf's and jpeg's provided by DropBox for testing).  I can query the folder, and return a list of results via files_list_folder.

I can get a list of files, extensions and paths via the code below - however the issue I am having is that I cannot parse the data based on file extension, and the rudementary methods I am using are not working.

 

While I can get a list of files, and even 'copy' them to another list - I cannot find a way to parse the list to give me the path_lower/ the directory and filename - which will give me the extension (ie: find all *.jpg's).  I must be missing something in the manner in which the data is constructed (I understand its instance/object based).  I have been assuming Im not hitting on the correct keyword combinations to extract the data - so Im looking for some help in identifying where Im going wrong.  Thanks in advance!

 

my_client=Dropbox(token)
folderfile_list = my_client.files_list_folder('', True, True)

#this gives me a nice list of items - however I dont seem to be able to *do* anything with it
for item in folderfile_list.entries:
    if isinstance(item, dropbox.files.FileMetadata):
        name = item.name
        fileID= item.id
        fileHash = item.content_hash
        path= metadata.path_lower
        print(name, path)

#This does return the search results I want - but its not iterable - so I dont seem to be able to do anything with it
files_search = my_client.files_search('', '*.pdf')
print(files_search)

type(files_search)
Out[325]: <class 'dropbox.files.SearchResult'>


#this returns nothing
for files in folderfile_list.entries:
    if files.path_lower == '*.jpg':
        print("yes")

#this returns nothing
for item in folderfile_list.entries:
    if entry.path_lower == '*.jpg':
        print("I got it")
    else:
        print("still nothing")

#this also doesnt work import fnmatch pattern ='*.jpg' matching = fnmatch.filter(folderfile_list.entries, pattern) print(matching)

fname = []
for i in folderfile_list.entries:
fname.append(i)
print(fname[1])

import fnmatch
pattern ='*.jpg'
matching = fnmatch.filter(fname, pattern)
print(matching) #this did work - however I cannot find a file TYPE with this - the specific file #name I can find - but not the file extension
#In other words if I change this it *.pdf - it does not get a 'happy' result :( for files in fname: if files.path_lower == '/test folder/strategy-session-hotel.pdf': print("happy") print(files.path_lower) else: print('unhappy')

 

 

 

2 Replies 2

Bin2
New member | Level 2

I believe I have solved my own problem - incase anyone else needs it.  Its not pretty - but it works.

 

spot=[]
holder=[]
holder=dbx.files_list_folder('/Test Folder')
print(holder)
for files in holder.entries:
  spot.append(files.path_lower)
 
print(spot)

pattern = '*.jpg'
matching = fnmatch.filter(spot, pattern)
print(matching)

['/test folder/az-car-rental.jpg', '/test folder/il-car-rental.jpg', '/test folder/car-rental-invoice.jpg', '/test folder/dinner-receipt.jpg', '/test folder/lunch-receipt.jpg', '/test folder/meal-receipt.jpg', '/test folder/meetup-dinner.jpg', '/test folder/team-offsite-lunch.jpg', '/test folder/training-airfare.jpg', '/test folder/training-hotel-invoice.jpg', '/test folder/travel-meal.jpg']

Greg-DB
Dropbox Staff

I'm glad to hear you already got this working. You have the right idea in that you can call files_list_folder to list the contents of a folder, and then check the Metadata.path_lower (or Metadata.name) for the returned entries to see if the file extension is one you're interested in.

 

Note though that you should also implement files_list_folder_continue to make sure you can receive all of the entries. Check out the files_list_folder documentation for more information.

 

Also, one alternative for your file extension check may be to use the 'endswith' method like this:

files.path_lower.endswith(".jpg")
Need more support?
Who's talking

Top contributors to this post

  • User avatar
    Greg-DB Dropbox Staff
  • User avatar
    Bin2 New member | Level 2
What do Dropbox user levels mean?