cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Are you interested in hearing how one of our Community members uses Dropbox for sailing trips? Read all about it here.

Dropbox API Support & Feedback

Find help with the Dropbox API from other developers.

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Dropbox Python API and Google Cloud Composer

Dropbox Python API and Google Cloud Composer

ohienmhen
Explorer | Level 3

I am attempting to orchestrate downloads using the Dropbox Python API and Google Cloud's Cloud Composer. I am trying to download the files off my dropbox account, unzip them and them upload them to my Bigquery warehouse for analysis. Does anyone have a more efficient way of accomplishing this? I am currently stuck at downloading my files off Dropbox using the API. It seems as if my calls are succesful however the files do not appear in my local drive. I have tried every conceivable combination for the download path including using cloud storage url's and local file paths. A sample of the code to download the file is as follows:

 

def download_files_in_folder():
            """Download a file from Dropbox to the local machine."""
            dbx = connect_to_dropbox()
            files = dbx.files_list_folder(dbxPath)
         
            for entry in files.entries:
                print(entry.path_lower)
                dbx.files_download_to_file(r"Users\damian.ohienmhen\\"+entry.name,entry.path_lower)
                print('downloaded file')
 
Can anyone help me?
 
Addendum: I attempted to run the code using only the Google Cloud Command line and ran it as a standalone python script. The following code worked below:
 
for entry in files.entries:
                print(entry.path_lower)
                dbx.files_download_to_file(r"/home/dohmhen/"+entry.name+entry.name,entry.path_lower)
                print('downloaded file')
Is there a way to specify a file download path while running it in the cloud composer environment? This same script fails in the environment with the following message:
 
FileNotFoundError: [Errno 2] No such file or directory: '/home/dohmhen/2022-06-29_7-20-04 pm_hk_1656544804.zip2022-06-29_7-20-04 pm_hk_1656544804.zip'
3 Replies 3

Greg-DB
Dropbox Staff

Using the files_download_to_file method is a valid way to download a file from Dropbox to your local filesystem. When calling that, the first parameter 'download_path' should be the local filesystem path where you want to save the downloaded file. Regardless of where you're running this, that first parameter is how you specify where on the local filesystem to save the file. (If you'd prefer to handle the data directly, that is, without having the SDK save it to the local filesystem for you, you can use the methods without "to_file", e.g., files_download, as shown in this example.)

 

Refer to any output/error you are getting when you run this to see if there's an issue. If you're not getting any exception thrown, that should indicate that the download succeeded.

 

In your first sample, it looks like you may not be supplying a correct path value for the path you expect though. For instance, you don't have a leading slash so this would be a relative path, while you perhaps intended for it to be an absolute path for your home folder. It looks like there may be an issue with the escaping slashes as well.

 

In your second sample, it looks like writing to the local filesystem failed, perhaps because "/home/dohmhen" doesn't exist on that machine. (It looks like you're also writing entry.name to the path twice.)

 

In any case, if you want this to save a file to your home folder on the local filesystem, you'd likely be better off with something like this:

 

dbx.files_download_to_file(os.path.join(os.path.expanduser("~"), entry.name), entry.path_lower)

 

 

As for optimizing this, if you want to download an entire folder, you can use the files_download_zip_to_file method (or files_download_zip) instead. That will allow you to download the folder as a zip in one call, instead of making one call per file.

ohienmhen
Explorer | Level 3

I updated the code as follows and ran it both as a standalone script and in my airflow environment:

 

for entry in files.entries:
                print(entry.path_lower)
                dbx.files_download_to_file(os.path.join(os.path.expanduser("~"), entry.name), entry.path_lower)
                print('downloaded file')
 
It works when I ran it as part of my standalone script. In the Google Cloud Composer Environment, no error log was produced i.e. it ran successfully. However I cannot find the files which were supposed to download to my computer. Is it because I am using a cloud environment to run the script?
 
This is the complete log produced from my Google Cloud Composer Environment (i.e. Airflow)
 
*** Reading remote log from gs://us-central1-fitnessdata-3bac80e2-bucket/logs/fitness_tracker/list/2022-07-05T19:35:17.822704+00:00/1.log.
[2022-07-05 19:35:34,499] {taskinstance.py:671} INFO - Dependencies all met for <TaskInstance: fitness_tracker.list 2022-07-05T19:35:17.822704+00:00 [queued]>
[2022-07-05 19:35:34,566] {taskinstance.py:671} INFO - Dependencies all met for <TaskInstance: fitness_tracker.list 2022-07-05T19:35:17.822704+00:00 [queued]>
[2022-07-05 19:35:34,567] {taskinstance.py:881} INFO - 
--------------------------------------------------------------------------------
[2022-07-05 19:35:34,568] {taskinstance.py:882} INFO - Starting attempt 1 of 1
[2022-07-05 19:35:34,568] {taskinstance.py:883} INFO - 
--------------------------------------------------------------------------------
[2022-07-05 19:35:34,626] {taskinstance.py:902} INFO - Executing <Task(PythonOperator): list> on 2022-07-05T19:35:17.822704+00:00
[2022-07-05 19:35:34,647] {standard_task_runner.py:54} INFO - Started process 81707 to run task
[2022-07-05 19:35:35,863] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'fitness_tracker', 'list', '2022-07-05T19:35:17.822704+00:00', '--job_id', '1961', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/exercise.py', '--cfg_path', '/tmp/tmp3j0kew8z']
[2022-07-05 19:35:35,870] {standard_task_runner.py:78} INFO - Job 1961: Subtask list
[2022-07-05 19:35:36,807] {logging_mixin.py:120} INFO - Running <TaskInstance: fitness_tracker.list 2022-07-05T19:35:17.822704+00:00 [running]> on host airflow-worker-7b8cd9bbf6-ptgxv
[2022-07-05 19:35:36,927] {dropbox_client.py:471} INFO - Request to files/list_folder
[2022-07-05 19:35:37,358] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-06-29_7-20-04 pm_hk_1656544804.zip
[2022-07-05 19:35:37,359] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:38,096] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:38,098] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-06-26_4-34-36 pm_hk_1656275676.zip
[2022-07-05 19:35:38,098] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:38,599] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:38,604] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-06-25_12-00-07 pm_hk_1656172807.zip
[2022-07-05 19:35:38,604] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:39,114] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:39,115] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-06-16_7-52-59 pm_hk_1655423579.zip
[2022-07-05 19:35:39,115] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:39,523] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:39,524] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-05-23_6-08-26 pm_hk_1653343706.zip
[2022-07-05 19:35:39,524] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:39,980] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:39,980] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-05-10_7-38-44 pm_hk_1652225924.zip
[2022-07-05 19:35:39,981] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:41,998] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:41,999] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-04-20_6-32-35 pm_hk_1650493955.zip
[2022-07-05 19:35:42,000] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:42,447] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:42,448] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-04-08_4-33-54 pm_hk_1649450034.zip
[2022-07-05 19:35:42,449] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:43,070] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:43,071] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-03-31_7-59-07 pm_hk_1648771147.zip
[2022-07-05 19:35:43,071] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:43,570] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:43,570] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-03-24_1-42-20 pm_hk_1648143740.zip
[2022-07-05 19:35:43,571] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:43,983] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:43,984] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-03-23_4-51-28 pm_hk_1648068688.zip
[2022-07-05 19:35:43,985] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:44,400] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:44,401] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-03-01_6-02-44 pm_hk_1646175764.zip
[2022-07-05 19:35:44,401] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:44,884] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:44,885] {logging_mixin.py:120} INFO - /apps/rungap/export/2022-01-25_4-47-32 pm_hk_1643147252.zip
[2022-07-05 19:35:44,885] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:45,329] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:45,330] {logging_mixin.py:120} INFO - /apps/rungap/export/2021-12-23_5-44-12 pm_hk_1640299452.zip
[2022-07-05 19:35:45,331] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:45,852] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:45,853] {logging_mixin.py:120} INFO - /apps/rungap/export/2021-12-19_3-36-59 pm_hk_1639946219.zip
[2022-07-05 19:35:45,854] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:46,495] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:46,495] {logging_mixin.py:120} INFO - /apps/rungap/export/2021-12-16_4-19-31 pm_hk_1639689571.zip
[2022-07-05 19:35:46,496] {dropbox_client.py:471} INFO - Request to files/download
[2022-07-05 19:35:47,121] {logging_mixin.py:120} INFO - downloaded file
[2022-07-05 19:35:47,140] {python_operator.py:114} INFO - Done. Returned value was: None
[2022-07-05 19:35:47,205] {taskinstance.py:1058} INFO - Marking task as SUCCESS.dag_id=fitness_tracker, task_id=list, execution_date=20220705T193517, start_date=20220705T193534, end_date=20220705T193547
[2022-07-05 19:35:48,918] {local_task_job.py:102} INFO - Task exited with return code 0

Greg-DB
Dropbox Staff

It looks like there wasn't an error, so the files should have been successfully saved to the home folder. I can't offer help with interacting with the filesystem in your environment though, as that's made by Dropbox, so I recommend referring to the documentation for your environment for that.

 

For reference, you can see how the SDK writes to the specified location on the local filesystem here.

Need more support?
Who's talking

Top contributors to this post

  • User avatar
    Greg-DB Dropbox Staff
  • User avatar
    ohienmhen Explorer | Level 3
What do Dropbox user levels mean?