cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Tell us what you want to see on the Community here!
Close

Dropbox API Support & Feedback

Find help with the Dropbox API from other developers.

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

[Python V2] How to do batch upload?

Highlighted

[Python V2] How to do batch upload?

Explorer | Level 4

Hi! 

I am trying to upload several files using the sessions and batch operators in the Dropbox SDK for Python. I'm trying to do something like this: 

dbx = dropbox.Dropbox(<API KEY>)

commit_info = []
for df in list_pandas_df: 
    df_raw_str = df.to_csv(index=False)
    upload_session = dbx.upload_session_start(df_raw_str.encode())
    commit_info.append(
dbx.files.CommitInfo(path=/path/to/db/folder.csv
) dbx.files_upload_finish_batch(commit_info)

But, I do not completely understand from the documentation, how should I pass the commit data and the session info to the "files_upload_session_finish_batch" function. The "files_upload_session_finish" function does allow a commit and a cursor, while the documentation states that the batch option only takes a list of commits. 

My files are not particularly big, but they are numerous. That's why I'm not using any of the appending options. Should I use any cursor? I'm a little bit lost here :( 

3 Replies 3
Highlighted

Re: [Python V2] How to do batch upload?

Dropboxer

[Cross-linking for reference: https://stackoverflow.com/questions/54758978/dropbox-python-api-upload-multiple-files ]

Apologies for the confusion! The Python SDK documentation unfortunately doesn't do a good job identifying the types expected in certain parameters like this; I'll ask the team to work on improving that in the future.

The `files_upload_session_finish_batch` method does work differently than the `files_upload_session_finish` method. The `files_upload_session_finish_batch` method expects a list of `UploadSessionFinishArg`, where each one encapsulates the cursor and commit info together.

Here's a basic working example that shows how to do this:

import dropbox

ACCESS_TOKEN = "..."

dbx = dropbox.Dropbox(ACCESS_TOKEN)

local_file_path = "..."

upload_entry_list = []

for i in range(5):
    f = open(local_file_path)
    upload_session_start_result = dbx.files_upload_session_start(f.read(), close=True) # assuming small files
    cursor = dropbox.files.UploadSessionCursor(session_id=upload_session_start_result.session_id,
                                               offset=f.tell())
    commit = dropbox.files.CommitInfo(path="/test_329517/%s" % i)
    upload_entry_list.append(dropbox.files.UploadSessionFinishArg(cursor=cursor, commit=commit))

print(dbx.files_upload_session_finish_batch(upload_entry_list))

# then use files_upload_session_finish_batch_check to check on the job

 

Highlighted

Re: [Python V2] How to do batch upload?

Explorer | Level 4

Hi Greg! 

Again thanks for replying back so promptly. I have some follow up questions. If I am looping through different Python objects, not files, that's why I'm first converting the pd.DataFrame to a string, and then pointing it to the dbx.files_upload_session_start() function. 

Since my file is complete, and I am not passing a files in a context manager or a StringIO, I did not specify any offset in the cursor. Now that I'm trying to run the loop, I received the following error: 

Traceback (most recent call last):
  File "/Users/ivan/.pyenv/versions/weather_data/lib/python3.6/site-packages/dropbox/stone_serializers.py", line 337, in encode_struct
    field_value = getattr(value, field_name)
  File "/Users/ivan/.pyenv/versions/weather_data/lib/python3.6/site-packages/dropbox/files.py", line 10278, in offset
    raise AttributeError("missing required field 'offset'")
AttributeError: missing required field 'offset'

 

What are the real advantages of using the batch operations to upload files? Seems a convulted use case for objects in memory, rather than files. 

Thanks again for your help.

Highlighted

Re: [Python V2] How to do batch upload?

Dropboxer

Regardless of where the data is coming from, the `UploadSessionCursor` object does require an `offset`, in order "to make sure upload data isn’t lost or duplicated in the event of a network error". It sounds like in your case the `offset` value would be the length of the string. 

The main advantage of using `files_upload_session_finish_batch` is to minimize the number of "locks" needed when uploading multiple files. The Data Ingress Guide covers this in more detail. This applies to uploading from memory or files.

The main advantage of using "upload sessions" to begin with is to enable apps to upload large files. If you're just uploading small files, you can certainly do so just using `files_upload`, but you'd need to do so serially to avoid lock contention. Based on the code you provided, you're already uploading serially though, so it may not make much of a difference if you wish to switch to `files_upload` for simplicity.

Work Smarter with Dropbox

The way we work is changing. Share and discover new ways to work smarter with Dropbox in our community.

Sound good? Let's get started.
Who's talking

Top contributors to this post

What do Dropbox user levels mean?
Need more support?