Forum Discussion

New member | Level 2

2 years ago

Solved

reading parquet file using python sdk

Hi, I am trying to read a parquet file using pandas and vaex. I can sucessfully read a .csv but I get the following error message when I try to download the parquet file with dbx.files_download : ...

API

Greg-DB
2 years ago
Thanks for following up and sharing that. For reference, the URL parameters for shared links like that are documented here.

notoriusjack

New member | Level 2

2 years ago

When I try:

md, zipFile = dbx.files_download_zip('/County_test.parquet')
with ZipFile(zipFile, 'r') as zip:

with ZipFile(zipFile, 'r') as zip: fails with AttributeError: 'Response' object has no attribute 'seek'

I can't understand how to solve this

Greg-DB

Dropbox Community Moderator

2 years ago

The files_download_zip method works like the files_download method, in that the second value it returns is the response object. To access the data from the response object, you would access the 'content' field like you did in your other code snippet. So, in this case, it would be 'zipFile.content'.

Beyond that, refer to the documentation for ZipFile, BytesIO, pandas, etc., for information on using those. Those aren't made by Dropbox so I can't offer support for those in particular.

notoriusjack
New member | Level 2
2 years ago
Thank you for your support, I just managed to do it with dbx.files_download_zip but it takes more or less the same time to process.
Do you know if pandas or vaex support reading the data directly from a file in Dropbox?

md, zipFile = dbx.files_download_zip('/County.parquet')
file_list = []
with ZipFile(io.BytesIO(zipFile.content), 'r') as zip_ref:
for file in zip_ref.infolist():
if file.filename.endswith('.parquet'):
pd_df = pd.read_parquet(zip_ref.open(file.filename)) # this works
vdf = vaex.from_pandas(pd_df)
del pd_df
file_list.append(vdf)
conc_df = vaex.concat(file_list)
print(conc_df)
- Greg-DB
  Dropbox Community Moderator
  2 years ago
  I can't offer support for pandas or vaex themselves as they are not made by Dropbox. I suggest referring to the documentation for those for information on their capabilities.
  - notoriusjack
    New member | Level 2
    2 years ago
    I understand, and I found out it's possible. I put this here as it might help someone else.
    
    You can use copy link (set the permissions as you like) and use the URL inside pandas.read_csv or pandas.read_parquet to read the dataset.
    However the copy link will have a 'dl' parameter equal to 0, you have to change it to 1 to make it work. Example:
    # this does not work
    df = pd.read_parquet('https://www.dropbox.com/s/somecode/part.0.parquet?dl=0')
    
    # this works
    df = pd.read_parquet('https://www.dropbox.com/s/somecode/part.0.parquet?dl=1')
    Thank you again for helping out

About Dropbox API Support and Feedback

Get help with the Dropbox API from fellow developers and experts.

The Dropbox Community team is active from Monday to Friday. We try to respond to you as soon as we can, usually within 2 hours.

If you need more help you can view your support options (expected response time for an email or ticket is 24 hours), or contact us on X, Facebook or Instagram.

For more info on available support options for your Dropbox plan, see this article.

If you found the answer to your question in this Community thread, please 'like' the post to say thanks and to let us know it was useful!

Forum Discussion

reading parquet file using python sdk

About Dropbox API Support and Feedback

Related Content

Listing files using Dropbox Python

Python API

Python Teams Download file

#tags in the Python API

Linux Dropbox: fatal python exception

Recent Discussions

Does the hosted MCP server at mcp.dropbox.com expose any file write tools (upload, create, delete)?

embed video (dl.. raw=1) does not work

Scopes from Token

User Search

Copy files from individual folder to team folder