You might see that the Dropbox Community team have been busy working on some major updates to the Community itself! So, here is some info on what’s changed, what’s staying the same and what you can expect from the Dropbox Community overall.

Forum Discussion

notoriusjack's avatar
notoriusjack
New member | Level 2
2 years ago

reading parquet file using python sdk

Hi, I am trying to read a parquet file using pandas and vaex. I can sucessfully read a .csv but I get the following error message when I try to download the parquet file with dbx.files_download :

 

dropbox.exceptions.ApiError: ApiError('4ce7ef4f93544a4fa18c29478e1869a8', DownloadError('path', LookupError('not_file', None)))

 

Full code:

ACCESS_TOKEN = "My_Token"

# Initialize the Dropbox API client
dbx = dropbox.Dropbox(ACCESS_TOKEN)

# download csv file from dropbox
metadata, f_csv = dbx.files_download('/County_test.csv')

# this works for csv and pandas
with io.BytesIO(f_csv.content) as stream:
df = pd.read_csv(stream, index_col=0)
print(df)

# this works for csv and vaex
with io.BytesIO(f_csv.content) as stream:
df = vaex.read_csv(stream, index_col=0)
print(df)

# download parquet file from dropbox FAILS
metadata, f_parquet = dbx.files_download('/County_test.parquet')

# this part NOT tested yet
with io.BytesIO(f_parquet.content) as stream:
df = pd.read_parquet(stream, index_col=0)
print(df)
  • Greg-DB's avatar
    Greg-DB
    Icon for Dropbox Staff rankDropbox Staff

    I see you're getting a 'not_file' error, which means: "We were expecting a file, but the given path refers to something that isn’t a file."

     

    It looks like the ".parquet" is not a file, but rather a sort of folder, possibly referred to as a "package" or "bundle" in some environments.

     

    That being the case, to download that you would instead need to use files_download_zip (or files_download_zip_to_file) and then unzip the downloaded zip file, or walk through the contents using files_list_folder/files_list_folder_continue and then download each individual nested item using files_download (or files_download_to_file). The first option of using files_download_zip is probably better/faster.

    • notoriusjack's avatar
      notoriusjack
      New member | Level 2

      Thanks for replying. I have tried with one of your solutions but it's really slow.

       

      file_list = []
      for entry in dbx.files_list_folder(path="/County.parquet").entries:
      print(entry.path_lower)
      _, dwnld_file = dbx.files_download(entry.path_lower)
      with io.BytesIO(dwnld_file.content) as stream:
      pd_df = pd.read_parquet(stream) #this works
      vdf = vaex.from_pandas(pd_df)
      del pd_df
      file_list.append(vdf)
      conc_df = vaex.concat(file_list)
      print(conc_df)

      I have tried with dbx.files_download_zip but I can't find a way to read the data it returns 'utf-8' codec can't decode byte 0x82 in position 12: invalid start byte.

       

About Dropbox API Support & Feedback

Node avatar for Dropbox API Support & Feedback

Find help with the Dropbox API from other developers.

5,902 PostsLatest Activity: 2 hours ago
332 Following

If you need more help you can view your support options (expected response time for an email or ticket is 24 hours), or contact us on X or Facebook.

For more info on available support options for your Dropbox plan, see this article.

If you found the answer to your question in this Community thread, please 'like' the post to say thanks and to let us know it was useful!