cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Tell us what you want to see on the Community here!
Close

Dropbox API Support & Feedback

Find help with the Dropbox API from other developers.

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Using Dropbox with Pyspark Dataframe

Highlighted

Using Dropbox with Pyspark Dataframe

New member | Level 2

Hi all,

I am trying to transform a csv file from dropbox in pyspark dataframe with the following code:

import dropbox
access_token = 'XX'
dbx = dropbox.Dropbox(access_token)
metadata,res=dbx.files_download(path_Const+'tbEstado.csv')
dEstado= (spark.read.format('csv').option('delimiter'',').option('header''true').
                   load(io.StringIO(res.text)))
dEstado.show(2)
 

But I am getting this error: 

Py4JJavaError                             Traceback (most recent call last)
<ipython-input-142-9f00764596d5> in <module>()
     27 #print(result)
     28 dEstado = (spark.read.format('csv').option('delimiter', ',').option('header', 'true')---> 29          .load(result))     30 dEstado.show(2)
     31 

/content/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)    329             else:
    330                 raise Py4JError(
Py4JJavaError: An error occurred while calling o1477.load.
: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:42)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
	at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:49
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:23
	at java.lang.Thread.run(Thread.java:74

How can I correct that? And how can I read multiple files in a dropbox folder and transform to pyspark dataframe? 

Thanks

1 Accepted Solution

Accepted Solutions
Highlighted

Re: Using Dropbox with Pyspark Dataframe

Dropboxer

Using the files_download method (or files_download_to_file) is the right way to download a file from Dropbox using the Python SDK, and it looks like you already have that part working. That gives you a Python requests Response object that you can read from.

 

The error itself seems to be occurring on the Py4J/Spark side of things. That's not made by Dropbox though, so I'm afraid I can't offer help with that. Perhaps someone else on the forum here has experience with that, but otherwise you may be better served on a forum for that in particular, or something more general like StackOverflow.

View solution in original post

1 Reply 1
Highlighted

Re: Using Dropbox with Pyspark Dataframe

Dropboxer

Using the files_download method (or files_download_to_file) is the right way to download a file from Dropbox using the Python SDK, and it looks like you already have that part working. That gives you a Python requests Response object that you can read from.

 

The error itself seems to be occurring on the Py4J/Spark side of things. That's not made by Dropbox though, so I'm afraid I can't offer help with that. Perhaps someone else on the forum here has experience with that, but otherwise you may be better served on a forum for that in particular, or something more general like StackOverflow.

View solution in original post

Work Smarter with Dropbox

The way we work is changing. Share and discover new ways to work smarter with Dropbox in our community.

Sound good? Let's get started.
Who's talking

Top contributors to this post

What do Dropbox user levels mean?
Need more support?