We're making changes to the Community, so you may have received some notifications - thanks for your patience and welcome back. Learn more here.

Forum Discussion

tmachado's avatar
tmachado
New member | Level 2
5 years ago

Using Dropbox with Pyspark Dataframe

Hi all,

I am trying to transform a csv file from dropbox in pyspark dataframe with the following code:

import dropbox
access_token = 'XX'
dbx = dropbox.Dropbox(access_token)
metadata,res=dbx.files_download(path_Const+'tbEstado.csv')
dEstado= (spark.read.format('csv').option('delimiter'',').option('header''true').
                   load(io.StringIO(res.text)))
dEstado.show(2)
 

But I am getting this error: 

Py4JJavaError                             Traceback (most recent call last)
<ipython-input-142-9f00764596d5> in <module>()
     27 #print(result)
     28 dEstado = (spark.read.format('csv').option('delimiter', ',').option('header', 'true')---> 29          .load(result))     30 dEstado.show(2)
     31 

/content/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)    329             else:
    330                 raise Py4JError(
Py4JJavaError: An error occurred while calling o1477.load.
: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:42)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
	at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

How can I correct that? And how can I read multiple files in a dropbox folder and transform to pyspark dataframe? 

Thanks

  • Using the files_download method (or files_download_to_file) is the right way to download a file from Dropbox using the Python SDK, and it looks like you already have that part working. That gives you a Python requests Response object that you can read from.

     

    The error itself seems to be occurring on the Py4J/Spark side of things. That's not made by Dropbox though, so I'm afraid I can't offer help with that. Perhaps someone else on the forum here has experience with that, but otherwise you may be better served on a forum for that in particular, or something more general like StackOverflow.

  • Greg-DB's avatar
    Greg-DB
    Icon for Dropbox Staff rankDropbox Staff

    Using the files_download method (or files_download_to_file) is the right way to download a file from Dropbox using the Python SDK, and it looks like you already have that part working. That gives you a Python requests Response object that you can read from.

     

    The error itself seems to be occurring on the Py4J/Spark side of things. That's not made by Dropbox though, so I'm afraid I can't offer help with that. Perhaps someone else on the forum here has experience with that, but otherwise you may be better served on a forum for that in particular, or something more general like StackOverflow.

About Dropbox API Support & Feedback

Node avatar for Dropbox API Support & Feedback

Find help with the Dropbox API from other developers.

5,875 PostsLatest Activity: 21 days ago
324 Following

If you need more help you can view your support options (expected response time for an email or ticket is 24 hours), or contact us on X or Facebook.

For more info on available support options for your Dropbox plan, see this article.

If you found the answer to your question in this Community thread, please 'like' the post to say thanks and to let us know it was useful!