We're making changes to the Community, so you may have received some notifications - thanks for your patience and welcome back. Learn more here.
Forum Discussion
tmachado
5 years agoNew member | Level 2
Using Dropbox with Pyspark Dataframe
Hi all,
I am trying to transform a csv file from dropbox in pyspark dataframe with the following code:
import dropbox
access_token = 'XX'
dbx = dropbox.Dropbox(access_token)
metadata,res=dbx.files_download(path_Const+'tbEstado.csv')
dEstado= (spark.read.format('csv').option('delimiter', ',').option('header', 'true').
load(io.StringIO(res.text)))
dEstado.show(2)
But I am getting this error:
Py4JJavaError Traceback (most recent call last)
<ipython-input-142-9f00764596d5> in <module>() 27 #print(result) 28 dEstado = (spark.read.format('csv').option('delimiter', ',').option('header', 'true')---> 29 .load(result)) 30 dEstado.show(2) 31
/content/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". --> 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError(
Py4JJavaError: An error occurred while calling o1477.load. : java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:42) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)
How can I correct that? And how can I read multiple files in a dropbox folder and transform to pyspark dataframe?
Thanks
Using the files_download method (or files_download_to_file) is the right way to download a file from Dropbox using the Python SDK, and it looks like you already have that part working. That gives you a Python requests Response object that you can read from.
The error itself seems to be occurring on the Py4J/Spark side of things. That's not made by Dropbox though, so I'm afraid I can't offer help with that. Perhaps someone else on the forum here has experience with that, but otherwise you may be better served on a forum for that in particular, or something more general like StackOverflow.
- Greg-DBDropbox Staff
Using the files_download method (or files_download_to_file) is the right way to download a file from Dropbox using the Python SDK, and it looks like you already have that part working. That gives you a Python requests Response object that you can read from.
The error itself seems to be occurring on the Py4J/Spark side of things. That's not made by Dropbox though, so I'm afraid I can't offer help with that. Perhaps someone else on the forum here has experience with that, but otherwise you may be better served on a forum for that in particular, or something more general like StackOverflow.
About Dropbox API Support & Feedback
Find help with the Dropbox API from other developers.
5,875 PostsLatest Activity: 21 days agoIf you need more help you can view your support options (expected response time for an email or ticket is 24 hours), or contact us on X or Facebook.
For more info on available support options for your Dropbox plan, see this article.
If you found the answer to your question in this Community thread, please 'like' the post to say thanks and to let us know it was useful!