cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Want to learn some quick and useful tips to make your day easier? Check out how Calvin uses Replay to get feedback from other teams at Dropbox here.

Dropbox API Support & Feedback

Find help with the Dropbox API from other developers.

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Using Dropbox with Pyspark Dataframe

Using Dropbox with Pyspark Dataframe

tmachado
New member | Level 2
Go to solution

Hi all,

I am trying to transform a csv file from dropbox in pyspark dataframe with the following code:

import dropbox
access_token = 'XX'
dbx = dropbox.Dropbox(access_token)
metadata,res=dbx.files_download(path_Const+'tbEstado.csv')
dEstado= (spark.read.format('csv').option('delimiter'',').option('header''true').
                   load(io.StringIO(res.text)))
dEstado.show(2)
 

But I am getting this error: 

Py4JJavaError                             Traceback (most recent call last)
<ipython-input-142-9f00764596d5> in <module>()
     27 #print(result)
     28 dEstado = (spark.read.format('csv').option('delimiter', ',').option('header', 'true')---> 29          .load(result))     30 dEstado.show(2)
     31 

/content/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)    329             else:
    330                 raise Py4JError(
Py4JJavaError: An error occurred while calling o1477.load.
: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:42)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
	at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

How can I correct that? And how can I read multiple files in a dropbox folder and transform to pyspark dataframe? 

Thanks

1 Accepted Solution

Accepted Solutions

Greg-DB
Dropbox Staff
Go to solution

Using the files_download method (or files_download_to_file) is the right way to download a file from Dropbox using the Python SDK, and it looks like you already have that part working. That gives you a Python requests Response object that you can read from.

 

The error itself seems to be occurring on the Py4J/Spark side of things. That's not made by Dropbox though, so I'm afraid I can't offer help with that. Perhaps someone else on the forum here has experience with that, but otherwise you may be better served on a forum for that in particular, or something more general like StackOverflow.

View solution in original post

1 Reply 1

Greg-DB
Dropbox Staff
Go to solution

Using the files_download method (or files_download_to_file) is the right way to download a file from Dropbox using the Python SDK, and it looks like you already have that part working. That gives you a Python requests Response object that you can read from.

 

The error itself seems to be occurring on the Py4J/Spark side of things. That's not made by Dropbox though, so I'm afraid I can't offer help with that. Perhaps someone else on the forum here has experience with that, but otherwise you may be better served on a forum for that in particular, or something more general like StackOverflow.

Need more support?
Who's talking

Top contributors to this post

  • User avatar
    Greg-DB Dropbox Staff
What do Dropbox user levels mean?