<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Using Dropbox with Pyspark Dataframe in Dropbox API Support &amp; Feedback</title>
    <link>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Using-Dropbox-with-Pyspark-Dataframe/m-p/428641#M22805</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;I am trying to transform a csv file from dropbox in pyspark dataframe with the following code:&lt;/P&gt;
&lt;DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;dropbox&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;access_token&amp;nbsp;=&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;'XX'&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;dbx&amp;nbsp;=&amp;nbsp;dropbox.Dropbox(access_token)&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;metadata,res=dbx.files_download(path_Const+&lt;/SPAN&gt;&lt;SPAN&gt;'tbEstado.csv'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;dEstado= (spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'csv'&lt;/SPAN&gt;&lt;SPAN&gt;).option(&lt;/SPAN&gt;&lt;SPAN&gt;'delimiter'&lt;/SPAN&gt;&lt;SPAN&gt;,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;','&lt;/SPAN&gt;&lt;SPAN&gt;).option(&lt;/SPAN&gt;&lt;SPAN&gt;'header'&lt;/SPAN&gt;&lt;SPAN&gt;,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;'true'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;load(io.StringIO(&lt;SPAN&gt;res.text&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;dEstado.show(&lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;But I am getting this error:&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN&gt;Py4JJavaError&lt;/SPAN&gt;&lt;SPAN&gt;                             Traceback (most recent call last)&lt;/SPAN&gt;
&lt;/PRE&gt;
&lt;PRE&gt;&lt;A href="https://localhost:8080/#" target="_blank" rel="noopener"&gt;&amp;lt;ipython-input-142-9f00764596d5&amp;gt;&lt;/A&gt;&lt;SPAN&gt; in &lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;module&amp;gt;&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;
&lt;SPAN&gt;     27&lt;/SPAN&gt; &lt;SPAN&gt;#print(result)&lt;/SPAN&gt;
&lt;SPAN&gt;     28&lt;/SPAN&gt;&lt;SPAN&gt; dEstado = (spark.read.format('csv').option('delimiter', ',').option('header', 'true')&lt;/SPAN&gt;&lt;SPAN&gt;---&amp;gt; 29&lt;/SPAN&gt;&lt;SPAN&gt;          .load(result))&lt;/SPAN&gt;&lt;SPAN&gt;     30&lt;/SPAN&gt; &lt;SPAN&gt;dEstado&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;show&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;
&lt;SPAN&gt;     31&lt;/SPAN&gt; 

&lt;/PRE&gt;
&lt;PRE&gt;&lt;A href="https://localhost:8080/#" target="_blank" rel="noopener"&gt;/content/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py&lt;/A&gt;&lt;SPAN&gt; in &lt;/SPAN&gt;&lt;SPAN&gt;get_return_value&lt;/SPAN&gt;&lt;SPAN&gt;(answer, gateway_client, target_id, name)&lt;/SPAN&gt;
&lt;SPAN&gt;    326&lt;/SPAN&gt;&lt;SPAN&gt;                 raise Py4JJavaError(&lt;/SPAN&gt;&lt;SPAN&gt;    327&lt;/SPAN&gt;                     &lt;SPAN&gt;"An error occurred while calling {0}{1}{2}.\n"&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;
&lt;SPAN&gt;--&amp;gt; 328&lt;/SPAN&gt;&lt;SPAN&gt;                     format(target_id, ".", name), value)&lt;/SPAN&gt;&lt;SPAN&gt;    329&lt;/SPAN&gt;             &lt;SPAN&gt;else&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;
&lt;SPAN&gt;    330&lt;/SPAN&gt;&lt;SPAN&gt;                 raise Py4JError(&lt;/SPAN&gt;
&lt;/PRE&gt;
&lt;PRE&gt;&lt;SPAN&gt;Py4JJavaError&lt;/SPAN&gt;&lt;SPAN&gt;: An error occurred while calling o1477.load.
: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:42)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
	at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;How can I correct that? And how can I read multiple files in a dropbox folder and transform to pyspark dataframe?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Wed, 10 Jun 2020 20:56:32 GMT</pubDate>
    <dc:creator>tmachado</dc:creator>
    <dc:date>2020-06-10T20:56:32Z</dc:date>
    <item>
      <title>Using Dropbox with Pyspark Dataframe</title>
      <link>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Using-Dropbox-with-Pyspark-Dataframe/m-p/428641#M22805</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;I am trying to transform a csv file from dropbox in pyspark dataframe with the following code:&lt;/P&gt;
&lt;DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;dropbox&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;access_token&amp;nbsp;=&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;'XX'&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;dbx&amp;nbsp;=&amp;nbsp;dropbox.Dropbox(access_token)&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;metadata,res=dbx.files_download(path_Const+&lt;/SPAN&gt;&lt;SPAN&gt;'tbEstado.csv'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;dEstado= (spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'csv'&lt;/SPAN&gt;&lt;SPAN&gt;).option(&lt;/SPAN&gt;&lt;SPAN&gt;'delimiter'&lt;/SPAN&gt;&lt;SPAN&gt;,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;','&lt;/SPAN&gt;&lt;SPAN&gt;).option(&lt;/SPAN&gt;&lt;SPAN&gt;'header'&lt;/SPAN&gt;&lt;SPAN&gt;,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;'true'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;load(io.StringIO(&lt;SPAN&gt;res.text&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;dEstado.show(&lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;But I am getting this error:&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN&gt;Py4JJavaError&lt;/SPAN&gt;&lt;SPAN&gt;                             Traceback (most recent call last)&lt;/SPAN&gt;
&lt;/PRE&gt;
&lt;PRE&gt;&lt;A href="https://localhost:8080/#" target="_blank" rel="noopener"&gt;&amp;lt;ipython-input-142-9f00764596d5&amp;gt;&lt;/A&gt;&lt;SPAN&gt; in &lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;module&amp;gt;&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;
&lt;SPAN&gt;     27&lt;/SPAN&gt; &lt;SPAN&gt;#print(result)&lt;/SPAN&gt;
&lt;SPAN&gt;     28&lt;/SPAN&gt;&lt;SPAN&gt; dEstado = (spark.read.format('csv').option('delimiter', ',').option('header', 'true')&lt;/SPAN&gt;&lt;SPAN&gt;---&amp;gt; 29&lt;/SPAN&gt;&lt;SPAN&gt;          .load(result))&lt;/SPAN&gt;&lt;SPAN&gt;     30&lt;/SPAN&gt; &lt;SPAN&gt;dEstado&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;show&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;
&lt;SPAN&gt;     31&lt;/SPAN&gt; 

&lt;/PRE&gt;
&lt;PRE&gt;&lt;A href="https://localhost:8080/#" target="_blank" rel="noopener"&gt;/content/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py&lt;/A&gt;&lt;SPAN&gt; in &lt;/SPAN&gt;&lt;SPAN&gt;get_return_value&lt;/SPAN&gt;&lt;SPAN&gt;(answer, gateway_client, target_id, name)&lt;/SPAN&gt;
&lt;SPAN&gt;    326&lt;/SPAN&gt;&lt;SPAN&gt;                 raise Py4JJavaError(&lt;/SPAN&gt;&lt;SPAN&gt;    327&lt;/SPAN&gt;                     &lt;SPAN&gt;"An error occurred while calling {0}{1}{2}.\n"&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;
&lt;SPAN&gt;--&amp;gt; 328&lt;/SPAN&gt;&lt;SPAN&gt;                     format(target_id, ".", name), value)&lt;/SPAN&gt;&lt;SPAN&gt;    329&lt;/SPAN&gt;             &lt;SPAN&gt;else&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;
&lt;SPAN&gt;    330&lt;/SPAN&gt;&lt;SPAN&gt;                 raise Py4JError(&lt;/SPAN&gt;
&lt;/PRE&gt;
&lt;PRE&gt;&lt;SPAN&gt;Py4JJavaError&lt;/SPAN&gt;&lt;SPAN&gt;: An error occurred while calling o1477.load.
: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:42)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
	at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;How can I correct that? And how can I read multiple files in a dropbox folder and transform to pyspark dataframe?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2020 20:56:32 GMT</pubDate>
      <guid>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Using-Dropbox-with-Pyspark-Dataframe/m-p/428641#M22805</guid>
      <dc:creator>tmachado</dc:creator>
      <dc:date>2020-06-10T20:56:32Z</dc:date>
    </item>
    <item>
      <title>Re: Using Dropbox with Pyspark Dataframe</title>
      <link>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Using-Dropbox-with-Pyspark-Dataframe/m-p/428655#M22806</link>
      <description>&lt;P&gt;Using &lt;A href="https://dropbox-sdk-python.readthedocs.io/en/latest/api/dropbox.html#dropbox.dropbox.Dropbox.files_download" target="_self"&gt;the&amp;nbsp;files_download method&lt;/A&gt; (or&amp;nbsp;&lt;A href="https://dropbox-sdk-python.readthedocs.io/en/latest/api/dropbox.html#dropbox.dropbox.Dropbox.files_download_to_file" target="_self"&gt;files_download_to_file&lt;/A&gt;) is the right way to download a file from&amp;nbsp;Dropbox using the Python SDK, and it looks like you already have that part working. That gives you a &lt;A href="https://2.python-requests.org/en/master/api/#requests.Response" target="_self"&gt;Python requests Response object&lt;/A&gt; that you can read from.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The error itself seems to be&amp;nbsp;occurring on the&amp;nbsp;Py4J/Spark side of things. That's not made by&amp;nbsp;Dropbox though, so I'm afraid I can't offer help with that. Perhaps someone else on the forum here has experience with that, but otherwise you may be better served on a forum for that in particular, or something more general like StackOverflow.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2020 21:03:52 GMT</pubDate>
      <guid>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Using-Dropbox-with-Pyspark-Dataframe/m-p/428655#M22806</guid>
      <dc:creator>Greg-DB</dc:creator>
      <dc:date>2020-06-10T21:03:52Z</dc:date>
    </item>
  </channel>
</rss>

