Analyzing and Visualizing WeRateDogs project

  • This report describe three insights regarding analyzing WeRateDogs project and each insights have its own visualizations.
  • Especially, it is focus on retweet_count column to understand its correlation with other columns.

1 Understand distribution of tweets as per retweet_count

We need to understand distribution of number of tweets as per retweet_count.
As we can see 'Whole tweets as per retweet_count', there is very small number of tweets are retweeted extremely high

  • In the first histogram, we observe that very small number of tweet have big retweet_count.
  • It seems that most of tweets's retweet_count is smaller than 10000
<matplotlib.axes._subplots.AxesSubplot at 0x1d9012a2710>
  • To analyze more regarding between the number of tweets and retweet_count, all tweets they are divided into two groups, as most of tweets have (retweet_count < 10000
    • df.retweet_count<10000 , 1970 tweets are belong here
    • this group is gradually decreasing as per retweet_count then other group
    • df.retweet_count>=10000, only 98 tweets are belong here
    • this group is decreasing extremely as per retweet_count then other group
<matplotlib.axes._subplots.AxesSubplot at 0x1d90140da20>
<matplotlib.axes._subplots.AxesSubplot at 0x1d9016bae48>

2 Distribution of tweets as per retweet_count and favorite_count

To know tweet's distribution as per retweet_count and favorite_count, and also we hope to know regarding correlation coefficient between them

  • According scatter plot of retweet_count/favorite_count, they have very positive relationship
  • 0.85 is correlation coefficient value.
<matplotlib.axes._subplots.AxesSubplot at 0x1d90173a668>
  • Actual correlation coefficient value
0.8560484868962263

If tweet is recognized as doggo, floofer, pupper or puppo, theses tweet's retweet_count and favorite_count are higher than other?

  • Correlation coefficient between retweet_count and favorite_count as per doggo, floofer, pupper or puppo
    • Two groups are defined like below, and they have similar correlation coefficient
    • One group have at least one doggo, floofer, pupper or puppo
    • 0.87
    • The group have none of doggo, floofer, pupper and puppo
    • 0.87
  • Conclusion is that If tweet is recognized as doggo, floofer, pupper or puppo, retwee_count and favorite_count are not different with other tweets
    • Even correlation coefficient of two groups are same as 0.87, distribution/shape of scatter plot might be different.
    • However, distribution/shape of scatter plot from two groups are similar
<matplotlib.axes._subplots.AxesSubplot at 0x1d9020e9048>
  • Actual correlation coefficient value of group have at least one doggo, floofer, pupper or puppo
0.8782001471840263
<matplotlib.axes._subplots.AxesSubplot at 0x1d9031125f8>
  • Actual correlation coefficient value of group have none of doggo, floofer, pupper and puppo
0.879759492217311