6 posts tagged

redash

Using parameters in Redash

The most convenient and useful thing in Redash is parameters. Parameters can be both in a report and in dashboards.
Parameter – is an element of the interface, controlled by a user producing a report.

Parameter in a report can be of the following types:

Each type explained separately:

  • Text – text input field, can be used in constructions of LIKE type, such parameter can’t be applied in public dashboards
  • Number – number, entered by a user
  • Dropdown list – list of values, from which a user can select just one or several values (not long ago, an option of multi-entry of parameters in a dropdown list appeared)
  • Query dropdown list – similarly to the previous one, however the values will be taken from the existing query results
  • Date / Date and Time / Date and Time (with seconds) – fields of date entry
  • Date Range / Date and Time Range / Date and Time Range (with seconds) – fields of date ranges entry.
    It’s convenient to use for the following construction
between '{{parameter.start}}' and '{{parameter.end}}'

In case of dashboard, a situation might arise, when a parameter is named the same way in several queries, then it will become common for all the dashboard, it’s handy.

One of not obvious, but rather useful solutions: how to make a parameter disabled?
Let’s assume, that parameter of “dropdown list” type is named parameter and we want to set it for a column geo of a table, then the code of the query will look approximately like that:

WHERE
    ('{{parameter}}' = 'Disabled' or geo = '{{parameter}}')

At that, surely, ‘Disabled’ should be added to the values of the dropdown list.

 No comments    27   1 mon   parameter   redash

How to calculate Retention?

In this post we will discover, how to properly construct a report on Retention with application of Redash and SQL language.
For starters, let’s explain in a nutshell what the metric Retention rate is, why it is important,

Retention rate

Retention rate metric is widespread and is particularly popular within the mobile industry, since it allows to understand how well a product engages the users into daily use. Let’s recall (or discover), how Retention is calculated:

Retention of day X – is N% of users that will return to the product on day X. In other words, if on some specific day (day 0) 100 new users came, and 15 returned on the first day, then Retention of the 1st day will be equal to 15/100=15%.
Most commonly, Retention of days 1, 3, 7 and 30 are singled out as the most descriptive metrics of a product, however it’s useful to address Retention curve as a whole and make conclusions, proceeding from it.

Retention curve

In the end, we are interested in construction of such curve, that shows the retention of users from day 0 to day 30.

Retention rate curve from day 0 do day 30

Rolling Retention (RR)

Besides classic Retention rate, Rolling Retention (hereinafter, RR) is allocated. At calculation of RR, apart from day X, all the subsequent days are also considered. Thus, RR of the 1st day – the amount of users who returned on the 1st and subsequent days.

Let’s compare Retention and Rolling Retention of the 10th day:
Retention10 — the amount of users, who returned on the 10th day / the amount of users, who installed the app 10 days ago * 100%.
Rolling Retention10 — the amount of users, who returned on the 10th day or later / the amount of users, who installed the app 10 days ago * 100%.

Granularity (retention of time periods)

In some industries and respective tasks, it is useful to understand the Retention of a specific day (most often, in the mobile industry), in other cases it is useful to understand the retention of users on various time intervals: for example, weekly or monthly periods (oftentimes, it’s handy in e-commerce, retail).

An example of cohorts by months and monthly Retention respective thereto

How to build a Retention report on SQL language?

We have sorted out above how to calculate Retention in formulas. Now let’s apply it with SQL language.
Let’s assume, that we have two tables: user — storing data about users’ identifiers and meta-information, client_session — information on visits of the mobile app by users.
Only these two tables will be present in the query, so you can easily adapt the query to yourself.
note: within this code, I am using Impala as DBMS.

Collecting the size of cohorts

SELECT from_unixtime(user.installed_at, "yyyy-MM-dd") AS reg_date,
          ndv(user.id) AS users
   FROM USER
   WHERE from_unixtime(user.installed_at)>=date_add(now(), -60)
     AND from_unixtime(user.installed_at)<=date_add(now(), -31)
   GROUP BY 1

Let’s sort out this pretty simple query: for every day we calculate the number of unique users for the period [60 days ago; 31 days ago].
In order not to mess with documentation: command ndv() in Impala is analogue of a command count(distinct).

Calculating the number of returned users on each cohort

SELECT from_unixtime(user.installed_at, "yyyy-MM-dd") AS reg_date,
          datediff(cast(cs.created_at AS TIMESTAMP), cast(installed_at AS TIMESTAMP)) AS date_diff,
          ndv(user.id) AS ret_base
   FROM USER
   LEFT JOIN client_session cs ON cs.user_id=user.id
   WHERE 1=1
     AND datediff(cast(cs.created_at AS TIMESTAMP), cast(installed_at AS TIMESTAMP)) between 0 and 30
     AND from_unixtime(user.installed_at)>=date_add(now(), -60)
     AND from_unixtime(user.installed_at)<=date_add(now(), -31)
   GROUP BY 1, 2

In this query, the key part is contained in the command datediff: now we are calculating for each cohort and for each datediff the number of unique users with the very same command ndv() (practically, the number of users, who returned within the days from 0 to 30).

Great, now we have the size of cohorts and the number of returned users.

Combining all together

SELECT reg.reg_date AS date_registration,
       reg.users AS cohort_size,
       cohort.date_diff AS day_difference,
       cohort.ret_base AS retention_base,
       cohort.ret_base/reg.users AS retention_rate
FROM
  (SELECT from_unixtime(user.installed_at, "yyyy-MM-dd") AS reg_date,
          ndv(user.id) AS users
   FROM USER
   WHERE from_unixtime(user.installed_at)>=date_add(now(), -60)
     AND from_unixtime(user.installed_at)<=date_add(now(), -31)
   GROUP BY 1) reg
LEFT JOIN
  (SELECT from_unixtime(user.installed_at, "yyyy-MM-dd") AS reg_date,
          datediff(cast(cs.created_at AS TIMESTAMP), cast(installed_at AS TIMESTAMP)) AS date_diff,
          ndv(user.id) AS ret_base
   FROM USER
   LEFT JOIN client_session cs ON cs.user_id=user.id
   WHERE 1=1
     AND datediff(cast(cs.created_at AS TIMESTAMP), cast(installed_at AS TIMESTAMP)) between 0 and 30
     AND from_unixtime(user.installed_at)>=date_add(now(), -60)
     AND from_unixtime(user.installed_at)<=date_add(now(), -31)
   GROUP BY 1, 2) cohort ON reg.reg_date=cohort.reg_date
    ORDER BY 1,3

We have received the query, that calculates Retention for each cohort, and, eventually, the result can be displayed as follows:

Retention rate, calculated for each cohort of users

Construction of the sole Retention curve

Let’s modify our query a bit and obtain the data for construction of one Retention curve:

SELECT 
       cohort.date_diff AS day_difference,
       avg(reg.users) AS cohort_size,
       avg(cohort.ret_base) AS retention_base,
       avg(cohort.ret_base)/avg(reg.users)*100 AS retention_rate
FROM
  (SELECT from_unixtime(user.installed_at, "yyyy-MM-dd") AS reg_date,
          ndv(user.id) AS users
   FROM USER
   WHERE from_unixtime(user.installed_at)>=date_add(now(), -60)
     AND from_unixtime(user.installed_at)<=date_add(now(), -31)
   GROUP BY 1) reg
LEFT JOIN
  (SELECT from_unixtime(user.installed_at, "yyyy-MM-dd") AS reg_date,
          datediff(cast(cs.created_at AS TIMESTAMP), cast(installed_at AS TIMESTAMP)) AS date_diff,
          ndv(user.id) AS ret_base
   FROM USER
   LEFT JOIN client_session cs ON cs.user_id=user.id
   WHERE 1=1
     AND datediff(cast(cs.created_at AS TIMESTAMP), cast(installed_at AS TIMESTAMP)) between 0 and 30
     AND from_unixtime(user.installed_at)>=date_add(now(), -60)
     AND from_unixtime(user.installed_at)<=date_add(now(), -31)
   GROUP BY 1,2) cohort ON reg.reg_date=cohort.reg_date
    GROUP BY 1        
    ORDER BY 1

Now, we have average by all the cohorts Retention rate, calculated for each day.

More on the subject

 No comments    23   3 mon   analysis   redash   sql   visualisation

Building a funnel-report in redash

So, we’ve been planning to review Funnel-visualization of a report in Redash.
First and foremost, let’s build a request to the data source that we’ve created – Google Analytics.

The following text needs to be placed in the request console:

{
    "ids": "ga:128886640",
    "start_date": "30daysAgo",
    "end_date": "yesterday",
    "metrics": "ga:users,ga:goal1Completions,ga:goal2Completions,ga:goal3Completions"
}

In this request we are asking API Google Analytics to provide data for the last 30 days on the account GA: 128886640. We want to see the number of users and the number of completion of goals 1, 2 and 3.

As a result, our table will look as follows:

ga:users ga:goal1Completions ga:goal2Completions ga:goal3Completions
3,926 105 41 32

Great, that’s right what we need in order to build a funnel.
Now I will tell you about one very useful Redash feature: query-results. In order to connect tables with results of queries’ execution, we go to Data Sources and search for query-results (beta). Connecting new data source.
Now we have an opportunity to refer to results of Redash queries. Thus, for instance, we can use the results of a requests to Google Analytics API.

How to do it?
We need to choose a data source query-results on the left:

Drop down menu with selection of data sources (in the console – on the left)

Now we’ll learn to make funnel-visualization. For this purpose, we write the following SQL-query:

select 'Add a good to the shopping cart' as step_name, ga_goal1Completions as goalCompletion from query_8
union all
select 'View the shopping cart' as step_name, ga_goal2Completions from query_8
union all
select 'Order processing' as step_name, ga_goal3Completions from query_8

In this case query_8 – is the very table with results of request to the data source Google Analytics.

Let’s set visualization:

Carefully selecting parameters, in order to achieve the desired result

As a result, we receive the funnel of conversions from one goal to another:

You can display this funnel in the dashboard and add filters / parameters thereto.
 No comments    12   2018   google analytics   redash   visualisation

How to connect Google Analytics to Redash?

In this article we will take a look at how to connect the data source Google Analytic to the service Redash [We have already examined Redash and its opportunities more thoroughly in the previous notes].

Creating service account in Google

Moving to console of service accounts.

Creating new service account

In the window of account creating we insert the name, forming a new key afterwards. We select that we need JSON key and then press “Create”.

Integrating Analytics API

For the service account we’ve created, we need to integrate Analytics API.

When we’ve set everything up, Analytics API should be of green colour

Adding service user to Google Analytics

Next, we need to create the service user we’ve created to Google Analytics. The user will look approximately as follows:
user@PROJECT-ID.iam.gserviceaccount.com.
It is necessary to add the user with rights to Reading and View.

Creating new data source in Redash

Moving to settings (Settings) -> Adding new data source

Connecting new data source.

We are interested in data source Google Analytics, therefore we search “google”:

Searching google analytics in data sources.

Let’s recall where we’ve saved JSON file, we are going to need it now

Selecting the JSON file created before

Writing a query to the new data source

The query in Redash looks as follows:

{
    "ids": "ga:128886640",
    "start_date": "30daysAgo",
    "end_date": "yesterday",
    "metrics": "ga:users,ga:newUsers,ga:goal1Starts,ga:goal2Completions,ga:goal3Starts,ga:transactions,ga:transactionRevenue", 
    "dimensions": "ga:date"
}

How to know parameters for query execution?

Google has a great resource Query Explorer, in which one can find all the required metrics and measurements, that are available in Google Analytics.

I hope, this instruction was useful for you, further on we will find out how to make a goals funnel in Redash, basing on the data from Google Analytics.

 No comments    19   2018   google analytics   redash

Visualization of data in Redash

It is easy and handy to visualize the information in Redash, and in this post I will review the examples of data display on various charts. All the examples can be found in time series, constructed based on the data for each month.
Since I am all for analytics, along with graphs we will be exploring useful business indicators. Let’s start with quite a traditional metric for retail/e-commerce AOV (Average Order Value) – the average order value (in this case, for one month). The indicator allows to track changes, connected to consumer behaviour (whether they started buying less or more on average).

Example of a bar chart in Redash on the basis of AOV (Average Order Value) indicator

In terms of bar chart display, everything is pretty common, but handy – there is an opportunity to manage the colours of the chart, data labelling, format of data labelling (remove or display the data after comma).
Oftentimes, dynamics is much more evident, if looking either at traditional chart or at so-called area-chart. In this case, we are exploring new users’ dynamics, as well as which part of MAU (Monthly Active Users) belongs to new users.

This is a stacked chart – meaning that data of 2 rows are summarized, and one is shown above another.

In this example our chart is as informative as possible – we make one understand, which share belongs to new users, and, by stacking, we showcase the number of active users per month (in essence, killing two birds with one stone).
Actually, we could present the data in a slightly different way. For example, mix of various chart types is quite popular. Let’s imagine that MAU is represented by a bar chart (green on the chart), and the share of new users from MAU by red line, that is on an auxiliary (right) axis.

Two types of charts on one graph.

With Redash you can make pivot tables, display funnels and cohorts, and also use maps to display geo-data.
In the next posts I will tell you about a chart for funnel construction (however, prior to that we will learn how to involve google analytics).

 No comments    598   2018   redash   visualisation
Earlier Ctrl + ↓