Later Ctrl + ↑

Working with Materialized Views in Clickhouse

Estimated read time – 7 min

This time we’ll illustrate how you can pass data on Facebook ad campaigns to Clickhouse tables with Python and implement Materialized Views. What is materialized views, you may ask. Oftentimes Clickhouse is used to handle large amounts of data and the time spent waiting for a response from a table with raw data is constantly increasing. Usually, we would use ETL-process to address this task efficiently or create aggregate tables, which are not that useful because we have to regularly update them. Clickhouse system offers a new way to meet the challenge using materialized views.
Materialized Views allow us to store and update data on a hard drive in line with the SELECT query that was used to get a view. When we need to insert data into a table, the SELECT method transforms our data and populates a materialized view.

Setting Up Amazon EC2 instance
We need to connect our Python script that we created in this article to Cickhouse. The script will make queries, so let’s open several ports. In your AWS Dashboard go to Network & Security — Security Groups. Our instance belongs to the launch-wizard-1 group. Сlick it and pay attention to the Inbound rules, you need to set them as shown in this screenshot:

Setting up Clickhouse
It’s time to set up Clickhouse. Let’s edit the config.xml file using nano text editor:

cd /etc/clickhouse-server
sudo nano config.xml

Learn more about the shortcuts here if you didn’t get how to exit nano too :)

Uncomment this line:


to access your database from any IP-address:

Create a table and its materialized view
Open a terminal window to create our database with tables:

USE db1

We’ll refer to the same example of data collection from Facebook. The data on Ad Campaigns may often change and be updated, with this in mind we want to create a materialized view that would automatically update aggregate tables containing the costs data. Our Clickhouse table will look almost the same as the DataFrame used in the previous post. We picked ReplacingMergeTree as an engine for our table, it will remove duplicates by sorting key:

CREATE TABLE facebook_insights(
	campaign_id UInt64,
	clicks UInt32,
	spend Float32,
	impressions UInt32,
	date_start Date,
	date_stop	 Date,
	sign Int8
) ENGINE = ReplacingMergeTree
ORDER BY (date_start, date_stop)

And then, create a materialized view:

ENGINE = SummingMergeTree()
ORDER BY date_start
	SELECT campaign_id,
		      sum(spend * sign) as spent,
		      sum(impressions * sign) as impressions,
		      sum(clicks * sign) as clicks
	FROM facebook_insights
	GROUP BY date_start, campaign_id

More details are available in the Clickhouse blog.

Unfortunately for us, Clikhouse system doesn’t include a familiar UPDATE method. So we need to find a workaround. Thanks to the Yandex team, these guys offered to insert rows with a negative sign first, and then use sign for reversing. According to this principle, the old data will be ignored when summing.

Let’s start writing the script and import a new library, which is called clickhouse_driver. It allows to make queries to Clickhouse in Python:

We are using the updated version of the script from “Collecting Data on Facebook Ad Campaigns”. But it will work fine if you just combine this code with the previous one.

from datetime import datetime, timedelta
from clickhouse_driver import Client
from clickhouse_driver import errors

An object of the Clientclass enables us to make queries with the execute() method. Type in your public DNS in the host field, port – 9000, specify default as a user, and a databasefor the connection.

client = Client(host='', user='default', password=' ', port='9000', database='db1')

To ensure that everything works as expected, we need to write the following query that will print out names of all databases stored on the server:

client.execute('SHOW DATABASES')

In case of success the query will return this list:

[('_temporary_and_external_tables',), ('db1',), ('default',), ('system',)]

For example, we want to get data for the past three days. Create several datetime objects with the datetime library and convert them to strings using thestrftime()

date_start = - timedelta(days=3)
date_end = - timedelta(days=1)
date_start_str = date_start.strftime("%Y-%m-%d")
date_end_str = date_end.strftime("%Y-%m-%d")

This query returns all table columns for a certain period:

SQL_select = f"select campaign_id, clicks, spend, impressions, date_start, date_stop, sign from facebook_insights where date_start > '{date_start_str}' AND date_start < '{date_end_str}'"

Make a query and pass the data to the old_data_list. And then, replace their signfor -1 and append elements to the new_data_list:

new_data_list = []
old_data_list = []
old_data_list = client.execute(SQL_select)

for elem in old_data_list:
    elem = list(elem)
    elem[len(elem) - 1] = -1

Finally, write our algorithm: insert the data with the sign =-1, optimize it with ReplacingMergeTree, remove duplicates, and INSERT new data with the sign = 1.

SQL_query = 'INSERT INTO facebook_insights VALUES'
client.execute(SQL_query, new_data_list)
SQL_optimize = "OPTIMIZE TABLE facebook_insights"
for i in range(len(insight_campaign_id_list)):
    client.execute(SQL_query, [[insight_campaign_id_list[i],
                                datetime.strptime(insight_date_start_list[i], '%Y-%m-%d').date(),
                                datetime.strptime(insight_date_start_list[i], '%Y-%m-%d').date(),

Get back to Clickhouse and make the next query to view the first 20 rows:
SELECT * FROM facebook_insights LIMIT 20

And  SELECT * FROM fb_aggregated LIMIT 20 to compare our materialized view:

Nice work! Now we have a materialized view that will be updated each time when the data in the facebook_insights table changes. The trick with the sign operator allows to differ already processed data and prevent its summation, while ReplacingMergeTree engine helps us to remove duplicates.

 No comments    660   2020   Amazon Web Services   Analytics engineering   AWS   clickhouse   data analytics   python

Clickhouse as a consumer for Amazon MSK

Estimated read time – 4 min

Disclaimer: the note is of a technical nature, therefore it might be interesting to fewer people with business background.

This blog hasn’t addressed the topic of Clickhouse yet, however it’s one of the fastest databases from Yandex company. Brief account without going into details: Clickhouse – is the most efficiently written DBMS of a column type with respect to program code, information about the DBMS is quite thoroughly described in the documentation and in multiple videos on Youtube (one, two, three).

Over the last four years, I’ve been using Clickhouse in my practice as an analyst and expert in building analytical reporting. Mostly, I’ve been using Redash for solution of tasks on reporting visualization / reports with parameters / dashboards, as it is the most convenient interface for access to Clickhouse data.
However, just recently, in Looker, that I spoke about previously, an opportunity to connect Clickhouse as a data source appeared. It’s worth noting, that in Tableau connection to Clickhouse has existed for quite a while.

The architecture of the analytical service, based on Clickhouse, is predominantly cloud one. That’s how it was in the task reviewed. Let’s assume you have an allocated instance EC2 in Amazon (on which you’ve installed Clickhouse) and a separate Kafka-cluster (solution of Amazon MSK).

The task: is to connect Clickhouse as a  consumer in order to obtain information from brokers of your Kafka cluster. In fact, it’s quite thoroughly described how exactly one can connect to Kafka cluster in documentation on the site of Amazon MSK, so I won’t repeat this information. In my case, the guide helped: the topics were created by a producer from machine with installed Clickhouse, and were read from it by a consumer.

However, a problem arose: at connection of Clickhouse to Kafka as a consumer, the following error occurred:

020.02.02 18:01:56.209132 [ 46 ] {e7124cd5-2144-4b1d-bd49-8a410cdbd607} <Error> executeQuery: std::exception. Code: 1001, type: cppkafka::HandleException, e.what() = Local: Timed out, version = (official build) (from (in query: SELECT * FROM events), Stack trace (when copying this message, always include the lines below):

For a long time I’ve been searching information in Clickhouse documentation regarding a possible cause of this error, however it was in vain. The next idea that I had was checking the work of a local Kafka broker from the same machine. I installed Kafka client, connected Clickhouse, sent the data to topic and to Clickhouse and managed to read it easily, so Clickhouse consumer works with a local broker, meaning that it works in general.

Having spoken with all my acquaintances who are experts in the fields of infrastructure and Clickhouse, we weren’t able to identify the cause of the problem in stride. We’ve checked firewall, network settings,- everything was opened. It was also confirmed by the fact, that messages could be sent from a local machine to the topic of remote browser by the command bin/ and could be also read from there bin/

Thereafter, I got the idea to appeal to the main guru and developer of Clickhouse – Alexey Milovidov. Alexey eagerly tried to reply to the questions arisen and proposed a number of hypothesis, that we checked (for instance, tracing of network connections, etc.), however, even after more low-level audit we didn’t manage to localize the problem. then, Alexey recommended to turn to Michail Philimonov from the company Altinity. Michail turned out to be an extremely responsive expert, and proposed one hypothesis after another in order to conduct testing (in parallel, providing tips on a better way of testing).

As a result of our joint efforts, we discovered that the problem arises at the library librdkafka, since the other package kafkacat, that uses the same library, falls off the connection to the broker with the very same problem (Local: timed out).

After examination of connection through bin/ and connection parameters, Michail advised to add the following line into /etc/clickhouse-server/config.xml:


And, oh, what a miracle! Clickhouse connected to the cluster and pulled the required data from the broker.

I hope, this recipe and my experience will allow you to save time and powers in studying the similar problem :)

P.S. Actually Clickhouse has a very friendly community and telegram-chat where you can ask for advice and more likely to get help.

 No comments    839   2020   Analytics engineering   clickhouse   expert   troubleshooting   yandex

Looker Overview

Estimated read time – 10 min

Today we are going to talk about BI-platform Looker, on which I managed to work in 2019.

Here is the short content of the article for convenient and fast navigation:

  1. What is Looker?
  2. Which DBMS you can connect to via Looker and how?
  3. Building of Looker ML data model
  4. Explore Mode (data research on the model built)
  5. Building of reports and their saving in Look
  6. Examples of dashboards in Looker

What is Looker?

Creators of Looker position it as a software of business intelligence class and big data analytics platform, that helps to research, analyze and share business analytics in real time mode.
Looker — is a really convenient tool and one of a few BI products, that allows to work with pre-set data cubes in a real-time mode (actually, relational tables that are described in Look ML-model).
An engineer, working with Looker, needs to describe a data model on Look ML language (it’s something between CSS and SQL), publish this data model and then set reporting and dashboards.
Look ML itself is pretty simple, the nexus between the data objects are set by a data-engineer, which consequently allows to use the data without knowledge of SQL language (to be precise: Looker engine generates the code in SQL language itself on user’s behalf).

Just recently, in June 2019, Google announced acquisition of Looker platform for $2.6 billion.

Which DBMS you can connect to via Looker and how?

The selection of DBMS that Looker is working with is pretty wide. You can see the various connections on the screen shot below as of October, 2019:

Available DBMS for connection

You can easily set a connection to the database via web-interface:

Web-interface of connection to DBMS

With regard to connections to databases, I’d like to highlight the following two facts: first of all, unfortunately, Clickhouse support from Yandex is currently missing (as well as in the foreseeable future). Most likely, the support won’t appear, considering the fact that Looker was acquired by a competitor, Google.
updated: Actually, Looker supports Clickhouse from the December 2019
The second nuisance is that you can’t build one data model, that would apply to different DBMS. There is no inbuilt storage in Looker, that could combine the results of query (unlike the same Redash).
It means, that analytical architecture should be built within one DBMS (preferably with high action speed or on aggregated data).

Building of Looker ML data model

In order to build a report or a dashboard in Looker, you need to provisionally set a data model. Syntax of Look ML language is quite thoroughly described in the documentation. Personally, I can just add that model description doesn’t require long-time immersion for a specialist with SQL knowledge. Rather, one needs to rearrange the approach to data model preparation. Look ML language is very much alike CSS:

Console of Look ML model creation

In the data model the following is set up: links with tables, keys, granularity, information of some fields being facts, and other – measurements. For facts, the aggregation is written. Obviously, at model creation one can use various IF / CASE expressions.

Explore mode

Probably, it’s the main killer-feature of Looker, since it allows any business departments to get data without attraction of analysts / data engineers. And, guess that’s why use of accounts with Explore mode is billed separately.

In fact, Explore mode is an interface, that allows to use the set up Look ML data model, select the required metrics and measurements and build customized report / visualization.
For example, we want to understand how many actions of any kind were performed in Looker’s interface last week. In order to do it, using Explore mode, we select Date field and apply a filter to it: last week (in this sense, Looker is quite smart and it and it will be enough writing ‘Last week’ in the filter), thereafter we choose “Category” from the measurements, and Quantity as a metric. After pressing the button Run the ready report will be generated.

Building report in Looker

Then, using the data received in the table form, you can set up the visualization of any type.
For example, Pie chart:

Applying visualization to report

Building of reports and their saving in Look

Sometimes you can have a desire to save the set of data / visualization received in Explore and share it with colleagues, for this purpose Looker has a separate essense – Look. That is ready constructed report with selected filters / measurements / facts.

Example of the saved Look

Examples of dashboards in Looker

Systemizing the warehouse of Look created, oftentimes you want to receive a ready composition / overview of key metrics, that could be displayed on one list.
For these purposes dashboard creation fits perfectly. Dashboard is created either on the wing, or using previously created Look. One of the dashboard’s “tricks” is configuration of parameters, that are changed on all the dashboard and can be applied to all the Look at the same time.

Interesting features in one line

  • In Looker you can refer to other reports and, using such function, you can create a dynamic parameter, that is passed on by a link.
    For example, you’ve created a report with division of revenue by countries, and in this report you can refer to the dashboard on a separate country. Following the link, a user sees the dashboard on a specific country, that he clicked on.
  • On every Looker page there is a chat, where support service answers very promptly
  • Looker is not able to work with data merge on the level of various DBMS, however it can combine the data on the level of ready Look (in our case, this function works really weird).
  • Within the framework of work with various models, I have found out an extremely non-trivial use of SQL for calculation of unique values in a non-normalized data table, Looker calls it symmetric aggregates.
    SQL, indeed, looks very non-trivial:
 order_items.order_id AS "order_items.order_id",
 order_items.sale_price AS "order_items.sale_price",
 *(1000000*1.0)) AS DECIMAL(38,0))) + 
 * 1.0e8 + CAST(STRTOL(RIGHT(MD5(CONVERT(VARCHAR, )),15),16) AS DECIMAL(38,0)) ) 
 * 1.0e8 + CAST(STRTOL(RIGHT(MD5(CONVERT(VARCHAR, )),15),16) AS DECIMAL(38,0))) ) 
 / CAST((1000000*1.0) AS DOUBLE PRECISION), 0) 
 ELSE NULL END), 0)) AS "users.average_age"
FROM order_items AS order_items
LEFT JOIN users AS users ON order_items.user_id =

  • At implementation of Looker to a purchase, JumpStart Kit is mandatory, which costs not less than $6k. Within this kit you receive support and consultation from Looker at tool implementation.
 No comments    342   2020   analysis   Analytics engineering   BI-tools   looker   sql

Metrics for marketing analytics

Estimated read time – 19 min


Today in this update we have a long-read, supported by a telegram-channel Russian marketing on the subject of analytical metrics in marketing. Within the article we’ll discuss what marketing analytics is needed for, which metrics one should operate at calculation of marketing efficiency, how one can structure the work on building marketing reporting. Moreover, we will touch upon high-level KPI, discuss quite popular framework and sort out how to compute important analytical indicators. The article came up rather voluminous, and various abbreviations are used hereby, therefore we couldn’t do without a glossary.

  • Glossary
    • Revenue / Income / Sales — revenue, income (rub. / $ / euro)
    • GMV — Margin (% / rub. )
    • MAU — Monthly active users (persons)
    • WAU — Weekly active users (persons)
    • DAU — Daily active users (persons)
    • Requests — Requests (of advertising) (units)
    • Impressions — Displays (of advertising) (units)
    • Clicks — Clicks (on advertising) (units)
    • FR — Fill rate ( =Impressions / Requests) (%)
    • CTR — click through rate ( =Clicks / Impressions) (%)
    • С1 — conversion first purchase (%)
    • R, R1, R3, R7 — retention (of the 1st, the 3rd, the 7th day) (%)
    • RR (rolling retention) (%)
    • Churn — Outflow (%)
    • ARPU — average revenue per user (rub. / $ / euro)
    • ARPPU — average revenue per paying user (rub. / $ / euro)
    • cARPU — cumulative average revenue per users (rub. / $ / euro)
    • LTV (lifetime value) / CLV (customer lifetime value) — Customer lifetime value
    • ROI — return of investment (%)
    • ROAS — return on advertising spend (%)
    • ROMI — return of marketing investment (%)
    • CPA — cost per action (for example, purchase or installation of an app) (rub. / $ / euro)
    • CPC — cost per click (rub. / $ / euro)
    • CPO — cost per order (rub. / $ / euro)
    • CPS — cost per sale (rub. / $ / euro)
    • CPM (cost per mille) — cost per thousand advertising displays (rub. / $ / euro)
    • CAC — customer acquisition cost (rub. / $ / euro)
    • CARC — customer acquisition and retention cost (rub. / $ / euro)
    • SAC — share of advertising costs (%)

What marketing analytics is needed for?

In order to figure analytical metrics out, for starters we need to find out why we need marketing analytics and which questions it can reply. In general, marketing analytics is research and measurement of marketing activity in quantitative indicators. At that, most often, the aim of these actions is to evaluate the efficiency of marketing, calculate the return on marketing investments in the company.
Marketing analytics helps to find answers to the following questions:

  • How efficiently the marketing budget is allocated?
  • Which ROMI do various marketing channels provide?
  • Which target audience is the most effectively converted?
  • Which communication channels are the most / the least profitable?
  • What is the biggest source of company’s profit?

Marketing analysis is better to be initiated from identification of key business indicators and links therebetween, we’ll talk about it a bit later. In general, work on building of marketing analytics is more like creation of the system of proper metrics, their planning, measurement and reaction to changes of these metrics. More thoroughly the cycle PDCA is described in a book of W.Deming "Way out of crisis", I recommend you to get familiar with it.

Key principles of building proper analytics in marketing

Applying a systematic approach to analysis of data affecting the marketing activity can help marketologists to solve a problem, eliminate pain, provide recommendations for further marketing strategical steps. System approach implies abiding by a number of key principles, without which the analytics will turn out to be incomplete.

The task of marketing data analysis should be performed by a professional, who gets along with basics of mathematical statistics, econometrics and, obviously, can calculate, interpret the results and draw conclusions, relevant for a specific business (i.e. understanding the industry). Only in this case, the analytics can be fruitful, otherwise incorrect conclusions based on the data can even exacerbate the situation, that will lead not to budget optimization, but to devastation thereof.

Solving a task, one should address the data affecting a problem from various points of view. Different indicators, different aggregation of the data will allow to look at the problem objectively. It is desired, that the very same conclusion based on the data is repeated at least two times.

Studying current problems one shouldn’t operate the outdated retrospective data, the world is changing extremely fast, same way as the situation on the market / in a company. The analysis, held one year ago, today might have absolutely different results, therefore one needs to update the reports and data therein on a regular basis.

Results of analysis should be clear for a person from business, not familiar to technical terminology. In a perfect scenario – each report helps to wisely sort a problem out and pushes a reader to obvious conclusions. The situation, when an analyst is forced to dig into a huge pile of graphs, unclear charts and pages with numbers without any conclusions is inacceptable.

Such principles will surely help to hire competent analysts for building proper accountability.

How to structure indicators?

The system of metrics, helping to evaluate the marketing effectiveness, can be built based on the several considerations. One of the key approaches to structuration is customer’s lifecycle. Let’s try to sort it out and speak about one of the interesting frameworks for working on such system of metrics. The following major stages can be allocated in the life cycle of a customer.
1) Audience attraction – work of a marketing specialists starts even prior to the moment when potential audience turns into clients of a company
2) Engagement – stage of conversion of users who came to a website / mobile app into registered customers
3) Monetization – stage of formation of paying users (from the number of registered ones)
4) Retention / Churn – events dedicated to development and retention of the attracted audience, decreasing of the churn level

Method AARRR / Pirate Metrics

In 2007 Dave McClure developed and proposed a method called AARRR – the system of metrics, allowing start-ups to get to the bottom of business indicators. Another name of the method that can be also met, — "pirate metrics" due to the fact that the name is pronounced in a pirate way: "aarrr!".
So, let’s sort the approach out and speak about the metrics, responding to each step of the "funnel". The abbreviation consists of 5 key marketing stages:

  • Аcquisition — corresponds to paragraph 1 above
  • Аctivation — corresponds to paragraph 2 above
  • Retention — corresponds to paragraph 4 above
  • Revenue — corresponds to paragraph 3 above
  • Referral — recommendations (recently introduced stage)

At the entrance to the funnel our target audience that we desire to acquire is placed. Then, we are trying to do everything possible to register a potential buyer and turn him into a registered client (by that moment, a person who visited our site / app should realize the value of our product). Thereafter, a client conducts purchases and returns to us again and again. In the end, if he really likes our product, he will recommend it to his friends / acquaintances.

AARRR-funnel, pirate metrics (image source)

On every level of the funnel you need to choose metrics, describing transition from one state to another, that we can calculate and analyze. Let’s have an insight into each of the steps and metrics that correspond thereto. We’ll be studying each based on the example of real organizations, in order for calculation of indicators to be maximally comprehensible in practice.


Reach of potential customers is a key stage of formation of the new audience. let’s examine this crucial stage based on the example of a some mobile app and channels of traffic attraction. Oftentimes, the audience comes to application from several different sources:

  1. Organic traffic: search Google, Yandex, Bing, etc
  2. Organic mobile traffic: search in Apple Store / Google Play
  3. Commercial traffic: advertising in Facebook / Instagram, context advertising (Adwords), mobile advertising networks.

We’ll discuss it, using the example of an ad in Facebook. Each advertising announcement is targeted on the potential advertising audience, which is called "Reach" within the Facebook terminology. At that, we can optimize impressions of an ad by clicks / conversion / etc. Our task is to get as effective audience as possible for minimal amount of money. Consequently, we need to select metrics that will help us to evaluate efficiency. Let’s study them:

  • Impressions — number of advertisement displays, the indicator itself won’t say a lot and is extremely linked to the volume of potential audience, however we will need it for understanding of other metrics.
  • Clicks — number of clicks on ad in absolute figures also depends on the number of impressions.
  • Installs — number of clients who installed the mobile app
  • CTR — click through rate, calculated as Clicks / Impressions ratio and shows how efficient our ad is from the point of audience’s interest. in other words, what is click through rate of our ad.
  • CR (conversion rate) (= Installs / Clicks) — level of conversion, shows which percent of users have installed the app from those, who clicked on the advertising announcement
  • Spend — amount of money that we’ve spent on this advertisement
  • CPC (= Spend / Clicks) — shows us the cost of one click. We should operate this indicator in comparison with other ads / market benchmarks
  • CPM (= Spend / Impressions * 1000 ) — shows us the cost of thousands of impressions of an add. It is used for comparison with other ads / benchmarks
  • CPI (= Spend / Installs) — unit value of one install
  • Revenue — final revenue that we receive from this advertising announcement / campaign (you need to have tools for the proper attribution)
  • ROAS (= Revenue / Spend) — return on investments to advertising, gross income from a dollar spent. Metric shows the efficiency of advertising campaign from the point of money invested into it. For example, if ROAS is equal to 300%, it means that on every 1$ spent, 3$ were earned, and if ROAS is equal to 30%, it means that on every 1$ spent 30 cents were earned.

Thus, we already have quite good metrics palette that we can work with – examine their dynamics, compare the advertisements between each other and between various traffic sources. For example, a simple table, containing these indicators will already be the first approaching to understanding the advertising efficiency.

Facebook Campaign Efficiency

Advertisement Spend ($) Installs CPI Impressions CPM Clicks CTR CPC ROAS
Creative-1 x x x x x x x x x
Creative-2 x x x x x x x x x

This table can be reconstructed in a way, that we have dates vertically, and campaign is selected from the filter, thus we will start understanding the changes in dynamics of key indicators of traffic acquisition.

Summary: We can measure CTR of various banners and understand which from them is more appealing for the audience. this indicator can be used at A/B testing of the very same banner with selection of the most effective one. At calculation of effectiveness, you need to keep in mind CPC apart from CTR, in order to select not only the most clickable banner, but also not the most expensive one.

Key KPIs, indicators of efficiency in terms of money – CPI / ROAS, the first one shows how cheap / expensive we procure the traffic, and the second one – how well the traffic procured is monetized.


Let’s assume that we are developing a mobile game. Let’s think of what can be activation of user in this case? We have attracted users, who have installed a game to their smartphones. Our next task is to register a user (to make him a player), propose him an introduction tour to pass.
On this stage two metrics can be considered the key ones conversion into a registered user (= Registrations / Installs), conversion into users who passed a tutorial (=Tutorial Users / Installs).

Consequently, these two metrics will show us whether we require way too much from a user on the stage of registration, or, vice versa, the registration is very simple for him. The second metric will show how comprehensible the introduction to the game is, whether users are interested in passing the introduction tour, whether we require enough actions from a user.

Moreover, the last metric can be decomposed, if within the tutorial process a user needs to conduct several actions, we can examine the funnel of conversions into each of actions and understand the problematic spots of activation of new users. After activating our audience, we need to keep it, so we can make money thereafter.


Any organization would prefer to have an active base of loyal clients, who make repeating orders on a regular basis. Due to this fact, it is extremely important to track down a few key metrics: retention rate (or Rolling retention), Churn. I was sorting out construction of retention and rolling retention reports more thoroughly in one of the last blog’s articles.

Sticky Factor can be considered another crucial and fundamental metric — that is the level of engagement of users. Sticky Factor for a week is calculated rather easily: DAU / WAU * 100%. Let’s sort it out in more details based on the last example. The same way as before, we have a table — client_session, in which for every user_id the timestamps of activity are stored created_at. Thereafter, calculation of Sticky is quite easily performed with the following SQL-request:

SELECT daily.dau/weekly.wau*100 AS sticky
-- Calculating the average DAU for a week
( SELECT avg(dau) AS dau
(SELECT from_unixtime(cs.created_at, &quot;yyyy-MM-dd&quot;) AS event_date,
ndv(cs.user_id) AS dau
FROM client_session cs
AND from_unixtime(cs.created_at)&gt;=date_add(now(), -7)
AND from_unixtime(cs.created_at)&lt;=now()
GROUP BY 1) d) daily,
-- Calculating WAU for a week
( SELECT ndv(cs.user_id) AS wau
FROM client_session cs
AND from_unixtime(cs.created_at)&gt;=date_add(now(), -7)
AND from_unixtime(cs.created_at)&lt;=now() ) weekly

Along with the fundamental metrics we shall address also the metrics, connected to the tools of client base retention. These can be represented by the tools of direct marketing: sms, email, push-notifications. Each tool usually has the following descriptive metrics: number of messages sent / number of messages delivered / number of users returned. They showcase the efficiency of each of the tools.


Finally, we have reached the very key metric, that represents a point of interest of all the business users – money. Revenue, profit – money that we receive from users at acquisition of our product. In absolute figures, this metric (or the result of company’s activity) is not highly indicative, however it’s important for understanding of the future trends.

Most often, the following number of relative metrics are used, that describe the behaviour of users:
ARPU ( = Revenue / Users )— average revenue per one user
cARPU( = cumulative Revenue / Users ) — cumulative average revenue per one user
ARPPU ( = Revenue / Payers ) — average revenue per paying user
Avg Receipt (= Revenue / Purchases ) — average check
LTV / CLV — aggregate revenue per one user (life value of a customer)

I am planning to dedicate a separate post to an issue of LTV, since it’s quite a wide subject. In this post we’ll figure out ARPU, cumulative ARPU and connection with LTV. The metric ARPU shows us how much we earn on a user on average for some period of time (normally, it is a week or a month). It is useful information, however it might be not enough. The task of efficient marketing is to attract such users, that bring company more money than the amount that is spent on attraction thereof. Thus, if we are modifying the indicator ARPU and review the cumulative ARPU for 30, 60, 90, 180 days, for instance, we will receive quite a good approximation to an LTV of a user. Would be even better if we build a curve of cumulative ARPU by days.

Curve of cumulative ARPU

By adding CPI as a horizontal line, we will obtain a graph that is extremely useful for understanding. In the point of two lines’ intersection we get a day, starting from which the revenue from a user becomes higher than the costs on his acquisition (user’s acquisition becomes effective). In the example that we observed above that is the 56th day of client’s life. Solution of this task is like searching for a break-even point, however we need to keep in mind that company bears also other indirect costs, that need to be considered in order to calculate the break-even point correctly.


Recommendation of products of the company to friends, relatives and acquaintances can be considered as the best scenario of interaction with clients and the highest level of reward for the company. With regard to metrics, the following can be allocated: number of activated invited new users per one client and NPS.

Number of activated referrals allows to increase CAC / CPI. For example, we attract a user for $1 and want to preserve such a tendency. We have developed mechanics of referral links and revealed that now, after implementation, an average user invites two other users. So, in this case, the cost of acquisition of a user will be equal to $1 / 3 = $0.33. Consequently, we can afford acquiring users for $3, maintaining the value of CAC that is acceptable for us.

NPS (Net Promote Score) — metric, that showcases the level of customer loyalty. The mechanics of calculation thereof is thoroughly described in Wikipedia, therefore we won’t stop on this point. Let’s just say that it is recommended to measure NPS on a regular basis, using direct marketing communication channels.

Hierarchy of metrics within an organization

We have examined crucial metrics of every stage of AARRR quite thoroughly, and now all we have left is to find out how we can structure the indicators in order to receive a perfect dashboard.

For the solution of this task, it would be wise to decompose the goals of the company and the metrics corresponding thereto into different levels. Oftentimes, each subsequent level corresponds to a company’s department and represents KPI of this department. To put it simply, we can imagine the main high-level goal of the company – Profit and decompose it into components: Revenue and Costs.

Hierarchy of metrics within an organization

A good example is a school of English – SkyEng, on the video you can get familiar to the thoroughly designed structure of metrics of SkyEng.

Another alternative might be construction of dashboard structure on the basis of AARRR framework, sorted out above. Schematically, such dashboard will look as follows:


Today we have studied the key marketing metrics, that will help to track the changes on every stage of marketing funnel and tell you about the efficiency of each stage, as well as become useful tool of any marketologist’s activity.

References to the topic:

 No comments    130   2019   analytics   Analytics engineering   longread   marketing