2023

7 Tips: How to manage Big Data more effectively?

Author:



Timing:

min.

Data is the new currency of business. Without their proper collection, analysis and interpretation, companies lose direction and make chaotic decisions. Jakub Rojek, responsible for the development of Big Data and cloud technologies at Tigers, gives tips on how to manage Big Data more effectively.

I believe that companies are not taking advantage of the GOOD potential of Big Data.

While they often refer to themselves as "data driven," on closer inspection it becomes apparent that many of them do not meet this standard. That's why I created this article, which is based on my experience and provides value:

Business owners to learn how they can benefit from effective data management;
Analysts, who will receive practical tips and tools to make better use of Big Data;
To anyone interested in data management who understands that a data-driven approach makes real business sense.

Here are some tips to help you manage large data sets more effectively.

‍

Introduction

After the introduction of Google Analytics 4, many people began to pay attention to BigQuery as a solution that avoids the limitations of the number of API queries in Looker Studio. Unfortunately, most people ended their Big Data adventure at this stage, using the native connection. Not delving into the area of data engineering in this case limits our options if we really want to lean heavily on the #datadriven approach and develop towards AI/ML.

At Tigers, we decided to go a step further and exploit the full potential of data collected from various sources. Our goal is not only to extract information from Google systems such as Google Analytics 4 or Google Ads, but also to explore other platforms such as Facebook or LinkedIn. We also pay a lot of attention to data extracted directly from CRM/ERP/Ecommerce systems, which are often key elements in terms of sales results.

‍

#1 Prepare the appropriate toolstack‍.

Choosing the right toolstack and technology is a key yet first step in effectively managing Big Data, both from the ETL tools side and in the context of Business Intelligence (BI) tools. Aligning the right tools with the needs and characteristics of the data ensures efficiency, consistency and precision of analysis.

For ETL tools, a valuable step is to check the schema of the data provided by the API. Sometimes software providers may include information about the data source, but not include all endpoints or tables. That's why it's crucial to read the API documentation carefully to have a full picture of the available data.

I also happened to come across solutions that provided all possible metrics from a given source, but broke each metric into separate tables. On the one hand, this may have its pluses, but in practice such a solution involved connecting each metric (table) to the BI tool as a new source. As a result, we were getting ~100 different sources which was not a comfortable solution, especially when it came to mixing data from multiple sources, within a single chart.

That's why it's important that an ETL tool enables transformation already at the level of the tool itself, even before the data is loaded into the data warehouse (data warehouse). Such functionality makes it possible to organize and transform data into a consistent and uniform format, which greatly facilitates subsequent analysis and ensures data quality. Also useful are advanced transformation functions that allow filtering, merging, aggregation or generation of unique identifiers, according to analysis needs.

‍

#2 Check distributed data sources

While it is possible to use Google Analytics 4 to analyze multiple channels, it should be noted that the situation becomes more complicated when dealing with diverse data sources and less popular CMS/ERP systems. In such cases, it can be difficult or even impossible to use GA4 to achieve full consistency and precision in analyses.

We should consider Google Analytics 4 as a tool for analyzing trends and general patterns. In order to accurately analyze key results, let's try to acquire those data that "lie" closest to the source.

For example - Company X runs campaigns in 2 advertising channels. In addition, it also uses the Amazon system and runs one stationary store. When the end of the month comes, the analyst has to spend a lot of time analyzing the report from each channel, where they are not standardized - the work on Excel begins. In addition, when reviewing the results, he sees that the revenue from Google Ads, Facebook Ads and Amazon, is quite different from what GA4 presents, plus he still has results from the stationary store system, which does not integrate with GA4 and he has to transcribe the data manually. The purpose of the analysis is to provide sales results for management, so the analyst chooses to take most of the data directly from the ERP.

Of course, this example is not meant to discourage detailed analysis of each channel individually, as this gives us interesting conclusions and insights. It's more to show how much work we have to put in to get the data from the various channels into one place, then standardize it, and only then can we start the analysis. On top of that, there are often various technical limitations.

The ideal solution involves creating a single place for the analyst to automatically pull data from various sources, model it accordingly and visualize it - is this realistic? By all means!

‍

#3 Enter analytical dashboards

At Tigers, we offer such a solution in the form of our advanced dashboards, which allow clients to access a full cross-section of data and information in one place. We are able to consolidate data from various sources, transform and model to provide clients with a complete picture of their business.

For this purpose, we mainly use Google Cloud Platform resources and technology along with external tools supporting ETL processes like Hevo, SyncWith, Mage, Databrick. For insiders - it can be said that despite Tiger Instinct we also work strongly in Python.

Creating dedicated dashboards that meet client needs is an integral part of our work. Each client has unique requirements and business goals, so we design and customize our dashboards to provide them with accurate, valuable information.

‍

#4 Pay attention to unique metrics

Unique metrics such as "Reach" often present challenges when analyzing data in Big Data. An example of this would be the reach metric in the context of Facebook Ads. If we look at the reach of each ad campaign and add up the values, we can get a certain number (let's denote it as X). However, when we look at the summary of this metric, we get a different value (let's label it Y). This is because there is a process of data deduplication for unique metrics.

This may sound complicated, but let's consider an example in which two ads from the same advertiser are displayed. When viewing both ads, the ad dashboard assigns a value of 1 to each ad. It would seem that the reach is 2, since we saw two ads. However, Facebook knows that both ads were shown to the same person and performs deduplication, i.e. eliminating repetition. As a result, we get a value of 1 in the reach summary.

In the context of building a custom Big Data system based on raw data that is updated daily, these kinds of unique metrics can be challenging. If we collect coverage for each day, the sum of these values can lead to the incorrect results I mentioned earlier.

One solution to this problem is to retrieve unique metrics at the level of the entire month, rather than for individual days. In this way, we reduce the impact of the deduplication process on the results, while maintaining adequate precision and consistency in the data.

‍

#5 Optimize operations

‍Anotherkey aspect that is important when building a Big Data environment is query optimization, especially for custom SQL queries. This issue becomes especially important when dealing with large data warehouses, where the amount of data is huge.

For those who use BigQuery as an alternative to API query limits from Google Analytics 4 in Looker Studio, optimization may not be a primary concern. However, for those who have advanced in the topic of Big Data, optimization becomes a key element.

One way of optimization is table partitioning. With partitioning, we can limit queries to a specific range based on date, which avoids the need to search the entire database. Thus, queries are more efficient and faster, improving the performance of the entire system.

Indexes are another optimization tool. Their use allows you to quickly find relevant data, eliminating the need to search through entire tables. Indexes make queries more efficient, especially for large databases.

Aggregate queries are also an effective way of optimization. The use of aggregating functions, such as SUM, AVG, COUNT, allows the calculation of values at the database level, which reduces the amount of data sent to the application and improves performance.

In addition, introducing simple constraints, such as the "LIMIT" clause, allows you to retrieve only a certain number of rows from the query results. This can significantly reduce query execution time and the amount of data that must be processed.

‍

#6 Use AI/ML

Data, as opposed to being left as archival, can be said to become more valuable when it's in motion. We're talking about using it in the context of artificial intelligence (AI) and machine learning (ML), particularly in the areas of forecasting and estimation. As part of our multi-account capabilities, we aim to develop bechmarking businesses from different areas (in anonymized form, of course). We want to provide solutions that not only analyze data and generate forecasts, but also explore and understand relationships in business data.

Market analysis allows us to monitor changes in customer preferences and behavior, and identify growing or declining trends. Based on this information, we can make better decisions on marketing strategy, resource allocation and optimization of operations. For this purpose, we also use solutions from the Google family - Vertex AI, BigQuery ML, but I can briefly reveal that we are working on a solution based on our own libraries.

‍

#7 Approach data from a strategic perspective

Although this point is not strongly "revealing," I feel a great need to include it in this guide.

Big Data should not just be viewed as a collection of information, but also as a strategic business resource.

Use data analysis as a tool for strategic decision-making. Look at data in the context of business goals and strategy. What conclusions can be drawn from data analysis? What trends and opportunities can be identified? What actions should be taken to use this information to improve marketing strategy or optimize operational processes?

Remember that data is an invaluable resource that can lead to innovation and growth. Approach them with a long-term perspective and use them to build a business strategy for success in the marketplace.

‍
‍

Then what's next!

I hope that with this article I have inspired you to look at the data in your company. Maybe an idea has already popped into your head about how to put these tips into practice?

Whether you are an eCommerce manager, analyst or freelancer, if you have a specific case related to Big Data and would like to consult on it and get recommendations - email me at jakub.rojek@tigers.pl or book a time for a consultation directly in the calendar below.

‍
Happy to answer your questions!

Appreciate the author and share his article:

Author:

June

2023

Google Analytics 4: last chance to implement



March

2023

How to use artificial intelligence - legal aspects in an interview with Magda Korol of Creativa Legal.

