Tealium DataAccess: The Good Stuff

 In Under the Hood

Welcome back to the Tealium DataAccess conversation. Previously, we discussed the birth of DataAccess and its purpose. This post is meant to dive deeper into the the various elements of the solution, what each element offers, as well as the type of data being stored.

First, the *Store and *Database (DB) services are hosted on Amazon S3 and Amazon Redshift cloud services, respectively. These storage and database solutions are highly-scalable and provide top performance for large data volumes. Tealium will provide credentials and basic connection support.

Now, let’s talk about event-level data, which comes in two forms. The first is web event data, where the Tealium library has been triggered. This data is managed by the Tealium Collect tag and includes standard pageviews, click events triggered by utag.link(), and dynamic pageviews triggered by utag.view(). The second type comes from system generated events such as vendor cookie syncs – for example with Google, Criteo, The Trade Desk, MediaMath, and more.

It’s important to know that this event-level data is declared and based on the JSON open standard format. Code for processing this data is readily available by many programming languages. Here is a basic example of JSON data layer declared on page types like the home page and product page.

// Homepage

// Product Detail Page
   page_name:”tealium t-shirt”,
   product_name:”tame the digital beast”,

As you can see, it is easy to read and the single-level object makes it easy to parse.

Let’s dig into the DataAccess event-level offerings.

EventDirect: This is a client-side HTTP POST of the JSON data layer directly to your own data warehouse. If you find yourself thinking, “I already have a data warehouse onsite so I just want to ingest that data and aggregate it with my existing data,” then this may be the solution for you. In this scenario, Tealium services will be responsible for sending the data, but the responsibility of receiving and ingesting the data (via an ETL) falls onto the customer.

EventStore: This is semi-structured data – again think JSON – stored on Amazon S3 and accessible to tools such as Splunk. Splunk can ingest these JSON objects, then you can search and analyze the data. The JSON format is also fairly easy to import into Hadoop, MongoDB, and other similar systems. If you find yourself thinking, “I don’t have a data repository, but I already have a tool to query and analyze my data,” then this solution may be for you. Alternatively, if you need a tool to query your data and want to purchase a Splunk license separately then Tealium can help facilitate this.

EventDB: This is structured data stored in an Amazon Redshift data warehouse and accessible to tools such as Tableau. Structured data more closely mimics that of a typical Relational Database Management System. To expand on this, the JSON data above is leveraged to automatically create a database schema and the data is then formatted to be passed to our Redshift instance. This offer resembles a cloud-based BI tool. So if you find yourself thinking, “I don’t have a data repository and I need a tool to query and visualize my data,” then this solution may be for you.

Now that we’ve reviewed the three Event* offers, let’s talk about audience-level data and associated DataAccess offers currently available. The Audience products vary in regard to visitor profile storage, and the data is centered around the attributes applied to visitor profiles within AudienceStream.

AudienceDirect: This is a server-side HTTP POST of the JSON visitor profile directly to your own data warehouse, with the endpoint often being the same data repository as EventDirect. Think of AudienceDirect as a custom connector (responsible for triggering API calls) within AudienceStream. The most common point in time for triggering this connector is at end of visit when the visitor profile has been fully updated, although it can be triggered upon any Audience being joined or left. Here is an example JSON visitor profile:

       “Customer ID”:”8675309″
       “Product Viewer”,
       “T-shirt Buyer”
       “Tommy Tutones”

Please note, since there is the possibility of triggering multiple API calls, S3 may have multiple entries per visitor.

AudienceDB: This is the structured visitor profile data from AudienceStream stored in an Amazon Redshift data warehouse and accessible to tools such as Tableau. The database will only contain one entry per visitor (note the difference from AudienceDirect) and will represent the latest active profile for the visitor. AudienceDB data will post to the same Redshift instance as EventDB.

A final point for consideration: when EventDB and AudienceDB are purchased together, the data can be married to provide a full event-visitor level view. Think of the possibilities!

Recommended Posts
Data Access