Slay the jargon dragons at your company with The Marketer’s Data Dictionary: our user-friendly guide to 36 terms you need to know.

When it comes to data, accessibility is everything.

So, we’ve put together a data dictionary to help marketers simply understand complicated martech language and slay the jargon dragons standing in the way of meaningful, inclusive conversations about data at work.

Types of data

1st party data

1st party data is the data you have collected and are able to use.

2nd party data

2nd party data is the data you can receive from agreed partners.

3rd party data

3rd party data is the data collected from everything and everywhere else, well beyond your own interactions.

We’ve expanded out on first, second and third party data here.

Persistent ID

A consistent identifier (i.e. a customer number or email) used to follow a customer across different devices, think mobile web, in-app and on desktop.

De-identified ID

Personal information that cannot be associated with a specific individual. “De-identification” refers to removing personal information like an email address or a name.

Hashed data

Hashed data is data stored in an encrypted, secure format. For example is BB8C71F261C69B19446FD88243F8E579820C5D536CCD2572A5D284EEF6081D0 in hashed form.

When a CDP like Lexer sends email addresses to Google to create an audience, we send the hashed emails and Google matches it to their database of hashed emails – so no private information is transferred.


Personally Identifiable Information (PII) is data like email address, name, date of birth, physical address or a phone number that can be used to confidently identify a specific person. (It’s important you handle it correctly, especially in Social Media)

Bringing data together


The process of bringing together data from a range of sources to form a comprehensive profile of a customer, by connecting data points using deterministic or probabilistic matching.

Deterministic matching

Deterministic matching looks for an exact match between two different pieces of data. For example, an email address or phone number in two data sets can be used to make an exact match of two records.

Probabilistic matching

Probabilistic or “Fuzzy” matching calculates the likelihood of a match based on a scoring system on a range of data points. For example, two customer records with the same address and date of birth are 99.9% the same person, but two records with the same name, like John Smith, are not very likely the same person. Usually, a combination of 2-3 data points like address, DOB, name, and transactional data are used.

Data cleansing

Data cleansing involves detecting and correcting corrupt or inaccurate data. An example would be removing the value of ‘John’ from a column called ‘Age’.

Data enrichment

The process of creating a richer view of each customer record by adding data from external sources. Lexer enriches each customer record with valuable data from partners like Experian, Roy Morgan, and Mastercard to give you extra data points like their Mosaic profile or purchasing habits.

Housing data

Data lakes

Data Lakes store data in its raw format: unstructured, inconsistent and not easily queried. Companies can build data lakes by using Infrastructure-as-a-Service (IaaS) clouds including Amazon Web Services (AWS) and Microsoft Azure.

Data warehouse

A massive database where the structure is defined before the data is captured. Think a ginormous Excel spreadsheet, with rows and column titles specified in advance. Data Warehouses are easier to analyze than data lakes but generally require technical data skills and specialized software.

Analysing data

Machine learning

Machine learning is a method of data analysis where systems can learn from data, identify patterns and make decisions with minimal human intervention. Some examples of machine learning include regression analysis, looking at trends in data to predict what happens next and classification, the process of predicting the class or targets of given data points e.g. this email matches spam filters, so I will send it to your spam folder.

Customer churn modeling

Churn modeling is a method of identifying which data points can be used to indicate someone is likely to stop buying from you. It’s a quintessential form of regression testing.

Lifetime value

How much money a customer has spent with you, adding up all spend, across channels, removing discounts and refunds. This can be a challenge if you don’t have all transaction data in one place – this is where a CDP comes in really handy. Sometimes lifetime value can also mean how much a customer will spend with you in the future.

RFM model (Recency, frequency & monetary model)

RFM modeling ranks customers by the recency and frequency of their purchases and how much they’ve spent with you in the past. It’s great for identifying high-value customers for loyalty campaigns and re-engaging lapsed customers.


A way of taking an audience and expanding it to include people with similar qualities. Lookalikes may be used for prospecting and to ensure you’re reaching the largest and most relevant audience possible.

Using data in your day to day

Audience / customer onboarding

Audience onboarding involves uploading your customer data to an outside platform like Facebook and having them match it to their database.

Programmatic advertising

Programmatic advertising is the automated process of buying and selling ad space using a few tools including a Demand Side Platform (DSP). To learn more about a how a DSP fits into your marketing stack, check out Navigating Martech: DSP.

Single customer view

A comprehensive view of a customer across all channels. Single customer view is the end goal of bringing various customer data points together. A CDP achieves this rapidly and allows marketers and customer service teams to use this data to deliver personalization and contextualized customer care.

Multichannel marketing

In multichannel marketing, a brand may use different channels to interact with customers, but each channel is managed separately and with a different strategy.

Omnichannel marketing

Omnichannel marketing seamlessly connects all consumer touch points to create a consistent and progressive customer experience. It is centered around customer-centric measurements like lifetime value and loyalty. Excited? Read our guide to the Top 5 benefits of omnichannel marketing.


When it comes to Martech, the number of tools and abbreviations out there can be pretty overwhelming, which is why we developed Navigating Martech: The Complete Guide to the key platforms in the industry today. For now, let’s dive into five commonly used systems.

Customer Relationship Management (CRM)

Manages the sales and service history with every known customer. Originating from 1:1 sales workflows, a CRM can provide cross-channel contact history and servicing tools.

Data Onboarder

Expert in connecting email addresses (PII – known customers) to cookies (unknown prospects), so marketers can create addressable segments to target or suppress across the digital advertising ecosystem.

Tag Management System (TMS)

Manages the integration of tags from third-party software into owned digital properties.

Data Management Platform (DMP)

A DMP provides a centralized dataset that aggregates cookie browsing behavior (unknown prospects) to create large, de-identified audiences for ad targeting across digital channels.

Dynamic Creative Optimisation (DCO)

A system that analyses multiple data points on each visitor or email recipient and selects the ideal creative to serve from a library of dynamic or pre-created messages – all in real-time.


We’re passionate about data compliance, security, and management. We’re also certified and regularly audited and love helping our clients do the same. To learn more about our approach read our Privacy and Information Security policy.


General Data Protection Regulation (GDPR) is a reform designed to give individuals greater control of their data. It has been created to be centered around individuals – with a hefty responsibility on organizations, to empower greater transparency and accountability. Want to know more? Here’s a simple summary on how GDPR may affect your business.


Short for information security, Infosec refers to the processes and tools companies use to protect their data.

ISO 27001

ISO 27001 is a global information security standard for an information security management system or ISMS. (We’re certified!)


SFTP stands for Secure File Transfer Protocol and is a way of safely sending data over a secure connection. (If you’re sending PII or sensitive customer data over email you are breaking laws and putting your customer’s data at risk)


SOC 2 is an audit that ensures you’re securely managing your data. It focuses specifically on controls around unusual system activity, authorized and unauthorized system configuration changes, and user access levels. For example, logins from unusual devices or locations.

Amazon S3

A cloud computing storage option that groups huge amounts of data into buckets which you can query through the Amazon Web Services (AWS) API.

Slay the jargon dragons at your company

We hope our data dictionary helps you slay the jargon dragon at your company, leading to more meaningful and inclusive discussions about how data can drive value at every level.

To learn more about making data part of your every day, have a read of the 2018 Data Culture Study. You can also dive even deeper into the martech space with Navigating Martech: The Complete Guide.