The Partnership Between Administrative and Synthetic Data

Article Overview

Administrative data provides valuable insights into software performance and user behavior. However, privacy and security concerns may limit who can use this data and how it can be used. The confidentiality of personal data, access control, data retention, data sharing, and data anonymization are all critical considerations when using administrative data. The partnership between administrative and synthetic data offers an effective solution to these concerns.

Synthetic data generated from administrative data can accurately mimic the behavior of the original data while removing any privacy concerns. Statistical modeling, random sampling, and data augmentation effectively generate synthetic data. Using administrative data to generate synthetic data allows businesses to create high-quality data sets for testing and analysis while protecting personal data and complying with privacy laws. As data continues to play a crucial role in business operations, generating high-quality synthetic data will become an increasingly valuable skill for businesses.


The value of administrative data

Administrative data provides valuable insights into software performance and user behavior. Administrative data is collected that describes or supplements an organization’s regular operation, such as transactional, customer, and usage data. Its value lies in its representation of real-world use cases and its ability to help identify potential issues before they become critical problems.  

Security concerns when using administrative data

However, there are several privacy and security concerns when using administrative data. These concerns include the following:

  1. Confidentiality: Administrative data may contain sensitive information such as personal identifiers, financial information, or health data. If this information is not protected adequately, it can be misused or accessed by unauthorized individuals, leading to privacy breaches and identity theft.
  2. Access Control: Administrative data should be accessed only by authorized individuals with a legitimate need to know the information. Access control mechanisms, such as authentication and authorization protocols, should be implemented to prevent unauthorized access and ensure data security.
  3. Data Retention: Administrative data may contain personal information that should be retained only as long as necessary. Organizations should have a data retention policy that outlines how long data will be stored and when it will be disposed of securely.
  4. Data sharing: Administrative data may be shared with third-party organizations for research or other purposes. It is essential to ensure that data-sharing agreements are in place and that the third-party organization has appropriate security measures to protect the data.
  5. Data anonymization: To protect the privacy of individuals, administrative data may need to be anonymized in both the data itself and any shared output, products, and reports from that data. This involves removing personal identifiers or replacing them with a pseudonym to prevent individuals from being identified. If data is not anonymized, there is also the risk of reverse engineering the output to reveal the corresponding personal information of a person’s data used in a report.

Why partner administrative and synthetic data?

Although administrative data offers powerful insights into real-world use cases, security and privacy concerns limit the use cases for this data. In turn, these limitations can prevent robust system performance, integration, and functionality testing due to the need to limit the audience utilizing the data.

These concerns introduce the value of the partnership between administrative data and synthetic data. Generating synthetic data is one way to create data sets that mimic the behavior of real data, but without the privacy concerns that come with using actual data. Administrative data is an excellent source for generating synthetic data because it reflects real-world use cases and behavior. This data provides a wealth of information that can be used to create synthetic data sets that accurately mimic the behavior of real-world data.


Utilizing synthetic data reduces the risk of exposing critical data and compromising the privacy and security of federal, civilian, and business data.


Here are some ways administrative data can be used to generate synthetic data:

  • Statistical modeling: Statistical models can be built using administrative data and then used to generate synthetic data sets that mimic the statistical properties of the original data. This approach allows businesses to generate data that accurately reflects the characteristics of the original data.

  • Random sampling: Random sampling can be used to generate synthetic data sets that mimic the behavior of the original data. By randomly selecting data points from the administrative data, synthetic data sets can be generated that accurately reflect the behavior of the original data.

  • Data augmentation: Data augmentation involves adding noise or variability to the original administrative data to generate synthetic data sets that reflect the behavior of the original data. This approach is useful when creating data sets similar to the original data but with slight, controlled variations.

The importance of generating synthetic data using administrative data lies in creating high-quality data sets free from privacy concerns. By using administrative data, businesses can create synthetic data sets that reflect the behavior of the original data, allowing for more accurate testing and analysis. Additionally, synthetic data generated from administrative data can train machine learning models without exposing sensitive information or violating privacy laws.

Using administrative data to generate synthetic data is important because it allows businesses to create high-quality data sets without privacy concerns, leading to more accurate testing and analysis. As the importance of data continues to grow, the ability to generate high-quality synthetic data will become an increasingly valuable skill for businesses.