technical Archives - TechGDPR

Is an IP address considered personal data?

AJ Richter — Tue, 24 Mar 2026 07:33:49 +0000

The concept of personal data lies at the heart of the General Data Protection Regulation (GDPR), shaping the scope of its protections and obligations. Among the most debated examples of such identifiers are IP addresses. While often perceived as neutral technical data, regulatory authorities and courts within the European Union have clarified that IP addresses can constitute personal data when they enable identification, directly or indirectly. Understanding why IP addresses fall within the GDPR’s scope requires examining legal interpretation, regulatory guidance, and practical realities of online data processing.

What qualifies as personal data?

Article 4.1 of the GDPR defines personal data as “any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”

The EDPB explicitly identifies IP addresses as being personal data due to their ability to identify individual data subjects. If an IP address is successfully anonymized, then under the GDPR it is no longer considered personal data.

The French Data Protection Authority (CNIL) ruled over a case dealing with the transfer of personal data to a company not in the EU. In the decision, the CNIL wrote:

“It should be noted that online identifiers, such as IP addresses or information stored in cookies can commonly be used to identify a user, particularly when combined with other similar types of information. This is illustrated by Recital 30 GDPR, according to which the assignment of online identifiers such as IP addresses and cookie identifiers to natural persons or their devices may “leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.” In the particular case where the controller would claim to not have the ability to identify the user through the use (alone or combined with other data points) of such identifiers, he would be expected to disclose the specific means deployed to ensure the anonymity of the collected identifiers. Without such details, they cannot be considered anonymous.”

What is an IP address?

An IP address is a way of identifying a device or user attached to the Internet. It is a set of numbers that distinguishes how the device requests and receives information from the Internet. The two main formats are IPv4 and IPv6. Originally, IPv4 was the sole way of identifying devices but it does not allow for as many unique addresses that are needed in the modern age.

The format of IPv4 addresses are xxx.xxx.xxx.xxx where x is a decimal number. The format of IPv6 addresses is hexadecimal (2001:db8::ff00:42:8329), which means a value can be 0-9A-F. Static IP addresses are IP addresses that are constant and dynamic IP addresses can change over time. IP addresses can identify explicit addresses or the exact location of devices.

The GDPR perspective on IP addresses

The GDPR explicitly includes “online identifiers” (e.g., IP addresses) as personal data when they can identify a person. Even if the controller doesn’t have the identifying data itself, if there are means reasonably likely (e.g., legal processes to get ISP logs) to link an IP to a person, then it qualifies as personal data. This logic comes from the CJEU case Breyer (C-582/14). The CJEU relied on Recital 26 of the GDPR, which states that in determining whether a person is identifiable, “to determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.”

IP addresses can be personal data if the controller has legal ways to obtain additional info to identify someone via an ISP. This is due to the objective possibility of identification of a data subject. Under the GDPR there is less concern with whether it is probable or whether it has happened and the concern lies with whether it is objectively possible to identify an individual. Given an IP address, it is possible to identify an individual. EDPB decisions affirm that online identifiers like IP addresses are often treated as personal data because they can be combined with other information to profile or identify a data subject.

Personal data vs PII

Personal data, in the context of the GDPR, covers a much wider range of information than personally identifiable information (PII), commonly used in North America. In other words, while all PII is considered personal data, not all personal data is PII. For more information about PII vs personal data, read our blog post on the matter.

Device IDs, IP addresses and Cookies are considered as personal data under GDPR. According to the definition of the PII; however, they are not PII because they are anonymous and cannot be used on their own to identify, trace, or identify a person.

PII includes any information that can be used to re-identify anonymous data. Information that is anonymous and cannot be used to trace the identity of an individual is non-PII. Device IDs, cookies and IP addresses are not considered PII for most of the United States. But some states, like California, do classify this data as PII. California classifies aliases and account names aspersonal information as well.

Controllers must treat IP addresses as personal data

For organizations, this means IP addresses cannot be treated as neutral technical data. Controllers must:

Identify a lawful basis for processing (e.g. consent, legitimate interest, contract performance).
Provide transparency in privacy notices, clearly explaining why IP addresses are collected, who receives them (e.g., third-party providers), and how long they are retained.
Apply data minimisation and storage limitation, ensuring IP data is only collected when necessary and retained for no longer than required.

In practice, this is highly relevant when embedding third-party services such as Google Fonts or analytics tools. Whenever a website loads resources from Google servers, the user’s IP address is transmitted to Google by default. Even when using Google Analytics with IP anonymisation enabled, the IP address is initially collected before truncation. The anonymisation feature represents a commitment by Google not to further process the full IP address, but technically, the IP is still transmitted during the request phase. From a strict GDPR perspective, this transmission itself constitutes processing.

ePrivacy Directive

IP address collection via cookies or similar tracking technologies also engages the ePrivacy Directive. Where IP processing is linked to tracking or storing information on a user’s device, prior consent is generally required unless the processing is “strictly necessary” for providing the requested service. This creates a dual compliance requirement: organizations must assess both a GDPR lawful basis and ePrivacy consent obligations.

Anonymisation, pseudonymisation & risks

Pseudonymisation can reduce risks and demonstrate accountability, but it does not remove GDPR applicability. Organizations must still implement appropriate technical and organisational safeguards. In order to pseudonymize IP addresses, it is necessary to obscure the IP address. This is often done by:

For IPv4 addresses, the last segment is replaced with a zero or removed.
- Example: 123.456.789.123 → 123.456.789.0
For IPv6 addresses, a similar approach is applied, truncating the last portion.

Guidance from the European Data Protection Board makes clear that true anonymization must be irreversible. Simple IP truncation or masking is typically considered pseudonymization, not anonymization. This is because re-identification may still be possible, especially when combined with other data points. IP truncation reduces identifiability but does not automatically result in anonymisation. In most cases it constitutes pseudonymisation, meaning GDPR obligations still apply. Simply put: IP truncation is a risk-reduction measure (pseudonymization), not true anonymization under GDPR standards, unless re-identification is demonstrably impossible.

Real-world examples

Analytics and server logs: IP addresses used for traffic analysis remain personal data.
Security and abuse detection: Legitimate interest may apply, but retention must be limited.
Advertising and profiling: IP-based tracking combined with cookies generally requires prior consent and careful transparency measures.

Conclusion

Under the GDPR, personal data encompasses far more than obvious identifiers such as names or identification numbers. It includes any information that can reasonably be linked to an individual. IP addresses, whether static or dynamic, fall within this definition when identification is objectively possible. This identification includes even if indirect or requiring additional data from third parties. Reach out to TechGDPR for any help with regards to understanding the nuances of data protection legislative requirements.

The post Is an IP address considered personal data? appeared first on TechGDPR.

AI Data Retention Strategy under the GDPR and the EU AI Act: Reconciling the Regulatory Clock

AJ Richter — Wed, 26 Nov 2025 15:11:23 +0000

Artificial Intelligence (AI) is reshaping industries, but organizations developing AI systems face a critical, often overlooked strategic risk: managing the retention of training data in compliance with European Union (EU) law. The GDPR emphasizes rapid deletion of personal data, while the EU AI Act requires long-term archival of system documentation. Navigating these conflicting requirements is essential for legal compliance, operational efficiency, and risk mitigation. An effective AI data retention strategy under the GDPR and the EU AI Act is now essential for organisations developing, deploying, or governing artificial intelligence systems in the European Union.

Executive Summary: The Dual Compliance Imperative and Strategic Findings

Organisations that leverage advanced data processing, particularly those developing complex Artificial Intelligence (AI) systems, face a critical and often unrecognized strategic risk: the prolonged retention of training data. European Union (EU) law establishes conflicting imperatives regarding data lifecycle management, creating a fundamental compliance challenge. The General Data Protection Regulation (GDPR) mandates personal data erasure as soon as the data is no longer required for its established purpose, while the newly implemented EU AI Act demands lengthy archival of system documentation.

The GDPR is the primary constraint on personal data, and the AI Act governs long-term retention of non-personal audit and system records.

The Inescapable Regulatory Conflict: Delete Now vs. Document for a Decade

The core of the conflict lies in the tension between personal data protection and system accountability. The GDPR is clear: personal data must be erased once its specific processing purpose is fulfilled. This is enforced by the Storage Limitation Principle (Article 5(1)(e)). Retention beyond this defined necessity, even if the data might be useful for future research or system retraining, is deemed a direct violation unless a new, distinct, and lawful purpose is established.

Conversely, the EU AI Act introduces stringent requirements for system traceability, particularly for High-Risk AI Systems (HRAS). Providers of HRAS must maintain comprehensive technical documentation, quality management system records, and conformity declarations for up to 10 years after the system is placed on the market (Article 18, EU AI Act). This requirement applies to system records, ensuring long-term accountability, but does not override the fundamental protection afforded to individuals’ data under the GDPR.

The GDPR Foundation: The “Storage Limitation” Principle

The entire framework of data retention under EU law rests on the GDPR’s Storage Limitation Principle (Article 5(1)(e)).This foundational rule dictates that personal data must be kept “for no longer than is necessary for the purposes for which the personal data are processed.” This is the core principle driving all retention decisions.

Personal data shall be:
(e) kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed; personal data may be stored for longer periods insofar as the personal data will be processed solely for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) subject to implementation of the appropriate technical and organisational measures required by this Regulation in order to safeguard the rights and freedoms of the data subject (‘storage limitation’);
GDPR Article 5(1)(e)

The GDPR does not set generic retention times, instead placing the full burden on the data controller to define, document, and justify a specific deletion timeline for every category of data. If personal data (which is defined broadly to include information beyond PII, like cookie IDs) is used to train a system, the retention clock starts ticking. Organisations leveraging advanced data processing face a critical strategic risk: retaining training data for too long. The GDPR is unambiguous; personal data must be erased once its specific processing purpose. Retention beyond that, even for potential future research, is a direct violation unless a new, distinct, and lawful purpose is established.

Defining the Critical Strategic Risk for GDPR non-compliance

The strategic risk is precisely defined by failing to establish, document, and legally justify a specific deletion timeline for every category of personal data used in the training process. The absence of generic retention times in the GDPR places the full burden of definition and justification squarely upon the data controller.

This environment forces organizations to confront a critical trade-off: is the unproven, speculative future value of raw personal data worth the risk of fines and potential data breaches? The calculation strongly favors deletion. As,

Failing to define and document specific deletion timelines exposes organizations to GDPR violations.
Retaining data for future retraining or academic purposes is legally indefensible once the initial training purpose is fulfilled.
Financial penalties for non-compliance can exceed the cost of implementing compliant, minimal-data systems.

The EU AI Act Layer: Traceability and Documentation

The EU AI Act introduces a layered approach to retention centered on system accountability rather than individual personal data. The rules are tied to the system’s risk profile, with High-Risk AI Systems (HRAS) (EU AI Act, Chapter 3) having the most stringent obligations.

Data Governance (Article 10) for HRAS requires that training, validation, and testing data sets be relevant, representative, and free of errors. While not a direct retention rule, this implicitly requires maintaining data sets for a period necessary for auditing and quality checks during the development phase.

The most critical requirement is Documentation Retention (Article 18): HRAS providers must keep key records (Technical Documentation, Quality Management System, etc.) for 10 years after the system is placed on the market. This 10-year rule applies to documentation and metadata, not the raw personal data itself, which must be deleted sooner under the GDPR. This 10-year period covers documentation, quality records, and conformity declarations. It is vital to understand that this does not override the GDPR’s Storage Limitation Principle (Article 5(1)(e)).

Raw personal data used for training must still be deleted sooner. However, the requirement for Record-Keeping (Logging) (Article 12) means that systems must automatically record events and usage logs. While these logs should ideally be anonymised, their retention period must be “appropriate” extending the non-personal data record-keeping timeline. This mandates a long-term, non-personal data retention strategy that must be carefully integrated with the strict, short deletion cycles required by the GDPR for raw personal data.

Blending the GDPR and EU AI Act Requirements

The intersection of the GDPR and the EU AI Act necessitates a blended compliance strategy, particularly concerning purpose and identification. The GDPR’s Purpose Limitation principle (Article 5(1)(b)) demands that the purpose for processing, such as system training, be explicitly defined. This definition directly dictates the maximum legal retention period for personal data.

Personal data shall be:
(b) collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes; further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes shall, in accordance with Article 89(1), not be considered to be incompatible with the initial purposes (‘purpose limitation’);
GDPR Article 5(1)(b)

Implementing De-Identification in Your AI Data Retention Strategy under the GDPR and the EU AI Act

The best path for long-term data use is de-identification:

Pseudonymisation only reduces identifiability; the data remains personal data under the GDPR and the Storage Limitation Principle still applies.
Anonymisation is the only legal release valve. If the data is permanently and irreversibly stripped of identifiers; it is no longer considered personal data (GDPR Recital 26). Therefore, it can be retained indefinitely.

It’s critical to remember that while the raw personal data must be deleted, the trained system itself (the output) can be retained.

Reconciling the GDPR’s Right to Erasure with the EU AI Act Traceability

The most direct legal challenge is reconciling the GDPR’s Right to Erasure (Article 17) with the ongoing need for system traceability under the AI Act. If a system is trained on personal data, the controller must maintain the technical ability to honor an erasure request.

This is the Purpose Limitation Conflict: if the initial purpose (training) is complete, retaining the raw personal data is a violation of the GDPR. Developers must implement technical solutions like secure deletion protocols immediately after a system is finalised. Using robust, irreversible anonymisation is the only way to retain data sets without triggering the GDPR’s strict retention clock.

When facing overlapping regulations, the GDPR always acts as the primary constraint on personal data. Its Storage Limitation Principle sets the hard ceiling for raw personal data retention. This is regardless of the EU AI Act’s documentation rules.

The crucial legal distinction is that PII and other personal data used to create the system must be subject to rigorous deletion procedures the moment the training purpose ends. The technical documentation, metadata, and system logs (which should contain no personal data) are then subject to the EU AI Act’s extended 10-year retention rules. This hierarchy demands that the deletion process (the GDPR) must happen first, leaving only the audit trail (EU AI Act) behind.

The documentation required under the EU AI Act must serve dual purposes: it must confirm the system’s data quality (EU AI Act) and must also provide evidence of the deletion or robust anonymization event, confirming that the GDPR timeline was honored.

Table: Comparison of differences

Summary	GDPR (Personal Data Protection)	EU AI Act (HRAS Accountability)
Asset	Raw PII, Pseudonymous Data, Identifiable Metadata.	Technical Documentation, QMS, System Logs (Non-Personal), Conformity Records.
Core Principle	Storage Limitation (Delete when purpose ends).	Accountability & Traceability (Document for 10 years).
Max Retention Period	Defined by Controller’s Justified Purpose (Short/Medium Term).	10 years after the system is placed on the market.
Legal Hierarchy	Primary binding constraint on identifiability.	Governs the necessary audit trail after GDPR constraints are met.
Highest Penalty Risk	4% Global Annual Turnover (Financial).	Operational disruption, market access denial.

The Financial & Operational Cost of AI Data

Compliance is not just a cost, but a powerful risk mitigator. Storing raw personal data beyond the necessary period is a direct violation of the GDPR’s Storage Limitation Principle. This exposes an organisation to fines of up to 4% of global annual turnover (GDPR Article 83).

Beyond the fines, excessive data retention creates massive operational liability. Longer storage times mean higher infrastructure costs and a larger surface area for security breaches. Every day the data is held, the probability of a costly Data Subject Request (DSR) increases, demanding expensive legal and technical personnel to fulfill. Compliant, timely deletion is ultimately the most financially responsible strategy.

Should you store raw personal data for training?

Organisations often retain raw data for perceived future utility, perhaps for retraining a system. The GDPR forces a hard strategic trade-off: is the speculative future value of that raw personal data worth the immediate, tangible risk of massive fines and data breaches?

The EU AI Act demands auditable records, but these should be built from fully anonymised data or non-personal data metadata. The cost calculation is simple: the threat of financial penalty for retaining personal data too is a much greater risk or potential cost than developing a compliant, data-minimal system. A mature data strategy prioritises de-identification and deletion over retention, significantly reducing the organisation’s regulatory and financial exposure.

Data Type	Legal Status	Retention Requirement	Effect on AI Systems
Raw Personal Data (PII)	Personal data under the GDPR	Must be deleted as soon as the training purpose ends (Article 5(1)(e))	Limits availability for retraining; requires technical deletion pipelines; increases compliance complexity if data spans multiple systems
Pseudonymised Data	Still personal data under the GDPR	Same as raw personal data; cannot retain for 10-year audit	Provides limited utility for internal processing, but retention beyond purpose is legally risky; still triggers Data Subject Requests and fines if not deleted
Irreversibly Anonymised Data	Non-personal data (Recital 26)	Can be retained indefinitely	Supports long-term model auditing, retraining, bias checks, and the EU AI Act traceability; safe to store for 10-year audit requirements
Metadata / Technical Documentation	Non-personal data	Retention required up to 10 years under the EU AI Act (Articles 10, 18)	Supports HRAS compliance; ensures traceability without exposing personal data; must be designed to avoid inclusion of PII
System Logs	Non-personal / anonymized	Retention period must be “appropriate,” often aligned with the EU AI Act 10-year audit	Enables audit and monitoring; must be anonymized to avoid GDPR violations; operational impact includes storage and secure access management

Strategic Recommendations

The regulatory landscape governing AI development in the EU is defined by a critical tension:

the immediate obligation to protect individual privacy (GDPR) and
the extended obligation to ensure system safety and traceability (EU AI Act).

Compliant data management requires recognizing the GDPR’s Storage Limitation Principle as the absolute constraint on personal data retention. This is regardless of the EU AI Act’s documentation timelines. The solution is architectural separation, where raw personal data is subject to automated deletion, and the audit trail is constructed exclusively from non-personal, irreversibly anonymized assets.

TLDR;

Under the GDPR, personal data must be deleted once its specific purpose is fulfilled. This limits how long raw training data can be stored.
For AI developers, this means models cannot indefinitely rely on historical raw personal data. This can potentially impact retraining strategies and model evolution.

The post AI Data Retention Strategy under the GDPR and the EU AI Act: Reconciling the Regulatory Clock appeared first on TechGDPR.

How Privacy Enhancing Technologies (PETs) Can Help Organizations Stay GDPR Compliant

AJ Richter — Tue, 13 May 2025 09:22:00 +0000

Safeguarding personal information is now more important than ever. 95% of customers will not engage with companies that cannot offer adequate safeguards for their data. With data protection regulations like the General Data Protection Regulation (GDPR), organizations are under constant pressure to protect sensitive data while ensuring compliance. Privacy Enhancing Technologies (PETs) have emerged as powerful tools to achieve this balance. These technologies not only help secure personal data but also support GDPR compliance by minimizing risks and enhancing confidentiality.

But what are PETs exactly, and how can they help organizations meet GDPR standards? PETs are crucial to securing data and serve a critical role PETs in modern data privacy.

What Are Privacy Enhancing Technologies (PETs)?

Privacy Enhancing Technologies (PETs) are a set of tools and techniques designed to protect personal data throughout its lifecycle. PETs can help reduce the risk to individuals while enabling further analysis of personal data without a controller necessarily sharing it, or a processor having access to it. They aim to minimize the exposure of sensitive information while still enabling data processing. PETs can be categorized based on their primary function: minimization, confidentiality, and control.

Some of the key types of PETs are as follows:

Anonymization: This technique removes or alters personal identifiers so data cannot be traced back to an individual. Under the GDPR, true anonymization is considered irreversible; allowing the data to be stored and used without further GDPR constraints.
Pseudonymization: Unlike anonymization, pseudonymization replaces private identifiers with artificial labels. Although it is reversible under strict controls, it adds a layer of protection by decoupling personal identifiers from the dataset. It is very important to understand pseudonymized data is not the same as anonymized data.
Encryption: Encryption converts data into a coded format, accessible only with a specific decryption key. This ensures that even if the data is intercepted, it remains unreadable to unauthorized parties.
Synthetic data: This allows organizations to create artificial data that mimics real data but preserves user privacy. Synthetic data is often used in AI and machine learning as well as software testing and development.
Differential privacy: This is a mathematical concept that adds randomness or noise to data analysis, making it more difficult to identify individuals.

Confidential computing: This form of data processing prevents unauthorized access to data during computation. It is often used in cloud computing and for healthcare and financial services.
Federated learning: This machine learning approach allows multiple organizations to train algorithms collaboratively without sharing raw data, enhancing both privacy and compliance.
Trusted execution environments: Secure hardware or software environments within a system that provide an isolated area of execution of sensitive operations and protect code and data from external tampering.

By using these technologies, organizations can significantly reduce the risk of data breaches and support GDPR’s core principles. PETs help to ensure that an individual’s data is better protected to avoid any potential data breaches or misuse of data.

GDPR Principles Supported by PETs

The GDPR is built around principles that prioritize data protection at every stage of processing. PETs offer a practical path to compliance by reinforcing these key principles.

The key GDPR Principles can be reinforced through the usage of PETs:

Data Minimization (Article 5): PETs like anonymization and pseudonymization ensure that only necessary personal data is processed, reducing exposure. Techniques like differential privacy also enable organizations to analyze data sets without exposing individual identities, aligning with GDPR’s minimization principle.
Integrity and Confidentiality (Article 5): Technologies such as encryption protect data against unauthorized access, maintaining its confidentiality and integrity. Homomorphic encryption, for instance, allows for computations on encrypted data without revealing its contents, offering enhanced protection.
Technical and Organizational Measures (Article 25): Implementing PETs as part of system design supports privacy by design, a core requirement of the GDPR. This includes pseudonymizing or encrypting data by default, ensuring that privacy safeguards are active even before processing begins.

Organizations can further strengthen their compliance by incorporating PETs into Data Protection Impact Assessments (DPIAs), identifying and addressing potential risks before processing begins. DPIAs help document how PETs mitigate risks by offering a transparent view of data processing activities.

PETs and International Data Transfers

Cross-border data transfers are a major concern under the GDPR, especially after the Schrems II ruling. PETs help address these challenges by adding layers of security to data during transit. Technologies like encryption and federated learning ensure that sensitive information remains protected even during international exchanges. PETs act as supplementary measures to meet the GDPR Chapter 5 (Art 44-50) requirements, reducing risks during cross-border transfers and maintaining compliance with European standards.

Some examples of how PETs can help mitigate this include federated learning that allows for machine learning models to be trained across multiple locations without sharing raw data. This reduces exposure and facilitates compliance with strict European data protection laws. Encryption helps to further ensure that even if data is intercepted during transfer, it remains unreadable without the right decryption keys.

Real-World Applications of PETs

PETs are already being used across various industries to maintain privacy and GDPR compliance.

Here are some of core examples of PET usage:

Healthcare: Differential privacy allows hospitals to share patient data for research while protecting confidentiality.
Technology: Companies like Google and Apple use federated learning to improve their services without centralizing user data. Apple also uses differential privacy.
Finance: Secure computation enables financial institutions to analyze sensitive data while maintaining strict confidentiality.

Implementing PETs requires careful planning and collaboration across IT, legal, and privacy teams. Legal ambiguities around anonymization, integration with legacy systems, and the complexity of deployment can pose challenges. However, conducting DPIAs, aligning strategies with GDPR Article 32, and ongoing training for staff help smooth the integration process. Regular audits and collaborative cross-functional efforts also contribute to effective implementation.

PETs as a Strategic Enabler for GDPR Compliance

Privacy Enhancing Technologies are not just compliance tools; they are strategic assets that enable secure, responsible data processing. For organizations striving to meet GDPR standards, PETs offer a practical path to data minimization, enhanced confidentiality, and secure international transfers.

Implementing PETs as part of your data privacy strategy not only reduces compliance risks but also fosters trust with clients and partners. By embracing these technologies, businesses can navigate the complexities of GDPR with confidence and accountability.

The post How Privacy Enhancing Technologies (PETs) Can Help Organizations Stay GDPR Compliant appeared first on TechGDPR.

Self-Hosting AI: For Privacy, Compliance, and Cost Efficiency

AJ Richter — Wed, 12 Mar 2025 11:12:08 +0000

Self-hosting AI models is the future of privacy and compliance. By hosting AI models on personal hardware, individuals and businesses can improve data security while meeting strict regulations like the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA). Most people use hosted artificial intelligence (AI) services such as ChatGPT by OpenAI or Gemini by Google. These are known as cloud-based AI models and the computation is done on servers operated by the AI providers. Self hosting your AI means that you are the controller of all of the data. Unlike cloud-based AI services, self-hosting ensures that all data remains within the user’s direct control. This significantly reduces the risks of unauthorized access, data breaches, and non-compliance with regulatory frameworks.

What does self-hosting an AI model mean?

To be explicit: if one self hosts AI models, it occurs directly on the hardware they own (i.e. one can run Ollama on their laptop). This control allows for enhanced privacy and security. Arguably, if you host an AI model on your device, there is no need for the data to ever leave your device. Therefore, the risk of data breaches or unauthorized access decreases drastically. If one hosts an AI directly on their device, the data does not need to travel far distance. This means the latency is decreased and one receives a faster response (this aspect of speed is hardware dependent). Latency can best be understood as how much time passes between when a question is asked to an AI model and when a response is received.

Most modern computers can run smaller AI models with no issue, but larger models tend to be more resource intensive. There are many resources available that allow one to examine the free open-source models and the hardware compatibility. The benefits to using an open source model can be greater privacy and transparency. The decreased latency also allows for reduced risks of data breaches and a better level of compliance if processing sensitive data using AI models.

Why and how to invest in self-hosting AI models?

To run usable AI models, hardware plays a crucial role. Self-hosting AI models require a graphical processing unit (GPU) for optimal performance, as running AI solely on a central processing unit (CPU) leads to slower computations and, as aforementioned, higher latency.

What are the key benefits of self-hosting AI models:

Improved Performance: GPUs significantly enhance processing speed, allowing AI models to generate responses faster.
Cost Savings Over Time: While the initial investment in hardware may be high, self-hosting eliminates recurring cloud subscription fees—leading to long-term financial benefits.
Data Control & Privacy: Self-hosting removes dependence on third-party cloud providers, ensuring full control over sensitive data.
Regulatory Compliance: Self-hosting reduces the risk of breaches and helps meet strict regulations like the GDPR and the HIPAA.
Avoids External Policy Changes: Cloud-based AI providers frequently update pricing models, governance rules, and data policies. Self-hosting AI models provide stability and predictability in data management.
Eliminates Token Costs: Using AI services from major providers (e.g., OpenAI, Google) requires purchasing tokens, making usage costs unpredictable. Self-hosting avoids reliance on fluctuating pricing. As demonstrated in the included chart, these prices are ever fluctuating and the cost of using AI that is not self-hosted is that one is at the whim of the cost dictated by the service provider.

Fluctuating AI Token Costs

By investing in local AI infrastructure, businesses and individuals regain autonomy over AI processing, ensuring cost efficiency, data privacy, and long-term stability. Investing in the hardware means that one is not at the whims of the service provider for your virtual cloud instance. It allows for complete control over the data and for an eventual decrease in the amount of money self-hosting AI costs.

How can using self-hosting AI help with regulatory compliance?

Self-hosting AI models is a crucial step toward ensuring compliance with data protection regulations such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA), while also reducing reliance on big tech companies. Under Article 9 of the GDPR, sensitive personal data, such as health information, biometric data, and racial or ethnic origin, requires strict protection and cannot be processed without explicit consent or a lawful basis. By self-hosting AI models, organizations retain full control over such data, minimizing the risk of unauthorized access and third-party breaches.

Studies have shown that developing AI models within institutional boundaries, particularly in healthcare, enhances privacy and regulatory compliance. It allows for more ethical and secure AI deployment. Furthermore, reliance on centralized AI models controlled by major corporations raises concerns about monopolized access to data. This can potentially leading to biased decision-making and limited transparency. Self-hosting AI fosters greater ethical responsibility, ensuring that data governance aligns with user interests rather than corporate agendas.

Case study: Deepseek

In the beginning of 2025, there was a huge shock in the AI sphere with the introduction of DeepSeek R1. DeepSeek, a Chinese startup, was able to create and train an open sourced AI model for a fraction of the cost of its competitors. It is free to download and use. Since DeepSeek is based in China, there were growing concerns about using chat.deepseek.com or the application because of where the data is sent. However, if one is to host DeepSeek R1 the data is not sent anywhere the controller. Running DeepSeek as a self-hosted AI model is a simple and cost-effective way to explore the benefits of self-hosted AI, including privacy, performance, and cost savings.

Why is DeepSeek good for privacy?

But, do self-hosted AI models perform worse?

Short answer: No. A Swiss study showed that using a small local Deep Neural Net (DNN) alongside a remote large-scale AI model can help reduce the prediction cost by half without affecting the system’s accuracy. Essentially in 2022, Chat GPT-3 models cost $0.48 per request. The study worked by putting the input to a local hosted DNN for a response. If the response was trustworthy, the response was not forwarded to the GPT. If the output was not trustworthy, the GPT would need to compute the response. The local DNN was able to generate a correct prediction or response for 48% of the input needed and lost very little accuracy. Self-hosted AI models are able to save money for individuals. This is done by saving tokens and avoiding expensive calls with very little loss in terms of accuracy.

Why should businesses adopt self-hosting AI?

In a world where AI is increasingly intertwined with daily life, the decision to self-host AI models offers a powerful alternative to cloud-based solutions. By self-hosting AI models on personal hardware, one can improve:

Data Security: Eliminates external risks by keeping information in-house.
Regulatory Compliance: Easier to meet industry-specific privacy laws.
Cost Efficiency: Reduces long-term expenses related to cloud computing and API usage.
Customization & Flexibility: Empowers users to fine-tune models to their specific needs, ensuring greater transparency and understanding of how AI systems operate.
Improved Performance: Faster response times and reduced latency lead to better user experiences.

With advancements in open-source models like DeepSeek R1, running self-hosted AI models is more accessible than ever. This allows users to benefit from high-performance models without sacrificing privacy or autonomy. As AI continues to evolve, self-hosting AI models stands as a viable and increasingly necessary choice for those who prioritize control, security, and ethical responsibility in their AI usage.

The post Self-Hosting AI: For Privacy, Compliance, and Cost Efficiency appeared first on TechGDPR.

Why should software developers care about GDPR compliance?

AJ Richter — Wed, 14 Feb 2024 14:27:29 +0000

Software developers often view ensuring GDPR compliance as blocker . As they are left trying to figure out what personal data is and how to maintain compliance. In a recent study by Alhazmi and Arachchilage, software developers cite multiple reasons that make approaching GDPR compliance tricky. Some reasons listed include a lack of clear best implementation practices, a lack of familiarity with the legislation and a lack of guidance. Understanding what to look for and what to prioritize likely constitutes the 1st hurdle. There are many reasons why software developers should acknowledge privacy and ensure regulatory compliance such as GDPR compliance. Software developers play a key role in ensuring GDPR compliance.

GDPR compliance as a market differentiator

Companies serious about GDPR compliance understand its role in maintaining their market position. Those who are proactive are quicker at placing themselves on a purchaser’s list of adequate suppliers. When processing data from people in Europe, the GDPR applies. It forces an organization to implement measures and maintain records of compliance. Even if an organization is not currently processing that data, building in regulatory compliance early supports future collaborations and partnerships with larger organizations and ensures the trust of product users.

Regardless of whether a software developer operates in a B2C, B2B or B2B2C context is irrelevant. The processing of personal data anywhere on that chain of services needs to comply with GDPR requirements. Thus achieving and maintaining compliance allows an organisation to be a supplier that implementing clients consider. For instance, a software developer for a small start up is able to integrate fundamental privacy by design and default principles in their design. This includes practices such as implementing end-to-end security, hashing, and other cryptographic measures.

Transparency makes the product more competitive if it is to be implemented through partnerships or sold as a SaaS. Procurement negotiations might still bring up specific questions and feature requests to be added to the agreements your organization signs as a vendor. By prioritizing compliance, any solution developed is more likely to remain on the list of suppliers worth considering especially if the negotiation deals with business in the EU. Implementing privacy preserving design features allows an organization the competitive edge of transparency.

Major fines

Tech giants, Facebook, Google and Amazon, regularly face severe fines for non compliance. These fines are essentially caused by deliberate ambiguity in their data processing and the fulfillment of their transparency requirements. Worse, they disregard their data controller obligations and get fined for a combination of hidden processing practices and implemented dark patterns. In May 2023, Meta, was hit with a 1.3 billion euro fine for lack of GDPR compliance. This is the largest fine to date. Amazon was fined for 746 million in 2021 for lack of user consent collection when advertising. When companies get fined, several factors come into play. This could potentially include their willingness to cooperate and implement corrective actions. However, a constant factor includes lack of transparency, misleading patterns and a lack of legitimization of processing.

However, most businesses are small-to-medium-sized enterprises (SMEs). This term is technically defined by the European Commission as a company with less than 250 employees. For an SME, GDPR compliance is harder to achieve due to proportionally reduced resources or access to expertise. Therefore, if an SME is able to achieve compliance, they recover the competitive advantage over larger players lost on operational costs. Tech giants are consistently pressured to maintain compliance due to their increased visibility. Therefore, compliance, when managed efficiently, is a defining competitive advantage for smaller companies.

GDPR compliance as a political or social issue

When tech-savvy individuals go online, they tend to protect their own privacy by using strong passwords. Some examples of this includes increasingly using MFA where available or using pseudonyms and single use email addresses where possible. With the help of a few high profile breaches and updates to app marketplace practices and communication strategies, the average user has become more aware of the online privacy risks. Software developers tend to implement best security practices in their own use of software and apps. As a result, they are particularly best suited to understand the need for security. They are also specifically instructed to implement strong security practices and privacy design patterns such as content security policies for websites. As creators of technology, software developers have an ethical responsibility to protect the privacy of individuals and empower them to use their software or services more privately.

Through implementing best design practices such as the minimization of cookies, the forced use of MFA, the encryption of user data, a privacy by default approach to design, designers create privacy-preserving environments. While the expectation might be that less tech-savvy individuals are likely to show relative indifference about their own privacy, one study entitled Caring is not enough: the importance of Internet skills for online privacy protection, argues that even if people do care they also need to be educated on how to protect their own privacy. It is not uncommon to feel helpless protecting one’s own data or safely using the internet. Typically, a lot of the burden for security falls, wrongfully, on the individual.

Should the average user be expected to know how to make use of encryption to feel safe online?

For many, cookie banners are annoying interfaces, easily brushed away by clicking the “Accept all” button. Configuring a cookie banner to not set non-essential cookies by default, makes the organization compliant on that requirement. It also provides users with a choice. Amongst other principles, privacy by default also requires the developer to ensure the most private settings are set by default. Software designers, familiar with ePrivacy requirements, are able to notify the marketing team that silent opt-ins is illegal in the EU. This allows the organization to engage in discussions as to whether to design for compliance or to accept the risk. In accepting the risk, an organisation increasing user distrust for the benefit of tracking, profiling and advertising KPIs.

As digitization continues, there is a pervasive use of selling user data or mishandling personal information in the tech field. This trend occurs without much regard to the significance of this action. This has become regretfully normalized even though it is against the GDPR. This is likely due partially to many companies solely operating within the US. At the moment, the US does not have a federal governing law similar to the GDPR. Regardless, this precedent is pervasive.

People should have the right to use and access the internet and software related tools/services without being seen as a commodity. Through the use of tracking elements and abuse of consumer metrics, individuals are becoming commodified and sold as such. This should not be the case where individuals can be so easily manipulated and tracked through their actions online. When software developers prioritize GDPR compliance, they are able to help prevent the commodification of individuals by their company.

GDPR compliance in software development as an intellectual challenge

It is easy to do things in a non secure manner. It would be easier to access one’s phone to text people if one didn’t have a password, but most individuals likely have a password on their phone to protect from strangers accessing the content on their device. Therefore, the easiest solution is not always the best solution. This stems from the common dilemma of convenience versus privacy that one is confronted with daily. Instead of seeing this as an issue, one should frame it a challenge. If one views compliance as an intellectual challenge of how to protect others, the issue becomes more intriguing and fun to solve. An issue bears the connotation of an obligation or nuisance.

Individuals are motivated to do things either intrinsically or extrinsically. When a supervisor informs a developer that they must make the system compliant with the GDPR, that would be the definition of an extrinsic motivator as it is external; however, intrinsic motivation is a powerful and compelling motivator. Due to intrinsic motivation, this is part of the reason as to why computer games are fun to learn.

An intellectual challenge has a better and more enthralling connotation. This idea has been theorized since the 1950s and academics have postulated through research that intrinsic motivation is correlated with how challenging the activity is. Considering those who have a background in computer science are confronted with technical issues and problems to solve all the time, compliance is best viewed as an intellectual challenge to avoid the easiest solution but create the most secure solution.

Concluding thoughts

Compliance is the law. As a software developer, one will likely need to work to implement or maintain compliance with the GDPR. It is easy to see it as a tedious endeavor handed down to a higher up, who might not necessarily understand the ramifications of the technical assignment they are bestowing. Instead, one should view the GDPR through an intrinsically motivated lens as an intellectual challenge to protect the rights of individuals. There are other reasons as to why as a software developer one should care about the GDPR. This includes but is not limited to securing contracts and helping others with less knowledge of proper internet privacy practices.

The joy of the internet and technology should be able to benefit and be enjoyed by all individuals. Any individual regardless of their technical background and without the fear of loss of rights. The question should not be: “does one engage with technology and in doing so give up their right to privacy?” but rather the burden should fall less on the technically ignorant users and be built into technology inherently.

If you are interested in taking your GDPR knowledge to the next level, dive into TechGDPR’s specialized training for developers. This course is designed to equip you with the skills and understanding needed to navigate GDPR compliance within your projects. It will help you ensure your software is up to standard and gain a competitive edge. Discover more and enroll today at GDPR for Developers – Online Course.

The post Why should software developers care about GDPR compliance? appeared first on TechGDPR.