Automated Log and Response Data Retention

The log retention system ensures that audit logs are automatically anonymized after the configured retention period, helping maintain GDPR compliance while preserving essential historical records.

1. Who Can Manage It

Only syst. admins can configure and adjust the data retention period through configuration keys. Regular admins or end-users have no access to this setting.


2. What It Applies To

The configuration key LOG_RETENTION_PERIOD_DAYS defines how long audit logs (LogEntry objects) are kept in their original, identifiable form before being anonymized.

✔️ Impacted data:

  • Audit logs that may contain personal information (e.g. user identifiers, metadata linked to actions in the system).

Not impacted:

  • ML logs (requests, completions, responses)

  • Pipeline events

  • Task results

  • User chat history or conversations

Those elements are not deleted or anonymized by this configuration.


3. How It Works

  1. Default Retention:

    • By default, LOG_RETENTION_PERIOD_DAYS is set to 180 days.

    • After this period, audit log entries are anonymized.

  2. Anonymization Process:

    • A scheduled daily job (schedule_all_companies_anonymization) runs across all companies.

    • It triggers the anonymize_company_logs_task().

    • This calls the CompanyDataLogAnonymizationService, which anonymizes old LogEntry objects.

    • The event record is kept, but personal data is replaced with anonymous values.

  3. Important Note:

    • No data is deleted.

    • The database size does not decrease, since anonymization only replaces values without removing rows.

    • The benefit is compliance with data protection regulations (e.g. GDPR), not storage optimization.


4. Version Information

  • The anonymization mechanism has been present in the system for several releases.

  • In the Rapid Rabbit release, related fixes improved the overall reliability of scheduled tasks, ensuring anonymization runs consistently.