Navigation

  • Home
  • Documentation

    Known Issues

    Documented issues caused by external changes (e.g. Microsoft updates) that affect Autopilot Monitor. Each entry describes the impact, what still works, and whether a workaround exists.

    2026-05-19Resolved

    Follow-up: Safe Deletion in Place — Cleanup Incident Resolved

    Following the cleanup incident from 2026-04-16, the safeguards announced at the time are now in place. There are two ways data can be removed from Autopilot Monitor today, and both always create a backup first and offer a way back.

    When sessions are deleted from the admin UI

    • Before anything is removed, the system collects a list of everything that belongs to the session — events, analysis results, software inventory, and related entries.
    • The deletion runs in steps and is safe to interrupt and resume. The same session cannot accidentally be deleted twice or come back through late agent traffic.
    • Recovery: a deletion can be reversed afterwards, either completely or only for parts of the session. Counts and summaries are corrected automatically.

    When deletions happen behind the scenes (maintenance)

    • The original incident was a maintenance cleanup. These operations now run through a guided procedure with several stop points along the way.
    • A small sample is checked first and has to be explicitly confirmed before anything else happens.
    • A full backup of all affected entries is written to disk before any delete. If the amount is much larger than expected, the procedure stops on its own.
    • The deletion then runs in two test rounds (first one entry, then ten) before the rest follows — each round is verified against the live data.
    • Recovery: the backup contains everything that was removed and can be used to restore entries one-by-one or all at once.

    What this does and does not change

    The events lost on 2026-04-16 cannot be recovered — that data is gone. What these procedures guarantee is that any future deletion has a backup and a way back, and that a single mistake cannot cause the same kind of incident again.

    No action needed on your side.

    2026-04-16Breaking Change

    Event Data Loss Due to Faulty Cleanup Operation

    Due to an unfortunately faulty cleanup operation, a significant number of events were deleted from the database. As a result, some sessions no longer have a complete or correct timeline.

    Affected sessions may show missing events, incomplete phases, or incorrect completion states. There is no way to restore the deleted data retroactively.

    Operational safeguards are being put in place to block and verify cleanup operations before execution and prevent similar incidents in the future.

    2026-04-16Info

    Transparency Note — Root Cause of Today's Event Data Loss

    I want to share a quick note with full transparency regarding an incident from today.

    If you notice missing events in older session timelines, this is the reason: during a storage migration to a new layout that improves performance, I identified some orphaned event entries from the very early days of the platform and attempted to remove them. Unfortunately, part of the filter logic was not applied as expected, and I did not catch the issue quickly enough. As a result, a larger number of historical event entries were deleted.

    The impact is mainly limited to older session timelines. The platform is designed so that events are most valuable during the enrollment itself and shortly afterwards for troubleshooting and analysis. Over time, they become less critical and are used more for reporting purposes. In addition, the default configuration already removes sessions and events after 90 days. Even so, the impact is real, and I want to be open about it.

    This happened while I was implementing a new table structure to address one of the root causes identified during Private Preview: performance limitations caused by the previous storage design. The new layout significantly improves that area, but in this case the migration also introduced this incident.

    I take this seriously. While issues like this can happen during a Private Preview, especially in a fast-moving product, I know that does not reduce the impact. At the moment, I also do not have a separate staging environment, so larger refactorings currently have to be carried out on the live system. That is not ideal, and I will put additional operational safeguards and procedures in place to reduce the risk of this happening again.

    Thank you for your understanding, your patience, and your trust. I hope this message reflects the level of transparency I want to maintain while building the platform together with early adopters.

    2026-04-16Known Issue

    Agent Changes in Progress — Possible Detection Issues

    The agent is currently undergoing active changes. During this period, detection and classification issues may occur (e.g. sessions not completing correctly, events being misclassified, or sessions being falsely classified as WhiteGlove).

    This is actively being worked on. An update will follow once everything is running correctly again.

    2026-04-06Resolved

    Delivery Optimization Data Restored via OS-Level Collection

    Autopilot Monitor now collects Delivery Optimization data directly from the OS using Get-DeliveryOptimizationStatus, bypassing the IME log entirely. This restores DO metrics (BytesFromPeers, PeerCaching %, download progress) for all devices — including those running IME ≥ 1.101.

    The new OS-level collector works alongside existing IME log parsing. If both sources provide data for the same app, the IME log path takes priority (dedup logic).

    No action needed — devices running the latest agent version automatically benefit from this change.

    2026-04-05Breaking Change

    IME 1.101.x Removes Delivery Optimization Telemetry from Logs

    Starting with IME version 1.101.x, Microsoft no longer writes Delivery Optimization (DO) telemetry data to the IME log. This is a Microsoft-side change, not an Autopilot Monitor bug.

    What changed

    • Old IME (1.99.x): Wrote [DO TEL] = {JSON with all DO stats} after every download — full peer caching metrics were available.
    • New IME (1.101.x): Only writes [Win32App DO] DO download and decryption is successfully done — no DO telemetry JSON at all.

    What the new IME still logs

    • [Win32App] Downloaded file size 37,187,491.00 — file size only
    • [Win32App DO] start creating a new download job, FileId = ... — OriginalSize in ContentInfo JSON
    • [Win32App DO] DO Job set priority is BG_JOB_PRIORITY_FOREGROUND — download mode only

    What is lost

    The detailed DO stats — BytesFromPeers, PeerCaching %, LanPeers, GroupPeers, and all other DO telemetry fields — are no longer present in the IME logs. For IME ≥ 1.101 there is currently no way to extract DO telemetry from the logs because the data is simply not written anymore.

    Impact on Autopilot Monitor

    Sessions from devices running IME ≥ 1.101 will not show Delivery Optimization metrics in the timeline. Older IME versions continue to work as before. We are monitoring whether Microsoft re-introduces this data in a future IME release.