March 6, 2026

How We Built Page-Level Analytics for Shared PDFs

When you share a PDF via email or Google Drive, you lose all visibility the moment someone downloads it. Even link-based solutions only tell you "someone opened the link." Not which pages they read, how long they spent on each one, or their scroll depth.

We built CloakShare to fix this. But page-level analytics for PDFs isn't trivial. This post walks through our approach: server-side rendering with Poppler, canvas-based viewing, and real-time engagement tracking.

The goal

  • + Know exactly which pages a viewer read
  • + Track time spent on each page (not just total time)
  • + Measure scroll depth per page
  • + Calculate completion rate
  • + Aggregate per viewer across multiple sessions

Why it's hard

PDFs are static binary files. JavaScript can't reliably read PDF internals. You can embed a PDF in an iframe, but then the browser downloads it — game over. PDF.js works but bottlenecks on large documents and gives you limited control over the viewing experience.

The solution: render each PDF page as an image server-side, serve those images through a controlled viewer, and track every interaction.

Step 1: The rendering pipeline

When a user uploads a PDF, the file goes to S3-compatible storage and a background rendering job is created with status pending.

-- Worker atomically claims the job (no duplicate rendering)
UPDATE rendering_jobs
SET status = 'rendering', claimed_at = CURRENT_TIMESTAMP
WHERE id = ? AND status = 'pending'

The WHERE status = 'pending' clause ensures only one worker processes each job. If two workers race, one gets zero rows and moves on.

Step 2: PDF to images with Poppler

We use Poppler's pdftoppm to convert each page to an image. Poppler is MIT licensed — unlike MuPDF which is AGPL and incompatible with open-source commercial use.

// Spawn Poppler process
pdftoppm -r 150 -jpeg proposal.pdf page

// Output: page-1.jpg, page-2.jpg, page-3.jpg ...

150 DPI is the sweet spot: sharp on all devices, reasonable file size. The intermediate JPEGs are then compressed with Sharp:

// Compress to WebP with Sharp
await sharp('page-1.jpg')
  .resize({ width: 1600, withoutEnlargement: true })
  .webp({ quality: 85 })
  .toFile('page-1.webp');

// Thumbnail for navigation sidebar
await sharp('page-1.jpg')
  .resize({ width: 400 })
  .webp({ quality: 80 })
  .toFile('thumb-1.webp');

// ~400KB JPEG → ~80KB WebP per page

Rendered images are uploaded to storage under renders/{linkId}/page-{n}.webp. An SSE endpoint streams rendering progress to the client in real-time.

Step 3: The canvas-based viewer

The viewer is a custom Web Component built in vanilla TypeScript. 7KB gzipped. We chose Canvas over PDF.js or iframes for three reasons:

  • + No downloads — images are drawn to Canvas, not served as attachable files
  • + Watermark security — overlay is composited per frame, can't be removed via DevTools
  • + Full control — we own the interaction loop, making analytics trivial

Shadow DOM encapsulation prevents the host page's CSS or JS from interfering with the viewer. Pages are lazy-loaded — only the visible page and its neighbors are decoded.

Step 4: Tracking engagement

When a page becomes visible, an IntersectionObserver starts a timer. It tracks:

  • + Page number
  • + Duration (seconds the page was in viewport)
  • + Scroll depth (what % of the page was visible)
  • + Device type (desktop, mobile, tablet)

On page change, tab blur, or navigation away, the viewer POSTs metrics to POST /v1/viewer/:token/track:

// Viewer sends per-page metrics
{
  "page": 3,
  "duration": 45,
  "scrollDepth": 0.92,
  "email": "viewer@company.com",
  "device": "desktop"
}

Step 5: Server-side aggregation

The server stores per-page engagement in the views table:

// views.pageDetails (JSON column)
[
  {"page": 1, "seconds": 12, "scrollDepth": 1.0},
  {"page": 2, "seconds": 45, "scrollDepth": 0.92},
  {"page": 3, "seconds": 8,  "scrollDepth": 0.15}
]

// Completion rate = pages with 3+ seconds / total pages
completionRate = engagedPages / totalPages

We use a 3-second threshold for "engagement." Scrolling past a page takes 1-2 seconds, so anything under 3 seconds is likely a skim, not a read.

The analytics API

GET /v1/links/:id/analytics

{
  "total_views": 47,
  "unique_viewers": 12,
  "avg_completion_rate": 0.82,
  "viewers": [
    {
      "email": "alice@acme.com",
      "views": 3,
      "completion_rate": 0.95,
      "avg_duration_seconds": 420,
      "pages_viewed": 19,
      "device": "desktop"
    },
    {
      "email": "bob@acme.com",
      "views": 1,
      "completion_rate": 0.35,
      "avg_duration_seconds": 89,
      "pages_viewed": 7,
      "device": "mobile"
    }
  ]
}

What this enables

  • + Sales: "Alice read 19/20 slides. Bob skipped pricing. He needs a custom demo."
  • + Fundraising: "Partner X viewed twice, 40 minutes total. Partner Y abandoned at page 3."
  • + Education: "Maria completed 100%. Tom only read 25%. He needs support."
  • + Content: "Pages 6-8 get skipped by 80% of readers. Move the important stuff earlier."

Architecture decisions

Why Poppler, not MuPDF? Poppler is MIT licensed. MuPDF is AGPL, which would require open-sourcing any modifications — a dealbreaker for an MIT-licensed project.

Why WebP? 30% smaller than JPEG, 50% smaller than PNG, no perceptible quality loss at 85%. For a 20-page deck, that's ~1.6 MB instead of ~8 MB.

Why Canvas, not PDF.js? PDF.js gives you a viewer but you lose control. Canvas lets us composite watermarks per frame and own the entire interaction loop for analytics.

Why 3-second threshold? Scrolling past a page takes 1-2 seconds. Below 3 seconds is noise. Above 3 seconds means the viewer actually stopped to read.

Try it yourself

CloakShare is open source. Explore the rendering pipeline, viewer code, and analytics queries at github.com/cloakshare/cloakshare.

Or try the hosted version at cloakshare.dev — upload a PDF and share the link. Your dashboard will show per-page engagement per viewer.