# PDF Timetable Conversion Workflow

Schedule Lens can convert supported timetable PDFs directly in the browser. The PDF file stays on the user's computer: the site reads the file with PDF.js, extracts text coordinates, converts the result into the standard `schedule.json` shape, and immediately loads it into the query UI.

## Recommended Website Workflow

1. Open Schedule Lens.
2. Choose **匯入 PDF 課表**.
3. Select a text-based/vector timetable PDF.
4. Wait for the browser steps to finish: read file, parse pages, build schedule, complete.
5. Use the filters normally, or choose **匯出課表 JSON** to save the converted `schedule.json`.

## Supported PDFs

This browser converter is intended for machine-generated, text-based timetable PDFs whose layout matches the HKHorazon/ClassSchedule coordinate grid.

Supported:

- Landscape timetable PDFs with selectable text.
- PDFs where each teacher's timetable appears as text objects, not a screenshot.
- Standard room-code patterns such as `MB209`, `G501`, `N302`, or similar uppercase code plus three digits.

Not supported in this version:

- Scanned PDFs or photos embedded in a PDF.
- OCR-free image-only PDFs.
- Timetables with a substantially different grid or coordinate layout.

If the browser reports that no text layer was found, run OCR first or import an existing `schedule.json`.

## Privacy Model

The conversion runs inside the browser. The PDF is not uploaded to a server, and no backend API is added for conversion.

PDF.js is served from `class-schedule/vendor/pdfjs/` so the conversion flow does not depend on a third-party CDN at runtime.

## Python Converter Fallback

The Python converter remains in `class-schedule/converter/` for development comparison, regression checks, and advanced troubleshooting. It is not the normal user workflow.

Use it only when you need to compare browser parser output against the original Python extraction pipeline:

```bash
cd class-schedule/converter
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
python run_converter.py ./samples/課表.pdf --out ./output/schedule.json
```

## Troubleshooting

### Browser says no text layer was found

The PDF is likely a scanned image or photo. This version does not include OCR. Convert it with an OCR tool first, then try again.

### Browser parses zero sessions

The PDF may be text-based but not aligned with the supported timetable grid. Confirm it is a landscape timetable using the expected day columns and period rows.

### Some sessions look wrong

Export the converted JSON and inspect the `issues` and `audit` sections. The browser parser records suspicious or ignored candidate cells to make layout problems easier to diagnose.

### Need the legacy Python path

Install `pdfplumber` through `class-schedule/converter/requirements.txt` and run `run_converter.py`. The Python path is a fallback for developers, not a requirement for normal website users.
