---
title: "grayleafspotr Workflow"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{grayleafspotr Workflow}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  echo    = TRUE,
  message = FALSE,
  warning = FALSE,
  fig.width  = 7,
  fig.height = 4.5
)
library(grayleafspotr)
```

## Abstract

Gray leaf spot (GLS) is a fungal disease of maize (*Zea mays*) caused by
*Cercospora zeae-maydis* and *Cercospora zeicola*. In laboratory settings,
fungal isolates are grown on petri dishes and photographed at successive
time points to characterise colony expansion, morphology, and structural
organisation. Manual measurement of these traits is time-consuming and
operator-dependent.

`grayleafspotr` provides a fully automated pipeline for quantitative
phenotyping of GLS colonies from time-lapse plate photographs. The package
uses a bundled SmallUNet deep-learning segmentation model to identify colonies
in each image, then extracts morphometric and texture features that describe
colony growth, shape, and internal organisation over time. The resulting tidy
data frame can be explored with included template `ggplot2` visualisations or
passed to any downstream R analysis.

Python dependencies are managed automatically through `basilisk`; no manual
Python environment setup is required.

---

## 1. Package overview

### What the package does

Given a folder of plate photographs, `grayleafspotr`:

1. **Detects the petri dish** boundary using classical circle fitting.
2. **Segments the colony** inside the dish with a SmallUNet model
   (`best_area_w_0.7.pt`).
3. **Extracts per-image features**: area (mm²), equivalent diameter,
   circularity, eccentricity, edge roughness, crack coverage, texture
   entropy, and a radial intensity profile.
4. **Returns a tidy result** object — a `grayleafspot_run` — containing a
   data frame indexed by filename and imaging day.
5. **Provides template plots** — one function per figure, all returning
   `ggplot2` objects ready for further customisation.

### Input requirements

| Requirement | Detail |
|---|---|
| Image formats | JPEG, PNG, BMP, TIFF, WEBP |
| Naming convention | Encode the day with a `d\d+` token: `*_d04_*.jpg` for day 4 |
| Plate diameter | Standard 90 mm (adjustable via `plate_diameter_mm`) |
| Folder structure | One folder per experiment; one image per plate per time point |

Example filenames and the day values they produce:

```
20260210_P001_06-FEB_WT_PCBM_SUB_d04_TOP.jpg  →  day 4
20260212_P001_06-FEB_WT_PCBM_SUB_d06_TOP.jpg  →  day 6
20260216_P001_06-FEB_WT_PCBM_SUB_d10_TOP.jpg  →  day 10
```

### Workflow at a glance

```
Plate images (folder)
        │
        ▼
grayleafspot_analyze()  /  grayleafspot_run()
        │
        ├─ Dish geometry detection  (classical Hough transform)
        ├─ SmallUNet segmentation   (best_area_w_0.7.pt)
        └─ Feature extraction       (area, shape, texture, cracks, radial)
        │
        ▼
grayleafspot_run S3 object
        │
        ├─ $results   per-image feature data frame
        └─ $run       run manifest (paths, timestamp, engine)
        │
        ▼
Template ggplot2 plots  /  tidy data  /  custom analysis
```

---

## 2. Installation

Install from Bioconductor:

```r
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("grayleafspotr")
```

Python dependencies (NumPy, OpenCV, PyTorch, scikit-image, etc.) are
installed automatically by `basilisk` the first time the analysis pipeline is
invoked. This one-time setup may take a few minutes; subsequent calls use the
cached environment instantly.

---

## 3. Bundled example data

The package ships three pre-computed example results so that all plotting and
data-wrangling functions can be explored without running the full pipeline.

### 3.1 Load the example run

```{r}
example_run <- example_grayleafspot_results()
```

### 3.2 Inspect the feature table

```{r}
example_run$results[, c("filename", "day", "area_mm2", "diameter_mm",
                         "circularity", "crack_coverage_pct", "qc_status")]
```

### 3.3 Locate bundled source images

The three plate photographs used to generate this example are also included in
the package and are accessible via `system.file()`:

```{r}
image_dir <- system.file("extdata", "testdata", "06FEB", package = "grayleafspotr")
list.files(image_dir)
```

---

## 4. Template visualisations

Every plotting function accepts a `grayleafspot_run` object and returns a
`ggplot2` object that can be customised with standard `ggplot2` calls.

### Colony expansion over time

Colony equivalent diameter (mm) plotted against imaging day.

```{r}
plot_colony_expansion(example_run)
```

### Growth rate and edge roughness

Daily growth increment (mm/day) alongside a measure of colony edge
irregularity.

```{r}
plot_growth_roughness(example_run)
```

### Crack coverage and count

Proportion of the colony mask classified as cracked tissue, alongside the
total crack count — a proxy for structural stress.

```{r}
plot_stress_remodeling(example_run)
```

### Texture organisation

Texture entropy and the centre-to-edge entropy gradient reflect internal
colony heterogeneity.

```{r}
plot_texture_organization(example_run)
```

### Shape versus stress

Eccentricity (0 = circular, 1 = elongated) plotted against crack coverage
to reveal coupling between morphology and structural remodelling.

```{r}
plot_shape_vs_stress(example_run)
```

### Feature correlation heatmap

Pearson correlations between all numeric features.

```{r}
plot_feature_heatmap(example_run)
```

### Radial intensity profile

Mean pixel intensity as a function of normalised radial distance from the
colony centre.

```{r}
plot_radial_profile(example_run)
```

---

## 5. Work with tidy data

Convert the run object to a plain data frame for any downstream workflow:

```{r}
growth_data <- as_grayleafspot_growth_data(example_run)
growth_data
```

### Custom plot

Because every template function returns a `ggplot2` object, you can layer
on additional components:

```{r}
plot_colony_expansion(example_run) +
  ggplot2::labs(
    title    = "Colony expansion — 06-FEB experiment",
    subtitle = "WT strain, PCBM substrate"
  ) +
  ggplot2::theme_classic()
```

---

## 6. Analyze your own images

### 6.1 Prepare your image folder

Place all plate photographs for one experiment in a single directory. The
filename must contain a `d\d+` token that encodes the imaging day:

```
my_experiment/
├── isolate_A_d03_rep1.jpg
├── isolate_A_d05_rep1.jpg
├── isolate_A_d07_rep1.jpg
└── ...
```

### 6.2 Run the analysis — simple entry point

`grayleafspot_run()` is the recommended entry point for most users. It
returns the raw JSON payload as a named list.

```{r, eval=FALSE}
res <- grayleafspot_run(
  input_dir  = "my_experiment",
  output_dir = "outputs",
  run_name   = "isolate_A"
)

res$results   # per-image feature table (data frame)
res$run       # run manifest
```

### 6.3 Full-featured alternative: `grayleafspot_analyze()`

`grayleafspot_analyze()` returns a `grayleafspot_run` S3 object with direct
access to the template plots:

```{r, eval=FALSE}
run <- grayleafspot_analyze(
  input_dir  = "my_experiment",
  output_dir = "outputs",
  run_name   = "isolate_A"
)

plot_colony_expansion(run)
```

> **Why are these chunks not evaluated?** They require actual plate images
> on disk and invoke the Python pipeline, which cannot run during package
> build. The executable examples in sections 3–5 above use the bundled
> pre-computed results and demonstrate the same data structures and plotting
> functions.

### 6.4 Reload saved results

Every run writes outputs to a timestamped sub-folder. Reload a previous run
without re-running the pipeline:

```{r, eval=FALSE}
run <- read_grayleafspot_results("outputs/20260427T142731Z_localunet")
plot_colony_expansion(run)
```

---

## 7. Developer note: Python override

Normal users do not need to configure Python. Developers maintaining a local
virtual environment (e.g. `rvenv_arm_311`) can bypass basilisk by setting:

```r
# ~/.Rprofile — developer use only
Sys.setenv(GRAYLEAFSPOTR_PYTHON = "/path/to/rvenv_arm_311/bin/python")
```

When `GRAYLEAFSPOTR_PYTHON` is set, the pipeline uses that interpreter
directly instead of the basilisk-managed environment.

---

## Session information

```{r}
sessionInfo()
```