6 kNN smoothing

scDNA-Seq experiments are often sparsely sequenced and present inherent noise from low-coverage datasets. To mitigate this, CopyKit is capable of smoothing profiles based on the k-nearest neighbors of each single cell.

This way, CopyKit aggregates the genomic bin counts based on a cell k-nearest neighbors, followed by re-segmentation of copy number profiles.

In order to ensure optimal performane it is essential to carefully consider the number of neighbors (k) used in the smoothing process. We recommend to use conservative k values, which are below the number of cells that compose the smallest subclone, and to visually inspect and compare smoothed single cells to the original profiles

tumor <- knnSmooth(tumor)

## Finding neighbors.

## Smoothing cells using k = 4

## Running variance stabilization transformation: ft

## Smoothing outlier bins.

## Running segmentation algorithm: CBS for genome hg38

## Merging levels.

## Done.

## Replacing segment_ratios assay.

## Replacing logr assay.

## Done.

This step is recommended after the filtering process. Though we have noticed that smoothing can also rescue cells with low-depth quality. If done so, additional inspection is recommended due to the possibility of increased noise.