Uri Keich

School of Mathematics and Statistics F07
University of Sydney NSW 2006
Australia

+61 2 9351 2307
[email protected]

Research Interest

In the last few years I have been mostly working on competition-based approach to multiple testing --- an area I was introduced to through my interest in the analysis of tandem mass spectrometry data (MS/MS). Notably, this topic has recently gained significant interest in the statistics and machine learning communities when, generalizing it to sequential hypothesis testing, Barber and Candès used it, as part of their knockoff-filter.

Occasionally I still dabble in computational statistics as well.


Current and Recent Teaching:

STAT 2911 - Probability and Statistical Models (Advanced): Semester 1, 2024 (past: 2009-2023)

MATH1905 - Statistical Thinking about Data (Advanced) (past: 2019, 2021, 2022)

MATH1933 - Special Studies Program (Convolution, Fast Fourier Transform, Complexity and Numerical Precision) (past: 2022)

MSH2 - Probability (past: 2017-2019)

MATH1005 - Statistical Thinking about Data (past: 2018)

MSH8 - Statistical Methods in Bioinformatics: (past: 2009-2017)

STAT 3914 - Applied Statistics (Advanced): (past: 2010-2011, 2013-2015)

STAT 3014 - Applied Statistics (Advanced): (past: 2013-2015)

MATH 1907 - Mathematics (Special Studies Program) B: (past: 2010)


Software:

multicomp - An R package implementing false discovery rate (FDR-) control using multiple competition co-developed with and maintained by Kristen Emery (see the relevant RECOMB 2020 paper and its supplementary or the ArXiv version, all co-authored with Kristen Emery, Syamand Hasam, and Bill Noble).

R code of aFFT-C and sisFFT – aFFT-C accurately convolves two non-negative vectors (see Accurate pairwise convolutions of non-negative vectors via FFT below), and sisFFT (Accurate Small Tail Probabilities of Sums of iid Lattice-Valued Random Variables via FFT).

Python code of aFFT-C and sisFFT – Python version of above R code

dbSearchFDR - An R package for controlling the FDR in imperfect matches to an incomplete database (accompanying paper).

ALICO – alignment constrained sampling

GIMSAN – a novel tool for de novo motif finding that includes a reliable significance analysis

SADMAMA (new version 17/2/2010) – computational tool for motif scanning and for detection of significant variation in binding affinity across two sets of sequences

The FAST package – Fourier transform based Algorithms for Significance Testing of ungapped multiple alignments

csFFT/sFFT – computing the p-value of the information content (entropy score) of a sequence motif

BagFFT – computing the exact p-value of the llr statistic for multinomial goodness-of-fit test


Education:

Ph.D. in Mathematics, Courant Institute, New York University
Thesis title: Stationary Approximations to Non-Stationary Stochastic Processes.
Advisor: Prof. H . P. McKean

M.Sc. in Mathematics, Department of Mathematics, Technion - Israel Institute of Technology
Thesis title: A Generalization of the "Ahlswede Daykin Inequality".
Advisor: Prof. R. Aharoni

B.Sc. in Computer Science and Mathematics, Hebrew University of Jerusalem


Professional Experience:

2009 - present:
Associate Professor in the School of Mathematics and Statistics at the University of Sydney (Senior Lecturer 2009-2015)

2003 - 2009:
Assistant Professor at the Computer Science Department of Cornell University
2001 - 2003:
Project scientist at the Department of Computer Science and Engineering of the University of California, San Diego
1999 - 2000:
Assistant Professor at the Department of Mathematics of the University of California, Riverside
1996 - 1999:
Von Karman Instructor at the Applied Mathematics Department of the California Institute of Technology
1991 - 1996:
Research and Teaching assistant at the Courant Institute of New York University

Publications:

Solivais AJ., Boekweg H., Smith LM., Noble WS., Shortreed MR., Payne SH., Keich U. Improved detection of differentially abundant proteins through FDR-control of peptide-identity-propagation. submitted, (bioRxiv version).

Wen B., Freestone J., Riffle M., MacCoss MJ., Noble WS., Keich U. Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment. submitted, (bioRxiv version).

Lu Y., Noble WS., Keich U. A BLAST from the past: revisiting blastp's E-value. Bioinformatics, accepted (bioRxiv version).

Freestone J., Kall L., Noble WS., Keich U. Semi-supervised Learning While Controlling the FDR with an Application to Tandem Mass Spectrometry Analysis. Lecture Notes in Computer in Science (LNCS), 14758, 448-453 (Extended Abstract RECOMB 2024) (bioRxiv version).

Freestone J., Noble WS., Keich U. Analysis of tandem mass spectrometry data with CONGA: Combining Open and Narrow searches with Group-wise Analysis. Journal of Proteome Research , 23(6), 1894-1906, 2024 (paper) (bioRxiv version).

Freestone J., Noble WS., Keich U. Re-investigating the correctness of decoy-based false discovery rate control in proteomics tandem mass spectrometry. Journal of Proteome Research , 23(6), 1907-1914, 2024 (paper) (bioRxiv version).

Lin A., See D., Fondrie WE., Keich U., Noble WS. Target-decoy false discovery rate estimation using Crema. Proteomics , 24(8) 2024 (paper) (bioRxiv version).

Ebadi A., Freestone J., Noble WS., Keich U. Bridging the False Discovery Gap. Journal of Proteome Research, 22(7), 2172-2178, 2023 (paper).

Ebadi A., Luo D., Freestone J., Noble WS., Keich U. Bounding the FDP in competition-based control of the FDR. Submitted , 2023 (arXiv version).

Rajchert A. and Keich U. Controlling the False Discovery Rate via Competition: is the +1 needed? Statistics and Probability Letters, 109819, 197, 2023 (paper) (arXiv version).

Luo D., Ebadi A., Emery K., He Y., Noble WS., Keich U. Competition-based control of the false discovery proportion. Biometrics, 2023 (paper) (arXiv version).

Hasam S., Emery K., Noble WS., Keich U. A Pipeline for Peptide Detection Using Multiple Decoys. Methods Mol Biol., 2426:25-34, 2023 (Invited Chapter).

Freestone J., Short T., Noble WS., Keich U. Group-walk: a rigorous approach to group-wise false discovery rate analysis by target-decoy competition. Bioinformatics, 38(Supplement 2):ii82–ii88, 09, 2022 (ECCB paper) (bioRxiv version).

Lin A., Short T., Noble WS., Keich U. Improving peptide-level mass spectrometry analysis via double competition. Journal of Proteome Research, 21 (10): 2412–2420, 2022 (paper) (bioRxiv version).

Heil LR., Fondrie WE., McGann CD., Federation AJ., Noble WS., MacCoss MJ., Keich U. Building Spectral Libraries from Narrow-Window Data-Independent Acquisition Mass Spectrometry Data Journal of Proteome Research, 21 (6): 1382–1391, 2022 (paper) (PMC version).

Lin A., Plubell DL., Keich U., Noble WS. Accurately Assigning Peptides to Spectra When Only a Subset of Peptides Are Relevant. Journal of Proteome Research, 20 (8): 4153–4164, 2021 (paper) (PMC version).

Peres N., Lee AR., Keich U. Exactly Computing the Tail of the Poisson-Binomial Distribution. ACM Transactions on Mathematical Software, 47 (4): 1–19, 2021 (paper) (arXiv version).

Emery K. and Keich U. Controlling the FDR in variable selection via multiple knockoffs. arXiv, 1911.09442V2, 2019 (arXiv).

Emery K., Hasam S., Noble WS., Keich U. Multiple competition based FDR control and its application to peptide detection. Lecture Notes in Computer Science (LNCS, RECOMB 2020), 12074: 54-71, 2020 (paper) (arXiv version).

Keich U., Tamura K., Noble WS. Averaging Strategy To Reduce Variability in Target-Decoy Estimates of False Discovery Rate. Journal of Proteome Research, 18 (2): 585-593, 2018 (paper).

Keich U. and Noble WS. Controlling the FDR in imperfect matches to an incomplete database. Journal of the American Statistical Association, 113:523, 973-982, 2018 (paper).

Noble WS. and Keich U. Response to "Mass spectrometrists should search for all peptides, but assess only the ones they care about". Nature Methods, 29;14(7):644, 2017 (response).

Keich U. and Noble WS. Progressive calibration and averaging for tandem mass spectrometry statistical confidence estimation: Why settle for a single decoy? Lecture Notes in Computer Science (LNCS, RECOMB 2017), 10229: 99-116, 2017 (paper).

Wilson H. and Keich U. Accurate Small Tail Probabilities of Sums of iid Lattice-Valued Random Variables via FFT. Journal of Computational and Graphical Statistics, 26(1): 223-229, 2017 (paper).

Wilson H. and Keich U. Accurate pairwise convolutions of non-negative vectors via FFT. Computational Statistics & Data Analysis, 101: 300-315, 2016 (paper).

Manescu D. and Keich U. A Symmetric Length-Aware Enrichment Test. Journal of Computational Biology, 23(6):508-25, 2016 (paper).

Keich U., Kertesz-Farkas A., Noble WS. Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics. Journal of Proteome Research, 14 (8): 3148-61, 2015 (paper).

Kertesz-Farkas A., Keich U, Noble WS. Tandem Mass Spectrum Identification via Cascaded Search. Journal of Proteome Research, 14 (8): 3027-38, 2015 (paper).

Manescu D. and Keich U. A symmetric length-aware enrichment test. Best Paper Award, RECOMB 2015, LNBI 9029: 224–242, 2015 (preprint).

Keich U. and Noble WS. On the Importance of Well-Calibrated Scores for Identifying Shotgun Proteomics Spectra. Journal of Proteome Research, 14(2):1147–1160, 2015 (paper).

Tanaka E., Bailey TL., Keich U. Improving MEME via a two-tiered significance analysis. Bioinformatics, 30(14): 1965-1973, 2014 (paper).

Liachko I., Youngblood RA., Keich U., Dunham MJ. High-resolution mapping, characterization, and optimization of autonomously replicating sequences in yeast. Genome Research, 23(4):698-704, 2013 (paper) (co-corresponding author).

Liachko I., Tanaka E., Cox K., Chung SC., Yang L., Seher A., Hallas L., Cha E., Kang G., Pace H., Barrow J., Inada M., Tye BK., Keich U. Novel Features of ARS Selection in Budding Yeast Lachancea kluyveri. BMC Genomics, 12:633, 2011 (abstract).

Tanaka E., Bailey T., Grant CE., Noble WS., Keich U. Improved similarity scores for comparing motifs. Bioinformatics, 27(12):1603-9, 2011 (abstract).

Gupta N., Bandeira N., Keich U., Pevzner PA. Target-Decoy Approach and False Discovery Rate: When Things May Go Wrong. Journal of The American Society for Mass Spectrometry, Vol. 22, No. 7: 1111 - 1120, 2011(paper).

Ng P,. and Keich U. Alignment Constrained Sampling. Journal of Computational Biology, Vol. 18: No. 2, 2011 (paper).

Bhaskar A,. and Keich U. Confidently Estimating the Number of DNA Replication Origins. Statistical Applications in Genetics and Molecular Biology, Vol. 9: Iss. 1, Article 28, 2010 (paper).

Liachko I., Bhaskar A., Li C., Chung S.C.C., Tye B.K., and Keich U. A Comprehensive Genome-Wide Map of Autonomously Replicating Sequences in a Naive Genome. PLoS Genetics, May 2010 Issue. (paper).

Oliver H.F., Orsi R.H., Ponnala L., Keich U., Wang W., Sun Q., Cartinhour S.W., Filiatrault M.J., Wiedmann M., and Boor K.J. Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs. BMC Genomics, 10:641, 2009. (paper).

Nagarajan N. and Keich U. Reliability and efficiency of algorithms for computing the significance of the Mann-Whitney test. Computational Statistics, 24(4):605-622, 2009. (paper).

Ng P. and Keich U. Factoring local sequence composition in motif significance analysis. Genome Informatics, 21:15-26, 2008. (preprint).

Keich U., Gao H., Garretson JS., Bhaskar A., Liachko I., Donato J., Tye B. Computational detection of significant variation in binding affinity across two sets of sequences with application to the analysis of replication origins in yeast. BMC Bioinformatics, 9:372, 2008. (paper).

Ng P. and Keich U. GIMSAN: a Gibbs motif finder with significance analysis. Bioinformatics, 24(19):2256-7, 2008. (paper).

Keich U. and Ng P. A conservative parametric approach to motif significance analysis. Genome Informatics, 19:61-72, 2007. (preprint)

Nagarajan N. and Keich U. FAST: Fourier transform based Algorithms for Significance Testing of ungapped multiple alignments. Bioinformatics, 24(4):577-8, 2008. (paper).

Ng P., Nagarajan N., Jones N., and Keich U. Apples to apples: improving the performance of motif finders and their significance analysis in the Twilight Zone. Bioinformatics, 22(14):e393-401, ISMB 2006. (preprint)

Nagarajan N., Ng P., Keich U. Refining motif finders with E-value calculations. Proceedings of the 3rd RECOMB Satellite Workshop on Regulatory Genomics, Singapore. 73-84, 2006. (preprint)

Keich U., Nagarajan N. A fast and numerically robust method for exact multinomial goodness-of-fit test. Journal of Computational and Graphical Statistics, , 15(4):779-802, 2006. (preprint)

Nagarajan N., Jones N., and Keich U. Computing the p-value of the information content from an alignment of multiple sequences. Bioinformatics, Vol. 21, Suppl 1, i311-i318, ISMB 2005. (preprint) (Erratum)

Buhler J., Keich U., Sun Y. Designing Seeds for Similarity Search in Genomic DNA. Journal of Computer and System Sciences, Volume 70, Issue 3, May 2005, Pages 342-363. (preprint)

Keich U., and Nagarajan N. A Faster Reliable Algorithm to Estimate the p-Value of the Multinomial llr Statistic. Proceedings of the 4th International Workshop on Algorithms in Bioinformatic (WABI 2004), September 2004, Bergen, Norway. (preprint)

Keich U. sFFT: a faster accurate computation of the p-value of the entropy score. Journal of Computational Biology, Volume 12, Number 4, May 2005, Pages 416-430. (preprint)

Zhi D., Keich U., Pevzner P., Heber S., and Tang H. Correcting base-assignment errors in repeat regions of shotgun assembly. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(1):54-64, (2007). (preprint)

Keich U., Li M., Ma B., and Tromp J. On Spaced Seeds for Similarity Search. Discrete Applied Mathematics, 138(3):253--263. 2004. (preprint)

Buhler J., Keich U., Sun Y. Designing Seeds for Similarity Search in Genomic DNA. Proceedings of the Seventh Annual International Conference on Research in Computational Molecular Biology (RECOMB-2003), April 2003, Berlin, Germany. (preprint)

Eskin E., Keich U., Gelfand M.S., Pevzner P.A. Genome-Wide Analysis of Bacterial Promoter Regions. Proceedings of the Pacific Symposium on Biocomputing (PSB-2003), January 2003, Kaua'i, Hawaii. (preprint)

Keich U. and Pevzner, P.A. Finding motifs in the twilight zone. Bioinformatics, Vol. 18 (2002), Issue 10, 1374-1381. (preprint)

Keich U. and Pevzner P.A. Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics, Vol. 18 (2002), Issue 10, 1382-1390. (preprint)

Keich U. and Pevzner P.A. Finding motifs in the twilight zone. Proceedings of the Sixth Annual International Conference on Research in Computational Molecular Biology (RECOMB-2002), April 2002, Washington DC, USA, ACM Press. (preprint)

Keich U. A Stationary Tangent - the Discrete and Non-smooth Cases. Journal of Time Series Analysis, March 2003, vol. 24, no. 2, pp. 173-192(20). (preprint)

Cwikel M. and Keich U. Optimal decompositions for the K-functional for a couple of Banach lattices. Arkiv för Matematik, 39 (2001), No. 1, 27-64. (preprint)

Keich U. A Possible Definition of A Stationary Tangent. Stochastic Processes and Their Applications, 88 (2000), No. 1, 1-36. (preprint)

Keich U. Krein's Strings, the Symmetric Moment Problem, and Extending a Real Positive Definite Function., Communications on Pure and Applied Mathematics, 52 (1999), no. 10, 1315-1334. (preprint)

Keich U. On Lp Bounds for Kakeya Maximal Functions and the Minkowski Dimension in R2., Bulletin of the London Mathematical Society, 31 (1999), 213-221. (preprint)

Keich U. Absolute Continuity Between the Wiener and Stationary Gaussian Measures., Pacific Journal of Mathematics, Vol. 88 (1999), No. 1, 95-108. (preprint)

Keich U. The Entropy Distance Between the Wiener and Stationary Gaussian Measures., Pacific Journal of Mathematics, Vol. 88 (1999), No. 1, 109-128. (preprint)

Aharoni R. and Keich U. A Generalization of the Ahlswede Daykin Inequality., Discrete Mathematics , 152 (1996), 1-12. (preprint)