View AbstractAbstract: Machine learning and data science are revolutionizing how humans gain knowledge. Algorithmic tools are making breakthroughs in virtually all scientific areas including computer vision, precision medicine, and geoscience. Despite their empirical successes, machine learning methods are often poorly understood theoretically. We present a mathematical framework for unsupervised learning of data based on geometry. By considering data-dependent metrics on high-dimensional and noisy data, intrinsically low-dimensional structures in the data are revealed. Our algorithms enjoy robust performance guarantees for accuracy and parameter dependence that surpass known results in the case that the intrinsic dimension of the data is small relative to the ambient dimension. In particular, our algorithms are provably robust to large amounts of noise. Our methods of proof combine percolation theory, manifold learning, and spectral graph analysis. Beyond performance guarantees, we present efficient implementations of our algorithms that scale quasilinearly in the number of datapoints, demonstrating the applicability of our methods to the "big data" regime. The proposed algorithms are validated on a variety of synthetic and real datasets, and applications to hyperspectral data and other remotely sensed images will be discussed at length.