The multi-armed bandit (MAB) problem is one of the simplest instances of sequential or adaptive decision making, in which a learner needs to select options from a given set of alternatives repeatedly in an online manner. More specifically, the agent selects one option at a time, and observes a numerical (and typically noisy) reward signal providing information on the quality of that option, which informs its future selections. This thesis studies adaptive decision making under different circumstances. The first half of the thesis studies learning using pairwise comparisons. The algorithms depend on the objective of the experimenter. We study the objectives of finding the best item, and approximately ranking the given set of items. In the second half of the thesis, we study the problem of learning from user-clicks. A variety of models have been proposed to simulate user behavior on a search-engine results page, and we study learning in cold-start scenarios under two models: the dependent-click model and the position-based model. Finally, if partial prior information about the quality of items is available, we study learning in such warm-start circumstances. In these cases, our algorithm provides the experimenter means to control the exploration of the bandit algorithm. In all cases, we propose algorithms and prove theoretical guarantees about their performance. We also experimentally measure gains with respect to non-adaptive and state-of-the-art adaptive algorithms.