Work

Old professional webpage here (≤2014).


Highlights


  • Levin Tree Search with Context Models (pdf_long) describes a machine learning + search algorithm for deterministic search problems. By contrast to neural networks, we use algorithms and ideas from the online learning / compression literature which ensure that the loss function is convex. This also makes inference very fast (single cpu), though training still requires significant compute. Convexity is good for convergence guarantees and reproducibility, but more importantly it makes it clear what the model can learn, and what it cannot. Distinguished paper award at IJCAI 2023. [source code and more]

  • Isotuning With Applications To Scale-Free Online Learning (pdf_long) extends previous works on designing adaptive scale-free online-learning algorithm where the only parameter to set is a multiplicative constant. It develops the theory of three central ideas (isotuning, online correction, null updates) that are applied to various algorithms: FTRL, Mirror-Descent, Prod, AdaptMLProd, BOA, (Ada)Hedge, Soft-Bayes (arXiv 2021)

  • Policy-Guided Heuristic Search with Guarantees (pdf_long) describes PHS algorithm, which is an extension of Levin Tree Search (see below) to also learn and make use of a heuristic function. We provide search-cost  guarantees that relate to the quality of the policy and of the heuristic. Furthermore, if the heuristic is PHS-admissible, the guarantee is at least as good as the LTS guarantee that doesn't use a heuristic. [AAAI 2021] [github]

  • Logarithmic Pruning is All You Need (pdf_long)

    The strong Lottery Ticket Hypothesis, proven by Malach et al. with polynomial bounds, says that every sufficiently overparameterized network contains a subnetwork that, even without training, achieves comparable accuracy to the trained large network. This theorem, however, relies on a number of strong assumptions and guarantees a polynomial factor on the size of the large network compared to the target function. We remove the most limiting assumptions of this previous work while providing significantly tighter bounds: the overparameterized network only needs a logarithmic factor (in all variables but depth) number of neurons per weight of the target subnetwork. [NeurIPS 2020]


  • Iterative Budgeted Exponential Search (pdf_long) received a special mention at the IJCAI 2019 opening ceremony, as it was the result of merging two independent submission to IJCAI. The IBEX framework solves two long-standing problems of heuristic search: The re-expansion problem of IDA* and the re-expansion problem of A* and B/B'. The DovIBEX variant relaxes some assumptions and may even solve problems outside heuristic search. Nathan Sturtevant has some nice demos explaining the problem and solution (IBEX and others).  [IJCAI_2019] [slides]


  • Levin Tree Search (pdf_long) is a new tree/graph search algorithm that uses a policy to guide the search, and comes with strict guarantees on search time that depend on the quality of the heuristic. One can then use this guarantee as a loss function to optimize the search time. [NeurIPS_2018] [test_set]
    • Update 2021: See the PHS paper above (AAAI 2021) for a more general search-cost bound on the search cost, and a better learning procedure that uses this bound as a surrogate loss function.


  • Soft-Bayes (pdf) is the first tractable algorithm for the universal portfolio problem which has a guarantee that does not depend on the largest gradient. It requires O(N) computation steps per round for N experts, achieves O(√(NT ln N)) regret over T rounds which is independent of the largest gradient, and has restarting guarantees. [ALT_2017]
    • Update 2021: Appendix F of the Isotuning paper shows that by using the isotuning learning rate, Soft-Bayes has an improved regret bound of O(√(mT ln N)) where m is the number of 'good' predictors, without having to track them explicitly (by contrast to the 2017 paper).


  • Safely Interruptible Agents (pdf) builds upon the idea of corrigibility and Stuart Armstrong's principle of value indifference to build agents that learn to not care about being repeatedly interrupted, allowing the user/supervisor to repeatedly take control of the agent while the agent tends to indifferent to preventing it—and any opportunity cost will shift the balance toward not trying to prevent it. The concept is proven to work for both tabular MDP agents and universal intelligent agents. [UAI_2016]


  • Agents and Devices (pdf) is a fun little Bayesian experiment to define how an observer should see some object as a device or as a agent. An agent is better defined as optimizing an objective function, whereas a device is better seen as having a simple policy. [unpublished]


See my Google Scholar page for all other papers.


Awards