Monday

08:50 Welcome
Monday
10 July 2023
	Dave Woods (University of Southampton)

09:00 Discrete choice experiments (Chair: Martina Vandebroek, KU Leuven)
Design Replication In Partial-Profile Choice Experiments [slides]	Heiko Großmann (Universität Magdeburg )
On The Impact Of Decision Rule Assumptions In Experimental Designs On Preference Recovery [slides]	Sander van Cranenburgh (TU Delft)
Optimal Designs For Discrete Choice Models Via Graph Laplacians	Frank Röttger (Université de Genève)
Bayesian $D$- And $I$-Optimal Designs For Choice Experiments Involving Mixtures And Process Variables [slides]	Mario Becerra (KU Leuven)

10:30 Coffee break

11:00 Optimal subsampling (Chair: Selin Ahipasaoglu, University of Southampton)
Treess: A Model-Free Tree-Based Subdata Selection Method For Prediction	John Stufken (George Mason University)
Scale-Invariant Optimal Sampling And Variable Selection With Rare-Events Data [slides]	HaiYing Wang (University of Connecticut)
Subsampling In Large Graphs	Ping Ma (University of Georgia)
Efficient Subsampling For Exponential Family Models [slides]	Subhadra Dasgupta (Ruhr-Universität Bochum)

12:30 Lunch break

14:00 Subsampling, dynamic and non-linear models (Chair: Anatoly Zhigljavsky, University of Cardiff)
Optimal Subsampling Design For Polynomial Regression In One Covariate [slides]	Torsten Reuter (Universität Magdeburg )
Model Robust Subsampling Approach For Generalised Linear Models In Big Data Settings	Amalan Mahendran (Queensland University of Technology)
Adaptive And Robust Experimental Design For Linear Dynamic Models Using The Kalman Filter [slides]	Arno Strouwen
Optimum Design For Ill-Conditioned Models: $K$–Optimality And Stable Parameterizations [slides]	Anthony Atkinson (London School of Economics)

15:30 Coffee break

16:00 Clinical trials I (Chair: Chiara Tommasi, Università degli Studi di Milano)
Optimal Design For Inference On The Threshold Of A Biomarker	Rosamarie Frieri (Università di Bologna)
Posterior Alternatives With Informative Early Stopping	Nancy Flournoy (University of Missouri)
Design And Inference For Enrichment Trials With A Continuous Biomarker	Bill Rosenberger (George Mason University)
About $C$- And $D$-Optimal Dose-Finding Designs For Bivariate Outcomes [slides]	Frank Miller (Linköping and Stockholm Universities )

17:30 mODa 14
	Martina Vandebroek & Peter Goos

17.45 Finish of Day 1

Tuesday

09:00 Topics in randomisation and optimal design (Chair: Alan Vazquez, University of Arkansas)
Tuesday
11 July 2023
Valid Restricted Randomization For Small Experiments [slides]	Rosemary Bailey (University of St Andrews)
$D$-Optimal And Nearly $D$-Optimal Exact Designs For Binary Response On The Ball	Martin Radloff (Universität Magdeburg )
A Methodology To Augment Designs	Carlos de la Calle Arroyo (Universidad de Navarra)
D-Optimal Two-Level Designs For Linear Main-Effects Models: What If N Is Not A Multiple Of 4?	Peter Goos (KU Leuven)

10:30 Coffee break

11:00 Optimal Sensor Location for Spatiotemporal Processes and Networks (Chair: Christine Müller, Technische Universität Dortmund)
$A$-Optimal Designs For State Estimation In Networks [slides]	Kirsten Schorning (Technische Universität Dortmund)
Optimal Experimental Design For Infinite-Dimensional Bayesian Inverse Problems Under Model Uncertainty	Alen Alexanderian (North Carolina State University)
Bayesian Optimal Experimental Design Using Transport Maps	Karina Koval (Universität Heidelberg)
Convex Relaxation For Sensor Selection In The Presence Of Correlated Measurement Noise	Dariusz Uciński (University of Zielona Góra )

12:30 Lunch break

14:00 posters
New Insights Into Adaptive Enrichment Designs	Rosamarie Frieri (Università di Bologna)
Design Problems In Statistical Ecology	Linda Haines (University of Cape Town)
Adopting Tolerance Region For The Calibration Problem	Camelia Trandafir (Universidad Pública de Navarra)
Estimation Of The Emax Model: Maximum Likelihood, Correction And Design	Caterina May (Università del Piemonte Orientale)
Optimal Designs For Nonlinear State-Space Models With Applications Tin Chemical Manufacturing	Dasha Semochkina (University of Southampton)
Sequential Optimal Planning Of Response Surface Experiments	Olga Egorova (King's College London)
How To Determine The Minimal Sample Size In Balanced 3-Way Anova Models Where No Exact $F$-Test Exists [poster]	Bernhard Spangl (Universität für Bodenkultur Wien)
On The Polytope Of Optimal Approximate Designs	Lenka Filova (Comenius University Bratislava )
Designing Experiments On Networks	Vasiliki Koutra (King's College London)
A Randomized Exchange Algorithm For Problems With General Atomic Information Matrices	Pál Somogyi (Comenius University Bratislava )
Optimal Allocation Of Time Points For Consensus Emergence Models	Yvette Baurne (Lunds Univeritet)
Sequential Bayesian Design Using A Laplace-Parameterised Policy	Emma Rowlinson (University of Manchester)
Hierarchical Experiments And Likelihood Approximations	Theodora Nearchou (University of Southampton)
Bayesian Sequential Multi-Arm Multi-Stage Design With Time Trend Effect	Ziyan Wang (University of Southampton)

15:30 Coffee break

16:00 Kernel methods (Chair: David Ginsbourger, University of Bern)
Some Memories Of Jerry Sacks	Henry Wynn (London School of Economics)
OLSE And BLUE For The Location Model And Energy Minimization	Anatoly Zhigljavsky (University of Cardiff)
Kernel Relaxation For Space-Filling Design [slides]	Luc Pronzato (Université Côte d'Azur)
Prediction In Regression Models With Continuous Observations [slides]	Andrey Pepelyshev (University of Cardiff)
Energy-Based Sampling For PSD-Matrix Approximation	Matthew Hutchings (University of Cardiff)

17.45 Finish of Day 2

Wednesday

09:00 Orthogonal minimally aliased response surface and order-of-addition designs (Chair: Peter Goos, KU Leuven)
Wednesday
12 July 2023
Constructing Large OMARS Designs By Concatenating Definitive Screening Designs [slides]	Alan Vazquez (University of Arkansas)
Multi-Criteria Evaluation And Selection Of Experimental Designs From A Catalog	José Núñez Ares (KU Leuven)
Order-Of-Addition Orthogonal Arrays To Study The Effect Of Treatment Ordering [slides]	Eric Schoen (KU Leuven)
Symmetric Order-Of-Addition Experiments [slides]	Nicholas Rios (George Mason University)

10:30 Coffee break

11:00 Active learning (Chair: Jesús López-Fidalgo, Universidad de Navarra)
Implementation Strategies For Model-Robust Designs And Active Learning	Xiaojian Xu (Brock University)
Accounting For Outliers In Optimal Subsampling Methods [slides]	Laura Deldossi (Università Cattolica del Sacro Cuore)
Deriving Nearly Optimal Subdata	Min Yang (University of Illinois Chicago)
A Covariate Distribution-Based Optimality Criterion For Subdata Selection	Álvaro Cía Mina (Universidad de Navarra)

12:30 Lunch break (packed lunch)

13:30 Excursion to Winchester

Thursday

09:00 Computational challenges in experimental design (Chair: Radoslav Harman, Comenius University Bratislava)
Thursday
13 July 2023
On Gaussian Process Multiple-Fold Cross-Validation [slides]	David Ginsbourger (University of Bern)
A Modified Frank–Wolfe Algorithm For The Dk-Optimal Design Problem [slides]	Selin Ahipasaoglu (University of Southampton)
Mixed-Integer Linear Programming For Computing Optimal Designs	Radoslav Harman (Comenius University Bratislava)
A Convex Approach To Optimum Design Of Experiments With Correlated Observations [slides]	Werner Müller (Johannes Kepler Universität Linz)

10:30 Coffee break

11:00 Design for prediction and missing data (Chair: Vasiliki Koutra, King's College London)
Optimizing The Design Of A Pharmacokinetic Trial For A Novel Tuberculosis Drug In Children Living With Or Without HIV Using Population Nonlinear Mixed-Effect Models	Andrew Hooker (Uppsala Univeritet)
Optimal Designs For Prediction In Random Coefficient Regression With One Observation Per Individual [slides]	Maryna Prus (Linköping University and Universität Hohenheim)
Sparse Polynomial Prediction	Hugo Maruri Aguilar (Queen Mary University of London)
Designing Follow Up Samples – A Comprehensive Approach To Detect MNAR Missingness Efficiently	Adetola Adediran (University of Southampton)

12:30 Lunch break

14:00 Clinical Trials II (Chair: Katrin Roth, Bayer AG)
Longitudinal Model For A Dose-Finding Study For A Rare Disease Treatment [slides]	Sergei Leonov (CSL Behring)
Optimal Relevant Subset Designs In Nonlinear Models	Adam Lane (Cincinnati Children's Hospital Medical Center)
Group Sequential Tests - Beyond Exponential Family [slides]	Sergey Tarima (Medical College of Wisconsin)
How Can We Learn About The Biomarker-Negative Subgroup In A Biomarker-Guided Trial?	Anastasia Ivanova (Univerity of North Carolina at Chapel Hill)

15:30 Coffee break

16:00 Industrial panel (Chair: Tobias Mielke, Janssen)
Challenges and pitfalls in applying optimal design theory in clinical dose finding studies [slides]	Katrin Roth (Bayer AG)
Optimal Design of Experiments in Drug Development [slides]	Alex Sverdlov (Novartis)
	Andrew Hooker (Uppsala Univeritet)
	Ralf-Dieter Hilgers (RWTH Aachen University)
	Alun Bedding (Roche)
[slides]	Sergei Leonov (CSL Behring)
	Olivier Collignon (GlaxoSmithKline)

19:30 workshop reception and dinner

Friday

09:00 Distances, networks and Bayesian design (Chair: Luc Pronzato, Université Côte d'Azur)
Friday
14 July 2023
Distance In Big Dimensions [slides]	Jon Gillard (University of Cardiff)
Sampling And Low-Rank Approximation [slides]	Bertrand Gauthier (University of Cardiff)
Design Of Experiments For Networks, And Networks For Experimental Design [slides]	Ben Parker (Brunel University London)
Bayesian Optimal Design Using Inla	Noha Youssef (The American University in Cairo)

10:30 Coffee break

11:00 Optimal designs for model discrimination (Chair: Werner Müller, Johannes Kepler Universität Linz)
Connection Between Likelihood Tests And Discrimination Designs	Chiara Tommasi (Università degli Studi di Milano)
Discrimination Between Gaussian Process Models: Active Learning And Static Constructions [slides]	Markus Hainy (Johannes Kepler Universität Linz)
Computing $T$-Optimal Designs Via Nested Semi-Infinite Programming And Two-Fold Adaptive Discretization [slides]	David Mogalle (Fraunhofer-Gesellschaft)
Construction Of Maxi-Min Efficiency Designs [slides]	Juan Rodríguez-Díaz (Universidad de Salamanca)

12:30 Lunch break

14:00 Workshop closes

Abstracts

Speaker	Affiliation	Title	Abstract
Adetola Adediran	University of Southampton	Designing Follow Up Samples – A Comprehensive Approach To Detect MNAR Missingness Efficiently	The presence of missing data may lead to bias and inefficiencies in analyses. The missing not at random (MNAR) mechanism is the most complex type of missingness. Understanding if it is present is of utmost importance as suitable adjustments must be made to correct for bias. However, it is not possible to detect MNAR based on the incomplete data and a follow up sample of some of the missing observations must be made. In this work, we explore how different designs of the follow up sample impact the power of a test for MNAR. We provide an algorithm for designing the follow up sample that significantly improves the power of this test compared with random sampling. We also explore the efficiency and robustness of our designs through simulations studies.
Selin Ahipasaoglu	University of Southampton	A Modified Frank–Wolfe Algorithm For The Dk-Optimal Design Problem	We study a first-order method to find the minimum cross-sectional area ellipsoidal cylinder containing a finite set of points. This problem arises in optimal design in statistics when one is interested in a subset of the parameters and referred to as the Dk-optimal design. We provide convex formulations of this problem and its dual, and analyze a method based on the Frank–Wolfe algorithm for their solution. Under suitable conditions on the behaviour of the method, we establish global and local convergence properties. However, difficulties may arise when a certain submatrix loses rank, and we describe a technique for dealing with this situation.
Alen Alexanderian	North Carolina State University	Optimal Experimental Design For Infinite-Dimensional Bayesian Inverse Problems Under Model Uncertainty	We consider optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs that contain secondary model uncertainties, in addition to the uncertainty in the inversion parameters. Our focus will be mainly on linear inverse problems with reducible secondary model uncertainties; these are parametric uncertainties that can be reduced through parameter inference. For such problems, we seek experimental designs that minimize the posterior uncertainty in the inversion parameters, while accounting for the uncertainty in secondary parameters. To this end, we derive a marginalized $A$-optimality criterion and develop an efficient computational approach for its optimization. We illustrate our approach for estimating an uncertain time-dependent source in a contaminant transport model with an uncertain initial state as secondary uncertainty. We will also discuss extensions to the cases of nonlinear inverse problems and irreducible modeling uncertainties.
Anthony Atkinson	London School of Economics	Optimum Design For Ill-Conditioned Models: $K$–Optimality And Stable Parameterizations	Least squares estimation of the parameters in ill-conditioned nonlinear models may lead, via multiple optima and computational inefficiency, to virtually collinear parameter estimates; model building is then problematic. A strategy suggested by Gavin Ross is to reparameterize the one-factor model; the original set of parameters being replaced by another with increased orthogonality. Ross finds such stable parameters as functions of the response at selected values of the factor, the points being chosen in an ad hoc manner for each application. In this paper we propose a systematic strategy for model reparameterization based on a carefully chosen set of points. This is illustrated with the support points of locally $K$–optimal experimental designs, to generate a set of symbolic equations that allow the construction of a transformation to a set of parameters with better orthogonality properties. Recognizing the difficulties in the generalization of the technique to complex models, we propose a related alternative approach based on first-order Taylor approximation of the model. Our approach is tested both with linear and nonlinear models. The Variance Inflation Factor and the condition number as well as the orientation and eccentricity of the parametric confidence region are used for comparisons.
Rosemary Bailey	University of St Andrews	Valid Restricted Randomization For Small Experiments	If there is no inherent blocking factor in a small experiment, it may be decided not to use blocks, in order to have more degrees of freedom for the residual and hence more power for detecting treatment differences. However, in that case, complete randomization may produce a long run of plots with the same treatment. How should that be avoided? One common suggestion is simply to discard the undesirable layout and randomize again. This introduces bias, as it makes comparisons between neighbouring plots more likely to contribute to the estimators of treatment differences. When there is a single error term in the analysis of variance, a method of randomization is called strongly valid if the expected mean square for any subset of treatment comparisons is equal to the expected mean square for error if there are no differences between treatments. Here all these mean squares are averaged over all possible outcomes of the randomization. See Gundy & Healy (1950) and Yates (1948). One way of achieving strongly valid randomization is to choose a permutation at random from a doubly-transitive permutation group. Applying a random permutation from such a group to a carefully chosen initial layout has the potential to avoid some bad patterns. See Bailey (1983) and Gundy & Healy (1950). In the context of clinical trials with only two treatments and sequential recruitment of patients, there is a second method using Hadamard matrices. See Bailey & Nelson (2003). Using this avoids the risk of large treatment imbalance if the trial is terminated early, as well as of long runs of a single treatment. Yates (1948) proposed the term restricted randomization for any valid method that does not include the all layouts. Unfortunately, Youden (1972) introduced the term constrained randomization for the same thing. His method implicitly uses resolved balanced incomplete-designs. In my talk I shall describe recent joint work with Josh Paik using this method to produce a catalogue of tables which give a method of valid randomization for small experiments with a single line of experimental units. Bailey, R. A.: Restricted randomization. Biometrika 70 (1983), 183–198. Bailey, R. A. and Nelson, P. R.: Hadamard randomization: a valid restriction of random permuted blocks. Biometrical Journal 45 (2003), 554–560. Grundy, P. M. and Healy, M. J. R.: Restricted randomization and quasi-Latin squares. Journal of the Royal Statistical Society, Series B 12 (1950), 286–291. Yates, F.: Contribution to the discussion of the paper “The validity of comparative experiments” by F. J. Anscombe. Journal of the Royal Statistical Society, Series~A 111 (1948), 204–205. Youden, W. J.: Randomization and experimentation. Technometrics 14 (1972), 13–22.
Yvette Baurne	Lunds Univeritet	Optimal Allocation Of Time Points For Consensus Emergence Models	We study the optimal allocation of time points for consensus emergence models. The models are of interest within the social sciences and the study of organisations and groups, to study how within-group variance changes over time. More specifically we consider two consensus emergence models; (i) the heterogeneous consensus emergence model and (ii) the homogeneous consensus emergence model. The models are mixed models with a three-level nested structure, where the levels correspond to groups, individuals, and repeated measurements. Model (i) is a linear mixed model with an exponential decay of the noise variance. Model (ii) is a type of growth curve model, with exponential decay at the second level. The design problem is non-trivial because of the exponential functions and interest in variance components, making the problem similar to the problem of nonlinear models - the optimal design will depend on the parameter values of interest. We formulate the problem for complete data and extend it to incorporate intermittent missing data. As a first step we assume non-informative missing data.
Mario Becerra	KU Leuven	Bayesian $D$- And $I$-Optimal Designs For Choice Experiments Involving Mixtures And Process Variables	Many products and services can be described as mixtures of ingredients. For example, the ingredients used to make a drink, like mango juice, lime juice, and blackcurrant syrup. Usually, the researchers’ interest is in one or several characteristics of the mixture. In this work, the characteristic of interest is the preference of consumers. Consumer preferences can be quantified by using discrete choice experiments in which respondents are asked to choose between sets of alternatives. Choice experiments are well-suited to collect data for quantifying preferences for mixtures of ingredients. In addition to the proportions of ingredients, the preference for a mixture may depend on characteristics other than its composition alone. For example, the ideal cocktail composition may depend on the temperature at which it is served. To cope with this kind of complication, the choice model for mixtures must be extended to deal with the additional characteristics, typically called process variables. In these cases, standard linear regression models are inestimable. Hence, dedicated models are necessary, such as Scheffé models. As experiments in general are expensive and time-consuming, efficient experimental designs are required to provide reliable statistical modeling. Two optimality metrics have usually been studied: $D$-optimality and $I$-optimality. The Bayesian version of these metrics is obtained by assigning a prior distribution to the parameters of the model and averaging over the prior. We will show and compare the properties of Bayesian $D$- and $I$-optimal designs for choice experiments with mixtures of ingredients and process variables assuming a Scheffé model for utility description.
Álvaro Cía Mina	Universidad de Navarra	A Covariate Distribution-Based Optimality Criterion For Subdata Selection	Downsizing the data volume through subsampling is a widely employed technique to efficiently compute estimators in regression models. While existing methods predominantly focus on reducing parameter estimation errors, the primary practical objective of statistical models is often to minimize prediction errors. In this study, we introduce a novel subdata selection method for linear models based on the distribution of covariates. Specifically addressing scenarios with large samples where acquiring labels for the response variable is expensive, our proposed approach is supported by theoretical justifications and aligns with standard linear optimality criteria. Sequential selection is also considered. As anticipated by the theory, our method exhibits a reduction in prediction mean squared error compared to existing methods. Through simulations, we illustrate the performance of our innovative approach and its potential to enhance prediction accuracy in linear models.
Subhadra Dasgupta	Ruhr-Universität Bochum	Efficient Subsampling For Exponential Family Models	We propose a novel two-stage subsampling algorithm based on optimal design principles. In the first stage, we use a density-based clustering algorithm to identify an approximating design space for the predictors from an initial subsample and determine an optimal approximate design on this design space. In the second stage, we use matrix distances such as the Procrustes, Frobenius, and square-root distance to define the remaining subsample, such that its points are “closest” to the support points of the optimal design. Our approach reflects the specific nature of the information matrix as a weighted sum of non-negative definite Fisher information matrices evaluated at the design points and is applicable to a large class regression models including models where the Fisher information is of rank larger than $1$.
Carlos de la Calle Arroyo	Universidad de Navarra	A Methodology To Augment Designs	Optimal experimental designs usually have too few points and often very extremal, with points lying on the boundary of the design space. It is common that the number of different points is the same as the number of parameters to be estimated for models with a single explanatory variable. This does not allow for proper model adequacy testing. Due to this and other reasons, such as constraints, particular domain practice or need for robust estimation. Optimal designs; therefore, are often used as a reference to measure how efficient are the designs used in practice. In this work, as an alternative, a methodology for design augmentation is proposed. Based on the equivalence theorem, the procedure allows to control the efficiency when adding points to a given design. The experimenter can then, starting from the optimal design, or a regulated experimental plan, add points controlling the efficiency in order to enhance the initial design to its liking. The full procedure is presented for $D$-optimality, while an analogous solution for $D_S$- and $L$-optimality is tentatively provided. Its software implementation is available within the R package `optedr`.
Laura Deldossi	Università Cattolica del Sacro Cuore	Accounting For Outliers In Optimal Subsampling Methods	Nowadays, in many different fields, massive data are available and for several reasons, it might be convenient to analyze just a subset of the data. The application of the $D$-optimality criterion can be helpful to optimally select a subsample of observations. However, it is well known that $D$-optimal support points lie on the boundary of the design space and if they go hand in hand with extreme response values, they can have a severe influence on the estimated linear model (leverage points with high influence). To overcome this problem, firstly, we propose a non- informative “exchange” procedure that enables us to select a “nearly” $D$-optimal subset of observations without high leverage values. Then, we provide an informative version of this exchange procedure, where besides high leverage points also the outliers in the responses (that are not necessarily associated to high leverage points) are avoided. This is possible because, unlike other design situations, in subsampling from big datasets the response values may be available. Finally, both the non-informative and informative selection procedures are adapted to $I$-optimality, with the goal of getting accurate predictions.
Olga Egorova	King’s College London	Sequential Optimal Planning Of Response Surface Experiments	This work explores a stage-by-stage planning of factorial experiments with multiple objectives, incorporating both the quality of the inference from the fitted model and the protection against potential model misspecification. We adapt stratum-by-stratum design search strategy and the Pareto front approach to obtain a set of optimal designs to assist the practitioners in making informed decisions.
Lenka Filova	Comenius University Bratislava	On The Polytope Of Optimal Approximate Designs	We study the problem of non-uniqueness of optimal approximate designs (OADs) in regression experiments with uncorrelated observations. We show that the set of all OADs can be typically characterized by a polytope P whose extreme points form a finite set A. We call the elements of the set A ‘atomic optimal designs’ and show that, for models with non-unique OADs, the list of all atomic optimal designs enables us to choose the OAD that best corresponds to the needs of the experimenter. For example, it can be used to construct OADs with minimal support, or OADs optimizing a secondary criterion. We illustrate the results on the k-way second-degree model without interactions.
Nancy Flournoy	University of Missouri	Posterior Alternatives With Informative Early Stopping	Prominent Bayesian scholars (e.g., Berry and Ho, 1988) argue that Bayesian philosophy permits multiple interim decisions to stop or continue a trial without adjustment or penalty. In contrast, Frequentist practice is to adjust stopping boundaries to control Type~1 error, recognizing that without adjustment it converges to one as the number of interim tests increases. It is standard Bayesian practice that when more data become available, the posterior distribution is updated with new information and the posterior becomes the prior for the next posterior analysis. Yet in experiments with informative interim stopping decisions, standard Bayesian practice is not to condition the sampling density on interim decisions that are made. The consequence is that information in the decision is lost and the likelihood is invariant to the decision. Because it is also standard Bayesian philosophy that an analysis be performed on the experiment that was actually run, and not on experiments that might have been run, we propose incorporating information about the interim decision into the post-interim-decision posterior. We explore the consequences of conditioning the sampling density on the interim decision for subsequent posterior analyses in the context of a two-stage design with an early stopping option. Our discussion avoids relying on distributions of Frequentist statistics which are fixed under the Bayesian paradigm.
Roberto Fontana	Politecnico di Torino	Design Of Experiments And Machine Learning With Application To Industrial Experiments	In the context of product innovation, there is an emerging trend to use Machine Learning (ML) models with the support of Design Of Experiments (DOE). The paper aims firstly to review the most suitable designs and ML models to use jointly in an Active Learning (AL) approach. It then reviews ALPERC, a novel AL approach, and proves the validity of this method through a case study on amorphous metallic alloys, where this algorithm is used in combination with a Random Forest model.
Rosamarie Frieri	Università di Bologna	Optimal Design For Inference On The Threshold Of A Biomarker	Enrichment designs with a continuous biomarker require the estimation of a threshold to determine the subpopulation benefitting from the treatment. This paper provides the optimal allocation for inference in a two-stage enrichment design for treatment comparisons when a continuous biomarker is suspected to affect patient response. Several design criteria, associated with different trial objectives, are optimized under balanced or Neyman allocation and under equality of the first two empirical biomarker’s moments. Moreover, we propose a new covariate-adaptive randomization procedure that converges to the optimum with the fastest available rate. Theoretical and simulation results show that this strategy improves the efficiency of a two-stage enrichment clinical trial, especially with smaller sample sizes and under heterogeneous responses.
Rosamarie Frieri	Università di Bologna	New Insights Into Adaptive Enrichment Designs	The transition towards personalized medicine is happening and the new experimental framework is raising several challenges, from a clinical, ethical, logistical, regulatory, and statistical perspective. To face these challenges, innovative study designs with increasing complexity have been proposed. In particular, adaptive enrichment designs are becoming more attractive for their flexibility. However, these procedures rely on an increasing number of parameters that are unknown at the planning stage of the clinical trial, so the study design requires particular care. This review is dedicated to adaptive enrichment studies with a focus on design aspects. While many papers deal with methods for the analysis, the sample size determination and the optimal allocation problem have been overlooked. We discuss the multiple aspects involved in adaptive enrichment designs that contribute to their advantages and disadvantages. The decision-making process of whether or not it is worth enriching should be driven by clinical and ethical considerations as well as scientific and statistical concerns.
Bertrand Gauthier	University of Cardiff	Sampling And Low-Rank Approximation	Integral operators with positive-semidefinite (PSD) kernels play a central role in the theory of reproducing kernel Hilbert spaces (RKHSs) and their applications. As an important instance, this class of operators encompasses the PSD matrices. In this talk, we will describe how trace-class integral operators with PSD kernels can be isometrically represented as potentials, or kernel embeddings of measures, in RKHSs with squared-modulus kernels. We will then discuss the connections between the approximation of such operators and the approximation of integral functionals on RKHSs with squared-modulus kernels, and present some sampling-based approximation strategies leveraging the properties of the considered framework.
Jon Gillard	University of Cardiff	Distance In Big Dimensions	The ‘gold-standard’ distance measure in multivariate statistics is the Mahalanobis distance, but this requires knowledge or computation of an inverse covariance matrix. In big dimensions to compute such an inverse is challenging and anyhow, it is likely to be ill-conditioned. In this talk we introduce and discuss two approaches of quantifying distance which avoid inversion of a covariance matrix: the so-called k-simplicial distances and k-minimal-variance distances. The family of k-simplicial distances includes the Euclidean distance, the Mahalanobis distance, Oja’s simplex distance and many others. We introduce a new family of distances which we call k-minimal-variance distances. Each of these distances are constructed using polynomials in the sample covariance matrix, with the aim of providing an alternative to the inverse covariance matrix, particularly applicable when data is degenerate. We will describe some applications of the considered distances, including outlier detection and clustering.
David Ginsbourger	University of Bern	On Gaussian Process Multiple-Fold Cross-Validation	In this talk I will give an overview of some recent results pertaining to Gaussian Process multiple-fold cross-validation. In the first part, the focus will be put on results from (arXiv:2101.03108, joint work with Cedric Schärer), where block inversion is employed to efficiently calculate multiple-fold cross-validation residuals and their covariances. I will discuss implications of the resulting cross-validation covariance structure on model diagnostics and also how its knowledge helps clarifying some connections between cross-validation-based parameter estimation and MLE. In the second part I will focus on the impact of grouping the observations, referred to as “fold design”, on the estimation of kernel hyperparameters. In particular, numerical results from a joint work with Athénaïs Gautier and Cédric Travelletti on an inverse problem from geosciences will be used to illustrate how the way of designing folds may affect the estimation of a correlation length of interest. Overall, presented results enable fast multiple-fold cross-validation, have direct consequences in GP model diagnostics, and pave the way to future work on hyperparameter fitting as well as on the promising field of goal-oriented fold design.
Peter Goos	KU Leuven	D-Optimal Two-Level Designs For Linear Main-Effects Models: What If N Is Not A Multiple Of 4?	Two-level orthogonal arrays are known to be D-optimal for main-effects models in the event the number of runs is a multiple of 4. Complete catalogs of non-isomorphic orthogonal arrays have been enumerated and investigated to identify those orthogonal arrays that minimize the aliasing between main effects and two-factor interactions and the aliasing among two-factor interactions. It turns out that there exist many non-isomorphic D-optimal designs for main-effects models in the event the number of runs is not a multiple of 4 too, and that some of these designs involve substantially less aliasing between the main effects and the two-factor interactions as well as among the two-factor interactions. In this presentation, we will discuss how we identify non-isomorphic D-optimal designs for main-effects models in the event n is one less than a multiple of 4, and investigate the differences between these designs.
Heiko Großmann	Universität Magdeburg	Design Replication In Partial-Profile Choice Experiments	For design problems in linear models where a finite group of transformations acts transitively on a finite design space, it is well known that for convex optimality criteria which are invariant under the group, optimal approximate designs can be constructed by symmetrizing a given design. The underlying ideas can also be used to address some practical issues which arise in the area of partial-profile discrete choice experiments. In these experiments, there exist potentially many qualitative factors, of which only a subset is used in each question of a choice questionnaire. Certain exact designs for these experiments possess a high efficiency but are “rigid” in the sense that they only use a small number of all possible subsets of the factors. When using such a design as the basis for a survey, where the number of potential respondents would allow several replications of the design, simply repeating the rigid base design does not seem to be advisable. In order to ameliorate this issue, we propose to use replications where each replication of the design uses a different permutation of the factors. For the rigid base designs we consider, this approach leads to replicated designs with a better coverage of the design space and higher statistical efficiency. Moreover, the replicated designs appear to be robust against efficiency losses due to non-response. We illustrate the general ideas by referring to an example from an actual choice experiment.
Linda Haines	University of Cape Town	Design Problems In Statistical Ecology	The density of animals in regions ranging from small areas such as game reserves to vast areas over which animals are threatened with extinction is of key importance in ecology and conservation. It is not generally possible to conduct a census of animals in a region of interest and recourse is therefore made to observations, such as counts and times of arrival, at chosen study sites or transects across the region. Such surveys are expensive in terms of cost and human effort and design therefore plays a crucial role in the planning process. In this poster, I will highlight design issues relating to the estimation of animal density from observations based on site visits and capture-recapture. More specifically, I will introduce two interesting examples of ecological settings in which subtle design strategies could and should be formulated.
Markus Hainy	Johannes Kepler Universität Linz	Discrimination Between Gaussian Process Models: Active Learning And Static Constructions	We consider the design and analysis of experiments to discriminate between two Gaussian process models with different covariance kernels, such as those widely used in computer experiments, kriging, sensor location and machine learning. Two frameworks are considered. First, we study sequential constructions, where successive design points are selected, either as additional points to an existing design or from the beginning of observation. We investigate criteria such as the symmetrised Kullback-Leibler divergence between the two models when some observations have already been collected and the mean squared error of both models when no prior observations are available. Furthermore, we consider static criteria, such as the familiar log-likelihood ratios and the Fréchet distance between the covariance functions of the two models. Other distance-based criteria, simpler to compute than previous ones, are introduced. These new criteria can be easily generalised in terms of design measures, so we can use the framework of approximate design and provide a necessary condition for the optimality of a design measure.
Radoslav Harman	Comenius University Bratislava	Mixed-Integer Linear Programming For Computing Optimal Designs	Because the optimal exact design problem is a discrete optimization problem, it can be solved by general methods of integer programming. One approach to speeding up the solution is to use specialized solvers, provided that a more structured formulation of the problem is known. For instance, for many design criteria a mixed-integer second-order cone programming formulation of the optimal exact design problem was proposed in Sagnol & Harman (2015). We show (see Harman & Rosa, 2023) that for some criteria this mathematical programming specialization can be taken even further. In particular, we provide a mixed-integer linear programming (MILP) formulation for optimal replication-free designs with respect to a wide class of criteria, including $A$-, $I$-, $G$- and $MV$-optimality. We also show that the MILP formulation can be extended to exact designs with replications and demonstrate some unique advantages of the MILP approach.
Andrew Hooker	Uppsala Univeritet	Optimizing The Design Of A Pharmacokinetic Trial For A Novel Tuberculosis Drug In Children Living With Or Without HIV Using Population Nonlinear Mixed-Effect Models	Pharmacokinetic (PK) studies in children are usually small and have ethical constraints due to the medical complexities of drawing blood in this special population. Often population PK models for the drug(s) of interest are available in adults, and these models can be extended to incorporate the expected deviations seen in children. As a consequence, there is increasing interest in the use of optimal design methodology to design PK sampling schemes in children that maximize information using a small sample size and limited number of sampling times per dosing period. As a case study, we use the novel tuberculosis drug delamanid (DLM), which is used for the treatment of Multidrug-resistant tuberculosis (MDR-TB). MDR-TB is resistant to the two drugs isoniazid and rifampicin, which constitute the backbone of the first line regimen for treating TB. Furthering the development complexity of DLM is the interaction the drug has to drugs for the treatment for HIV, a common co-infection of MDR-TB patients, reducing the efficacy of MDR-TB treatment. For children with both MDR-TB and HIV, we show how applications of optimal design methodology with nonlinear mixed-effect PK models can result in highly efficient and model-robust designs in these children for estimating PK parameters using a limited number of sampling measurements. Using developed population PK models based on available data from adults living with and without HIV, and limited data on children without HIV, competing designs were derived and assessed based on robustness to model uncertainty when extended to children living with HIV.
Matthew Hutchings	University of Cardiff	Energy-Based Sampling For PSD-Matrix Approximation	The low-rank approximation of large-scale matrices through column sampling is a core technique in machine learning and scientific computing. As its name suggests, this approach consists in defining a low-rank approximation of a given matrix from a subset of its columns, naturally raising questions related to the characterisation of subsets leading to accurate approximations. In practice, the column sampling problem (CSP) is made difficult by its combinatorial nature and by the cost inherent to the assessment of the approximation errors. In this talk, we will present a pseudoconvex differentiable relaxation of the CSP for positive-semidefinite (PSD) matrices. The considered relaxation is based on the isometric representation of weighted PSD matrices as potentials, significantly reducing the numerical cost related to the exploration of the underlying approximation landscape. We will describe some sampling strategies which leverage the proposed relaxation, and illustrate their behaviour on a series of examples. The considered framework can be extended to kernel-matrix and integral-operator approximation, and is intrinsically related to the optimal design of experiments in second-order random field models.
Anastasia Ivanova	Univerity of North Carolina at Chapel Hill	How Can We Learn About The Biomarker-Negative Subgroup In A Biomarker-Guided Trial?	In a clinical trial with a predefined subgroup, it is assumed that the biomarker positive subgroup has the same or higher treatment effect compared to its complement, the biomarker negative subgroup. In these trials the treatment effect is usually evaluated in the biomarker positive subgroup and in the whole population. Statistical testing of the treatment effect in the biomarker negative subgroup is usually not done since it requires a larger sample size. As a result, the new intervention can be shown effective in the overall population even though it is only effective in the biomarker positive group. What can we do to improve decision making in such trials?
Vasiliki Koutra	King’s College London	Designing Experiments On Networks	This work deals with the investigation of the role of network symmetries in design performance and the development of methods that utilise them to inform a more computationally effective design search when units are connected. The majority of real-world networks display high degrees of symmetry, meaning that networks have nontrivial automorphsim groups (within which permutation of nodes does not alter the network structure). That is, they contain a certain amount of structural redundancy. Thus, we study the role of decomposition of the network based on its symmetries in the search for an optimal design in large networks.
Karina Koval	Universität Heidelberg	Bayesian Optimal Experimental Design Using Transport Maps	Solving the Bayesian optimal experimental design (BOED) problem often involves optimizing an expectation of a utility function or optimality criterion. For Bayesian inverse problems characterized by non-Gaussian posteriors, a closed-form expression for the criterion is typically unavailable. Thus, access to a computationally efficient approximation is crucial for numerical solution of the optimal design problem. We present a flexible sample-based approach for approximating the expected utility function and solving the BOED problem that is based on transportation of measures. Our discussion is supplemented with a numerical example for optimal sensor placement.
Adam Lane	Cincinnati Children’s Hospital Medical Center	Optimal Relevant Subset Designs In Nonlinear Models	Fisher (1934) argued that certain ancillary statistics form a relevant subset, a subset of the sample space on which inference should be restricted, and showed that conditioning on such ancillary statistics reduces the dimension of the data without a loss of information. The use of ancillary statistics in post-data inference has received significant attention; however, their role in the design of experiments has not been well characterized. Ancillary statistics are unknown prior to data collection and as a result cannot be incorporated into the design a priori. Conversely, in sequential experiments the ancillary statistics based on the data from the preceding observations are known and can be used to determine the design assignment of the current observation. The main results of this work describe the benefits of incorporating ancillary statistics, specifically, the ancillary statistic that constitutes a relevant subset, into adaptive designs.
Sergei Leonov	CSL Behring	Longitudinal Model For A Dose-Finding Study For A Rare Disease Treatment	Dose-finding studies in rare diseases are faced with unique challenges including low patient numbers, limited understanding of the dose-exposure-response relationship, variability around the endpoints. In addition, patient exposure to placebo is often not feasible. To describe the disease progression for different dose groups, we introduce a longitudinal model for the change from baseline for a clinical endpoint. We build a nonlinear mixed effects model using the techniques which have become popular over the past two decades in the design and analysis of population pharmacokinetic/pharmacodynamic studies. To evaluate operating characteristics of the proposed design, we derive the Fisher information matrix and validate analytical results via simulations. Alternative considerations, such as trend analysis, are discussed as well.
Ping Ma	University of Georgia	Subsampling In Large Graphs	In the past decades, many large graphs with millions of nodes have been collected/constructed. The high computational cost and significant visualization difficulty hinder the analysis of large graphs. Researchers have developed many graph subsampling approaches to provide a rough sketch that preserves global properties. By selecting representative nodes, these graph subsampling methods can help researchers estimate the graph statistics, e.g., the number of communities, of the large graph from the subsample. However, the available subsampling methods, e.g., degree node sampler and random walk sampler, tend to leave out minority communities because nodes with high degrees are more likely to be sampled. In this talk, I present a novel subsampling method via an analog of Ricci curvature in manifolds, i.e., Ollivier Ricci curvature.
Amalan Mahendran	Queensland University of Technology	Model Robust Subsampling Approach For Generalised Linear Models In Big Data Settings	Subsampling is a computationally efficient and scalable method to support timely insights and informed decision making in big data settings. An integral component of subsampling is determining what subsample should be extracted from the big data for analysis. Recent subsampling approaches propose determining subsampling probabilities for each data point based on optimality criteria from experimental design but we suggest this is of limited use in practice as these probabilities rely on an assumed model for the big data. To overcome this limitation, we propose a model robust approach where a set of models is considered, and the subsampling probabilities are evaluated based on the weighted average of probabilities that would be obtained if each model was considered singularly. Theoretical support for such an approach is provided and the results from considering a simulation study and two real-world applications show that our model robust approach outperforms current subsampling practices.
Hugo Maruri Aguilar	Queen Mary University of London	Sparse Polynomial Prediction	In numerical analysis, sparse grids are point configurations that are used in stochastic finite element approximation, numerical integration and interpolation. This paper is concerned with the construction of polynomial interpolator models in sparse grids. Our proposal stems from the fact that a sparse grid is an echelon design with a hierarchical structure that identifies a single model. We then formulate the model and show that it can be written using inclusion-exclusion formulæ. At this point, we deploy efficient methodologies from the algebraic literature that can simplify considerably the computations. The methodology uses Betti numbers to reduce the number of terms in the inclusionexclusion while achieving the same result as with exhaustive formulæ.
Caterina May	Università del Piemonte Orientale	Estimation Of The Emax Model: Maximum Likelihood, Correction And Design	The Emax model is a dose-response model commonly applied in clinical trials, agriculture and environmental experiments. It is known in the literature that maximum likelihood estimation of the model parameters often encounters computational problems. Moreover the MLE is biased, and this fact significantly affects the estimate in real life situations when the sample size is small. In this work we study and compare different methods for estimating the Emax model. We also take into account the effect of choice of the experimental design.
Frank Miller	Linköping and Stockholm Universities	About $C$- And $D$-Optimal Dose-Finding Designs For Bivariate Outcomes	In many cases, clinical dose-finding trials consider both efficacy and safety. In order to optimise the trial design, we analyse bivariate models for these two outcomes. We focus on models of Emax-type for the outcomes with different number of parameters. We discuss results for the number of design points for the locally $c$- and $D$-optimal designs. The considered family of models offers the opportunity to use symmetry properties and transformations of the design space. Using these methods, we can derive algebraic results for the optimal designs. We illustrate how the optimal design and its number of design points depend on model parameters. Since the locally optimal designs depend on unknown model parameters, we handle this issue using sequential designs.
David Mogalle	Fraunhofer-Gesellschaft	Computing $T$-Optimal Designs Via Nested Semi-Infinite Programming And Two-Fold Adaptive Discretization	The $T$-criterion for model discrimination represents a bi-level optimization problem which can be transferred into a semi-infinite one. However, its solution is very unstable or time consuming for non-linear models and non-convex lower- and upper-level problems. If one considers only a finite number of possible design points, a numerically well tractable linear semi-infinite optimization problem arises. Since this is only an approximation of the original model discrimination problem, we propose an algorithm which alternately and adaptively refines discretizations of the parameter as well as of the design space and, thus, solves a sequence of linear semi-infinite programs. We prove convergence of our method and its subroutine and show on the basis of discrimination tasks from process engineering that our approach is stable and can outperform the known methods.
Werner Müller	Johannes Kepler Universität Linz	A Convex Approach To Optimum Design Of Experiments With Correlated Observations	Optimal design of experiments for correlated processes is an increasingly relevant and active research topic. Present methods have restricted possibilities to judge their quality. To fill this gap, we complement the virtual noise approach by a convex formulation leading to an equivalence theorem comparable to the uncorrelated case and to an algorithm giving an upper performance bound against which alternative design methods can be judged. Moreover, a method for generating exact designs follows naturally. We exclusively consider estimation problems on a finite design space with a fixed number of elements. A comparison on some classical examples from the literature as well as a real application is provided.
Theodora Nearchou	University of Southampton	Hierarchical Experiments And Likelihood Approximations	This work aims to approximate likelihood inference for models with intractable likelihood by combining various approximations to the likelihood, including cheap crude approximations and more computationally expensive accurate approximations to obtain statistically valid and effi- cient inference. To do this, we use techniques from the field of computer experiments, where Gaussian process (GP) models can be used as a surrogate for the computer model output. We apply multi-level GP model-based approximations where two levels of computer model are avail- able, corresponding to two levels of likelihood approximations. We illustrate that the likelihood approximation can be predicted using few observations from the complex approximation and more from the simple approximation through a generalised linear mixed model example. More- over, we present a methodology for choosing the design for our two-level GP approximation, with the aim of finding the point that maximises the likelihood using expected gain in utility.
José Núñez Ares	KU Leuven	Multi-Criteria Evaluation And Selection Of Experimental Designs From A Catalog	In recent years, several researchers have published catalogs of experimental plans. First, there are several catalogs of orthogonal arrays, which allow experimenting with two-level factors as well as multi-level factors. The catalogs of orthogonal arrays with two-level factors include alternatives to the well-known Plackett-Burman designs. Second, recently, a catalog of orthogonal minimally aliased response surface designs (or OMARS designs) appeared. OMARS designs bridge the gap between the small definitive screening designs and the large central composite designs, and they are economical designs for response surface modeling. The catalogs contain dozens, thousands or millions of experimental designs, depending on the number of runs and the number of factors, and choosing the best design for a particular problem is not a trivial matter. In this presentation, we introduce a multi-objective method based on graphical tools to select a design. Our method analyzes the trade-offs between the different experimental quality criteria and the design size, using techniques from multi-objective optimization. Our procedure presents an advantage compared to the optimal design methodology, which usually considers only one criterion for generating an experimental design. Additionally, we will show how our methodology can be used for both screening and optimization experimental design problems. Finally, we will demonstrate a novel software solution, illustrating its application for a few industrial experiments.
Ben Parker	Brunel University London	Design Of Experiments For Networks, And Networks For Experimental Design	How can we design experiments on networks, and how can ideas from network science help us to find designs? Much recent research exists on statistical analysis of network data; typically we wish to infer the influence of some network on some response we measure. However, in this talk we consider designing experiments on experimental units linked by some intrinsic network structure, in order to get maximum information about our interventions (treatments). Networks in experiments may be obvious, or less so, and might correspond to: social networks- for example we may give an advert to users connected by a social network, and try to infer the effectiveness (for example in A/B testing); spatial networks- for example in agricultural experiments, we may apply a treatment to one plot of wheat, and the effect of the treatment spreads to the adjacent experimental unit; temporal networks- we can even incorporate temporal elements into the definition, for example in a crossover experiment, a drug administered at a particular time might still have an effect afterwards. In this talk, we briefly review some of the work we, and others, have done on experiments on networks, and talk about the vital importance of including the network structure in our model where it exists. We show that by not taking into account network structure, we can design experiments which have very low efficiency and/or produce biased results, and provide some guidelines for performing robust experiments on networked data. Happily, many of the techniques we adopt for designing experiments on networks also can be used for experiments where no obvious network is found. We show that tools in network science can also be useful in helping us find good designs for a wide variety of experiments and models.
Andrey Pepelyshev	University of Cardiff	Prediction In Regression Models With Continuous Observations	We consider the problem of predicting values of a random process or field satisfying a linear model $y(x)=\theta^\top f(x) + \varepsilon(x)$, where errors $\varepsilon(x)$ are correlated. This is a common problem in kriging, where the case of discrete observations is standard. By focussing on the case of continuous observations, we derive expressions for the best linear unbiased predictors and their mean squared error. Our results are also applicable in the case where the derivatives of the process $y$ are available, and either a response or one of its derivatives need to be predicted. The theoretical results are illustrated by several examples in particular for the popular Mat'{e}rn $3/2$ kernel.
Luc Pronzato	Université Côte d’Azur	Kernel Relaxation For Space-Filling Design	The packing radius, covering radius and $L_r$-quantisation error ($r>1$) of a sampling design ${\mathbf X}_n$ in a compact subset ${\mathscr X}$ of ${\mathscr R}^d$ form natural measures of its space-filling performance in ${\mathscr X}$. They are also key factors for the derivation of error bounds for function approximation or integration over ${\mathscr X}$. Maximising a $\ell_q$ relaxation of the packing radius is equivalent to minimising the discrete energy for the Riesz kernel $K({\mathbf x},{\mathbf x}')=\\|{\mathbf x}-{\mathbf x}'\\|^{-q}$, that is, to constructing of a set of $q$-Fekete points. Other kernels $K$ can also be used, and a continuous version of the problem can be considered: it corresponds to the construction of a minimum-energy probability measure $\mu_K^+$ on ${\mathscr X}$ for $K$, a problem for which a simple Equivalence Theorem can be formulated. Kernel relaxation can also be introduced in the minimisation of the covering radius, leading to the notion of discrete polarisation. We show that when the minimum-energy probability measure $\mu_K^+$ has full support ${\mathscr X}$, it is also optimal for the continuous polarisation problem, this support condition being intimately related to the existence of the continuous BLUE in the location problem where $K$ defines the errors correlation. Finally, we also consider a continuous relaxed version of the minimisation of the $L_r$-quantisation error, and show that the optimal solution is obtained by maximising a concave functional. A necessary and sufficient condition for optimality, i.e., an Equivalence Theorem, is given.
Maryna Prus	Linköping University and Universität Hohenheim	Optimal Designs For Prediction In Random Coefficient Regression With One Observation Per Individual	The subject of this work is random coefficient regression models with only one observation per observational unit (individual). An analytical solution in form of optimality conditions is proposed for optimal designs for the prediction of individual random effect for a group of selected individuals. The behaviour of optimal designs is illustrated by the example of linear regression models.
Martin Radloff	Universität Magdeburg	$D$-Optimal And Nearly $D$-Optimal Exact Designs For Binary Response On The Ball	In this talk the results of Radloff and Schwabe (Stat Papers 60:165–177, 2019) will be extended for a special class of symmetrical intensity functions. This includes binary response models with logit and probit link. To evaluate the position and the weights of the two non-degenerated orbits on the $k$-dimensional ball usually a system of three equations has to be solved. The symmetry allows to reduce this system to a single equation. As a further result, the number of support points can be reduced to the minimal number. These minimally supported designs are highly efficient.
Torsten Reuter	Universität Magdeburg	Optimal Subsampling Design For Polynomial Regression In One Covariate	Improvements in technology lead to increasing availability of large data sets which makes the need for data reduction and informative subsamples ever more important. We construct $ D $-optimal subsampling designs for polynomial regression in one covariate for invariant distributions of the covariate. We study quadratic regression more closely for specific distributions. In particular we make statements on the shape of the resulting optimal subsampling designs and the effect of the subsample size on the design. We propose a generalization of the IBOSS method to quadratic regression which does not require prior knowledge of the distribution of the covariate and which performs remarkably well compared to the optimal subsampling design.
Nicholas Rios	George Mason University	Symmetric Order-Of-Addition Experiments	In an Order-of-Addition (OofA) experiment, the order in which $m$ components are added to a system influences a response. The goal of an OofA experiment is to find an optimal permutation; i.e., one that maximizes or minimizes the response. Existing models for Order-of-Addition experiments, such as the Pairwise-Ordering (PWO) model, assume that ordering effects are asymmetric. This assumption is unsuitable for many problems in networking, such as the famous Travelling Salesman Problem, where the cost of traversing a network is the same moving forwards as it is backwards. In this scenario, a permutation can be viewed as a Hamiltonian path on an undirected graph. A model is proposed that uses the edges of Hamiltonian paths to represent the effect of a permutation. Optimal designs for this model are derived that only require a fraction of all possible Hamiltonian paths to be examined. The model is shown to be effective at finding optimal paths in a drone delivery problem.
Juan Rodríguez-Díaz	Universidad de Salamanca	Construction Of Maxi-Min Efficiency Designs	Maxi-min efficiency criteria are commonly applied to solve the issue of parameter dependence in non-linear problems. They can also be used to take into consideration several tasks expressed by different component-wise criteria. Maxi-min efficiency criteria are, however, difficult to manage because of their lack of differentiability. As a consequence, maxi-min efficiency designs are frequently built through heuristic and ad hoc algorithms, without the possibility of checking for their optimality. The main contribution of this study is to prove that the maxi-min efficiency optimality is equivalent to a Bayesian criterion, which is differentiable. In addition, we provide an analytic method to find the prior probability associated with a maxi-min efficient design, making feasible the application of the equivalence theorem. Two illustrative examples show how the proposed theory works.
Samuel Rosa	Comenius University Bratislava	Mixed-Integer Linear Programming For Computing Optimal Designs	Because the optimal exact design problem is a discrete optimization problem, it can be solved by general methods of integer programming. One approach to speeding up the solution is to use specialized solvers, provided that a more structured formulation of the problem is known. For instance, for many design criteria a mixed-integer second-order cone programming formulation of the optimal exact design problem was proposed in Sagnol & Harman (2015). We show (see Harman & Rosa, 2023) that for some criteria this mathematical programming specialization can be taken even further. In particular, we provide a mixed-integer linear programming (MILP) formulation for optimal replication-free designs with respect to a wide class of criteria, including $A$-, $I$-, $G$- and $MV$-optimality. We also show that the MILP formulation can be extended to exact designs with replications and demonstrate some unique advantages of the MILP approach.
Bill Rosenberger	George Mason University	Design And Inference For Enrichment Trials With A Continuous Biomarker	We describe the philosophical approaches to two-stage enrichment designs, in which a benefitting subpopulation is targeted in a second stage, after a first stage identifies the threshold of a predictive, continuous biomarker. The design issue we address is sample size estimation for the first and second stages, and the consequences of poorly estimating the threshold. These design issues are established based on an approach where the two stages are conducted and analyzed separately, and stage two is considered a confirmatory trial. Another approach is to combine the data from the two stages, and we demonstrate how to do that by testing two hypotheses simultaneously with test statistics that (we show) have an asymptotic normal distribution. While a bivariate normal model is used to give insights into the predictive nature of the biomarker, and to visualize some closed-form solutions, in principle other models can certainly be used (but perhaps yield fewer insights). As in many ongoing, long-term research projects, our work probably raises more questions than it answers!
Katrin Roth	Bayer AG	Challenges And Pitfalls In Applying Optimal Design Theory In Clinical Dose Finding Studies	In designing clinical dose finding studies, challenges arise not only through operational limitations like pre-defined tablet strengths, but also as different objectives need to be addressed and the planning assumptions are highly variable. For example, in trials using MCP-Mod one objective is to test for a dose-response relationship using a contrast test, while another objective is to estimate the dose-response relationship. Maximizing the power of the contrast test and maximizing the precision in estimating the dose-response curve generally require different optimality criteria and thus result in different optimal designs. Also, an efficient design across several candidate models needs to be found as the shape of the dose-response curve might be unknown. Even if a design that is efficient regarding both the contrast test and the estimation of several candidate models is chosen, deviations from the planning assumptions or just unfortunate data constellations can lead to unfavorable results. These may include convergence issues in estimating specific models, or large confidence intervals in case the chosen design was inefficient for the resulting estimated parameters. We will present an example of a real dose-finding study where we encountered such challenges and will provide some initial ideas how to improve the design in such situations.
Frank Röttger	Université de Genève	Optimal Designs For Discrete Choice Models Via Graph Laplacians	In this work, we connect design theory for discrete choice experiments with Laplacian matrices of connected graphs. We rewrite the $D$-optimality criterion in terms of Laplacians via Kirchhoff’s matrix tree theorem, and show that its dual has a simple description via the Cayley–Menger determinant of the Farris transform of the Laplacian matrix. This results in a drastic reduction of complexity and allows us to implement a gradient descent algorithm to find locally $D$-optimal designs. For the subclass of Bradley–Terry paired comparison models, we find a direct link to maximum likelihood estimation for Laplacian-constrained Gaussian graphical models. This implies that every locally $D$-optimal design is a rational function in the parameter when the design is supported on a chordal graph. Finally, we study the performance of our algorithm and demonstrate its application on real data.
Emma Rowlinson	University of Manchester	Sequential Bayesian Design Using A Laplace-Parameterised Policy	Policy-based approaches pose a promising future direction for sequential Bayesian design. These approaches involve use of a design policy, which maps from the current state of knowledge to the next proposed design. The design policy must first be trained via tuning of its parameters to maximise the expected utility of the whole sequential experiment, referred to as total expected information gain (total EIG). This can potentially enable better performance compared to traditional “myopic” or one-step look ahead approaches. A key question is how to represent the current state of knowledge, or equivalently, parameterise the design policy. Previous works have considered the use of neural network policies with relatively little structure. Our work explores the use of a Laplace approximation to the posterior to parameterise the current state of knowledge. This approach allows us to incorporate more refined structure into the policy, potentially offering a more lucid interpretability and efficient policy training. Our research also explores whether this Laplace-parameterised policy achieves higher total EIG in sequential Bayesian design problems. Motivated by recent results demonstrating Laplace-based importance sampling approximations give highly accurate approximations to the EIG utility in Bayesian design, we use Approximate Laplace Importance Sampling (ALIS) to estimate the intractable total EIG utility. To implement our methodology, we have utilised PyTorch and its automatic differentiation package, torch.autograd. We apply the proposed approach to a location finding example, where we endeavour to determine the true locations of sources, given measurements of a signal. The poster presented will summarise progress on this work so far.
Eric Schoen	KU Leuven	Order-Of-Addition Orthogonal Arrays To Study The Effect Of Treatment Ordering	The sequence in which a set of $m$ treatments is applied can be modeled by relative-position factors that indicate whether treatment $i$ is carried out before or after treatment $j$, or by the absolute position for treatment $i$ in the sequence. A design with the same normalized information matrix as the design with all $m!$ sequences is called an order-of-addition orthogonal array. In a recent paper, Peng, Mukerjee and Lin proved that such an array is $D$- and $G$-optimal for the main-effects model involving the relative-position factors. We prove that such designs are also $I$-optimal for this model and $D$- $G$- and $I$-optimal for the first-order model in the absolute-position factors. We propose a methodology for a complete or partial enumeration of non-equivalent order-of-addition orthogonal arrays.
Kirsten Schorning	Technische Universität Dortmund	$A$-Optimal Designs For State Estimation In Networks	We consider two models for estimating the expected states of nodes in networks where the observations at nodes are given by random states and measurement errors. In the first model, we assume independent successive observations at the nodes and the design question is how often the nodes should be observed to obtain a precise estimation of the expected states. In the second model, all nodes are observed simultaneously and the design question is to determine the nodes which need larger precision of the measurements than other nodes. Both models lead to the same design problem. We derive explicitly A-optimal designs for the most simple network with star configuration. Moreover, we consider the network with wheel configuration and derive some conditions which simplify the numerical calculation of the corresponding $A$- optimal designs.
Dasha Semochkina	University of Southampton	Optimal Designs For Nonlinear State-Space Models With Applications Tin Chemical Manufacturing	One of the first stages in chemical manufacturing is to create and calibrate a realistic model of the process of interest. Many chemical reactions can be expressed as nonlinear state-space models which are also widely used in other areas, including statistics, econometrics, information engineering and signal processing. State-space models depend on parameters to be calibrated and some control parameters. We address the problem of systematic optimal experimental design for the control parameters for this class of models. We construct locally D-optimal designs by incorporating the calculation of the determinant of the Fisher Information Matrix. This allows us to identify a set of control parameters such that if experiments are run under those conditions, the remaining parameters could be estimated with the highest possible precision.
Pál Somogyi	Comenius University Bratislava	A Randomized Exchange Algorithm For Problems With General Atomic Information Matrices	The optimal design algorithm REX (see Harman et al, 2020) is simple and efficient, yet it is limited to the basic optimality criteria and rank-one elementary information matrices. This contribution proposes a generalization of the REX algorithm, expanding its capabilities to compute optimal approximate designs with respect to all Kiefer’s optimality criteria and elementary information matrices of any rank. In addition to the generalization of the REX algorithm, we also present some applications of optimal design problems with general-rank elementary information matrices, such as multivariate response models and optimal augmentation of designs. Numerical results confirm the stability and rapid convergence of the proposed algorithm. Harman, R., Filová, L.,Richtárik, P. (2020): A Randomized Exchange Algorithm for Computing Optimal Approximate Designs of Experiments, Journal of the American Statistical Association, Vol. 115, No. 529, 348-361.
Bernhard Spangl	Universität für Bodenkultur Wien	How To Determine The Minimal Sample Size In Balanced 3-Way Anova Models Where No Exact $F$-Test Exists	We consider balanced three-way ANOVA models to test the hypothesis that the fixed factor $A$ has no effect. The other factors are fixed or random. For most of these models (including all balanced 1-way and 2-way ANOVA models) an exact $F$-test exists. Details on the determination of the minimal sample size and on an in-depth structural result can be found in Spangl, et al.(2023). For the two models \begin{equation} \label{eq:approx} A \times \boldsymbol{B} \times \boldsymbol{C} \qquad\text{and}\qquad (A \succ \boldsymbol{B}) \times \boldsymbol{C}%, \end{equation} (bold letters indicate random factors), however, an exact $F$-test does not exist. Approximate $F$-tests can be obtained by Satterthwaite’s approximation. The approximate $F$-test involves mean squares to be simulated. To approximate the power of the test, we simulate data such that the null hypothesis is false and we compute the rate of rejections. The rate then approximates the power of the test. In this talk we aim to determine the minimal sample size of the two models mentioned above, given a prespecified power, and we give a heuristic that the number of replicates $n$ should be kept small $(n=2)$. This suggestion is backed by all simulation results. This leaves the numbers of levels of the random factors $B$ and $C$ to determine the sample size. determine the active and inactive variance components for both ANOVA models by using a surrogate fractional factorial model with variance components as factors. determine the worst combination of active variance components for both models by using a surrogate response surface model based on a Box-Behnken design. The special structure of the Box-Behnken design ensures that the used models have similar total variance. Additionally we propose three practical methods that help reducing the number of simulations required to determine the minimal sample size. The first suggested method searches for models with minimal sample size in the proximity of a preselected ray on the grid spanned by the numbers of levels of the random factors $B$ and $C$. The second method optimizes the sample size subject to a prespecified power by using response surface methods. The third method reconstructs the power surface by fitting a parametric model to simulated empirical power values. This reconstruction is then used to determine the model with minimal sample size given a prespecified power. We compare the proposed methods, present some examples and, finally, we give recommendations about which method to choose. Spangl, B., Kaiblinger, N., Ruckdeschel, P. & Rasch, D. (2023). Minimal sample size in balanced ANOVA models of crossed, nested, and mixed classifications. Communications in Statistics – Theory and Methods, 52(6), 1728–1743.
Arno Strouwen		Adaptive And Robust Experimental Design For Linear Dynamic Models Using The Kalman Filter	Current experimental design techniques for dynamical systems often only incorporate measurement noise, while dynamical systems also involve process noise. To construct experimental designs we need to quantify their information content. The Fisher information matrix is a popular tool to do so. Calculating the Fisher information matrix for linear dynamical systems with both process and measurement noise involves estimating the uncertain dynamical states using a Kalman filter. The Fisher information matrix, however, depends on the true but unknown model parameters. In this presentation we combine two methods to solve this issue and develop a robust experimental design methodology. First, Bayesian experimental design averages the Fisher information matrix over a prior distribution of possible model parameter values. Second, adaptive experimental design allows for this information to be updated as measurements are being gathered. This updated information is then used to adapt the remainder of the design.
John Stufken	George Mason University	Treess: A Model-Free Tree-Based Subdata Selection Method For Prediction	With ever larger datasets, there is a growing need for methods that select just a small portion of the entire dataset (subdata) so that reliable inferences can be obtained by analyzing only the selected subdata. Many of the subdata selection methods that have been proposed in recent years are based on model assumptions for the data. While these methods can work extremely well when the model assumptions hold, they may yield poor results if the assumptions are wrong. In addition, subdata that is good for one task may not be so good for another. In this presentation we introduce and discuss a model-free Tree-based Subdata Selection method (TreeSS) that focuses on selecting subdata that performs well for prediction.
Sergey Tarima	Medical College of Wisconsin	Group Sequential Tests - Beyond Exponential Family	We consider group sequential tests powered for multiple ordered alternative hypotheses with a predetermined $\alpha$-spending function. Theorem 1 shows that if a fixed-sample size likelihood ratio test is monotone with respect to a one-dimensional test statistic, then a group sequential test comprised of interim cumulative likelihood ratio tests continue to be monotone. This group sequential test is most powerful, at a given ALPHA-spending function and predetermined interim sample sizes. We work directly with heavy tailed data which provides an enhanced opportunity for the application of group sequential designs in studies of computer network trafficking, high-frequency trading, risk management and insurance, where the use of heavy tailed distributions is common. This theorem extends our previous results (Metrika, 2022) from the exponential family to non-exponential distributions with monotone likelihood ratio. When the likelihood ratio is not monotone for finite sample sizes, locally most powerful tests can be constructed if a test is locally most powerful for a fixed sample size against a local alternative. A two-stage Cauchy example is used to show how to build such tests using either a likelihood ratio or an MLE. Fisher information decomposes into the design and sampling components. The design component is lost to the likelihood. We describe how this affects Cramer-Rao lower bound for post-testing estimators. In summary, if a one parameter distribution for the data is either known or assumed, MLE-based group sequential tests allow for separately powered multiple ordered alternatives that are most powerful for this set of hypotheses in either finite or in local asymptotic settings. Such constructions can be used to mitigate the over- and under-powering associated with the alternative hypotheses that may be mis-specified in the design.
Chiara Tommasi	Università degli Studi di Milano	Connection Between Likelihood Tests And Discrimination Designs	The goal of this study is to show the connection between hypothesis testing and optimal designing to discriminate between homoscedastic and heteroscedastic regression models. More specifically, we consider the following heteroscedastic regression model for the response $Y$: $$ y_i=\eta(\boldsymbol{x}_i;\boldsymbol{\beta})+\varepsilon_i,; \varepsilon_i\sim N(0;\sigma^2 h(\boldsymbol{x}_i;\boldsymbol{\gamma})),\quad i=1,\ldots,n $$ where $\eta(\boldsymbol{x}_i;\boldsymbol{\beta})$ is the mean response, $\boldsymbol{\beta}$ is a vector of regression coefficients and $\sigma^2 h(\boldsymbol{x}_i;\boldsymbol{\gamma})$ is the error variance depending on an unknown constant $\sigma^2$ and a continuous positive function $h(\cdot; \cdot)$, completely known except for a parameter vector $\boldsymbol{\gamma}\in {\rm I\!R}^s$. Let $\boldsymbol{\gamma}_0$ such that $h(\boldsymbol{x}; \boldsymbol{\gamma}_0) = 1$ (homoscedastic case). To test $H_0 :\, \boldsymbol{\gamma}=\boldsymbol{\gamma}_0$ against a local alternative $H_1 :\, \boldsymbol{\gamma} = \boldsymbol{\gamma}_0 + \boldsymbol{\lambda}/\sqrt{n}$ (with $\boldsymbol{\lambda}\neq \boldsymbol{0}$), a likelihood-based test (such as log-likelihood ratio, score or Wald statioltics) is usually applied. We aim at designing an experiment with the goal of maximizing (in some sense) the asymptotic power of a likelihood-based test. Few papers are related to hypothesis testing (see for instance, Dette and Titoff (2009) and the references therein) and herein, we justify the use of the $D_s$-criterion and the KL-optimality (L'opez-Fidalgo, Tommasi, Trandafir, 2007) to design an experiment with the inferential goal of checking for heteroscedasticity. Both $D_s$- and KL-criteria are proved to be related to the noncentrality parameter of the asymptotic chi-squared distribution of a likelihood test. References Dette, H. and Titoff, S.(2009). Optimal discrimination designs. The Annals of Statistics,37(4), 2056-2082. López-Fidalgo, J. and Tommasi, C. and Trandafir, P. C.(2007). An optimal experimental design criterion for discriminating between non-normal models. Journal of the Royal Statistical Society Ser. B, 69(2), 231–242.
Camelia Trandafir	Universidad Pública de Navarra	Adopting Tolerance Region For The Calibration Problem	It is known that the simple Linear Calibration problem is based on the evaluation of the “future calibrating value”, which actually means to obtain an optimal ratio estimate, under different optimality criteria. Different approaches have been proposed to face the problem. But for these “future” observations is requested a high proportion of them, to be within a certain interval, for a given probability level. This is exactly the definition of the Tolerance Region (TL). There is a different line of thought to face the TL and under the different approach, different solutions are offered for the Calibration problem, which are presented to this paper.
Dariusz Uciński	University of Zielona Góra	Convex Relaxation For Sensor Selection In The Presence Of Correlated Measurement Noise	Best sensor selection is of paramount importance when monitoring spatiotemporal phenomena using large-scale sensor networks. A technique of this type is developed for maximizing the parameter estimation accuracy when the system in question is modelled by a partial differential equation and the measurement noise is correlated. The weighted least squares method is supposed to be used for estimation and the trace of the covariance matrix of the resulting estimator is adopted as the measure of estimation accuracy. This design criterion is to be minimized by choosing a set of spatiotemporal measurement locations from among a given finite set of candidate locations. To make this combinatorial problem computationally tractable, its relaxed formulation is considered. The pivotal role here is played by the decomposition of the covariance kernel matrix into the sum of a positive matrix and a scalar multiple of the identity matrix. Optimal solutions are found using efficient simplicial decomposition which alternates between updating the design weights using a version of the multiplicative algorithm and computing a closed-form solution to a simple linear programming problem. The sequence of iterates monotonically decreases the value of the original convex design criterion. As the resulting relaxed solution is a measure on the set of candidate measurements and not a specific subset, randomization and a restricted exchange algorithm are used to convert it to a nearly-optimal subset of selected sensors. A simulation experiment is reported to validate the proposed approach.
Sander van Cranenburgh	TU Delft	On The Impact Of Decision Rule Assumptions In Experimental Designs On Preference Recovery	Efficient experimental designs aim to maximise the information obtained from stated choice data to estimate discrete choice models’ parameters statistically efficiently. Almost without exception, efficient experimental designs are based on the assumption that decision-makers use a Random Utility Maximisation (RUM) decision rule. When using such designs, researchers (implicitly) assume that the decision rule used to generate the efficient design has no impact on respondents’ choice behaviour. However, recent research has called this assumption into question. This study investigates whether the decision rule assumption underlying an experimental design affects respondents’ choice behaviour. To do so, we conducted four stated choice experiments, two based on experimental designs optimised for utility maximisation and the other two based on experimental designs optimised for a mixture of RUM and Random Regret Minimisation (RRM). We evaluate the model fits of RUM and RRM models across the four data sets and investigate whether some choice tasks particularly invoke RUM or RRM decision rules. For the latter, we develop a new sampling-based approach that avoids the confounding between preference and decision rule heterogeneity that plagues latent class-based methods. We find no evidence that RUM-optimised designs particularly invoke RUM-consistent choice behaviour. However, we find compelling evidence that some choice tasks invoke RUM consistent behaviour while others invoke RRM consistent behaviour. This implies that respondents’ choice behaviour and choice modelling outcomes are not exogenous to choice tasks selected by the choice modeller.
Alan Vazquez	University of Arkansas	Constructing Large OMARS Designs By Concatenating Definitive Screening Designs	Orthogonal minimally aliased response surface (OMARS) designs permit the screening of quantitative factors at three levels using an economical number of runs. In these designs, the main effects are orthogonal to each other and to the quadratic effects and two-factor interactions of the factors, and these second-order effects are never fully aliased. Complete catalogs of OMARS designs with up to seven factors have been obtained using an enumeration algorithm. However, the algorithm is computationally demanding for constructing good OMARS designs with many factors and runs. To overcome this issue, we propose a construction method for large OMARS designs that concatenates two definitive screening designs. The method ensures the core properties of an OMARS design and improves the good statistical features of its parent designs. The concatenation employs an algorithm that minimizes the aliasing among the second-order effects using foldover techniques and column permutations for one of the parent designs. We study the statistical properties of the new OMARS designs and compare them to alternative designs in the literature. Our method increases the collection of OMARS designs for practical applications.
Ziyan Wang	University of Southampton	Bayesian Sequential Multi-Arm Multi-Stage Design With Time Trend Effect	Multi-arm multi-stage (MAMS) design evaluates multiple treatments simultaneously, resulting in a more efficient and ethical drug development process compared to two-arm trials. Randomization, a fundamental aspect of trial design, ensures unbiased treatment allocation. In this study, we explore the impact of different randomization methods within the Bayesian sequential MAMS design context. Furthermore, the time trend problem poses a critical challenge in MAMS trials. Understanding the potential changes in treatment effects over time is essential for valid trial outcomes. To address this issue, we investigate the influence of time trends on Bayesian MAMS design using adaptive randomization. We evaluate various existing models that incorporate adjustments for the time trend effect in Bayesian MAMS designs. By integrating these models into the analysis, we can mitigate the impact of time trends. Our study highlights the importance of selecting appropriate randomization methods and employing models that account for time trends in Bayesian sequential MAMS trials. These considerations enhance the efficiency and ethical considerations of the trial.
HaiYing Wang	University of Connecticut	Scale-Invariant Optimal Sampling And Variable Selection With Rare-Events Data	Subsampling is a particularly effective approach to solve the computational challenges with massive rare-events data, with the possibility of a significant reduction in computational burden by little sacrifice on asymptotic estimation efficiency. In case of any estimation efficiency loss due to too aggressive subsampling, an optimal subsampling method can help minimize the information loss. However, optimal subsampling has never been investigated in the context of variable selection. Existing optimal subsampling probabilities based on $A$- and $L$- optimalities depend on the scale of the covariates and may produce inconsistent results for the same data with different scale transforms. This scale dependence issue may cause more serious problems in variable selection when there are inactive covariates, because the contribution of the inactive covariates may be arbitrarily amplified if an inappropriate scale transform. To resolve this issue and fill the aforementioned gap in the literature, we investigate variable selection for rare-events data. We first prove the oracle properties of full data adaptive lasso estimator with massive rare-events data, which justify the usage of subsampling controls. We then propose a scale invariant optimal subsampling function to minimize the prediction error of the inverse probability weighted (IPW) adaptive lasso. Both the optimal subsampling function and the adaptive lasso require a pilot estimator, and the two procedures are naturally integrated. We also propose an estimator based on maximum sampled conditional likelihood with adaptive lasso penalty to further improve the estimation efficiency. The oracle properties of the proposed estimator are also investigated. Numerical experiments based on simulated and real data are carried out to investigate the performances of proposed methods.
Xiaojian Xu	Brock University	Implementation Strategies For Model-Robust Designs And Active Learning	We discuss the implementation strategies for model-robust experimental designs and active learning, which protect against possible model departures within a L2-type of neighbourhood. Accounting for both variance minimization and bias reduction, the minimax efficient designs are necessary to be absolutely continuous. In the present study, we propose two new implementation methods: one is deterministic, a ìcluster designî whereas the other is non-deterministic, a ìrandomized designî. We compare these implementation strategies among other existing ones in the literature.
Min Yang	University of Illinois Chicago	Deriving Nearly Optimal Subdata	Big data brings unprecedented challenge of analyzing such data due to its extraordinary size. One strategy of analyzing such massive data is data reduction. Instead of analyzing the full dataset, a selected subdata set is analyzed. Various subdata selection methods have been proposed. Those methods are often based on the characterization of an optimal design. There are significant differences between optimal design and subdata selection: (i) an optimal design point may not exist in a given full data and (ii) while a point can be selected multiple times in an optimal design, it can only be selected once in a subdata selection. While the trade-off between computation complexity and statistical efficiency has been studied, little is known how efficient the selected subdata is in terms of statistical efficiency. To answer this question, we need to find an optimal subdata. Deriving an optimal subdata, however, is a N-P hard problem. In this talk, a novel framework to derive a nearly-optimal subdata, under any given statistical model, regardless of optimality criterion or parameters of interest, will be introduced. This framework has three benefits: (i) it shows us the structure of a nearly-optimal subdata for any given full data under various set-ups (model, optimality criterion, parameter of interest); (ii) it measures highly accurate statistical efficiency; and (iii) it provides a tool of deriving a nearly optimal subset in active learning where statistical efficiency is the main concern.
Noha Youssef	The American University in Cairo	Bayesian Optimal Design Using INLA	Constructing a Bayesian optimal design for generalized linear models and non-linear ones is a rich area for research. In such cases, optimizing the utility functions requires the solution of intractable integrals. In this work we propose utilizing INLA (The Integrated Nested Laplace Approximation) to obtain the Bayesian optimal design. INLA is an R package for approximating posterior distributions. It is used as an alternative to MCMC in numerous applications. To assess the performance of INLA for finding the Bayesian optimal design we compare it with ACEbayes, another R-package in which they use the approximate coordinate algorithm along with MCMC techniques for the design selection. Illustrative examples will be presented throughout this work to assess the performance of both approaches.
Anatoly Zhigljavsky	University of Cardiff	OLSE And BLUE For The Location Model And Energy Minimization	In this talk, we will make connections between the following areas: (a) simple and ordinary kriging with kernel K for prediction of values of a random field indexed by a set X, (b) energy minimization for K, and © parameter estimation with the Ordinary Least Squares Estimator (OLSE) and the Best Linear Unbiased Estimator (BLUE) in the location model with observations whose correlation is defined by K. All there three areas are well-studied in modern literature. Despite both kriging and designing for correlated observations are well-known and well-studied areas in the DOE community, the effect of replacing the OLSE for the BLUE as parameter estimator in kriging prediction models has not been adequately addressed in the DOE literature. Likewise, there is much research on energy minimization but the implications of this well-researched field on properties of the OLSE and the BLUE in regression models were not much discussed in the DOE literature. The present paper aims at closing these gaps by fully concentrating on the interplay between the above three areas. We emphasize the special role of the constant function and illustrate our results on examples. At the end of the talk, several conjectures will be formulated.