Past projects
A selected list of Dr. Yuhua Su's statistical consulting and data analysis projects
Statistics and data analysis for Doctor of Nursing Practice (DNP) capstone project
Two-sample t-tests and paired t-tests were performed for a Doctor of Nursing Practice (DNP) capstone project.
Multivariate data analysis in nursing research
Canonical correlation analysis was performed. A comprehensive analysis report was created. Radar plots were also used to illustrate analysis results.
Diagnostic test evaluation
Sensitivity, specificity, positive predictive value, negative predictive value, and classification accuracy were computed for a screening tool. The 'exact' Clopper-Pearson confidence intervals were computed for each assessment estimate.
Survival analysis of breast cancer patients
Various survival analyses, such as Kaplan-Meier method, log-rank tests, and Cox proportional odds model, were performed. High resolution charts were created.
Statistics and psychology
In this statistical consulting project, chi-square tests of independence, two-sample t-tests, repeated-measures ANOVAs with interaction effects, Pearson's correlations, and ANCOVAs were performed to determine the impacts of an intervention on prospective memory performance.
Statistics and psychology
To find the best predictors of personal adjustment, Pearson’s correlation coefficients, simple linear regressions, and multiple linear regressions with stepwise procedures for variable selection, such as forward selection, backward elimination, and stepwise regression were performed.
Statistics and social work
Regression modeling was used to investigate the risk factors associated with domestic violence.
Statistics and political science
Linear mixed-effects models were used to investigate the relationship between community engagement and student learning performance.
Statistics and clinical trials
The two one-sided tests procedure was performed to test statistical hypotheses of equivalence.
Statistics and medical research
Latent class growth modelling (LCGM) was performed using the SAS procedure PROC TRAG to identify trajectories of hemoglobin A1C over the study time period.
Statistics and econometrics
The vector autoregressive (VAR) models were performed to determine long-run and short-run relationships among variables.
Survey instruments validation
Exploratory factor analysis and confirmatory factor analysis were performed to determine underlying constructs for a set of measured variables.
Statistics and education
The academic achievement data were analyzed using the two-stage least squares (2SLS) simultaneous equations.
Statistical quality control in industrial engineering
Various methods for statistical quality control were performed. High resolution control charts were created.
Production process improvement using six sigma methodologies
Six sigma process improvement methods were utilized. High resolution figures were created.
Meta-analysis in medical research
Mixed-effects models for meta-analysis were performed. High resolution forests plots were created.
Repeated measures analysis
Repeated measures ANOVA, MANOVA, and mixed effects models were performed for education survey data.
Statistics and laboratory quality control
Two-sample t-test was used to compare the means for the pre-and-post laboratory readings. Levey-jennings charts were created to investigate if the data signal the presence of a special-cause that requires immediate investigation. Correlation analysis was performed.
Moderator and mediator analysis
The procedures used in this project to determine the mediation and moderator effects are described in Baron and Kenny (1986).
Time series analysis on medical data
Interrupted time series analysis (ITS) was conducted to investigate factors associated with inhospital mortality.
Qualitative data analysis in education
Thematic analysis was conducted.
Variability assessment using the bootstrap method
Statistical analysis for data from an online survey
Factor analysis was conducted to provide understanding of the patterns and relationships among the variables of interest. Χ² test of independence, Fisher's exact test and Cochran-Armitage trend test were performed.
Statistical analysis for observational data in the medical field
Binary logistic regression and proportional odds models were performed to identify factors that may have influences on the outcome variables of interest.
Statistics and education survey data
A comparison of group perceptions for a specific education workshop. Survey data were summarized by R x C cross tabulation tables. Χ² test of independence and Fisher's exact test were performed. Gamma coefficient was used to evaluate the strength of dependence. Cronbach's alpha was used as a measure of the internal consistency or reliability of the survey instrument. Cumulative logit models were fit for ordinal responses. Factor analysis was used to evaluate the construct validity. Upon the request of the client, t-test, simple linear regression, and correlation were also performed.
Statistics and biology
Generalized linear mixed effects models were conducted to examine various hypothesis testings. High resolution SAS plots were generated.
Statistics and pricing strategy
Generalized linear models were conducted to predict the relationship between prices and attendances. High resolution SAS plots were generated.
Data transformation and regression diagnostics
In this project, regression diagnostics were performed before and after data transformation; the true relationship between two variables was uncovered.
Logistic regression models
In this project, three types of analyses were performed. A baseline-category logit model for multicategory nominal response variables was proposed to investigate the relationship between one response variable and two explanatory variables (nominal and continuous). A multiple logistic regression for dichotomous responses was conducted. A multiple regression model was used to assess the relationship between dependent and independent variables (multicollinearity was detected and variable selection was performed).
Two-way ANOVA
A two-way analysis of variance (ANOVA) was conducted to evaluate the effect of a new method of sequential care on patient clinic-a School of Nursing project.
Statistics and marketing
A simulation study was conducted to investigate the data distribution for a market segmentation study. Goodness-of-fit tests (Χ² test statistics, Likelihood ratio test statistics, Kolmogorov-Smirnov test statistics) and Monte-Carlo simulation for p-values were performed.
Statistics and business decision making
Descriptive data analysis. Simple linear regression and multiple regression. Time series forecasting, project management using statistics.
Pre-treatment and post-treatment design
Paired t-test and analysis of covariance (ANCOVA) were performed to compare pre-treatment and post-treatment results and identify variables that are associated to post-treatment outcomes.
An assessment of training needs for the lumber manufacturing industry in the eastern United States
Denig, J., Page, S., Su, Y. and Martinson, K. (2008)
Statistical analysis (descriptive statistics, frequency counts, ANOVA, F statistic, histograms, and bar charts, etc.) for the training needs assessment of the primary forest products industry for 33 eastern states was conducted.
Variable selection using regression trees
Based on the data collected, an engineering company would like to try to predict whether a particular machine is more likely to fail in the field. The method of regression trees was adopted to identify factors associated with the production of faulty units. Evaluation of the reliability of the algorithms that predict field failures was conducted and a written report was provided. The Francis G. Giesbrecht award (Department of Statistics, NCSU) was awarded to this statistical consulting project in 2006.