DSI Collaborations

Do you have a challenging research question or complex dataset that you think would benefit from our data science expertise? If so, contact us at datascience@ucdavis.edu! We're keen to think about ambitious projects that push the envelope - whether that's within your discipline, across disciplines, or within the growing field of data science itself. We have human and computational resources at the DSI to allocate to compelling research challenges, ranging from data acquisition, visualization and analysis to improving computational efficiency for big data and/or intensive methodology.

We are especially interested in interdisciplinary problems that involve researchers from across UC Davis. In the true meaning of collaboration, we expect to work with you through the entire data science pipeline. We want to come in at the early stages of the project formation and continue to contribute through the data interpretation and generation of final products. Our goal is to have a specific product outcome from each collaboration, whether that's generating pilot results for a grant proposal or complete results for a paper, developing new custom software, etc.

Call for Projects

We have periodic calls for proposals, which will be posted on our Announcements page and on our mailing list. Our most recent call can be found here. But, don't hesitate to contact us if you have a brilliant idea outside of the collaboration call periods.

Current Interdisciplinary Collaborative Projects:

  • Data Bodies at Play
    Dr. Gina Bloom, Department of English, ModLab
    How do our relationships with technology influence how we learn and interact? How do gamers voices and bodies adapt to a challenging interface? By using gamer performances of Play the Knave (http://playtheknave.org), a digital game created at UC Davis by ModLab where players digitally enact scenes from Shakespeare's plays, the DSI is using machine learning to explore how players respond when faced with the "glitch" of challenging Shakespearian language and/or the visual interface.

  • Media Issue Framing
    Dr. Amber Boydstun, Department of Political Science
    How does media issue framing influence public opinion? How do issue frames tend to rise, fall and evolve in the media over time? Do the often co-varying degree of public attention and prevalence of competing issue frames correlated with the influence of a specific media frame on public opinion? The DSI is providing assistance with data exploration and visualization.

  • Predicting Length of Hospital Stays
    Dr. Ronald Fong, UC Davis Medical Center
    One of the most significant problems in emergency medical care across the country is predicting how long a patient will remain hospitalized. We are attempting to build a better predictive model by taking into account both quantitative and qualitative data from electronic medical records, with the goal to assist clinician recommendations for patient treatment and discharge planning.

  • Learning from LCAPS
    Dr. Jacob Hibel, Department of Sociology
    How do local communities articulate goals, priorities and action plans for their school districts to foster increased access to special services? The DSI is helping to develop an automated workflow for extracting data from complex-structured school district LCAP (Local Control and Accountabliity Plans) PDF documents, and to code stated policies for analysis.

  • Seismic Curation, Rescue & Archival Project
    Dr. Lorraine Hwang, Computational Infrastructure for Geodynamics

  • Predicting Zoonotic Diseases
    Dr. Christine Kreuder Johnson, EpiCenter for Disease Dynamics, One Health Institute
    How can we predict and preempt outbreaks of zoonotic diseases, which pass from animals to humans? The DSI is helping to extract data on emerging zoonotic viruses, host ranges, outbreaks and emergence mechanisms to build a database and develop models to assess the drivers of zoonotic emergences.

  • Creativeness Digital Scholarship Group
    Dr. David Kyle, Department of English
    From the Civil War to World War II, diverse intellectual and institutional voices debated the nature and role of human creative power or "creativeness." A noun coined only in the 1920s, "creativity" was shaped by grafting scientific management onto the creative imagination of the 1950s, when social scientists and government agents began advancing the concept and developing research advocating or a more institutionalized version believed necessary for Cold War-era bureaucracies. Using computational methods, we are analyzing tens of thousands of newspaper documents produced from 1950-1970 to identify key actors, institutions revealing how "creativeness" evolved into modern "creativity." The DSI is assisting with improving the optical character recognition (OCR) and programmatically reading and reassembling the articles, which span different columns across multiple pages.

  • Genotype-Phenotype Database Design
    Dr. Julin Maloof, Department of Plant Biology
    How are genotypes related to phenotypes, and vice versa? Being able to predict those relationships with high accuracy can improve agriculture and lead to better models of how plants adapt and grow. Recent advancements in genotyping and phenotyping methods have led to exponential increases in the volume of data from short-read RNA/DNA sequences to extensive measurements across thousands of individual plants. The DSI is helping the Maloof lab to develop a database and workflow pipeline to increase data transparency, accessibility and discovery for enhanced analytical pursuits.

  • L2 Spanish Learners
    Dr. Claudia Sánchez-Gutiérrez, Department of Spanish and Portuguese
    What specific vsual and auditory input do learners receive when learning a new language, and how does it relate to language acquisition? The DSI is helping to analyze American college Spanish language textbooks and in-class lecture recordings to test the role of lexical frequency on automatic word processing across different proficiency levels.

  • Hiring in Academia
    Dr. Kimberlee Shauman, Department of Sociology
    What features drive hiring biases in academia? Is it discipline specific, driven by a lack of diverse candidate pools, or are there other implicit bias in the hiring process? To tackle this issue we are data mining faculty application packages. The DSI is helping to extract relevant information from applicant PDFs that feature semi-structured text (CVs, letters of recommendation, transcripts) for evaluation of applicant academic history, research, output, collaborations, and evaluations against current hiring trends.

  • Authenticating Hadiths
    Dr. Mairaj Syed, Department of Religious Studies
    Hadiths are reports attributed to the Prophet Muhammad or his students and other authorities in early Islam. There is a wealth of information within the collection of hadiths ranging from their dates and locations to the social network between the teachers and students. A large-scale analysis of these allows a qualitatively new approach and understanding. The DSI is developing software to automate the task of collecting and collating these hadiths and the different books in which they located.

  • Naturally Labeled
    Dr. Charlotte Biltekoff, Department of American Studies and Food Science & Technology
    What are the cultural politics of dietary health and the values and beliefs that shape American eating habits? Ther e is often a disconnect between the legal meaning of words as defined by the Food and Drug Administration and their use by labelers and understanding by consumers. We are using various modes of scraping and text mining (sentiment analysis, source tracking, topic modeling) of online public comments to FDA proposals and decisions to better understand the complex personal, corporate, official, and legal discourses surrounding the labeling of food.


  • Data Science for the Built Environment, NSF NRT-IGE Grant
    Debbie Niemeier, Annamaria Amenta, Jonathan Eisen, Duncan Temple Lang, Megan Welsh
    This National Science Foundation Research Traineeship (NRT) award in the Innovations in Graduate Education (IGE) Track to the University of California-Davis will pilot, test, and compare modes of datascience instruction. The testbed project will provide critical new information to inform the development of new learning platforms designed to cultivate robust computational, statistical, and data reasoning skills in engineering graduate students.
    The project will implement a hybrid short-course approach that 1) bridges existing code camps and semester long classes, and 2) is coupled with a formal user group experience. A robust evaluation will be conducted to identify the individual effects of code camps, short courses, and users groups, as well as the effect of participating in combinations of experiences. In addition, learning gains, self-efficacy to engage in interdisciplinary studies that require data science principles, and career trajectories (including decisions to take additional coursework in data science and decisions to pursue interdisciplinary research and employment involving data science) will be examined. The project will generate new knowledge that addresses a particularly important gap in knowledge in terms of whether intense short-term learning experiences result in longer-term retention of skill development and computational reasoning. Findings on effectiveness of different modes of data science instruction in engineering will be broadly applicable to all data-enabled science and engineering fields.