Saved Lists

Catalog Search

Books

Learning from code and non-code artifacts

Author / Creator: Henkel, Jordan Joseph, author
Available as: Online; Physical
Summary: Three things are fundamentally true about software: (i) every day that passes we, as a society, generate more software (more code, more documentation, and more software-related artifacts of all kin...

Three things are fundamentally true about software: (i) every day that passes we, as a society, generate more software (more code, more documentation, and more software-related artifacts of all kinds), (ii) it is easier to write new software than it is to understand and maintain existing software, and (iii) we depend on software in every area of our lives (from critical infrastructure to entertainment and everything in between). These three fundamental truths set the stage for one massive problem: if there is more software every day, and it is hard to understand and maintain, how can we ever "keep up" with this unbounded growth? How can we ever truly understand the software we depend on if we are adding to it every single day? In this thesis, we provide new ideas and tools that help with some of these issues. More specifically, this thesis takes the position that we need tools and techniques for understanding and learning from software. To do this, we consider software to be a composite of source code and other, non-code, artifacts (build scripts, documentation, etc.). We introduce techniques for working with both code and non-code artifacts; for code, we introduce a form of code embeddings (learned from a semantic representation of code: abstracted symbol traces); we then create a novel specification mining technique that uses these semantic code embeddings; additionally, we explore the robustness of models of code; and, to address non-code artifacts, we mine tree-association rules from Dockerfiles, from which we learn best practices; we take these learned best practices and create a human-in-the-loop technique for automated repair of Dockerfiles. Finally, to accelerate empirical research on software and lay a groundwork for a more comprehensive solution to trusting the growing amount of software we-as a society-create, we introduce code-book. code-book is a tool for interactively querying and analyzing code inspired by the great successes of the Data Science community. With code-book, we introduce a novel query-by-example-based query language for asking questions about code. Furthermore, we develop this query language so that users can ask questions that incorporate both code structure and "fuzzy" semantic constraints (based on code embeddings).

Citation

Details

Subjects

Dissertations, Academic -- Computer Science.

Additional Information

Check for Hathi data

Search

Additional Options

Website Search

Catalog Search

Database Search

Journal Search

Article Search

UW Digital Collections Search

Search the UW-Madison Libraries

Catalog Search

Learning from code and non-code artifacts

Details

Subjects

Additional Information

Additional Options

Chat with a Specific library

Catalog Search

Learning from code and non-code artifacts

Details

Physical Locations

Publication Details

Notes

Subjects

Additional Information

Library Staff Details

Keyboard Shortcuts

Available anywhere

Available in search results