Are you the publisher? Claim or contact us about this channel


Embed this content in your HTML

Search

Report adult content:

click to rate:

Account: (login)

More Channels


Channel Catalog


Channel Description:

»Data Carpentry« is non-profit organization that develops and provides data skills training to researchers.
    0 0
  • 11/20/17--16:00: Genomics Lesson Release
  • Thanks to a Herculean effort from our Maintainers, the long-awaited Data Carpentry Genomics lessons have now been published on Zenodo! These lessons were originally developed through a hackathon in 2014, and have gone through several major rounds of revision since then. We had an Issue Bonanza in August and a Bug BBQ in September, with contributions from dozens of community members.

    These contributions were critical to transforming the initial lesson drafts into a polished version, ready for teaching by any Carpentry instructor with a Genomics background. As with our previous lesson release, every contributor is an author, so thank you to all who have been involved!

    Extra thanks to Anita Schürch, Fotis Psomopoulos, Tracy Teal, and Amanda Charbonneau for their quick response time to pull requests and issues and their eagle-eyed proofreading. Also a special thank you to all of the Genomics Maintainers, who put in a huge effort to reorganize and clean up these lessons over the past several months.

    Overview of the Genomics Data Carpentry workshop
    This workshop teaches data management and analysis for genomics research including: best practices for organization of bioinformatics projects and data, use of command line utilities, use of command line tools to analyze sequence quality and perform variant calling, and connecting to and using cloud computing.

    Please note that the R Genomics Lesson has not yet been released, as the Maintainers for that lesson are working on a major overhaul and re-focusing of those materials. If you are interested in being involved with that re-design, please check out the repo.

    Thanks again to everyone for helping get these lessons ready for teaching! We’re looking forward to seeing many more Genomics workshops happening soon and hearing back from you all about your experiences teaching these materials.


    0 0
  • 11/20/17--16:00: Meet the Candidates
  • Eight nominations have come in so far for the 2018 Steering Committee of the new, merged Carpentries.

    The nominees so far are:

    There is still time to put your name forward. Nominations will close on 1 December.

    If you are not sure if you are eligible to stand for election, or to vote in the election, please check out this blog post which has all the logistics.


    0 0

    Data Carpentry workshop, 1-3 November, 2017

    Background/Introduction

    Conducting the first Data Carpentry workshop by members of NWUMafikeng campus and teaching my (Caroline Ajilogba) first workshop was a great task with support from NWU and our mentor, Anelda van der Walt. The support from the co-instructors Martin Dreyer (NWU Potchefstroom) and Amy Hodge (Stanford University), and from helper Bennett Kankuzi (NWU Mafikeng) was great. Other helpers from the R study group at Mafikeng’s NWU campus, like Olubukola Aremu and Ayansina Ayangbenro, were available to enhance the workshop.

    This workshop was planned for 2.5 days, with spreadsheets, OpenRefine, and the Intro to R on the first day; the remainder of the R lesson on the second day; and SQL on the last half day.

    The day before the workshop, Caroline and Bennett met to make sure nothing was left behind only to find out on the first day of the workshop that the venue had not been included on the workshop web site. Thanks to Martin and Amy who helped to salvage the situation and quickly updated the website, while Caroline sent e-mail to as many participants as she could who had sent messages to her requesting the venue.

    Another interesting issue was the attendance register which was not available immediately but was also sorted before the workshop started on the first day. I must really say thank you to the organizers especially in making the connecting plugs, the stickies and also the badges available. Thank you so much.

    The caterers were on time and participants were learning in a good atmosphere as some of them commented that it was a plus to come for a workshop and be taught and also given tea breaks and lunches.

    Ist November 2017

    The first day was great, as participants trooped in with enthusiasm and were being helped to settle down and have their software installed. Though the installation had its hitches here and there, I think that was great as we had to find solutions to how to handle the hitches and that for me was ‘learning’!!!

    We waited nearly an hour for everyone to arrive the first day so that we could do data downloads and OpenRefine installation together. This took quite some time, as there were many issues with the install, including quite a few who had trouble with the required Java installation. Because of this and a later fire alarm and power outage, we were very short on time for teaching on the first day.

    The instructors did a great job with the spreadsheet (Martin) and OpenRefine (Amy) lessons. Students later indicated they felt it was rushed, but when asked repeatedly if they had questions or wanted things repeated, they would say all was fine. We were supposed to cover the first part of R on the first day, but did not have time for that because of the delays.

    Because of these issues, it seemed the point of the spreadsheet and OpenRefine lessons and their importance to the overall workflow and subsequent use of R seemed to get lost on the learners. The modules make sense in this order, as it is the same as someone’s own workflow might be, but seems sometimes difficult to connect for the learners. More emphasis should be placed on making these connections. For example, from the worksheet where the raw data is can be better organized, to using OpenRefine where more cleaning up is done, to loading it into R where data is analyzed and other databases are used.

    Comments at the end of this day generally indicated that people were enjoying the workshop but that much of it was just too fast.

    2nd November 2017

    This morning was started with installation of R and RStudio. More of the students came with this software already installed, and many came early to do the installations, so that we were able to start right on time at 9am. Caroline started on a good note with R, but since we could not start off R on the first day, she thought she could be a bit fast after laying the foundation. When it was time to use data in R, she saw that for some it was fun, while others were trying to catch up. Since we planned to cover SQL on the third day, Caroline thought she had to go fast in order to cover everything in that one day. We also had technical difficulties on this day with the room technology, as the school’s desktop she was using kept shutting down and wiping everything so that she would have to reload RStudio, reinstall packages, create data frames, etc.

    Again, comments on the second day indicated enthusiasm for R, but that it was going too fast. We never had anyone say that we were going too slowly. At this point, we still had the ggplot lesson to do and had decided that we would continue with R on the last half day and skip the SQL lesson altogether. Caroline did an excellent job responding to comments from day 2 that she slow down, and day 3 seemed to finally hit the right pace for these students.

    3rd November 2017

    The third day was better than I expected as participants were still ready to work, though some had told me they were travelling early Friday morning and were not available. The workshop concluded with enthusiasm from participants about being part of the study group.

    We did not have any engagement from people on the etherpad at this workshop, and even some reluctance to use the sticky notes. Often only one or two people would put up a sticky but when helpers walked around, many more people actually needed and accepted help. Several students were doing well and consistently helping the learners sitting next to them.


    0 0
  • 11/27/17--16:00: My Favorite Tool: Rasterio
  • Rasterio is a Python spatial data library that has changed the way I work with large spatial datasets.

    Ever struggled to do calculations with big datasets in proprietary GIS software? Re-project your results to analyze relationships with other data?

    Rasterio makes manipulating gridded spatial data (rasters) simple and brings these data into the Python ecosystem.

    Want to do some preliminary analysis on a low-memory machine?

    Instead of reading a massive file, read it as windowed chunks.

    Need to create quick derivative products like directional gradients?

    Raster bands are read as numpy arrays so all your favorite numerical methods are available. Likewise, if you’re dealing with a set of time-referenced images, you can quickly load summary values into a Pandas dataframe for time series analysis.

    How the tool helps me in my work

    Many of us in the Earth sciences deal with large, co-registered spatial datasets.

    For example, changes in vegetation health at a volcano might be captured in a series of satellite images. This could be due to volcano degassing or a more benign environmental change. Direct information about volcanic activity– like gas emissions or earthquakes - is available in other grid formats or as point features. Meteorological data is in yet another raster format. Pre-processing data from multiple data sources can be time-consuming.

    The use of Rasterio (and other libraries like scikit-image, fiona, shapely) has greatly streamlined my workflow for loading, transforming, resampling, and correlating these kinds of data to detect and analyze changes.

    What I wish someone had told me when I first started learning this tool

    I’d say check out the convenience functions show and show_hist in rasterio.plot. They make visualizing multi-band imagery easy.

    And finally …

    Lots of nice features are being added; it’s in pretty active development.

    – Robert Sare / PhD student, earth sciences / Stanford, California, USA


    Have you got a favourite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest. You can read the background to these posts here, or see what other tools people have written about.


    0 0

    On 16 November 2017 at 15:00 UTC+0, the Lesson Infrastructure Subcommittee had their 2017 November meeting. This post will cover the topics discussed and their resolutions.

    Software Carpentry and Data Carpentry merger

    With the merger in 2018, some Git repositories will be owned by a new GitHub organization. The Instructor Training course material has already been moved, you can now find it at http://carpentries.github.io/instructor-training/. Date for the migration will be announced in 2018. Instructions for migrating the repository can be find here.

    Syntax Highlight

    Thanks to naught101, the next release of our lesson will offer syntax highlighting to our readers. Lesson maintainers might need help to change

    ~~~
    print("Hello World")
    ~~~
    {: .foobar}
    

    to

    ~~~
    print("Hello World")
    ~~~
    {: .language-foobar}
    

    for example. If you want to help, send a pull request to us.

    Exercises Throughout the Episodes

    After a small discussion, we reached the consensus that it will be better to have exercises throughout the episodes instead of all the exercises at the end of the episode. Lessons will migrate to the new format in a slow pace because this change requires a good amount of work.

    Non-English Lessons

    If you are involved with us since 2014, you might remember this post about the attempt to translate the lesson to Spanish and this other post announcing the lessons in Korean. During the meeting, we had a conversation about the workflow to translate lessons to other languages, and there is now interest and work on a translation.

    Some of the conversation was archived as issues here. If you want to get involved with the translation join the Latinoamerica email list or see the updates.

    Windows Installer

    In March 2018, a discussion about our recommended text editor created a lot of buzz on the mailing list. The email thread started because sometimes nano wasn’t installed on the learners’ machines. The new version of Git Bash will include nano by default and we have a pull request, thanks to Oliver Stueker, to adopt the new version in our workshop instructions. The pull request will be merged at the end of this year or beginning of 2018.

    Next steps

    Version 9.3.x lesson template and lesson documentation was released. Maintainers are working to release the new version of the lessons before the end of the year.

    The subcommittee will meet again in February to provide an update on some of the topics covered by this post and discuss new requests from the community.

    Acknowledgement

    Thanks to Alejandra Gonzalez Beltran, Christina Koch, David Pérez-Suárez, Erin Becker, Naupaka Zimmerman and Tracy Teal.


    0 0

    Along with fellow Software Carpenters Rayna Harris and Paula Martinez, I attended OpenCon 2017 held over the weekend of 11-13 November, 2017 in Berlin. The conference was held in the Harnack Haus in Dahlem, the home of the Max Planck Society, where the friendly ghosts of Einstein, Heisenberg and other stellar scientists smiled on our endeavours to promote open access, open education and open data.

    Harnack Haus, Dahlem

    This was a conference with a difference. Most conference goers were very new to this area of work so there was a strong learning aspect to all that unfolded over the three days. Many of the speakers had eye-opening stories to tell about education’s role in transforming lives, whether it be Kholoud Ajarma’s experiences of growing up in a Palestinian refugee camp or Aliya Ibraimova’s work with remote grazing communities in the Kyrgyz mountains.

    Rayna's tweet

    While the largest cohort (50) were from the US, 47 different countries were represented at OpenCon. Of the 186 listed in the attendance sheet, 132 attendees had GitHub accounts and even more used Twitter (160).

    Sessions were a mixture of plenary sessions and small group work. As an early icebreaker, we were put into groups called Story Circles, in which everyone had eight (uninterrupted) minutes to explain what had led them to apply for and attend OpenCon. The sheer diversity of backgrounds and experiences unearthed by this kind of session was astounding. Hearing Thomas Mboa describe teaching Nigerian students without having access to electricity certainly put some of my own workshop issues into perspective.

    My story circle

    Another eye-opener was the Diversity and Inclusion panel where uncomfortable questions about ‘whose knowledge?’, ‘who has access?’, and ‘who is missing from the discussion?’ put paid to the idea that ‘open’ is a universal, unquestionable good. Speakers from the global south stressed that making knowledge open can seem like a replay of having that knowledge stolen from them during the colonial period. And if ‘open’ does not welcome people of all genders, sexual orientation, color and other forms of diversity, then how ‘open’ it is really?

    The quality and clarity of OpenCon recordings mean that these sessions can easily be watched by anyone with an interest in what was said. Footage of the Diversity and Inclusion panel also includes the post-panel discussion.

    To help build more local action post-conference, people could opt to work with groups from their own region. Since I was the only Australian there, I chose to work with an Asian group, and helped people from Armenia and Taiwan create ‘empathy maps’ to try to understand the concerns of researchers in their region who might want to work ‘open’ but who face formidable barriers, not least the kinds of behaviours outlined by Laurent Gatto’s and Corina Logan’s ‘Bullied by Bad Science’ campaign.

    The final day of OpenCon was a Do-a-Thon - what I would call a sprint or hackathon. For this day, Rayna and Paula marshalled a team from Chile, Argentina and other Spanish-speaking countries to work on the Spanish translations of Carpentry lessons.

    Spanish translation Do-A-Thon

    This was certainly a one-of-a-kind conference and for those who missed it, session recordings are available online, courtesy of the Right to Research Coalition. The conference was phenomenally well-organised, with terrific food, and people could opt to join Dine-Arounds to ensure that no one had to eat dinner all alone in a strange city. I was very interested in the organization of the conference as I was hoping to get many tips I could use to make next year’s CarpentryCon in Dublin a similar success.

    The conference’s leading sponsor was the Max Planck Gesellschaft (Max Planck Society), and the conference was jointly organised by SPARC (the Scholarly Publishing and Academic Resources Coalition, and the Right to Research Coalition. A number of other organisations and foundations were supporting sponsors.

    A floor tile at Harnack Haus was inset with Einstein’s signature - you don’t see that every day.

    Albert Einstein signature


    0 0

    The Data Carpentry community has published two full workshop curricula in 2017, both targeted towards researchers in the life sciences. We had our first lesson release for our Ecology lessons in May, followed by the release of the Genomics workshop materials in November.

    Data Carpentry lessons are domain-specific, and targeted towards helping researchers in particular domains gain the skills they need to conduct their research efficiently and reproducibly. We’re excited to broaden our reach to researchers outside of the life sciences starting in mid-2018 with the release of curricula for working with Geospatial and Social Sciences data.

    Carpentry lessons are developed by and for the community. Lend a hand in developing these materials and preparing them for publication and teaching. There are many ways to get involved, ranging from helping edit the existing lesson drafts, to running pilot workshops, to serving as a Maintainer for the completed lessons. We are also setting up a Curriculum Advisory Committees for the two sets of lessons. Members of these Committees will help ensure these lessons stay up-to-date and continue to serve the needs of our learners.

    How you can contribute

    We are asking anyone interested in helping now (or in the future) to fill out a brief form before December 20th so that we can organize the effort:

    Contribution Form for Geospatial Lessons

    Contribution Form for Social Sciences Lessons

    If you don’t get a chance to fill out the form by December 20th, but still want to be involved, please get in touch with Erin Becker (ebecker@carpentries.org).

    While experience in geospatial or social sciences research, and experience with the Carpentry community are a plus, there are many ways to contribute even if you don’t have this background. Please circulate this link and post to others who might be interested. We be following up near the end of January 2018 to organize everyone and provide more information.

    Thanks to everyone who is working to move these lessons to the next stage!


    0 0

    The Assessment Network was established as a space for those working on assessment within the open source/research computing community to collaborate and share resources. During our quarterly meeting in November, we engaged one another in a conversation revolving around data science education. This meeting was organized and hosted online by Kari Jordan, and six community members attended.

    First, we discussed the definitions of data scientist, data analyst, and data engineer; second, we worked in pairs on a set of questions about assessing data science education.

    The session was exciting and fruitful, as it combined two topical efforts: on one hand, our organization’s focus on assessment and, on the other hand, our contribution to the global effort in defining, understanding, and shaping the rising field of data science.

    Kari Jordan attended a meeting of collaborators from industry, academia, and the non-profit sector to brainstorm the challenges and vision for keeping data science broad. During that meeting, a brainstorming session took place where attendees were asked to come up with core competencies for data science. This was difficult, as each sector identified competencies important for their particular interest. Kari thought it would be a good idea to talk about it with the assessment network.

    What is Data Science?

    So, what is data science? What are the core competencies? For a positive definition, we turn to the seminal “Data Science Venn Diagram” by Drew Conway, as reproduced by Jake VanderPlas in the preface of his Python Data Science Handbook. Data science lies at the intersection of statistics, computer science, and domain expertise (in industry-friendly terms, or traditional research, in academic terms). Data science is cross-disciplinary by definition. Hardly anyone gets formal training in all three areas. Most working data scientists are self-taught to a certain extent. Basically, it takes a growth mindset to be a data scientist!

    For a negative definition (in logician’s terms, i.e., what data science is not), we turn to industry job descriptions. It turns out that Marianne Corvellec served on a panel dedicated to the definition of these emerging occupations. This panel was held in 2016 with Québec’s Sectoral Committee for the ICT Workforce. It brought together industry professionals and HR specialists who would frame the discussion, and resulted in this report (in French; note that “architecte de(s) données” == data engineer and “scientifique de(s) données” == data scientist).

    This report is in line with academic sources (e.g., data science curricula at U.S. universities), insofar as a data scientist is not a data engineer. A data engineer takes care of data storage and warehousing; s/he builds, tests, and maintains a data pipeline, which integrates diverse data, transforms, cleans, and structures them. S/he masters big data technologies, such as Apache Hadoop, Apache Spark, and Amazon S3. Data engineers ensure the data are available (and in good shape) for data scientists to work with.

    What is a Data Scientist?

    More subtly, a data scientist is more than a data analyst. It takes an aptitude for collecting, organizing, and curating data, as well as for thinking analytically. A strong quantitative background is useful but not necessary. Principles and practices from the social sciences or digital humanities are valuable assets; data scientists should be good writers, good storytellers, and good communicators. Perhaps surprisingly, attention to detail is not a key item to include in a data scientist’s skillset; ability to grasp the big picture is much more key, as data scientists will find themselves working at the interface of very different departments or fields (in an industry context, these could be engineering, marketing, or business intelligence).

    A data scientist does not master any specific technology to perfection, since s/he dabbles in everything! Unlike the traditional data (or business intelligence) analyst, s/he resorts to several different frameworks and programming languages (as opposed to a given domain-specific platform) in order to leverage data. Plus, the data scientist typically works with datasets coming from multiple sources (as opposed to the traditional data analyst who usually works with a single data source already populated by an ETL solution). Data scientists are flexible with their tools and approaches.

    Challenges Assessing Data Science Education

    In the second part of the meeting, we split into breakout pairs to discuss the challenges of assessing data science education with respect to Carpentries’ workshops. Brainstorming in parallel lets us cover more ground (breadth), while interacting one-on-one lets us explore different avenues (depth).

    One pair focused on the industry perspective, another on the education system, and the third on assessment practices. Kari offered a list of questions to frame the discussion.

    Working groups identified challenges for assessing data science education at the object level (i.e., what should this assessment consist of?) and at the meta level (i.e., what favors or hinders the application of assessment?).

    At the meta level, the following prompts were discussed (pulled from South Big Data Hub’s Data Divide workshop):

    • Vision for Assessing Data Science Education
    • Stakeholders for Data Science Education
    • What specific skills or resources are most important/lacking to address this challenge?
    • How do our challenges fit into the national landscape?
    • What is the broader impact of addressing our challenges?

    Check out the notes from our working groups to see what we came up with!

    Now is your chance to tell us what you think. We opened several issues on the Carpentries assessment repo. We’d love to engage you in a rich discussion around this topic. Comment on an issue, and tweet us your thoughts using the hashtag #carpentriesassessment.


    0 0
  • 12/10/17--16:00: When Do Workshops Work?
  • Author: Karen R. Word

    Contributors: Kari Jordan, Erin Becker, Jason Williams, Pamela Reynolds, Amy Hodge, Maxim Belkin, Ben Marwick, and Tracy Teal.

    “Null effects of boot camps and short-format training for PhD students in life sciences” is the provocative title of a recent article in the Proceedings of the National Academy of Sciences. Those of us who enthusiastically design and deliver short-format training promptly took note, then scratched our heads a bit. We waited a little for a response, wondering if one or more of the programs that participated in the study might step up to their own defense. Nothing happened. We thought about letting it go - we’ve got our own programs, with distinct goals, and our own assessment data, so maybe this broad-brush study isn’t so important. But … it keeps being raised. Someone will bring it up here and there, asking what we think about it. Whenever this paper comes up in conversation, its title certainly throws some weight around.

    So, do workshops work? However certain we may be about the value of our own programs, it seems important to have a little sit-down with this paper and talk about what it means to us, what it doesn’t mean and, most importantly, what it does not address at all: the question of what you can do with a short course [1] when a short course is all you’ve got.

    The premise: Spacing instruction over time is better for learning

    When given a choice between teaching a two-day short course versus stretching those same hours and content across several weeks of repeated meetings, you can expect to get a lot more learning out of the longer course. This point, described as a core premise for the PNAS study, is essentially irreproachable. There is abundant evidence that distributing instruction over time maximizes learning in comparison with the “massed practice” that occurs when teaching is concentrated into an intensive short-format course.

    The problem: Spacing instruction over time is often impractical

    Traditional courses match students and faculty on a spaced schedule over a quarter or semester time period. When this format is possible, it should be pursued and optimized, not replaced with short courses.

    But when isn’t it possible?

    When there aren’t enough instructors. If expertise in an area is scarce, the time demand for distributed training often exceeds the FTEs available to meet that need. Until that shortage can be remedied, a large number of people are left to self-teach or go without. Under these circumstances, short-format workshops are often the only practical way to deliver training to the many more who need it. This is currently the situation with regard to training in data management and analysis, and in many cases, with foundational computing skills as well.

    When learners don’t have time. A similar scenario emerges when those in need of training are fully committed to jobs or research or are otherwise unavailable for a time-distributed course. This is the case for most professional-development training. Even within academia, researchers may need training right away and can’t wait for the next semester-long course offering.

    When opportunity knocks. Even within graduate school, where long-format courses are the norm, some opportunities are concentrated in time. For example, a short course may be able to attract many faculty simultaneously, allowing students to observe them engaging with and learning from each other. Some research experiences or team-building activities may also be possible only on a concentrated schedule. Also where traditional course curricula can be slow to change, short-courses can permit rapid inclusion of new and needed skills before they can be added elsewhere.

    When a little goes a long way. In many of these cases, particularly when training is truly necessary for progress, learners are already engaged in self-teaching, and conveying a large quantity of knowledge may not be as important as providing a boost of confidence and a guide to best-practices as they proceed. Embracing the limitations on learning and leveraging the flexibility and low-stakes of a workshop setting might actually confer an advantage in these areas.

    For those of us who work within the short course mandate, then, the question becomes: how can we optimize that format to best meet learners’ needs? When setting goals for impact, we tend to think in terms of how much and what type of impact we can have, and to focus our efforts accordingly.

    One reason why the paper by Feldon et al. raises concern within our community is because it frames the question as “whether”. And if the answer to “whether” we can have an impact with a short course is “no”, then we’ve clearly got a problem on our hands. However, in our experience, that simply is not the case. To the contrary, our evidence suggests that there is quite a lot you can accomplish with a workshop when you accept its constraints, focus on specific goals, and leverage the strengths of this format. In the next section, we’ll take a look at the study described in the paper, evaluate its claims, and examine its relevance to the kind of training we provide. Then we’ll circle back around to our goals, our strategies, and the kind of data that we collect to assess and inform the development of our workshops.

    The study

    There is a lot to love in this work! This was not a simple survey study. They graded papers – multiple times, with validation, for 294 students from 53 institutions. They also repeatedly administered tests and surveys over the course of two years. The dataset must be impressive; we assume there is a LOT of other interesting stuff there that relates to graduate student development and correlates of early success. However, it is hard to know since the data are not publicly available or displayed in the paper. We’re eager to see more publications and perhaps more extensively summarized data come out of this project in the future.

    That being said, in discussion with our community members, several persistent questions and concerns emerged. These are a few of the most pertinent questions:

    1. How diverse are the program goals? This study lumps together an unknown number of programs administered at the outset of life-science PhD programs as a single treatment. We know only that 53 institutions were sampled and that, of the 294 students in the study, 48 were short-course “participants”. According to Feldon et al., the unifying goal of these programs is to “accelerate the development of doctoral students’ research skills and acculturation”, with emphasis on research design, statistics, writing, and socialization. However, specific emphasis seems likely to vary, and herein lies the concern most frequently voiced in our community: any given program might focus its efforts on any or all of the components identified (research, statistics, writing, or socialization). Indeed, the more astutely a program identifies and engages with short-format limitations, the more focused their program may be. By surveying students across 53 different institutions, it seems highly likely that the specific aims of different programs are heading in different directions. If some programs are particularly good at socializing students and preparing them to cope with the hurdles ahead, while others emphasize grant writing, otherwise ‘significant’ impacts within a sub-group of similar programs are likely to be lost when combined and assessed with the group overall. This is particularly clear if we consider the sample size of 48 students as being further split (e.g. 10, 10, 15, 13) by distinct program emphases. Lumping together successful programs with different aims is likely to show that all are ineffective in each category.

    2. How generalizable is this context? The public reading of these findings seems to be, “Too bad short courses don’t work”. However, pre-PhD short-courses are a highly specific and unusual context for a short course. In most other cases, short courses arise out of necessity or unique opportunity, such that there is no subsequent distributed content that re-teaches or even remotely overlaps with the content taught in the short course. In pre-PhD programs, specifically, any effects are potentially in direct competition with gains made via traditional course content. The extent to which the same or overlapping content is otherwise available in each program is also unclear. The authors of this paper might not have intended their work to generalize to other contexts, but the tendency of readers to generalize makes this question a vital one. Benefits of a short course are easily lost in a sea of positive outcomes resulting from graduate training, but that has little bearing on the impact such courses may have when they stand alone.

    3. Is this the right experiment to test graduate student outcomes? While we found the methods to be impressive and worthwhile in many respects, several people expressed concern about the two-year assessment regime. This included questions as to whether a graduate student is likely to have matured and, particularly, to have written substantively in their content area within the first two years of study, as well as whether a regime of continuous surveys might itself have a sizeable impact on student development. As with any study that takes volunteers, willingness to participate – both in the short course programs and in the study itself – may bias toward more motivated or engaged students overall, and this could have an impact on the interpretation of the results. These are the sorts of problems that plague any effort at assessing students at scale, and are worth noting only as a standard “grain of salt” with which any study should be (but is not always) considered when it stands alone.

    4. How do we go about making short courses more successful? This paper provides no means of evaluating variation between programs, which is really where our interests lie. This is not a criticism: it is simply not the purpose of the paper. It is the next question, the natural response to such results: if these programs really aren’t making a difference, how might we capture the opportunity, with existing funded and institutionally invested programs, to change that? Is it that short course workshops have no impact on anything, or that we need to better understand and plan for what they can accomplish?

    We have a few suggestions.

    What We Do

    Software and Data Carpentry offer short-course training for academics and professional researchers in software and data management skills. Many of our affiliates, who have also contributed to this response, offer other short courses in related subjects. We are all driven to the short-course format out of necessity. We recognize that this format places severe constraints on the quantity of information that can successfully be conveyed, but we design our curriculum and train our instructors specifically to maximize our effectiveness in this format. Here’s how we do it:

    Streamline content. We aim to teach only the most immediately useful skills that can be taught and learned quickly. We teach our instructors to resist the urge to “get through everything” or pack extra details into their explanations.

    Teach strategically. We keep learners active by using live coding (in which learners work through lessons along with the instructor) and frequent formative assessment. We teach instructors to be mindful of the limitations of short-term memory and to focus instruction and assessments to minimize cognitive load.

    Meet learners where they are. Our workshops attract a diverse population of learners, from novices to experienced IT personnel. Our learners use colored sticky notes to indicate when they are stuck. We teach instructors how to use this to adjust their pacing. We also recruit workshop “helpers” who can directly coach learners who may be struggling. The absence of performance-based grades gives us added flexibility to meet diverse needs by generating diverse learning outcomes. Some may learn about the “big picture” of a new programming language by completing a lesson, while others may come away having added “tips and tricks” to their existing skills. This is one area in which workshops may have an advantage over traditional courses, particularly when it comes to confidence- and motivation-based outcomes.

    Normalize error and demonstrate recovery. We know and expect that our learners will acquire the bulk of their skill independently. Willingness to make mistakes and awareness of problem-solving strategies are far more crucial to their success than any particular content. We coach our instructors to embrace and even delight in their own errors as an opportunity to model healthy and effective responses.

    Explicitly address motivation and self efficacy. One substantial advantage that we have is that our learners attend our workshops because they are motivated to learn precisely what we teach. However, preserving and nurturing that motivation is crucial. Perseverance results not only from embracing error as normal, but also from learners’ personal belief in their ability to succeed. Creating a workshop in which learners can be successful in both learning and in demonstrating to themselves that they have learned is one piece of this. We spend a good deal of time discussing motivation with our instructors. We explain why saying “it’s easy, anyone can do it” is often demotivating. We explore the differences between novice and expert perspectives and coach instructors to be mindful of and to respect the novice experience. We teach instructors to foster a growth mindset in their language and learner interactions. We emphasize that a relaxed, welcoming, and positive workshop experience is one of the most important things we can provide.

    Build community. The more people at all levels are able to share what they know, the more efficiently we can distribute knowledge. As a volunteer organization, we have a strong community of instructors, lesson maintainers, and others. As learners progress, they often become involved in this community. In the long range, we hope to create a community that can provide widespread support directly to learners.

    What we know about our impact

    We have conducted both short-term and long-term follow-up assessments of learners. Data Carpentry post-workshop survey results have always been positive and 85% of learners report that they agree that they would recommend our workshops to a colleague. The Carpentries’ Long-Term Impact survey (n = 530) is designed to determine whether this positive experience and self-reported increase in confidence affects long term outcomes. This survey (full report here) measured self-reported behaviors around good data management practices, change in confidence in open source tools, and other specific program goals. It also explored other ways the workshop may have impacted learners, such as improved research productivity. While Feldon et al. rightly critique self-assessment with regard to performance metrics, many of our target outcomes are more conducive to self-evaluation, e.g. confidence, motivation, and daily work habits. Researchers report increased daily programming usage after attending our two-day coding workshops, and sixty-five percent of respondents report higher confidence in working with data and open source tools as a result of completing the workshop. Our long-term assessment data shows a decline in the percentage of respondents that ‘have not been using these tools’ (-11.1%), and an increase in the percentage of those who now use the tools on daily basis (14.5%). Additional highlights from our long-term survey report include:

    • 77% of respondents reported being more confident in the tools that were covered during their workshop compared to before the workshop.
    • 54% of respondents have made their analyses more reproducible as a result of completing a workshop.
    • 65% of respondents have gained confidence in working with data as a result of completing a workshop.
    • 74% of respondents have recommended our workshops to a friend or colleague.

    We see that short-format workshops can be effective at increasing researchers’ confidence, use of coding skills, and adoption of reproducible research perspectives. As a part of the Open Source community, we make all of our survey data and analysis code available in our assessment repository. We welcome people to work with our survey data and ask new questions. Understanding impact is important, and we will continue to keep our community informed with regular releases of survey data and reports. We also have a virtual assessment network which newcomers are welcome to be part of. Please join here if you are interested in discussing assessment efforts in the area of training in research computing.

    In Closing …

    Our data suggest that we are having a positive impact, and we expect that other short-format programs can be similarly effective. However, this likely requires a focused effort on optimizing within the limitations of a short course, along with clear goals and targeted assessment to demonstrate such efficacy. It is not clear that this was the case for any of the programs surveyed by Feldon et al. , and if it was, it is not clear to us that any such specific and variable successes would be discernable in their study. We agree, however, that under most circumstances, particularly where a large quantity of content needs to be taught, a short-format course should not be favored over any available time-distributed alternative.

    We applaud, encourage, and endeavor to support those who have the access and opportunity to conduct long-format training in the subjects we teach. Many members of our community are actively involved in traditional undergraduate and graduate instruction of this kind. Traditional training opportunities will begin to catch up with demand for training in data science generally, but there will always be limitations - concepts or tools that don’t clearly fit into curriculum or new approaches that haven’t yet had a chance to be incorporated. We work on training in these gaps through short courses. It is necessary for us to be as effective as possible to achieve that mission.

    So far, we feel comfortable declaring that effort a success.


    [1] While the paper refers to programs as either “boot camps”, “bridge programs”, or “short-format training”,
    it has been brought to our attention that this usage of “boot camp” can cause some consternation for those with military training or under military regimes. We will therefore use the less-vivid but more-accurate “short course” label for this piece.


    0 0

    Voting in the election for community governance of the Carpentries (Executive Council, formerly named Steering Committee or Board of Directors) closed last week. Out of the 501 members eligible for voting, 147 ballots were cast (29% turnout).

    We are pleased to announce the four newly elected members of the Executive Council:

    Raniere and Lex received the highest number of votes and will serve two year terms; Amy and Elizabeth will serve one year terms.

    These four elected members will join the five appointed Council members selected from the current leadership of Software Carpentry and Data Carpentry:

    • Karen Cranston is a computational biologist at Agriculture and Agri-Food Canada working on digitisation and integration of biodiversity data. She was the lead PI of the Open Tree of Life phylogeny synthesis project, and serves on the board of the Open Bioinformatics Foundation (OBF). She has been involved with Software Carpentry since 2012, was a founding board member of Data Carpentry, and is a certified instructor trainer.

    • Kate Hertweck is an Assistant Professor at the University of Texas at Tyler. Her research and teaching focuses on bioinformatics and genomics. She completed Instructor Training in fall 2014, served on the Mentoring Subcommittee in 2015, and was elected to the Software Carpentry Steering Committee in 2016 and 2017, also serving as Chair in 2017.

    • Mateusz Kuzak is Scientific Community Manager at the Dutch Tech Center for Life Sciences. He has background in bioinformatics live cell imaging and research software engineering, and is passionate about Open Source, Open Science and Reproducible Research. He is currently working on training activities and coordinating life science data and technology projects in the Netherlands. Mateusz is an Instructor Trainer and was elected to the 2017 Software Carpentry Steering Committee.

    • Sue McClatchy is a bioinformatician and research program manager at the Jackson Laboratory. She provides research training at all academic levels from high school to faculty. She mentors students and develops training materials for analysis of quantitative and high-throughput data. Her expertise in curriculum design and instruction stems from an eight-year science teaching career in schools in the U.S. and Latin America. Sue is an Instructor Trainer and was elected to the 2017 Software Carpentry Steering Committee.

    • Ethan White is an Associate Professor at the University of Florida working on computational and data-intensive ecology. He is a Moore Foundation Investigator in Data Driven Discovery and serves on the board of directors of Impactstory. He has been involved in Software Carpentry since 2009, was a founding member of the Data Carpentry steering committee, wrote the first version of the Data Carpentry Ecology SQL material, and leads the development of the semester long Data Carpentry course for biologists.

    Many thanks to all candidates who chose to stand for election. The voting was very close, which reflects the commitment you all show towards service to our community. We are fortunate to have such awesome leaders representing diverse education, careers, and geography. We look forward to continuing to work with you in the Carpentries community, and hope you will consider pursuing other opportunities for leadership.

    Also thanks to the outgoing steering committee members:

    • Software Carpentry: Rayna Harris, Christina Koch, Karin Lagesen
    • Data Carpentry: Hilmar Lapp, Aleksandra Pawlik, Karthik Ram

    Finally, thanks to all of you across the Carpentries for your continued participation and engagement!