Workshop in Brussels, 2-3 Nov 2015

November 24, 2015, 4:00 pm

Belgium has a flourishing biological research community and skills in data handling have become an essential tool for anyone conducting biodiversity science. The workshop was jointly organised by the VIB and the Botanic Garden Meise as part of a series of “Empowering Biodiversity workshops” supported by the Belgian Science Policy Office.

The thirty-five attendees of the workshop came from a wide variety of backgrounds in biology, including marine biology, bioinformatic, plant biology and ecology. They were also at many different points in their career, some were students wanting to gain experience, while others were established scientists looking to learn new tricks. Nevertheless, this diversity was not a problem for the workshop as everyone came with enthusiasm and an interest in improving their knowledge. We were instructed by Leszek Tarkowski and Frederik Coppens with support from Dima Fishman and Christof De Bo.

The workshop was graciously hosted by the Belgian Science Policy Office, for which we are very thankful, particularly as one of the days was a holiday for federal employees. Their location in central Brussels and their well-equipped meeting rooms made the logistics much easier.

The Timetable of the workshop was as follows:

Day 1 AM:	Working with spreadsheets and introduction to Open Refine (Leszek)
Day 1 PM:	Data manipulation in R (Frederik)
Day 2 AM:	Data visualisation in R (Frederik)
Day 2 PM:	Managing data with SQL (Leszek)

Conclusions

The feedback we received after the workshop was very positive. Participants were happy with the range of skills they learned and the quality of the teaching. As one of the students wrote “Broad, good introduction to a lot of data management software Helpful teachers”. Given the diverse backgrounds of the participants some people found the pace either too slow or too fast, but it sees that there was, in general, a good match between the expectations of participants and the positioning of the curriculum.

Still - there is a room for improvements. The problem with diversity of participants is hard to answer - maybe in the future organizers can provide two rooms, and two sets of instructors, for better fitting into attendees capabilities. Data Carpentry lessons materials needs some polishing, to match the quality of Software Carpentry materials.

We would like to thank all the teachers, helpers, funders and organizers who made this event successful and we look forward to all the great data science that will be encouraged by this workshop.

↧

Hiring an Associate Director

November 29, 2015, 4:00 pm

≫ Next: Data Carpentry to adopt Reproducible Research Curriculum

≪ Previous: Workshop in Brussels, 2-3 Nov 2015

Data Carpentry is hiring! http://www.datacarpentry.org/jobs/

With the support of the Gordon and Betty Moore Foundation, we now have the opportunity to hire an Associate Director. The Associate Director is one of the two key roles providing leadership to Data Carpentry’s core efforts and is expected to shape the organization’s operational functioning, influence training, and contribute to strategic planning. The main focus of the Associate Director’s role will be in community engagement and education as well as overseeing communications.

Mainly though, we’re looking for someone who is passionate about education and training and its ability to empower researchers to conduct data-driven research. We expect applicants to come from a wide range of backgrounds and disciplines and encourage all applications. If you have any questions about the posting, please don’t hesitate to contact us at jobs@datacarpentry.org. We will begin reviewing applications December 18, 2015 and the posting will remain open until filled.

Update: We are no longer accepting applications for this position

For full details and application information, please see the posting:
http://www.datacarpentry.org/jobs/

↧

Data Carpentry to adopt Reproducible Research Curriculum

December 15, 2015, 4:00 pm

≫ Next: Starting off Data Carpentry in 2016

≪ Previous: Hiring an Associate Director

by Karen Cranston

Part of the mission of Data Carpentry is to encourage and enable reproducible research. The core Data Carpentry curriculum teaches researchers approaches and skills that are fundamental to reproducible research, such as scripting and data management. We are also adopting a Reproducible Research curriculum that explicitly focuses on reproducible techniques and some of the next steps, including version control and data publishing. This is an update on the efforts on this curriculum so far, and we expect to have it available soon as a Data Carpentry workshop option.

Reproducible Research Hackathon, Take 2

hashtag: #rrhack
link: hackathon repo

In fall 2014, NESCent held an initial hackathon to develop a set of materials for teaching reproducible research to computational scientists. Participants in the hackathon then taught three #rrhack workshops to ~~unsuspecting guinea pigs~~ students, postdocs, faculty and staff at Duke University, iDigBio / University of Florida and the Duke Marine Lab. A year after the initial hackathon, we re-convened people from the first event, along with a few locals from the University of Florida. This second hackathon aimed to expand / update the lessons based on feedback from the first three workshops.

Here is a summary of the feedback from our first three workshops:

Participants didn’t need to be convinced about the need for reproducibility. They could see the problems in their own work, and wanted concrete solutions. The lessons from the initial hackathon spent a fair bit of time motivating reproducibility, and it turns out we need less of that.
More participants than expected had some experience with programming / scripting, whether in R or python. This was true across all three workshops. Compare to Software or Data Carpentry, where we expect a significant fraction of participants to have little to no prior experience with computational science. It was still a challenge to teach literate programming without actually teaching programming.
Participants really wanted to know version control! We did not include a version control lesson in the first workshop, but our feedback cards after each session included requests for VC, so we added a short demo at the end of the second day. We experimented with a Software Carpentry style git lesson at the Florida workshop and a GitHub-only lesson at the Duke Marine Lab.
Lessons needed clear goals, and more exercises / less presentation.

Based on this feedback, we made the following changes to the workshop materials:

better overview of the workshop for potential organizers and instructors
created a workshop template and template repo for lesson maintainers / creators
improved the instructor notes for all lessons
made sure each lesson had clear goals
created new version control lessons for git-in-github and git-in-rstudio
new exercises for project organization
converting slides to common, text-based formats

A note on R-vs-python: The materials currently use R + RStudio + knitr for all examples, and we intially had ‘translate-the-lessons-to-python’ on the agenda for this meeting. After some discussion, we decided that our time in Gainesville was better spent revising the existing lessons. This gives us a high-quality set of R-based lessons ready for teaching. But, pythonistas should not despair! We are already planning another event focused on developing a parallel set of materials on reproducibility using python (likely using Jupyter notebooks).

Interested in using this material? Go ahead! We’ve put it in the public domain under a CC0 waiver. Have questions? Each lesson has contact information in the README, or you can contact Hilmar Lapp for general questions about the #rrhack project.

↧

Starting off Data Carpentry in 2016

January 24, 2016, 4:00 pm

≫ Next: Software and Data Carpentry Instructor Training Comes to Africa

≪ Previous: Data Carpentry to adopt Reproducible Research Curriculum

In 2015 we were just starting out as an organization and ran limited workshops as we were working to scale. With many Software Carpentry instructors now certified as Data Carpentry instructors, our Program Coordinator, Maneesha Sane, coordinating workshops, a new website and an instructor database, in 2016 we’re set to do a lot more!

The website was a bit behind our 2016 activities, so we’d like to highlight some workshops that have happened already.

Workshops

Already we have run 4 workshops. We were very excited to start off teaching at two great organizations on the same dates, USDA-ARS and London Natural History Museum. A great thanks to O.P. Perara at USDA-ARS for hosting that workshop and John Moreau and Zhuo Fu for instructing. A full set of instructors ran the Natural History Museum workhshop, and thanks to Ahmad Alam, Martin Callaghan, Malcolm Penn and Consuelo Sendino.

Two other workshops were the last few pilots of the Genomics lessons, to finish preparing them for prime time teaching this year. Many people from the Genomics and Assessment Hackathon and others have put in an incredible amount of work to get this workshop version together, and we’ll have a full post on these lessons soon. Jason Williams, Sheldon McKay and Matthew Aiello-Lammens taught one at Stonybrook University and Amanda Charbonneau and Will Pitchers taught a one day segment at BEACON at Michigan State.

We’re just starting to schedule more for the months ahead. You can see upcoming workshops on our Upcoming Workshops page. If you don’t see a workshop in your region and are interested in having one, become a host and request a workshop! Or contact us for more information.

↧

Software and Data Carpentry Instructor Training Comes to Africa

March 17, 2016, 5:00 pm

≫ Next: Hello, Spatio-temporal Data Carpentry

≪ Previous: Starting off Data Carpentry in 2016

by Anelda van der Walt

North-West University eResearch, UCT eResearch, and Talarify are excited to announce that a Software & Data Carpentry Instructor Training event will take place in Potchefstroom, North-West Province, South Africa from 17 - 20 April 2016.

The lead trainer will be Aleksandra Pawlik, a Data Carpentry Steering Committee member. She will be joined by several of the more experienced South African instructors who provide additional support working with the trainee instructors.

In line with the approach taken previously by Belinda Weaver to help new instructors through the pipeline, this workshop will form part of a larger 12-month programme to help new instructors truly integrate into the community. The programme will include supporting instructor trainees to: complete the training after the workshop; run their first workshop at their home institution; and set up and run a user group or Mozilla Science Study Group to support participants from their workshop after the event

In 2017 the hosts, led by Anelda van der Walt, of the Instructor Training in Potchefstroom, aim to bring newly qualified instructors as well as the two or three most active community members from their study groups together again to share experiences and develop proposals for future initiatives.

The instructor training workshop will run over two and a half days. The last day will be used to introduce the concept of user groups and communities and expose participants to the Mozilla Science Lab Study Group Handbook and other useful resources that could be used to help set up and run these community events. The workshop will also include feedback from Maia Lesosky who started the Cape R User Group and members of the NWU Genomics Hacky Hour Study Group to provide real life anecdotes.

To ensure a transparent process is followed for selection of candidates the hosts have developed a rubric which will be used to score applications based on requirements set out in the original advertisement. An independent selection committee consisting of two international Software/Data Carpentry community members and four South African instructors will score the candidates. We hope to attract at least 50% women and other gender participants for the event.

For more information about the workshop please visit the NWU eResearch website.

If you’d like to learn more about the extended 12 month programme, please contact Anelda van der Walt

↧

Hello, Spatio-temporal Data Carpentry

March 27, 2016, 5:00 pm

≫ Next: Hiring a Deputy Director of Assessment

≪ Previous: Software and Data Carpentry Instructor Training Comes to Africa

By: Leah Wasser
NEON Supervising Scientist, Education & Public Engagement
@leahawasser

I am sitting here in my hotel room in Oslo, Norway, floating on a high from the past few days of Carpentry workshops. We taught the NEON / Data Carpentry spatio-temporal Carpentry lessons for the first time this week at the University of Oslo in Norway. These lessons are the result of unique a collaboration between the National Ecological Observatory Network (NEON) and Data Carpentry. I thought I’d take some time to share my experiences with creating and teaching the lessons, while letting the community know that this great new resource is available for both workshops and self-paced learning!

About the Data

You may be surprised by this, but Data Carpentry workshops are about the data. I know. You are thinking, “no way”.

For this workshop, we pulled together a dataset that is optimal for following a pre-determined spatio-temporal data story. Our intended learner is interested in exploring the science theme of phenology over several study sites, using a suite of heterogeneous data, including my favorite type of data: remote sensing! The data required to explore phenology across sites include:

Micro-meterology data derived from high frequency sensors of temperature, precipitation, PAR, etc. (text time series format).
Landsat Remote Sensing derived time-series data (raster format).
Vector data for the creation of study area base maps and for extracting descriptive statistics from the remote sensing data.
Lidar raster data used to characterize vegetation at the study sites.

This was a cool set of data to pull together - it has so much lesson-building potential goodness packed into it. But assembling it was no trivial task: it took time to find, collate, organize, clean and subset everything. Some things we considered included:

Data that allowed for both spatial and temporal analysis.
Data that could be cleaned, adjusted, and manipulated to illustrate potential roadblocks during the workshop, and that we could work through together as a group. For instance, data that are in different coordinate reference systems that will cause problems for learners when plotting and analyzing values.
Data that were small enough to be efficient in a workshop, but large enough to demonstrate real world applied issues.
Data that allowed for self-paced challenge activities where the learner practices skills taught in the workshop. In our case, this manifested in a parallel dataset for another field site which also allowed for some cool comparisons.
Data that were heterogeneous enough to simulate real world experiences while being similar enough to focus learning/data literacy concepts.
Data that were freely available.

Teaching Nuggets - Data Management / Organization Best Practices

The teaching data subset described above is saved on figshare to provide version control as we append and modify the subset over time. The structure of the data as saved demonstrates data management and organization practices which we pointed out as we taught, such as:

Organizing data into folders with years and site locations clearly understood and human readable.
Using consistent directory structures and file naming across locations and time periods.

All of these tantalizing nuggets of data-management insight might be boring if taught on their own, but when integrated throughout, in a consistent way, created consistent reinforcement of best practices. It’s kind of like those subtle product placement plugs that are embedded into our favorite TV shows and movies, but better because it’s useful best practices that help learners more efficiently store and work with data.

Lesson learned: make time for creating a dataset in your lesson development schedule. It will pay off in flexibility associated with lesson development.

Lessons by Hackathon

The spatio-temporal lessons were begun via a hackathon, which was a fun, creative event filled with fantastic discussion and wonderful community input. And snacks. Snacks are key.

In retrospect, however, I would adjust our workflow. Rather than a hackathon early on, I would bring together a small group (3-4 people) of experts in both the topic area and high level thinking associated with lesson development, to build the initial lesson shell and flow. I would then hold the hackathon after to both test the lessons and get explicit and focused community input on existing, structured (but not complete) lessons. This hackathon could serve a dual role of helping familiarize a group of instructors with teaching the materials.

Snacks would still be involved.

Warning: Learning Overload: Code and Data Literacy in One Workshop

Spatio-temporal data are complex to teach in a tool like R or Python. We have to couple advanced data literacy concepts like coordinate reference systems, spatial extents, data resolution, and missing data values with R programming concepts, including working with spatial objects which are heterogenous in structure (such as slots containing text strings, data.frames, and embedded metadata). It turns out that looking at the structure of a SpatialPointsDataFrame is a lot like teaching HDF5. Proceed with caution to avoid glazed participant eyes!

The Tastes Great, Less Filling Model in Action

Data Carpentry currently relies on two-day workshops. However, 3 or 4 days would be ideal to teach the material well, keep the pace slow, and the discussion, rich. We received this feedback during the workshop, and I agree that another day would be ideal.

The most powerful learning moments were those fueled by a mistake or odd result in code implementation and the followup discussion. These learning moments “taste great” to both participants and instructors.

Complex Concepts Become Familiar - Not Scary

To address some of the key (complex) data literacy concepts that are often not fully understood by even seasoned users of GUI-based GIS tools (e.g., ArcGis or QGIS), we covered key topics several times. R forces a user to understand these concepts on at least a basic level.

For instance, our data management section included a lesson on coordinate reference systems (CRS) that was designed to be taught—with code optionally—but better taught as a interactive group discussion fueled by R-generated maps of the globe in various geographic and projected CRSes. In parts two and three of the workshop, participants encountered and had to deal with different data in different Coordinate Reference Systems. By the end of the workshop, the concept became familiar rather than foreign and scary. We hope.

Timing is Everything - Make Your Own (Teaching) Adventure

During the workshop, we also were asked to cut the spatial workshop content short on both days. Luckily, I read my share of “choose your own adventure” books as a child, and thus I am well versed in the art of choosing a different path.

I instituted a “make your own adventure” approach where I read the pace of the learners and adjusted lessons content as we went, skipping sections and reintegrating key concepts that I felt were important to cover. This worked well, but may be difficult to document and thus difficult for an instructor new to these materials to implement.

Note: there were no dragons associated with this adventure’s ending but participants did learn some things about time series raster data in R, which was equally adventurous.

One approach that could account for this is to break up the material into discrete subsections that would allow instructors to mix and match topics depending upon time and audience needs or requests. The topics may look like this:

Spatial data management
Intro to vector data in R - understand, import, plot and retroject vector data
Intro to raster data in R - understand structure of, import, plot and reproject raster data in R
Raster/vector data analysis in R - geoprocessing tasks, crop rasters & extract raster values in R

The lessons might be a bit shorter with more focus than they are currently which will make it easier to piece together.

2 Screens = Dark Chocolate of Workshops

No joke: if you have a second projector, use it. We have lots of descriptive graphics in our lessons that are carefully designed to help us explain key data literacy concepts. During the workshop, we’d leave the lesson up on the screen and refer to those graphics in between coding.

Awesomeness.

We’d then put the challenge activity up on the screen and let the learners go to town, coding on their own.

Double awesomeness. Do it.

Automated Workflow Lessons

One of the largest benefits of moving from a GUI-based GIS approach to a coding approach is automation. A few lessons demonstrating the power of loops and functions to automate large processing workshops would be extremely beneficial to many users given real-world applications. Given the make your own adventure track I was placed into early on, I demonstrated a bit of automation using loops on the fly during the workshop. I also provided a verbal outline of how a participant might move forward with automating out entire workshop to support working with many sites worth of data (2,3,10, 20, 60??!) over many years.

In my perfect reality, it would be nice to create lessons that participants could work through on their own or that could be taught on an optional day 3 or 4 or at a followup intermediate workshop. Anyone game to work on this with me?

Next Steps

I am excited to teach the spatial lessons again in April in Denver, CO for the USGS. I made some edits as we taught and have many more ideas in mind to improve the lessons! I look forward to other instructors digging in and giving us feedback or submitting PR’s with updates, and improvements. Please get in touch if you’re interested in joining the group that is helping with the lessons - we need more input and help.

Want to Organize a Spatio-temporal Data Workshop Near You? Or Teach One?

Please get in touch with Data Carpentry or request a workshop. All of the material can be found online at:

Raster data in R
Vector data in R
[data management section(http://neon-workwithdata.github.io/NEON-R-Spatio-Temporal-Data-and-Management-Intro/) (under development]

Note: we will be moving and restructuring these in the near future.

Also check out our lessons on Time Series Data in R:

- Text formatted time series data in R

A Personal Note

I’d like to note that these lessons would not be possible without the help and support of all of the hackathon participants. Thank you all for your time in pulling the lessons together and providing feedback!

One last note: I’m so happy to be a part of the Carpentry community! I think it’s a fantastically positive group and look forward to teaching with other instructors in the future.

↧

Hiring a Deputy Director of Assessment

April 27, 2016, 5:00 pm

≫ Next: Announcing Partnerships

≪ Previous: Hello, Spatio-temporal Data Carpentry

Data Carpentry is hiring!

Data Carpentry seeks to hire a full-time staff member to direct its assessment activities. This person will design, implement, monitor, analyse, and report on a comprehensive system of metrics to help the Data Carpentry project and its sibling organization, Software Carpentry, evaluate the impact and effectiveness of the training they offer, to both learners and instructors.

As the Deputy Director of Assessment, you will have primary responsibility for developing methods and standards for the evaluation of all aspects of Data Carpentry’s training including relevance of curriculum, learning experience, long-term adoption of tools and skills and impact on productivity and reproducibility. You will also work with the Software Carpentry Foundation to build evaluation of the instructor development program, including effectiveness of instructor training and mentorship and the longer term impacts of instructor training on career development for instructors. You will also have the opportunity to collaborate with the training coordinators from related organizations to coordinate strategies and initiatives.

For details, including a full job description and the application procedure, please see the Data Carpentry jobs page.

↧

Announcing Partnerships

May 1, 2016, 5:00 pm

≫ Next: A Welcoming Community

≪ Previous: Hiring a Deputy Director of Assessment

We’ve been hearing of the interest of organizations to build local capacity for training and to be able run both Data and Software Carpentry workshops. We are excited to announce that Data Carpentry and Software Carpentry are now offering joint partnerships! These partnerships will give member organizations the benefits of running workshops from either the Software Carpentry or Data Carpentry community. At the Silver and above tiers there will also be instructor training and capacity building services provided.

Partnership Information

There are four tiers of Partnerships: Bronze, Silver, Gold and Platinum.

We wanted to provide opportunities for organizations to run multiple workshops, but who aren’t currently planning to train instructors (Bronze) and help organizations build local capacity with instructor training, coordinated workshops and self-organized workshops (Silver, Gold). There is also a flexible tier for organizations who are advancing beyond just capacity building and on to sustainment and wide adoption of our methods across disciplines (Platinum).

In all Partnerships, some coordinated workshops are included, so that organizations are freely able (with a small travel budget) to bring in outside instructors to help mentor new instructors and continue to encourage cross-connections of instructors across organizational boundaries. Also, all Partner organizations can run as many self-organized workshops as they like.

All currently in-place partnerships with the Software Carpentry Foundation will be grandfathered into a joint partnership consistent with their current contract until the current partnership expires, at which time they can select to have a joint partnership or a standalone Software Carpentry or Data Carpentry partnership as they choose.

Interested in a partnership or want more information, please get in touch!

↧

A Welcoming Community

May 8, 2016, 5:00 pm

≫ Next: Welcoming our new Associate Director

≪ Previous: Announcing Partnerships

The amazing Software and Data Carpentry community of instructors and learners is the foundation of our organizations. We have more than 500 instructors from 30 countries and have had over 20,000 learners in our workshops.

Software and Data Carpentry are community driven organizations. We value the involvement of everyone in this community - learners, instructors, hosts, developers, steering committee members, and staff. We are committed to creating a friendly and respectful place for learning, teaching and contributing. All participants in Software and Data Carpentry events or communications are expected to show respect and courtesy to others.

Core to our organizations is creating a friendly and welcoming community. Therefore, we would like to reiterate that anyone participating in Software and Data Carpentry activities must comply with our Code of Conduct. This code of conduct applies to all spaces managed by Software and Data Carpentry, including, but not limited to, workshops, email lists, and online forums.

We are so fortunate to have such a strong and supportive community of contributors, instructors, and learners and we are committed to supporting and maintaining that community!

↧

Welcoming our new Associate Director

May 9, 2016, 5:00 pm

≫ Next: R Instructor Training

≪ Previous: A Welcoming Community

We are excited to announce that Dr. Erin Becker has accepted the position of Associate Director for Data Carpentry! Erin did her PhD in computational genomics and her postdoc at the Center for Educational Effectiveness at UC Davis. Her postdoc focused on bringing evidence-based teaching strategies to teaching assistants, and she developed and implemented training programs, supervised and mentored instructors and assessed the effectiveness of training practices. Her computational background and experience with educational pedagogy and communities of instructors are skills and perspective she will bring to the Data Carpentry community, and we’re delighted to have her join the team!

Erin will lead Data Carpentry’s community engagement activities, sustainably growing a strong and supportive volunteer community of contributors, instructors, and learners. Her focus will be on improving communications, working with the Software Carpentry Foundation on the instructor training and mentorship program and increasing opportunities for learners.

Please join us in welcoming Erin! She is @erinsbecker on Twitter and ebecker@datacarpentry.org on email.

Message from Erin Becker

I’m very excited to be joining the Data Carpentry team as Associate Director. I come to Data Carpentry from the University of California, Davis after completing my PhD in Microbiology and a postdoc in biology education research.

My postdoctoral work focused on understanding how to help both novice and experienced instructors effectively use evidence-based teaching practices. This work has helped me to appreciate the variety of motivations that bring people to teaching and the diversity of beliefs about teaching and learning that shape instructors’ behaviors. I look forward to bringing this experience to my work with Data Carpentry as I help direct the instructor training program and serve as a mentor for workshop instructors.

Data Carpentry’s goal of helping researchers develop the ability and self-confidence to conduct computational data analyses resonates strongly with me from my own experience struggling to self-teach computational skills in graduate school.

When I first started my doctoral program, I had no intention of becoming a computational biologist and possessed barely basic computer literacy skills. A few years later, I found myself pursuing a thesis project in comparative genomics. I was lucky enough to have a supportive mentor who provided me with space to learn and believed in my ability, despite my own initial feelings of incompetence.

My desire to bring this guidance and supportive environment to others, so that they can be successful in their own long-term learning, has led me to Data Carpentry. I look forward to working with the community to help grow our ability to support researchers in becoming confident, capable data scientists.

↧

R Instructor Training

May 16, 2016, 5:00 pm

≫ Next: Instructor and trainee involvement

≪ Previous: Welcoming our new Associate Director

Thanks to generous sponsorship from the R Consortium, Software Carpentry is running a two-day R instructor training class in Cambridge, UK, on September 19-20, 2016. If you are active in the R and/or Software and Data Carpentry communities, and wish to take part in this training, please fill in this application form. We will select applicants, and notify everyone who applied, by June 30, 2016; those who are selected will be responsible for their own travel and accommodation. If you have any questions, please mail training@software-carpentry.org.

Please note that as a condition of taking this training:

You are required to abide by Software Carpentry’s code of conduct, which can be found at http://software-carpentry.org/conduct/.
You must complete three short tasks after the course in order to complete certification. The tasks are described at http://swcarpentry.github.io/instructor-training/checkout/, and take a total of approximately 2 hours.
You are expected to teach at a Software Carpentry or Data Carpentry workshop within 12 months of the course.

For more information on Software and Data Carpentry instructor training, please see http://swcarpentry.github.io/instructor-training/.

↧

Instructor and trainee involvement

May 18, 2016, 5:00 pm

≫ Next: How to approach selecting a license for data release

≪ Previous: R Instructor Training

This analysis is meant to answer questions about the extent of the non-teaching instructor and the non-certified trainee issues.

Analysis and Data Files

The RMarkdown that generated this document: Completion_rates.Rmd
The data: In the Data Carpentry metrics repo the files instructor_data_5_17_16_no_ids.csv and trainee_data_5_18_16_no_ids.csv

Questions

Questions to be answered include:

What percent of fully certified instructors haven’t taught?
- Overall
- Only those > 1 year past training
What percent of one-time instructors have only taught at their home institution?
What percent of trainees haven’t finished checkout within 90 days of training?
- Including online trainees
- Excluding online trainees

Data was collected from AMY database (instructor completion) and Google sheet (trainee completion) by Greg Wilson on 5/17/16.

Note: This data can not answer Question #2.

Definitions:
Trainee - person who has gone through instructor training but may not have completed checkout.
Instructor - person who has gone through both instructor training and checkout.

Trainee completion rate

Looking only at training sessions from at least 90 days ago:
- overall completion 54.73%
- in person completion 56.21%
- online completion 51.35%. (Two events, 63% and 31%).

Most (10/14) in person events have >65% completion rate. Some (Arlington, OK, Melbourne, Florida) much lower.

Takeaway: Overall, online sessions didn’t have a noticably lower completion rate than in-person sessions, but this appears to be due to a few in-person sessions have very low completion rates.

Some other events do not look on track to meeting normal completion rates (e.g. UCDavis - 84 days, 39%; UW - 68 days, 14%). More follow-up with these participants likely needed.

Wonder whether these abnormal rates are due to issues with local community, issues with how training session went, or some other factor.

Summary of completion rates per event:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   22.22   38.00   70.72   66.73   89.28  100.00

Note that mean completion rate for individual events is not the same as mean overall completion rate, as the number of participants per event varies.

Instructor teaching rate

What percent of instructors trained over a year ago haven’t yet taught?
Trained over one year ago: 378
Of which, haven’t taught: 70
This is 18.52%.

What percent of total instructors haven’t taught?
Total trained: 673
Haven’t taught: 222
This is 32.99%.

What percent of instructors trained within the past year haven’t yet taught?
Trained within last year: 295
Of which, haven’t taught: 152
This is 51.53%. (But many of these may have been trained very recently.)

What percent of instructors teach within their first year?
Took longer than one year to teach: 10
Haven’t taught (been over a year since training): 70
Total percent who didn’t teach within first year: 15.36%

What is the distribution of time to first teaching?

Note that many (204) instructors taught their first workshop before they were officially certified.

Excluding them:

Summary of time to teach first workshop (non-retroactive):

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##     1.0    42.0    98.0   124.6   175.8   659.0

Are recently trained instructors on track to meet normal teaching rates?
Overall, half of instructors teach w/in 98 days.

Trained between 95-120 days ago: 71
Of which, haven’t taught: 33
This is 46.48%.

Recent batch of trainees appear to be on track.

Conclusions

Online training sessions do not appear to have lower completion rates than in-person sessions. Completion rates are quite variable between training sessions. This may indicate greater follow-up needed. Overall completion rates ~55%.

Most instructors (~85%) teach within their first year. About half teach within 100 days. Current group of trainees is on track to meet that target.

↧

How to approach selecting a license for data release

June 12, 2016, 5:00 pm

≫ Next: A Roadmap for Lesson Development

≪ Previous: Instructor and trainee involvement

A question recently came up on the SWC Discuss mailing list about how to select a license for publicly released data. Answering these questions without knowing the life story of the data in question can only be done in vast generalities, but there are some nearly universal issues that everyone should consider.

No person will be able to tell you definitively which license is the best. Your selection will be determined by the kind of data you are using, community standards, and your personal preference. There are pros and cons to permissive versus restrictive licenses, which you will need to evaluate using your own preferences.

The first step is not to select a license

The first step is to determine if you are actually able to release the data in a public repository. You need to determine if the data you are releasing is subject to copyright, contractual, or legal sensitivities. You acquired your data somehow, and that method or the content may have restrictions in place on the redistribution of it. Just because you can access the data source without paying money or logging into a website doesn’t mean that it is public data and you are freely available to harvest and distribute it.

Some starter questions to ask include:

Did you have permission to gather the data and/or are you abiding by any applicable Terms of Service by gathering, using, or publishing that data?
Are you including data values where entities, such as publishers or users, hold copyright?
Was your access to the original data you’ve processed under a contract that restricts or has stipulations about how derivatives are released?
Does your home institution have policies on how data products and other intellectual property content is released and licensed?

These scenarios will impact your ability make the data public and which kind of license you can attach to it, which is why I always suggest working through this process in the hypothetical when you begin a project. Data copyright and IP control are thorny issues that, like other copyright domains, vary by country, institution, and year of creation.

Once you have determined that you can make your data public and need to select a license, here are some steps to help guide you through that process.

First, look at your community

What does your normal community of practice release under? You may also be depositing your data into a repository that has some opinions or other stock licenses to choose from. Let this group guide you, if you have one.

Looking to the larger general community of datasets published with DataCite DOIs (think FigShare and Zenodo), most of the declared rights statements are within the Creative Commons family of licenses. That doesn’t mean that Creative Commons is the best for data, but it is one of the most recognized license systems out there.

Second, select a license that you understand

When you select a license you (and your data users!) should understand the terms for release, access, reuse, and republication that the license grants. Creative Commons and Open Data Commons have a serious advantage here because they each focus on making ‘human readable’ versions to help authors make informed selections, but many other licensing schemes (particularly the software communities) have other canonical and well understood license types.

Selecting the individual license type you’d like to release under means that you need to carefully consider how restrictive or permissive you’d like to be. Generally, options like a public domain release or a simple attribution requirement are recommended for data. Data copyright is a complex issue that differes by country, state, and age of the content, with no brief statement doing it justice. The legal implications of data licenses are also being sorted out and differ from country to country. While it is true that facts and other such datums cannot be copyrighted, the assembly and selection of datasets can be (depending on the country). Given the international audience of this blog, I find it best to leave each reader to investigate the applicable copyright law. Be sure to read about the impact of attribution stacking if you select an attribution requirement license or anything more restrictive.

Do not be afraid of selecting a public domain license, such as CC0. This declaration does not mean that future users of the data are off the hook for citations from a scholarship perspective, just from legal consequences of it. You may still put in a request to be cited and suggested citation information with your data deposit in most data repositories.

Third, declare the license

There isn’t much to declaring the license other than indicating the information somewhere prominently where the data is located and/or even within the data files. Most data repositories have an option to select the license you’d like to release the data under, which is the most minimal way of making a declaration.

Where to get help

The UK’s Digital Curation Centre has a lengthy guide on selecting licenses: http://www.dcc.ac.uk/resources/how-guides/license-research-data you can look to for more information on specific licenses. However, I cannot suggest strongly enough that you make connections to your local data services groups at your university or institute. You may also need to speak to multiple people in order to answer all the questions posed in the previous sections, but having a point of contact for a consultation or other data services can be an important starting point. While this list is geared toward larger universities, and people in these positions may have a variety of job titles, you can look for:

a research data service unit, scholarly publishing commons, or copyright librarian within your university’s library system.
research support staff within your research unit, institute, department, or college.
your institutional review board if you are dealing with human subject data.
your office of technology transfer, if you are dealing with data that may be commercialized or the basis of a patent.

↧

A Roadmap for Lesson Development

July 18, 2016, 5:00 pm

≫ Next: Reopening Instructor Training

≪ Previous: How to approach selecting a license for data release

Our Lessons Roadmap

Development of domain-specific lessons in data literacy is part of the Data Carpentry mission. Data types, language (both computer and vocabulary), the types of questions that people ask, and underlying assumptions differ between fields and communities. Therefore when learning about data analysis, it’s easiest to translate what you’ve learned to your own research when you’re being taught from the perspective of your own domain. This means that we are committed to developing lessons in new fields, with a variety of data types and on new topics.

We are excited about the tremendous interest in lesson development across a huge breadth of domains, but we need to ensure that the pace of new lesson development doesn’t exceed what we can effectively steward. As a part of our Gordon and Betty Moore Foundation grant, we have committed to development of lessons in the life and physical sciences. Over the last year we’ve expanded beyond our original domain of ecology to now have lessons on genomics and lessons on spatial data, collaboratively developed with NEON, for working with raster data and vector data. In the development of both of those curricula we’ve worked with the community to determine the core concepts, run hackathons, write lessons and pilot the first teaching workshops. As a result we’ve learned a lot about the process and used this to develop a roadmap of how we want to take on creating new lessons moving forward. This includes how to determine the lessons to develop, the resources needed and the stages involved. Although Software Carpentry isn’t focused on new domains, they, too, are developing lessons on new topics. We have therefore developed this roadmap jointly, and it is being adopted by both organizations.

We’ve posted the lessons roadmap here to provide an overview of the process and show how to be involved! Please also see this post on the Software Carpentry website.

↧

Reopening Instructor Training

July 24, 2016, 5:00 pm

≫ Next: Survey on workshops for for-profit organizations

≪ Previous: A Roadmap for Lesson Development

For the last ten months, the Software Carpentry Foundation has worked toward three goals for its instructor training program:

Make the content more relevant.
Increase the number of people able to deliver instructor training.
Find a format that meets everyone’s needs in a sustainable way.

They have made a lot of progress on all three, and are therefore now able to offer instructor training once again to people who aren’t affiliated with our partner organizations, but would like to teach Data Carpentry, Software Carpentry, or both (as the course is shared by both organizations). If you wish to apply to take part in one of the two open-enrollment classes they will offer this fall, please fill in the form at:

https://amy.software-carpentry.org/workshops/request_training/

to tell them about yourself, what excites you about teaching, and how Software and Data Carpentry can help in your community. They will notify applicants as spaces become available. If you have any questions, please mail training@software-carpentry.org.

If you would like to accelerate the process, check out our Partnership program. Organizational partners make ongoing commitments to supporting our organization and are prioritized for instructor training. If you need help making the case at your organization, please mail us: we’d be happy to chat.

Please note that as a condition of taking this training, you must:

abide by our code of conduct, which can be found at http://datacarpentry.org/code-of-conduct/ and http://software-carpentry.org/conduct/,
agree to teach at a Data Carpentry or Software Carpentry workshop within 12 months of the course, and
complete three short tasks after the course in order to complete certification. The tasks take a total of approximately 8-10 hours, and are described at http://swcarpentry.github.io/instructor-training/checkout/.

For more information on instructor training, please see the course material at:

http://swcarpentry.github.io/instructor-training/

↧

Survey on workshops for for-profit organizations

July 24, 2016, 5:00 pm

≫ Next: Announcing our new Deputy Director of Assessment

≪ Previous: Reopening Instructor Training

Data Carpentry’s vision is “building communities teaching universal data literacy”. All our activities are focused on working towards this vision.

One of our signature activities is running and teaching workshops. Except for two, we have so far run such workshops only for non-profit organizations - universities, non-government non-profits and government organizations. We have also been assessing the idea of running workshops for for-profit organizations. One goal would be to raise additional revenue to subsidize workshop fees at non-profit institutions and to support long-term sustainability for Data Carpentry. Another is to provide Data Carpentry training and perspectives to people conducting research in industry. We’re still evaluating this idea, and before making any long-term policy we wanted to run pilot workshops and survey the instructor community on their thoughts about workshops for for-profit organizations.

We have completed the workshop pilots, running workshops for two for-profit companies. We charged these companies $5000 per workshop, four times the $1250 fee at that time. We conducted our standard learner surveys and solicited feedback from the instructors who taught these workshops and the hosts who ran them. Overall these workshops were well received by learners, instructors and hosts. Learners really appreciated the training, as they struggle with many of the same data challenges as those faced by their peers in not-for-profit organizations. Instructors appreciated the opportunity to visit a company and make connections outside of the academic environment. Hosts were happy with the materials and the enthusiasm and expertise the instructors brought to the training.

Software Carpentry has already made the decision to teach for for-profit organizations. While Software and Data Carpentry generally coordinate policies, teaching at for profit organizations would represent a shift in the community that Data Carpentry has traditionally served. It is the instructors who make these workshops possible, so we need your feedback!

Instructors contribute to the community in numerous ways, including teaching workshops, serving on subcommittees, mentoring new instructors, and maintaining lessons. Another option could be volunteering their time to teach at workshops that bring in additional revenue. However, is it fair to generate this additional revenue using volunteer effort? Or is it fair only if the added revenue is tagged for specific activities, other than simply improving the prospects for Data Carpentry’s longevity (for example, by earmarking it for fee-waivers for underserved communities)? Do these workshops align with our goal of training researchers to work more effectively and reproducibly with data? Should we be running workshops for for-profit organizations at all? Are there potential benefits to instructors to teach at for-profit organizations? If so, what are they? As instructors get to choose where and when to teach, how many instructors would be interested in teaching at for-profit institutions? These are all important questions as we think about these workshops going forward.

If you currently are or anticipate being involved in Data Carpentry in any capacity, we want to learn more about your interest and support for teaching for for-profit organizations! Please fill out this survey and help guide Data Carpentry.

[-- Data Carpentry community survey about workshops for for-profit organizations --](http://tinyurl.com/datacarpentry-survey1)

↧

Announcing our new Deputy Director of Assessment

August 1, 2016, 5:00 pm

≫ Next: Code of Conduct and Call for Volunteers for Policy Subcommittee

≪ Previous: Survey on workshops for for-profit organizations

We are very happy to announce that Dr. Kari L. Jordan will be joining us as the new Deputy Director of Assessment! Kari is coming from Embry-Riddle Aeronautical University in Daytona Beach, FL where her postdoctoral research was focused on understanding the factors that influence faculty adoption of evidence-based instructional practices. She is also adjunct faculty at Embry-Riddle and teaches a Graphical Communications course. Prior to her postdoc, Kari completed a PhD in STEM Education at Ohio State. Her thesis focused on strategies for improving self-efficacy and sense of belonging in first-year engineering students. She holds a BS and MS in mechanical engineering from Michigan Tech. Kari’s experiences in educational research are closely aligned with Data Carpentry’s efforts to create a welcoming and inclusive workshop environment that enhances learners’ confidence and helps them build identities as data-driven researchers. As the Deputy Director of Assessment, Kari will be focusing on assessment of workshop learning outcomes, including both skill-based and identity-based outcomes.

In addition to her work in education research, Kari brings a depth of experience working with and promoting diversity in the science and technology community. She has served on the board of the National Society of Black Engineers and the Ohio Diversity Council. At both Michigan Tech and Ohio State she worked with campus groups focused on supporting traditionally underrepresented students in obtaining degrees in STEM. In her free time, Kari teaches Zumba!

We are excited to have Kari on our team! Kari’s first day will be August 22nd. Please join us in welcoming her! She is @drkariljordan on Twitter and kjordan@datacarpentry.org.

Message from Kari L. Jordan

The opportunity to work with Data Carpentry as Deputy Director of Assessment is one that I truly value. I am excited to be a part of a community of practitioners and learners. We have work to do!

As a Post-Doctoral Research Associate I assessed the implementation of evidence-based instructional practices (EBIPs) among faculty in introductory engineering, mathematics, and physical sciences courses. My work as a post-doc has prepared me to lead assessment for Data Carpentry, in particular, the organization’s goal to motivate participants to engage in further self-directed learning of computational and data analytic skills.

My passion is inclusivity in all levels and forms of education. As such, I look forward to being a part of Data Carpentry’s continuous growth and development as we expand our workshop offerings, and working with an amazingly talented team. Thank you for having me.

↧

Code of Conduct and Call for Volunteers for Policy Subcommittee

August 7, 2016, 5:00 pm

≫ Next: Resources for Running Workshops

≪ Previous: Announcing our new Deputy Director of Assessment

The Carpentries are proud to share a common Code of Conduct (CoC), which outlines acceptable standards of behavior for our community members and those interacting with the Carpentries at in-person events and online spaces. Historically, however, we have not had an official process for reporting potential Code of Conduct violations or for adjudication and resolution of reported incidents. Thanks to input from our community, we recognize that defining these procedures is an important step in ensuring that any such issues are dealt with transparently in order to keep our community welcoming and safe for all.

Members of the Carpentry Steering Committees and staff have been working on defining these policies, and have put together a Reporting Guide and Enforcement Manual for handling potential CoC violations. These documents are based on valuable insights gained from previous community discussions of this issue (especially here and here). While we have made every effort to represent the views voiced in these discussions, ultimately, the CoC impacts every member of our community. To ensure that these policies meet the community’s needs, we would like your input.

The Carpentries are convening a joint Policy Subcommittee. Members of this group will be responsible for serving as advocates for the CoC, moderating Carpentry listservs, adjudicating reported CoC violations and developing and enforcing related policy as needed. If you are interested in serving the Carpentry community as a Policy Subcommittee member, please use this form to tell us about yourself, your involvement with the Carpentry community, and what valuable skills and perspectives you would bring to the Policy group. Applications will be open until Monday, August 15th at 5pm Pacific (Monday midnight UTC).

Regardless of your interest in joining the Policy Subcommittee, we invite all of our community members to give us feedback on the CoC Reporting Guide and Enforcement Manual. These documents can be found here as a Google Doc. The finalized policy will take into account community comments, so please add your voice to the discussion! If, for any reason, you would be more comfortable communicating your comments privately, please feel free to email DC’s Associate Director Erin Becker (ebecker@datacarpentry.org) and I will ensure that your voice is represented in the discussion.

The upcoming Lab Meeting will include a discussion of these issues. We encourage all community members to attend and share your thoughts. The Lab Meeting will be held Tuesday, August 16th at 1pm UTC and 10pm UTC.

We greatly appreciate the diverse insights our community members have brought to this discussion so far and look forward to hearing more from you as we continue to engage on this important topic.

↧

Resources for Running Workshops

August 7, 2016, 5:00 pm

≫ Next: September Data Carpentry All-Stars!

≪ Previous: Code of Conduct and Call for Volunteers for Policy Subcommittee

A successful Data Carpentry workshop is the result of coordinated effort among many different types of participants, including instructors, helpers, hosts, learners and Data Carpentry staff. Data Carpentry offers two types of workshops - self-organized and centrally-organized. These workshop types differ in terms of instructor training requirements, fee structures, and participant responsibilities - with local hosts and instructors at self-organized workshops taking on administrative responsibilities normally handled by Data Carpentry staff.

Instructors (both new and experienced) and workshop hosts often have questions about their roles in workshops logistics, especially with how their responsibilities differ between self-organized and centrally-organized workshops. To help clarify the roles played by the different participants, and the differences between self- and centrally-organized workshops, we’ve put together some resources to guide participants through the workshop organizational process.

These resources are available on our “Host a Workshop” and “Self-organized Workshops” pages and include:

Checklists for:
Email templates for communicating with co-instructors, helpers and learners
An accessibility checklist
A list of necessary equipment and
A troubleshooting page

We want these resources to be as useful as possible to our instructor, helper and workshop host community. If you find that anything is unclear, incomplete, or would like to suggest an additional resource, please email us (ebecker@datacarpentry.org).

↧

September Data Carpentry All-Stars!

August 31, 2016, 5:00 pm

≫ Next: Responding to your Learners

≪ Previous: Resources for Running Workshops

As a community-led organization, Data Carpentry depends on our active and highly-engaged volunteer community to carry out many of our core activities. Volunteer efforts span a wide range of activities, including teaching workshops, developing and maintaining lessons, and training and mentoring new instructors, among many other efforts. Without the commitment of our volunteers, Data Carpentry would never be able to reach such a broad learner base and to make such a huge impact on our learners.

We’d like to make it a habit to actively recognize and publicly acknowledge the great work that is being done by members of our community. Because there are so many ways to contribute to Data Carpentry, we’d like to focus each month on a different way that our volunteers are making an impact. This month, we’d like to start out by showcasing the efforts of our most active instructors.

Every one of these Data Carpentry badged instructors has taught at least three workshops since January 2016, including at least one Data Carpentry workshop. Many are also actively involved in serving the Carpentry community in other ways.

A big thank you to:

Martin Callaghan
Emily Davenport
Westa Domanova
Auriel Fournier
Ivan Gonzalez
Christopher Hamm
Christina Koch
Mateusz Kuzak
Paula Andrea Martinez
François Michonneau
John Moreau
Hani Nakhoul
Lex Nederbragt
Joseph Stachelek
Sarah Stevens
Steve Van Tuyl
Lukas Weber
Jason Williams

On behalf of the Data Carpentry community, and all of our learners, we’d also like to offer a hearty thank you to all of our instructors!

Please join us again in upcoming months to recognize the great work being done by other segments of our community.

Important caveat: If you’re a Data Carpentry instructor who should be listed above, but aren’t, please let us know so we can correct our mistake! We sincerely apologize for any inadvertent omissions.

↧