Quantcast
Channel: Data Carpentry

Launching our New Handbook

$
0
0

As The Carpentries, we’re excited to announce that we have consolidated and updated many materials and resources to more easily share them online and be a community resource.

Today we are launching an all-new The Carpentries Handbook. We will also be tweeting regularly through a new Carpentries Twitter account.

The Carpentries Handbook

We are excited to release our new Carpentries Handbook! Historically, information and resources have been spread across various websites, Google docs, GitHub repos, and more. We now have a one-stop shop that consolidates all these resources. In one place, you can now find information on how to run a workshop, how to develop and maintain lessons, and how to participate in an instructor training event. You’ll also learn about getting the word out about Carpentries activities through our communication channels, and how to get involved in our global community. Many, many thanks to all the community members who helped develop this site.

We welcome everyone’s feedback on this Handbook. Feel free to submit issues or pull requests on this GitHub repo.

The Carpentries Twitter

We also will be regularly tweeting from our new The CarpentriesTwitter account from now on. Data and Software Carpentry-specific messages will still be tweeted from the individual Twitter accounts, and people will most likely tweet the handles of the individual Carpentries when teaching workshops. People are welcome to use the Software Carpentry, Data Carpentry, and The Carpentries Twitter handles in whatever combination that suits them.

Please take a look at all our new material and let us know what you think. You can comment via Twitter, Slack, or Facebook, but since issues are less ephemeral than a Tweet, raising an issue or submitting a pull request to the Handbook repo may work best so we can have a public discussion about what still needs doing.


Building Library Carpentry Community and Development

$
0
0

We are excited to announce that Chris Erdmann has been hired as the Library Carpentry Community and Development Director starting May 4, 2018.

Chris has worked in libraries for more than 21 years to integrate data management and workflows in database and library systems. Through training, consulting and tool development to build programs, he has tried to empower people in research and library communities to work effectively with data. Chris received his MLIS at the University of Washington iSchool while working at the University’s Technology Transfer Office, where he helped automate workflows and develop the unit’s web presence and analytics. He spent roughly ten years working alongside astronomers at the European Southern Observatory (ESO) and Harvard-Smithsonian Center for Astrophysics to advance library data-mining and linking services, e.g. ESO Telescope Bibliography. Also during this time, he led an experimental training series called Data Scientist Training for Librarians (DST4L) geared towards teaching librarians data-savvy skills to help transform their library services to meet the needs of their research communities. He recently joined the Library Carpentry governance group.

He is a co-author with Matt Burton, Liz Lyon, and Bonnie Tijerina on the recent report Shifting to Data Savvy: The Future of Data Science In Libraries, where Library Carpentry and The Carpentries are highlighted as a necessary next step for libraries to advance their research services.

Chris will be working with the Library Carpentry community and The Carpentries to start mapping out the infrastructure for growing the community, formalizing lesson development processes, expanding its pool of instructors, and inspiring more instructor trainers to meet the demand for Library Carpentry workshops around the globe and thus reach new regions and communities.

This new position is funded by IMLS and hosted by the University of California Curation Center (UC3), the digital curation program of the California Digital Library (CDL). It is intended to support the work of the Library Carpentry governance committee on streamlining operations with The Carpentries, determining standard curriculum, growing instructor training for librarians and planning for community events like the upcoming Mozilla Sprint to update Library Carpentry materials. Chris will be helping to manage the sprint work in the northern hemisphere.

Chris is excited about advancing the profession and sees the Library Carpentry and The Carpentries communities as the perfect catalyst to do that. He is on Twitter as @libcce, on GitHub and on LinkedIn, and we’re very excited to welcome Chris to this role!

For more information on Library Carpentry:https://librarycarpentry.github.io Follow @libcarpentry on Twitter. For more information on UC3 and California Digital Library: http://uc3.cdlib.org Follow @caldiglib and @UC3CDL on Twitter.

Volunteering for CarpentryCon 2018

$
0
0

The first inaugural CarpentryCon is less than 50 days away! The taskforce is diligently working to make sure all t’s are crossed and I’s are dotted to ensure that the Community enjoys a wonderful un-conference. While doing so, we have realized that we need one more thing…YOU!!!

Have you had a desire to get involved with the planning of CarpentryCon and did not have the time? Or maybe, you felt as though you did not know where you could be most valued? Or maybe, you thought the taskforce had everything under control and did not need your help?

If you had any of those thoughts, I’m happy to tell you those are all misconceptions. While the planning is well underway, there are areas that can use a few additional hands. And we would LOVE for you to get involved! You may only be able to assist for an hour, a day or possibly the entire conference. It does not matter the amount of time, if you want to help, there will be something that could use your assistance!

Here are areas that could use YOU!

  • Pre-Conference Setup
  • Registration
  • Speakers and Workshops
  • Social Media
  • AV
  • Entertainment

There are awesome benefits to becoming a volunteer. Here are just a few:

  • Making an impact on The Carpentries community
  • Network with The Carpentries community
  • Discounted items
  • Free items

CarpentryCon will be a history making event for The Carpentries. We would like for as many of our community members to be a part of this great event. If you would like to get involved, please send an email carpentrycon@carpentries.org to receive more information.

We look forward to seeing you in Dublin, Ireland!

Updates on the new Geospatial and Social Sciences lessons

$
0
0

In December 2017, we made a call for community members to contribute to two new sets of Data Carpentry lessons, targeted towards researchers working with geospatial data or survey data for the social sciences.

There was overwhelming interest from the community in working to develop and publish these two curricula. In the past few months, six new Maintainers for the Geospatial lessons, and five new Maintainers for the Social Sciences lessons have gone through Maintainer onboarding and begun to work with their lessons. Please join our community in welcoming Chris Prener, Geoff LaFlair, Peter Smyth, Juan Fung, Stephen Childs, Tyson Swetnam, Lauren O’Brien, Janani Selvaraj, and Lachlan Deer as new Maintainers on these lessons (Chris Prener and Juan Fung will serve as Maintainers for both the Geospatial and the Social Sciences lessons), as well as Leah Wasser and Joseph Stachelek, who will be continuing on as Maintainers for the Geospatial lessons.

In addition to new Maintainers, a set of Curriculum Advisors has also been assembled for each of these new curricula. Curriculum Advisors help to provide strategic oversight, vision, and leadership for a particular set of lessons to guide the overall development of the lessons. Please join us in welcoming Arindam Basu, Chris Prener, Geoff LaFlair, Katie Metzler, Rachel Gibson, Reka Solymosi, Peter Smyth, Scott Peterson, and Stephen Childs as the Curriculum Advisory Committee for the Social Sciences lessons and Anne Fouilloux, Arthur Endsley, Chris Prener, Jeff Hollister, Joseph Stachelek, Leah Wasser, Michael Sumner, Michele Tobias, and Stace Maples as the Curriculum Advisory Committee for the Geospatial lessons. Curriculum Advisors meet twice yearly and advise the curriculum’s Maintainers in overall strategy for the lessons. Meeting minutes for Curriculum Advisory meetings are available in the group’s GitHub repo.

Thanks to the phenomenal support from the Maintainers and Curriculum Advisors for these lessons, as well as the support of the Carpentry community during the recent Bug BBQ, these lessons are on track for publication. The Social Sciences lessons are scheduled for release at the end of April and the Geospatial lessons will be complete in June.

We still have work to do before publication! Everyone is invited to help as we enter the final stretch for preparing these lessons for their first official release. If you have a few minutes to spare, head on over to one of the lesson repositories and check out the open issues or review an existing pull request. You can also contact the lesson Maintainers on the SWC Slack channel with specific questions.

Thank you to everyone who has participated in building these lessons up to this point. It has been a fantastic community effort. We’re excited to be releasing these lessons soon so that they can benefit researchers in the social sciences and geospatial communities.

Launching The Carpentries Website

$
0
0

Website Launch

We are excited to announce that The Carpentries website is now live!

The new website celebrates our merged identity as The Carpentries.

The new website will give you access to all things ‘Carpentries’, in other words, it will give you easy access to what is common information across the merged organization. The sorts of things you will find there include our Code of Conduct, information about instructor training and assessment, a range of shared policies, including our privacy policy, details of staffing and project governance, and a whole lot more.

The existing Data and Software Carpentry sites will remain in place alongside the new site. Since Data and Software Carpentry are ongoing lesson organizations, information related to lessons belongs on those individual sites. We will gradually take down material that is now more logically based on The Carpentries website.

You may notice that a lot of the links on The Carpentries transfer you directly to The Carpentries Handbook that we launched last week.

The Handbook has been enthusiastically received by our community. For those who haven’t seen it yet, find it here. The aim of the Handbook is to provide a one-stop shop for people wanting all kinds of Carpentries-related information. Information is being added and updated all the time so please let us know if there is something missing. The Handbook and the website will complement each other to cover all things Carpentries.

Please let us know if there are errors or omissions on our new website. You can raise an issue about the website at this link, or about the Handbook at this link.

The launch of the new website completes our transition to a new, merged, online identity as The Carpentries. Increasingly we will blog as The Carpentries, rather than as Software or Data Carpentry, so be sure to check out our new blog.

We also have our new merged Twitter feed. Follow The Carpentries on Twitter.

An extended Data Carpentry Workshop over 7 weeks instead of 2 days

$
0
0

Background

The UF R Users Group was formed in January 2017, and since then we’ve been running a weekly “UF R Meetup”: A two-hour session consists of a 30 to 60 minute presentation/tutorials followed by an “open lab” session. The meetup is meant to be a casual, informal opportunity to learn as a community, and seek face-to-face advice.

By the end of the second semester of running the meetup, we had identified a couple of issues:

  • The majority of the participants were either beginners or completely new to R (and programming in general).
  • As our presentations shifted to cater to new users, it became difficult to engage and entice more advanced programmers.

In addition, our presentations on the basics of R were unstructured and constructed on-the-fly – not the best way to teach and learn R. We felt that these disconnects were making it difficult to establish a sustainable learning community.

In January 2018, we decided to run an introductory workshop series separate from the meetup. The workshop would provide structured lessons on the basics of R and allow the meetup to cover more advanced topics. Luckily for us, The Carpentries already have well-structured lessons for these materials, and we could rely on the strong pool of Carpentry instructors at the University of Florida.

The question then became: “Do we want to run a traditional two-day Carpentry workshop, or try something different?”. We already knew that there was interest in regular weekly meetings, and saw potential in giving access to people who could not commit to a full two-day Carpentries workshop, or people who might need a refresher even though they’ve taken the two-day workshop. So we decided to run our workshop a bit differently than normal.

Implementation

We used the Data Carpentry in Ecology curriculum as a starting point. This included Data Analysis and Visualization in R, Data Organization in Spreadsheets, and Data Cleaning with OpenRefine. The two-day workshops usually include the Data Management in SQL lesson as well, but we felt it may have been to much for learners to learn all the SQL concepts in a two-hour session. Instead we opted to create some new material centered on the join features in dplyr, which has very similar concepts. This extended naturally from the dplyr lesson, and we even titled it “Advanced Dataframe Manipulation” to reflect that.

| Week | Lesson | redirect_from: /blog/dc-seven-weeks/ | :—– |:—-| | 1 | Intro to R| | 2 | Data Organization| | 3 | Starting with Data| | 4 | Manipulating Dataframes | | 5 | Visualizing Data | | 6 | Advanced Dataframe Manipulation | | 7 | OpenRefine |

Besides that it was run exactly like any other Carpentry workshop. We had different instructors for each lesson, there were helpers available, we created an Etherpad for collaborative note-taking, and used red and green sticky notes for real time feedback. You can view the workshop homepage.

How it went

We’ve been a part of many Data Carpentry and Software Carpentry workshops here at the University of Florida, and this one went as well as any of them. Anonymous feedback at the end of each lesson was universally positive, and several participants told us in person how much they enjoyed it.

Sticky note feedback

We capped the elongated workshop at 40 participants and it filled fairly quickly. However, at most only 18 came to a particular session and attendance dropped over the 7 weeks.

Attendance

Several factors likely contributed to this attendance pattern. Attendance is also often low the first time a new training opportunity is offered. We also chose not to collect a registration fee, because our group is designed to be an informal alternative to other resources on our campus, including formal courses and traditional Carpentry workshops (on average there are about three Carpentry workshops each semester at UF). The lack of a financial commitment from students may have been part of the the depressed attendance. We also found that there was less interest and reduced attendance in the non-R focused lessons, as well as more interest in the tidyverse-based lessons compared to base R lessons. Scheduling conflicts also arose over the course of the series, and once someone had to miss a lesson there appeared to be a lack of motivation to continue.

Lastly, we were very interested in how this schedule format improved access. Anecdotally several participants told us how they preferred a two-hour-a-week workshop over a full two days. In a post-workshop survey, two of the three respondents said they preferred this schedule over a two-day workshop.

Scheduling-wise, the majority of material fit into the two-hour time slots. The exception was the “Manipulating Dataframes” lesson where we did not have time for the very last section. Luckily this fit in nicely with the “Advanced Dataframe Manipulation” lesson to fill the full two hours in week 6.

Lessons Learned

Overall, we feel this elongated workshop was a success and we hope to run similar ones in the future. We are encouraged by the post-survey responses, as well as the anecdotal comments from the attendees. The workshop series also provided a less time- and material-intensive opportunity for newly trained instructors to gain some teaching experience.

There a few things we may change:

  • There were participants who felt confident not attending the first few sessions and only attended specific ones such as Data Manipulation or Visualization. This likely contributed to, at most, 18 of the 40 sign ups attending. In the beginning we encouraged people to attend every lesson but did not enforce this. In the future, we would consider session-specific sign-ups where participants can express interest in any or all of the sessions based on their needs.

  • We found it helpful to do a short recap at the beginning of each session to quickly summarize the primary lessons from the prior week.

  • We collected and re-distributed the post-it notes every week so as not to waste them, though some eventually lost their stickiness. In the end we used up roughly three-quarters of a single stack for each color.

Social Sciences Lessons Published!

$
0
0

We are excited to announce the initial release of a Data Carpentry Social Sciences Curriculum. This is the first Data Carpentry Curriculum to be released targeted towards researchers outside of the life sciences and provides an opportunity to reach out to new communities.

Peter Smyth has assembled the initial content for these lessons with the guidance of Rachel Gibson, Professor of Political Science at the Cathie Marsh Institute of Social Research, University of Manchester, UK. It was polished during the April 2018 Bug BBQ, and the finishing was done by the lesson Maintainers in coordination with Carpentries staff.

This curriculum aims at teaching similar skills like the ones covered in the Ecology curriculum. It is focused on best practices for working with rectangular and tidy data. The curriculum covers data organization in spreadsheets, data cleaning with OpenRefine, as well as data manipulation and visualization with R. There are also lessons on SQL and Python that are available but are not part of this initial release.

As with other materials for Data Carpentry, the same dataset is used across all the lessons. Here, we use a simplified version of a research datasets generated by the SAFI (Studying African Farmer-led Irrigation) research project. This dataset is available on Figshare and is survey data relating to households and agriculture in Tanzania and Mozambique. The survey data was collected through interviews conducted between November 2016 and June 2017 and covered such things as household features (e.g., construction materials used, number of household members), agricultural practices (e.g., water usage), assets (e.g., number and types of livestock) and details about the household members.

If this curriculum were a piece of software, we would say it is in “beta”. The authors of this curriculum have taught it, and it is now ready to be taught by other members of The Carpentries community. We are interested in your feedback to improve it. We want to ensure it meets the needs and matches the skills that Social Scientists want to acquire when working with data. If you are a social scientist (or studying to become one), please review the lessons and provide us with your feedback. If you are interested in teaching one of the first Social Sciences Data Carpentry workshops, let us know by filling this form.

Geospatial Launch

$
0
0

The long-awaited Data Carpentry curriculum for working with Geospatial data is now ready to teach! As with all our newly developed curricula, these lessons are now in ‘beta’. We are actively promoting workshops and collecting information from those workshops to improve these lessons as they are taught more broadly and in different contexts. We will also be onboarding Instructors to prepare them to teach these new lessons. Keep reading for more details.

So what’s in the material?

This R-based geospatial workshop will introduce project organisation and management for spatial data, cover data structures and storage and transfer formats, teach the creation of summary statistics and publication-quality graphics, and help users work with and plot vector and raster-format spatial data in R. Find more information on the workshop homepage.

Want to teach this material?

We will be onboarding Instructors to prepare them to teach these new lessons. We also want to run some pilot workshops so that we can assess what we have got right, and what might still need some tweaking.

Lesson background

These lessons were initially developed in 2016 through a hackathon held in conjunction with the National Ecological Observatory Network (NEON). Hackathon participants included the following people who became the initial authors of the lessons: Leah A. Wasser - University of Colorado, Megan A. Jones - NEON, Zack Brym - University of Florida, Kristina Riemer - University of Florida, Jason Williams - Cold Spring Harbor Lab, Jeff Hollister - US Environmental Protection Agency, Mike Smorul - SESYNC, Joseph Stachelek - Michigan State University, Marissa Guarinello - NKN/University of Idaho, Jonah Duckles - The Carpentries, Keely Roth - University of California at Davis, Mike Alonzo - NASA Goddard, Ben Best - Duke / UCSB, Matt Kwit - Duke, Tracy Teal - The Carpentries, Kaitlin Stack Whitney - University of Wisconsin-Madison, Dave Roberts - Montana State, Courtney Soderberg - Center for Open Science, Sean Barberie - University of Alaska Fairbanks. The workshop materials were piloted in March 2016, and the lesson release has been much anticipated by Carpentries’ community members. Most of the data used in the workshop has been sourced from NEON (https://www.neonscience.org/). You can see other NEON tutorials for advanced GIS topics here (https://www.neonscience.org/resources/data-tutorials).

Recent community involvement

Recent developments in these materials have been led by a highly active and engaged group of Maintainers (Lachlan Deer, Juan Fung, Lauren O’Brien, Chris Prener, Janani Selvaraj, Joseph Stachelek , Tyson Swetnam, Jane Wyngaard) and Curriculum Advisors (Anne Fouilloux - University of Oslo, Arthur Endsley - University of Michigan, Chris Prener - St Louis University, Jeff Hollister - US Environmental Protection Agency, Joseph Stachelek - Michigan State University, Leah Wasser - University of Colorado, Michael Sumner - Australian Antarctic Data Centre, Michele Tobias- University of California, Davis, Stace Maples - Stanford University).

If you are interested in the direction and decisions the Curriculum Advisors took for the lesson, you can see their minutes. The finalisation of many parts of the material was down to a big burst of work during the April 2018 Bug BBQ. Thanks to all the community members who took part.

Special thanks go to: Lauren O’Brien for re-organizing the Geospatial Project Organization and Management lesson to line up with changes to the rest of the curriculum. Lachlan Deer, Juan Fung, Joseph Stachelek, Anne Fouilloux and Justin Millar for converting all of the episodes to ggplot from base R graphics. Joseph Stachelek for transferring the lessons to the current lesson template. Chris Prener for updating the installation instructions and creating a Docker image for the lessons. Leah A. Wasser and Megan A. Jones for providing an introduction to the data used in the lesson. Michael Culshaw-Maurer, Anne Fouilloux, Michael Heeremans, Megan A. Jones, Natalie Robinson, Joseph Stachelek, Tracy Teal, Michele Tobias, and Leah A. Wasser for teaching pilot workshops. NEON for collecting and sharing the data, organizing and co-hosting the 2016 Hackathon, and providing staff time to produce these lessons.

Teach or host a Geospatial workshop!

Want to get involved with the Geospatial materials? Get badged to teach the Geospatial lessons. Sign up for onboarding using this Etherpad. Onboarding sessions also appear on the Community Calendar. Request a Geospatial beta pilot workshop at your institution using this form. Self-organise a Geospatial beta pilot workshop at your institution. Use our self-organized workshop checklist to plan your workshop..


Atmos Ocean Launch

$
0
0

Back in late 2012, I was a couple of years into my first job out of college. My undergraduate studies had left me somewhat underprepared for the coding associated with analyzing climate model data for a national science organization, so I was searching online for assistance with Python programming. I stumbled upon the website of an organization called Software Carpentry, which at the time was a relatively small group of volunteers running two-day scientific computing “bootcamps” for researchers. I reached out to ask if they’d be interested in running a workshop alongside the 2013 Annual Conference of the Australian Meteorological and Oceanographic Society (AMOS), and to my surprise Greg Wilson - the co-founder of the organization - flew out to Australia to teach at our event in Melbourne and another in Sydney (the first ever bootcamps outside of North America and Europe). I trained up as an instructor soon after, and from 2014-2017 I hosted Software Carpentry workshops alongside the AMOS conference, as well as other ad hoc workshops in various meteorology and oceanography departments.

While these workshops were very popular and well received (Software Carpentry workshops always are), in the back of my mind I wanted to have a go at running a workshop designed specifically for atmosphere and ocean scientists. Instead of teaching generic skills in the hope that people would figure out how to apply them in their own context, I wanted to cut out the middle step and run a workshop in the atmosphere and ocean science context. This idea of discipline (or data-type) specific workshops was the driving force behind the establishment of Data Carpentry, so this year with their assistance I’ve developed lesson materials for a complete one-day workshop:
https://carpentrieslab.github.io/python-aos-lesson

The workshop centers around the task of writing a Python script that calculates and plots the seasonal rainfall climatology (i.e. the average rainfall) from the output from any arbitrary climate model. Such data is typically stored in netCDF file format and follows a strict “climate and forecasting” metadata convention. Along the way, we learn about the PyAOS stack (i.e. the ecosystem of libraries used in the atmosphere and ocean sciences), how to manage and share a software environment using conda, how to write modular/reusable code, how to write scripts that behave like other command line programs, version control, defensive programming strategies and how to capture and record the provenance of the data files and figures that we produce.

I’ve run the workshop twice now (at the 2018 AMOS Conference in Sydney and at Woods Hole Oceanographic Institution last month), which means I’ve completed the alpha stage of the Data Carpentry lesson development cycle. Moving from the alpha to beta stage involves having people other than me teach, which is where you come in. If you’re a qualified Carpentries instructor and would be interested in teaching the lessons (some experience with the netCDF file format and xarray Python library is useful), please get in touch with either myself or Francois Michonneau (Curriculum Development Lead for Data Carpentry). You can also request a workshop at your institution by contacting us and we’ll reach out to instructors. There is no fee for a pilot workshop, but you would need to cover travel expenses for instructors. I’d also be happy to hear any general feedback about the lesson materials at the associated GitHub repository.

Genomics Workshop Pilot and BugBBQ at University of Arizona

$
0
0

This February, a small group of committed community members, led by Taylor Reiter at the University of California, Davis, completed a major update to the Data Carpentry Genomics curriculum. A huge thank you to everyone who contributed! This update included significant changes to both the data set and the software used for the lessons, modernizing the workshop and keeping it relevant to researchers working in the genomics field. The new version of this curriculum is already live and will be officially published in the upcoming lesson release in June.

In parallel with these updates to the core Genomics workshop curriculum, Maintainers of the Data Analysis and Visualization in R for Genomics lesson have completely revamped their lesson. While previous versions focused on analysis of a small metadata dataset and was disconnected from the narrative of the remaining Genomics lessons, this revised version picks up where the core lessons leave off in analyzing and visualizing output from a variant calling pipeline.

Jason Williams and Uwe Hilgert will be hosting a pilot workshop using this new lesson at the University of Arizona on 30-31 May, followed by a BugBBQ hacky day in which Instructors and other Carpentries community members around the globe will use feedback from the pilot to improve the Genomics lessons and prepare them for publication in June.

How can I get involved?

  1. If you’re interested in attending the workshop and hacky day, please fill out this application. Limited funding is available to support travel within the United States.
  2. If you can’t be present in person, the workshop and hacky day will be live-streamed from 8:30am - 5:00pm US MST. To convert to your local time, click this link for start time and this link for end time . Start and end times are the same each day. For live-stream links, please visit the workshop website.
  3. Opportunities to contribute aren’t limited to just these three days! Maintainers for the Genomics lessons welcome contributions and have created lists of issues that need attention. Check out the links below for each lesson to see how you can contribute.
  4. We’re planning to start actively promoting the Genomics curriculum in June and will have more opportunities for you to teach. Make sure you’re signed up on the Instructor mailing list to be informed of upcoming teaching opportunities. We’re also developing training materials to help Instructors prepare for this workshop. To be notified when materials are released, add your info to this form.
  5. No matter how you choose to get involved, you can follow the conversation during the BugBBQ on this Gitter Channel.

If you have any questions about how to get involved or about the Genomics curriculum, email us at team@carpentries.org. We’re excited to work with you on publishing these lessons and making them available to a larger audience!





Latest Images