Minutes of the IBP Implementation Team meeting held in Wageningen 4th June 2011
Jean-Marcel Ribaut, Elizabeth Arnaud, Scott Chapman, Alain Charcosset, Fran Clarke, Mark Dieters, Delphine Fleury, Chengzhi Liang, Graham McLaren, Clarissa Pimentel, Arllet Portugal, Abhishek Rathore, Trushar Shar, Fabio Valente, Fred van Eeuwijk, Jiankang Wang, Guoyou Ye, Lucia Gutierrez, Marcos Masoletti
Jose Crossa, Hector Sanchez
Decision support for QTL use
- Delphine and Graham to approach user cases to obtain suitable data and contextual information which will be passed to Fred for review and QTL analysis - July 2011
- Fred, Scott, Guoyou, Pancho and Alain to use the QTL results and compare strategies for identification and weighting of favourable alleles. Results and efficiency will be compared to arrive at a recommended strategy (or strategies) - Sept 2011
- Mark and the simulation team will identify the most promising intervention points and options for use of simulation in the AP. They will also outline prospects for the generic simulation platform and highlight the resource implications of various options - July 2011
- The Simulation team will also clean up their current interface, finish the Breeding design program and add a simulation step to OptiMas (to generate next-generation lines) - Sept 2011
- Guoyou will send to Scott, Mark and Jiankang his rice data for MARS project - July 2011
- Trushar will develop an installation procedure for the GDMS - June 2011
- Arllet and Clarissa will install a version on the CIMMYT network for Chunlin to test - July 2011
- Arllet and Graham will load the germplasm data for Sorghum from David Jordan and Emma Mace to a local database and coordinate with Praveen for its upload to then central -July 2011
- Trushar will start to construct a Sorghum GDMS using data from David Jordan and Emma Mace as well as ICRISAT. Prototype - Sept 2011
- Publication of Sorghum GDMS on IBPortal - Dec 2011
- Trushar will document the GDMS schema - July 2011, and to work with Graham and Arllet on the linkage of GDMS and GMS - Sept 2011
- Delphine and Graham will locate several raw genotyping and phenotyping datasets which will be sent to Fred who will isolate the appropriate scripts for QA and develop a standard QA report from these scripts - Sept 2011
- Hector and Trushar to add QA reports in the loading stages of IB Fieldbook (phenotyping) and GDMS (genotyping). Prototype Dec 2011
- Guoyou: Set up CropForge project for R Pipeline - July 2011
- Fred: Report back on open souce REML discussion with Cullis - Sept 2011
- Guoyou: Identify R scripts available for design generation for discussion with the implementation team - July 2011
- Guoyou will develop a prototype GUI with data manipulation and SSA and MET scripts from Fred - Sept 2011
- I would like to try and plan the next meeting at ICRISAT on the morning of 23rd September during the GCP Research Meeting. For those who cannot attend we could try to link by phone or Webex.
All the R scripts for single-site analysis, multi-environment analyses, G x E and QTL analysis with graphical output are ready for use. The experimental design tool still has to be completed.
Although the R scripts are rather slow, it has been decided to bypass ASREML for the analysis to keep the R tools independent on VSNi. Fred will be in Australia in early July might ask Brian Cullis about the status of new open source mixed model software.
In terms of general procedure, new scripts or information will be sent to Delphine or Graham to be added to a project on CropForge. They will send an email about updates to the implementation team.
Decision support tools for QTL
Considering the gap between QTL detection and use, it was decided that it will be necessary to work with several use-cases in a coordinated way to decide on a basic methodology which should then be implemented in the Analytical Pipeline (AP).
The key issue is the identification and weighting of favourable alleles from the QTL analysis and perhaps also the identification of traits which should not be changed, or for which diversity should be retained by the breeding process.
Several approaches are available:
- Environmental clustering and weighting
- Classical index construction
- Approaches based on simulation .
The agreed strategy is for Delphine and Graham to approach user cases to obtain suitable data and contextual information which will be passed to Fred for review and QTL analysis. Starting from this analysis different team members will may take different analytical approaches - Fred, Scott, Guoyou, Pancho and Alain. Results and efficiency will be compared to arrive at a recommended strategy (or strategies). These will then be integrated into the AP as appropriate.
Simulation Tools in the CWS
We returned to the discussion about deployment of simulation in routine breeding decisions. A clear trade-off was identified between further development of a generic simulation platform, preferable on the iPlant CI, and the packaging of specific stand-alone simulation tools in the CWS.
The MT believes, although the service approach remains a good one, that some presence of simulation options in the CWS is desirable for political and pedagogical reasons. Mark and the simulation team will identify the most promising intervention points and options. They will also outline prospects for the generic simulation platform and highlight the resource implications of various options.
The Simulation team will also clean up their current interface, finish the Breeding design program and add a simulation step to OptiMas (to generate next-generation lines). They will also make templates of breeding schemes to be filled by the users. Guoyou will send to Scott, Mark and Jiankang his rice data for MARS project.
Genotyping Data Management System (GDMS)
Trushar will develop an installation procedure for the GDMS so that implementation team members can test the database. Arllet and Clarissa will install a version on the CIMMYT network for Chunlin to test and use as a repository for Marker Service (MS) datasets. This may also prove to be a suitable vehicle for distribution of MS results to users.
It was agreed that deployment of GDMS by crop makes practical and political sense and Trushar will start to construct a Sorghum GDMS using data from David Jordan and Emma Mace as well as ICRISAT data. Arllet and Graham will load the germplasm data to a local database and coordinate with Praveen for its upload to then central. Trushar and his group will load the genotyping and QTL data. We should aim to publish this on the Crop Information section of the IBP portal over the next six months as a proof of concept.
Trushar will also work to document the GDMS schema over the next month and to work with Graham and Arllet on the linkage of GDMS and GMS. The question of data privacy and a publication strategy must be addressed in the near future. How can users see their local genotyping data integrated with public data and how do they extract this data for publication in a central database without corrupting the integrity of either database? Trushar will consider these questions and suggest solutions or a strategy for getting to the solutions.
Trushar asked about the development of the middleware functions for GMS as he needs some of them like checking the existence of germplasm in the genotyping study etc. This is also a problem for Hector. The MT agrees that there is an urgent need for a coordinated development of a Java based thin middleware to allow application developers to move quickly in interface issues. In further discussion on this point the MT has decided to coordinate this activity (see appendix), however, in the short term Trushar and Hector will have to develop simple data access functions to support their applications. These will be replaced by the 'IBP middleware' as soon as it is available, and this will also provide more sophisticated functions. Candy and Arllet will help developers with the logic of some of the functions which are urgently needed.
Fabio will need similar functions. Since he is developing in C++, he might try to use the existing ICIS.DLL. I will give him the DLL and the functions he can call. He will try to call it from OptiMAS.
Delphine summarized the recent discussion on QA for genotyping data. It was recognized that QA is a continuous process from design through data collection, data loading, analysis and decision making with extra information brought into the QA process at each step. We decided to concentrate on QA for the data loading stage when the design and marker metadata are available but on the model or decision context.
It was agreed that a number of QA checks are available for this stage (for both genotyping and phenotyping data) and that these should automatically generate a QA report prior to data loading. Warning about quality issues will be reported but it will be the users responsability to decide on it.
Delphine and Graham will locate several raw genotyping and phenotyping datasets which will be sent to Fred who will isolate the appropriate scripts for QA and develop a standard QA report from these scripts. They will be added to the data loading interface of IB Fieldbook (phenotyping) and GDMS (genotyping). The Genetic mapping tools of Genstat also includes a quality check step (removing outliers). Fred's group will generate a quality report for this step as well.
The target should be to have alpha versions by December and a prototype in the CWS by June 2012.
We considered the organization of the R Pipeline development process. It is clear that many partners wish to participate in this development and we must avoid duplication and redundancy as far as possible.
We reviewed screenshots of the tckl and Java GUIs developed by Guoyou and agreed that they look like they can provide a very good technology for the pipeline.
While there is no harm in having alternative analyses for comparison and advanced users, it was felt that we should 'publish' a very restricted set for platform users.
Furthermore, inter-dependence of scripts across analytical stages makes it difficult to treat the pipeline development in a completely modular way. So we agreed that the development of the R scripts should follow the following steps:
- Guoyou will set up a project on CropForge where the R scripts, the GUI interface code and bug and feature tracking will be manged
- The latest versions of the current R scripts will be collected by Delphine and posted to the project by Guoyou
- People who are developing R scripts that are not included in our last workplan should inform Delphine and Graham.
- When R scripts are ready for testing, Delphine and Graham will distribute them to partners for testing and comments and will collect the feedback
*Back to the inventor for improvement and development of a last prototype.
- Delphine and Graham (in consultation with the team) will 'release' scripts for inclusion in the 'official' Breeders RAP.
- Released scripts will reviewed by a professional R Programmer.
- Guoyou will develop menu interfaces to the released scripts following mockups provided by the script developers.
- Delphine and Graham will distribute the package to testers for evaluation of the menu interface.
In order to get this pipeline started, Guoyou will quickly develop a prototype GUI, which will include some data manipulation features and the R scripts for single and multiple location analysis developed by Fred which are matured enough to go through the process. It is anticipated that the scripts will have to be modified to accept data and job parameters in a flexible way. These modifications should be carried out between IRRI and Biometris, with input from a professional R-scriptor. A designer of software interface will also be hired for improving the prototype GUI (Delphine to look for an appropriate consultant). A best practice strategy should be developed from this experience for the modification of other scripts or the development of new scripts for inclusion in the pipeline.
It would be desirable to have the alpha version, with basic data manipulation and one script (SSA), by September for testing and discussion at the GCP research meeting.
Abhishek gave a demonstration of his R analytical tools connected to ICIS. It could be used for the web version of the CWS. Abhishek will be introduced to the iPlant team and to the web master that will be hired at IBP.
GCP Research Meeting
The next meeting will be a phone conference in early August to hear of progress on action items listed for July .
The next opportunity for a face-to-face meeting of the implementation team will be at the GCP research meeting towards the end of September. The GCP MT will discuss and suggest IBP presentations and workshops for that meeting and circulate these to the implementation team for feedback.
Appendix 1 - IBP Java Middleware
For the pipeline to operate as a workflow system and for efficiency of development it is essential to have a Java based thin middleware to afford access to the databases - GMS, DMS and GDMS. It is proving difficult for IBP partners to develop this middleware in a coordinated way so the GCP will take over the coordination of this effort. This will involve the following steps:
- The teams of Hector and Trushar who are actively developing database dependent applications will quickly write urgently need access routines, well isolated from the business logic of the applications. They will also need to develop basic query facilities to export datasets in basic trait and genotype formats (eq flapjack files) for access by analytical and decision support tools. These should be ready by July 30.
- GCP will commission a software development company to quickly implement the basic functions as specified in existing ICIS documentation in a first phase project. This should be ready by December.
- Hector and Trushar will deploy the IBP Middleware as soon as it is available replacing redundant functions in their applications and documenting newly required functions to a standard that allows the software development company to add them to the IBP middleware in a second phase of development to be completed by March 2012. (For GDMS this involves all data access functions and requires solving the private/public requirement and change management).
- Analytical and decision support tools will initially be developed to accept standard file input, but will be updated to allow optional database access when the middleware is available.
A table with tasks and rough timeline follows:
||Documentation about requirements based on existing DLL
||July 15, 2011
||IBFieldbook and GDMS able to query the database and export to standard files
||Hector for IBFB
Trushar (may be already available for GDMS)
||Scout Software companies who can implement middleware
||July 5, 2011
||Selection of the company
||August 15, 2011
||Sept 15, 2011
||Development of the middleware for DMS add functions and some GMS retrieve functions needed by Fieldbook and GDMS
||Dec, 15, 2011
||Development of the rest of the functions from the DLL
||March 15, 2012