Saturday, January 22, 2011

MURPA-Lin Wei-Week 3


Progress of the project

1.The big picture of the project.

After playing with Kepler, reading documentation and talking with UCSD researchers, I found that there was a misunderstanding of the project. When Wilfred talked about integrating Opal and Nimrod/K originally, he actually wants to use Nimrod/K’s meta-scheduling capability to assign jobs to the available resources regardless of the geographical location of the resources. On the contrary, Colin and Ilkay thought that Wilfred wants to do parameter sweep with Nimrod/K and Opal actor. That’s why Colin suggested me to extend the Opal actor to use GridFileTokens, and Ilkay asked me to create work flow first. I have a strong feeling that the Nimrod/K may not do meta-scheduling, only the Nimrod itself has meta-scheduling capability. Even if it does, how could it make scheduling decisions when the Opal actor submits the requests to Opal? Nimrod/K simply does not know the state of the resources.

When I discussed the misunderstanding with Wilfred, both of us agreed that we need to confirm it with Colin during the Skype meeting on next Monday. However, there is no conflict between the parameter sweep and meta-scheduling. They are two separate components of the whole project. The picture below illustrates these two parts and the relationship between them. In the Kepler work flow environment, we can follow colin’s idea to use GridFileToken with Opal actor to implement parameter sweep. In terms of the meta-scheduling, we might be able to extract the meta-scheduling component of the Nimrod and implement it as a job manager in Opal.

Now, the question is which part should I work on for the next 5 weeks? I would recommend work on the meta-scheduling part because Colin is very knowledgeable about the Nimrod/K parameter sweep and another Monash student is upgrading Nimrod/K to version 2.X. The advantage of doing research in UCSD is that I can use Wilfred and his team’s expertise in Opal. Therefore, it might be a better idea to focus on the meta-scheduling part. But I still need to discuss this with Wilfred and Colin.

2. Fix the work flow empty output errors.

The original MEME work flow returns a web page without any DNA sequence analysis results. This problem is not caused by the design of the work flow. It is an issue with the Opal MEME web service.

This error was quickly fixed by changing the input file and command line argument.
New command: meme crp0.fasta -dna -nmotifs 3
New sequence file downloaded from:
http://meme.sdsc.edu/meme4_5_0/doc/examples/sample_opal_scripts/crp0.fasta

Work flow output can be accessed from: http://ws.nbcr.net/app1295647544139/meme.html

Other part of the work flow remains the same.

3. Study sample parameter sweep workflow

Jianwu sent me some parameter sweep work flows and a paper describing the use case of the work flow. Unfortunately, none of them uses Nimrod/K to do the parameter sweep. They use the parameter sweep actor in Kepler instead. These sample work flow are good start point for me to understand how parameter sweep is implemented.

Work flow: https://code.ecoinformatics.org/code/reap/trunk/usecases/terrestrial/workflows/HosseiniSimulationWorkflow/simhofi-withMulitipleParaSet-PN.xml

https://code.ecoinformatics.org/code/reap/trunk/usecases/terrestrial/workflows/HosseiniSimulationWorkflow/simhofi-withMulitipleParaSet-distributed-PN.xml.

Paper:

http://users.sdsc.edu/~jianwu/JianwuWang_files/Accelerating%20Parameter%20Sweep%20Workflows%20by%20Utilizing%20Ad-hoc%20Network%20Computing%20Resources%20-%20an%20Ecological%20Example%20(SWF%202009).pdf.

Since part of the project is to use Nimrod/K to handle parameter sweep, I would like to see some Nimrod/K work flow. I’m going to ask Colin for some Nimrod/K work flow in the Skype meeting.

Plan for the weekends

San Diego Zoo and downtown

No comments: