Saturday, January 22, 2011

MURPA-Lin Wei-Week 3


Progress of the project

1.The big picture of the project.

After playing with Kepler, reading documentation and talking with UCSD researchers, I found that there was a misunderstanding of the project. When Wilfred talked about integrating Opal and Nimrod/K originally, he actually wants to use Nimrod/K’s meta-scheduling capability to assign jobs to the available resources regardless of the geographical location of the resources. On the contrary, Colin and Ilkay thought that Wilfred wants to do parameter sweep with Nimrod/K and Opal actor. That’s why Colin suggested me to extend the Opal actor to use GridFileTokens, and Ilkay asked me to create work flow first. I have a strong feeling that the Nimrod/K may not do meta-scheduling, only the Nimrod itself has meta-scheduling capability. Even if it does, how could it make scheduling decisions when the Opal actor submits the requests to Opal? Nimrod/K simply does not know the state of the resources.

When I discussed the misunderstanding with Wilfred, both of us agreed that we need to confirm it with Colin during the Skype meeting on next Monday. However, there is no conflict between the parameter sweep and meta-scheduling. They are two separate components of the whole project. The picture below illustrates these two parts and the relationship between them. In the Kepler work flow environment, we can follow colin’s idea to use GridFileToken with Opal actor to implement parameter sweep. In terms of the meta-scheduling, we might be able to extract the meta-scheduling component of the Nimrod and implement it as a job manager in Opal.

Now, the question is which part should I work on for the next 5 weeks? I would recommend work on the meta-scheduling part because Colin is very knowledgeable about the Nimrod/K parameter sweep and another Monash student is upgrading Nimrod/K to version 2.X. The advantage of doing research in UCSD is that I can use Wilfred and his team’s expertise in Opal. Therefore, it might be a better idea to focus on the meta-scheduling part. But I still need to discuss this with Wilfred and Colin.

2. Fix the work flow empty output errors.

The original MEME work flow returns a web page without any DNA sequence analysis results. This problem is not caused by the design of the work flow. It is an issue with the Opal MEME web service.

This error was quickly fixed by changing the input file and command line argument.
New command: meme crp0.fasta -dna -nmotifs 3
New sequence file downloaded from:
http://meme.sdsc.edu/meme4_5_0/doc/examples/sample_opal_scripts/crp0.fasta

Work flow output can be accessed from: http://ws.nbcr.net/app1295647544139/meme.html

Other part of the work flow remains the same.

3. Study sample parameter sweep workflow

Jianwu sent me some parameter sweep work flows and a paper describing the use case of the work flow. Unfortunately, none of them uses Nimrod/K to do the parameter sweep. They use the parameter sweep actor in Kepler instead. These sample work flow are good start point for me to understand how parameter sweep is implemented.

Work flow: https://code.ecoinformatics.org/code/reap/trunk/usecases/terrestrial/workflows/HosseiniSimulationWorkflow/simhofi-withMulitipleParaSet-PN.xml

https://code.ecoinformatics.org/code/reap/trunk/usecases/terrestrial/workflows/HosseiniSimulationWorkflow/simhofi-withMulitipleParaSet-distributed-PN.xml.

Paper:

http://users.sdsc.edu/~jianwu/JianwuWang_files/Accelerating%20Parameter%20Sweep%20Workflows%20by%20Utilizing%20Ad-hoc%20Network%20Computing%20Resources%20-%20an%20Ecological%20Example%20(SWF%202009).pdf.

Since part of the project is to use Nimrod/K to handle parameter sweep, I would like to see some Nimrod/K work flow. I’m going to ask Colin for some Nimrod/K work flow in the Skype meeting.

Plan for the weekends

San Diego Zoo and downtown

Wednesday, January 19, 2011

MURPA-Lin Wei-Week 2

Progress of the Project
It is challenging and interesting that my three supervisors have different ideas about the project. Wilfred is interested in using Nimrod's meta-scheduling capability. Ilkay wants to create a work flow that can be extended later to include parameter sweep. Colin suggested me to extend the Opal actor to use GridFileTokens. While those ideas add an extra level of complexity to the project, it helps me view the project from different perspectives because my supervisors came up with the idea based on their expertise.

In week 2 meeting, it has been finally agreed that we will follow Ilkay's suggestion to construct a work flow first. Thanks to Jianwu and Jane's assistance, I have two candidate work flows at the moment:

MEME Work Flow

This work flow submits DNA or protein sequences to MEME. MEME will analyze your sequences for similarities among them and produce a description for each pattern/motif it discovers. The analysis results is displayed in a web browser. The work flow is retrieved from https://code.kepler-project.org/code/kepler/trunk/workflows/meme-opal/ To view a sample output, visit http://ws.nbcr.net/app1295390134402/. The required parameters to be specified in this work flow include the occurrences of a single motif, maximum and minimum width of motif, maximum number of motif. There are also many optional parameters. Those parameters can be used later to implement parameter sweep.

The original work flow use MEME 4.1.0 web service which is no longer supported by NBCR. Therefore, I modified the work flow to use MEME 4.5.0. The problem with the work flow is that when I save the modified work flow, the work flow cannot find At.fa sequence file at execution time. I plan to consult Jianwu again to fix this problem.


PDB2PQR

This work flow perform a PDB to PQR file format conversion using NBCR remote computation capabilities. It runs smoothly. Visit http://ws.nbcr.net/app1295392505925/ to view the results. However, I'm not sure if it make sense to do parameter sweep. I need to talk to Wilfred to find out if the parameter sweep can help solve scientific problem.



Entertainment

We rent a car and visited Las Vegas during the long weekends. We pretty much went to every big casino. Unfortunately, Wai Keung and Geoff lost around $100 in gambling. The buildings and scenery in downtown are more attractive to me than gambling.


Bad News
My dad told me that my mum had a big surgery. The surgeon opened my mum's belly and removed four tumors. I hope my mum will recover soon.



Monday, January 10, 2011

MURPA-Lin Wei-Week 1

Finally, Wai Keung and I arrived at San Diego on Tuesday. The long boring flight makes me exhausted. Not mentioned that the long queue at Los Angeles immigration kept us waiting for an hour and we missed the flight to San Diego. The security staff who inspected my visa was unfriendly. He never greets visitors and the way he looks at me makes me feel that he just wants to reject anyone from entering the United States.

The weather at San Diego in winter is so pleasant, the apartment is fantastic and the people at UCSD are friendly. Everything at UCSD helps me recover very soon and I feel I'm in the right mode to work on my project. I particularly like the supermarket here. It's still open after midnight and it's got tons of food, much better than Australian supermarkets. The plastic bag design is brilliant. There is no need to rub the bag to open it. You pull the front bag and the next bag opens automatically. This design reminds me Don Normans's book "The design of everyday things". This book is the best introduction to the importance of usability in design: much of the value comes from the fact that it is not about computers but about all kinds of other things that we suffer from every day. Australian supermarket should definitely learn from the plastic bag design here.

In terms of my project (Integrating Opal and Nimrod/K), I had one meeting with Ilkay, Maria and Jiang Wu and another meeting with Wilfred this week. Colin says my project needs much more than two month's work. The current version of Nimrod/K does not work with Kepler 2.X. Another student is updating Nimrod/K at Monash and it's going to take 4-6 weeks. In addition, the Opal actor cannot determine the order of files. It is not worthwhile to make Nimrod/K and Opal web service works in Kepler 1.0 and upgrade them to Kepler 2.X. Ilkay suggests that I should define a use case scenario and create a workflow in Kepler 2.X before I work on the integration part. Wilfred and Colin advise me to fix the Opal actor. I'm going to have a meeting with Ilkay and Wilfred to discuss my project plan further. I think fix the Opal actor, create a workflow and design an integration plan would be achievable while I'm at UCSD. The difficult part at the moment is to set up the development environment. Wai Keung I have spent hours on it but we made little progress. Certain part of the installation instructions on the Kepler web site is outdated. Apart from the installation, I need to get opal.xml workflow and the associated libraries.

Next week, we are going to have a cocktail party and promote MURPA program. We also plan to visit San Fransisco during the long weekends.