UCLA Bioinformatics
November 22, 2009, 09:39:03 PM *
Welcome, Guest. Please login or register.

Login with username, password and session length
News: Welcome to UCLA Bioinformatics Forum
 
   Home   Help Search Calendar Login Register  
Pages: [1]
  Print  
Author Topic: C14. How can I create pygr.NLMSA from UCSC axtNet pairwise alignments?  (Read 2273 times)
Namshin Kim
Administrator
Newbie
*****
Posts: 78


View Profile
« on: September 24, 2007, 11:04:12 AM »

UCSC pairwise alignment (axtNet format) is now supported (new in pygr 0.7beta3, will be released soon). User can add whatever pairwise alignment they want from UCSC axtNet files. In this example, we are using hg18_self (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsSelf/axtNet/), hg18_panTro2 (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsPanTro2/axtNet/), hg18_mm8 (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsMm8/axtNet/), hg18_rn4 (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsRn4/axtNet/), hg18_canFam2 (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsCanFam2/axtNet/) for widely used genome assemblies.

First, you have to create seqdb.BlastDB for all of the genome assemblies. Sequence name should be same as assembly name. I.e, save hg18 assembly as "hg18" and make seqdb.BlastDB.

Then, download all axtNet alignments from UCSC website (hg18_canFam2  hg18_mm8  hg18_panTro2  hg18_rn4  hg18_self) and uncompress gzip archives. Here is the simple example how to make pygr.NLMSA for hg18.

#!/usr/bin/env python

import sys, os, string, glob
from pygr import seqdb
from pygr import cnestedlist

args = sys.argv
if len(args) != 2:
    print 'Usage:', args[0], 'dummy'
    sys.exit()

genomes = {}
seqlist = ['hg18', 'panTro2', 'mm8', 'rn4', 'canFam2']

for orgstr in seqlist:
    genomes[orgstr] = seqdb.BlastDB(orgstr)

genomeUnion = seqdb.PrefixUnionDict(genomes)

msaname = 'hg18_pairwise5way'
mafdir = '/data/root2/' + msaname + '/'

axtlist = glob.glob(os.path.join(axtdir, '*/*.axt')
axtlist.sort()

msa = cnestedlist.NLMSA(msaname, 'w', genomeUnion, axtFiles = axtlist) # axtFiles for axtNet (cf. mafFiles for MAF), FOR AXTNET FORMAT, YOU *MUST* GIVE *axtFiles =*
msa.save_seq_dict() # IF YOU ARE NOT USING pygr.Data, YOU *MUST* SAVE SEQDICT.

Depending on your system, it would take several hours to finish. By default, it will use 1GB memory. If you have less memory, you have to decrease file size. You can pass arguments as follows.

cnestedlist.NLMSA(msaname, 'w', genomeUnion, axtFiles = axtlist, maxlen = 536870912, maxint = 22369620) # 500MB Version. It will use less than 750MB Memory at most.

If you are planning to save NLMSA into pygr.Data and never open directly from file, you don't have to give additional options. But, if you are planning to open NLMSA directly from file, seqDict should be saved into file and you have to do msa.save_seq_dict().

One big important differenct between MAF NLMSA and axtNet NLMSA is... if we are using hg18 referenced alignments, in MAF NLMSA, if you query using hg18, you can get all alignments of other species (same as axtNet NLMSA). But, if you query using panTro2, you can only get hg18 alignments in axtNet (MAF NLMSA gives all other hits from all other species). It is because axtNet NLMSA alignment is always stored as "pairwiseMode".



« Last Edit: September 24, 2007, 02:45:19 PM by Namshin Kim » Logged
sarahjenkins
Newbie
*
Posts: 1


View Profile WWW
« Reply #1 on: August 13, 2008, 06:52:43 AM »

Thanks for your information
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.7 | SMF © 2006-2008, Simple Machines LLC Valid XHTML 1.0! Valid CSS!