Sunday, 25 January 2015

Python - Extract lines from text file starting with From

Source file:

Let's assume text file containing bunch of lines while some of them start with "From". As an example such a line can look like following:

From louis@media.berkeley.edu Fri Jan  4 18:10:48 2008

What we are trying to do is to parse all the lines from the source file starting with "From" and have them written line by line as a list and at the end count number of lines matching our criteria.


Python code is then as per following:

fname = raw_input("Enter file location: ")
# source file, e.g. C:\Python34\Doc\SourceFile.txt
fh = open(fname)
count = 0
lst = list()
for line in fh:
    line = line.rstrip("\n")
    if line.startswith("From "):
        count = count + 1
        lst.append(line.split())
#
# Define initial item in a list
i = 1
for i in lst:
    # print second item in each list
    print i[1]
#
print "There were", count, "lines in the file with From as the first word"


Example of results:

stephen.marquard@uct.ac.za
louis@media.berkeley.edu
zqian@umich.edu
rjlowe@iupui.edu
zqian@umich.edu

rjlowe@iupui.edu
...
There were 15 lines in the file with From as the first word

Saturday, 3 January 2015

Python - Parse XML file

# ==================================================================
# >> Short Description: 
# Script searches for xml file in given root directory matching
# given pattern and prints selected file to xml
#
# >> Author: Tomas Nemeth
# >> Creation Date: 12/2014
# ==================================================================
#
#-- Import Python modules -----------
import os
import fnmatch
import sys
import xml.etree.ElementTree as ET
from xml.dom import minidom
#------------------------------------
#
#
rootPath = str(input("Insert root path: ")) # e.g. C:\Python34\Doc\ScriptTest
pattern = str(input("Insert file pattern: ")) # including wildcards, e.g. "data*"
print('=================================================')

# Search for the files matching given directory & pattern
for root, dirs, files in os.walk(rootPath):
    for filename in fnmatch.filter(files, pattern):
        print(os.path.join(root, filename)) # prints the results
#
# User is asked here to select one file from list for parsing
print('=========================================')
file = str(input("Select one file from list for parsing: "))
print('=========================================')
#
# get the root
tree = ET.parse(file)
root = tree.getroot()
#
# Function prints selected file as whole xml
xmldoc = minidom.parse(file)
print(xmldoc.toxml())
#
print('==========================================')
print('')