PhaseNet

Zhu, W., & Beroza, G. C. (2019). PhaseNet: A Deep-Neural-Network-Based Seismic Arrival Time Picking Method. Geophysical Journal International, 216(1), 261–273, doi:10.1093/gji/ggy423.

New version added since original repo was downloaded.

Installation

Requirements

New version (tried on 2022-09-18):

$ git clone https://github.com/wayneweiqiang/PhaseNet.git
$ cd PhaseNet
$ conda env create -f env.yml
$ conda activate phasenet

Run

How to run in Linux/macOS.

How to run on Artemisa

Run the new version

Here we assume that we have installed PhaseNet in the directory ${HOME}/repos/PhaseNet and we want to run the predictions for the miniSEED data in ${HOME}/repos/PhaseNet/demo.

First we need to create a csv file with the list of miniSEED files to process:

fname,E,N,Z
CCC.mseed,HHE,HHN,HHZ
...

To run the test data in the directory demo:

$ cd
$ cd repos/PhaseNet
$ python phasenet/predict.py --model=model/190703-214543 --data_list=demo/fname.csv \
         --data_dir=demo/mseed --result_dir=demo/results --format=mseed --plot_figure

This will produce the output in the directory ${HOME}/repos/PhaseNet/demo/results,

Old output

Old version outputs a csv and a pkl (pickle) file. The format of the csv file is:

fname,itp,tp_prob,its,ts_prob
CA.CBEU..HH.mseed_0,[],[],[],[]
CA.AVIN..HN.mseed_0,[],[],[],[]
CA.CBRU..HH.mseed_0,[],[],[],[]
CA.BAIN..HN.mseed_0,[],[],[],[]
CA.BLAN..HN.mseed_0,[],[],[],[]
CA.BAJU..HN.mseed_0,[],[],[],[]
CA.CAVN..HH.mseed_0,[],[],[],[]
CA.CBUD..HH.mseed_0,[],[],[],[]
CA.CBEU..HH.mseed_3000,[],[],[],[]
CA.AVIN..HN.mseed_3000,[2072],[0.36653],[],[]
CA.BLAN..HN.mseed_3000,[],[],[],[]
CA.BAIN..HN.mseed_3000,[],[],[],[]
CA.BAJU..HN.mseed_3000,[],[],[],[]
CA.CAVN..HH.mseed_3000,[2284],[0.884308],[],[]
CA.CBUD..HH.mseed_3000,[251],[0.78194],[530],[0.934841]
CA.CBRU..HH.mseed_3000,[],[],[],[]
CA.BLAN..HN.mseed_6000,[863],[0.733137],[],[]
...

This format can not be used and is re-written to the so-called pick format:

CA.CMAS..HH P 2020-10-19T19:53:09.600000000 0.94159900 1
CA.CMAS..HH S 2020-10-19T19:53:19.370000000 0.69820100 1

Also markers format to plot with snuffler:

# Snuffler Markers File Version 0.2
phase: 2020-10-19 19:53:09.600000000   0   CA.CMAS..HHZ  None   None   None   P            None   False
phase: 2020-10-19 19:53:19.370000000   0   CA.CMAS..HHN  None   None   None   S            None   False

New output

New csv output file now writes times in ISO 8601 format, so there is no need for conversion.

file_name,begin_time,station_id,phase_index,phase_time,phase_score,phase_type
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,18021,2019-07-04T17:03:00.208,0.956,P
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,24742,2019-07-04T17:04:07.418,0.838,P
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,30658,2019-07-04T17:05:06.578,0.78,P
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,35298,2019-07-04T17:05:52.978,0.442,P
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,38846,2019-07-04T17:06:28.458,0.493,P
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,38900,2019-07-04T17:06:28.998,0.435,P
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,51138,2019-07-04T17:08:31.378,0.971,P
...
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,2516038,2019-07-04T23:59:20.378,0.37,P
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,2516598,2019-07-04T23:59:25.978,0.961,P
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,2519071,2019-07-04T23:59:50.708,0.838,P
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,18412,2019-07-04T17:03:04.118,0.927,S
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,25121,2019-07-04T17:04:11.208,0.629,S
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,31022,2019-07-04T17:05:10.218,0.309,S
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,31401,2019-07-04T17:05:14.008,0.815,S
CCC.mseed,2019-07-04T16:59:59.998,CCC.mseed,35681,2019-07-04T17:05:56.808,0.537,S
...

Scripts for batch processing

Data extraction

$ sfile_extract_all.sh sfile destination_directory
#!/bin/bash
#: Title       : se_sfile_wf_extract.sh
#: Purpose     : Reads a hypocenter defined in an S-file, extracts all the waveform
#+               data available in a SeisComP directory structure (SDS) and runs
#+               PhaseNet for the extracted miniSEED files.
#: Usage       : se_sfile_wf_extract.sh S-file SDS_dir
#: Date        : 2021-03-26
#: Author      : "Antonio Villasenor" <antonio.villasenor@csic.es>
#: Version     : 1.0
#: Requirements: awk
#+               GNU date command (not macOS date)
#+               dataselect (https://github.com/iris-edu/dataselect)
#+               run.py (PhaseNet prediction script)
#: IMPORTANT!! : if PhaseNet is installed using a Python virtual environment
#+               (venv or conda) it must be activated before running this script
#: Arguments   : S-file SDS-dir
#: Options     : none
set -euo pipefail

progname=${0##*/}

[[ $# -lt 2 ]] && { echo "usage: $progname sfile dest_dir"; exit 1; }

[[ ! -s $1 ]] && { echo "ERROR: S-file does not exist: $1"; exit 1; }
[[ ! -d $2 ]] && { echo "ERROR: destination directory does not exist: $2"; exit 1; }

sfile="$1"
dest_dir="$2"

if command -v date > /dev/null && date --version > /dev/null 2>&1; then
    DATE=date
elif command -v gdate > /dev/null && gdate --version > /dev/null 2>&1; then
    DATE=gdate
else
    echo "ERROR: no GNU date command in this system"
    exit 1
fi

[[ $( command -v dataselect ) ]] || { echo "ERROR: dataselect executable is missing"; exit 1; }

pre_event=30
post_event=210

origin_time=$(awk '/1$/ { \
    year =   1*substr($0,2,4); \
    month =  1*substr($0,7,2); \
    day =    1*substr($0,9,2); \
    hour =   1*substr($0,12,2); \
    minute = 1*substr($0,14,2); \
    second = 1*substr($0,17,4); \
    printf("%4d-%02d-%02dT%02d:%02d:%04.1f",year,month,day,hour,minute,second); \
}' $sfile)

year=${origin_time%%-*}
jday=$($DATE -u --date="$origin_time" +%j)
event_dir=$($DATE -u --date="$origin_time" +%Y.%j.%H%M%S)
out_dir=$dest_dir/$year/$event_dir
[[ -d $out_dir ]] && { echo "WARNING: event $sfile already processed in $out_dir"; exit 1; }

seconds=$($DATE -u --date="$origin_time" +%s)
start_time=$(bc -l <<< "$seconds - $pre_event")
end_time=$(bc -l <<< "$seconds + $post_event")

ts=$($DATE -u --date="@$start_time" +%Y,%j,%H,%M,%S,%N)
te=$($DATE -u --date="@$end_time" +%Y,%j,%H,%M,%S,%N)

echo " "
echo Processing $sfile
echo Origin time: $origin_time t_start: $ts t_end: $te

# /home/antonio/atlantico/RAW_DATA2/IBERIA 
#sds=( /tank/PYROPE /tank/IBERARRAY /tank/SISCAN /tank/MISTERIOS /home/antonio/atlantico/PROCESSED_DATA/IBERIA )
sds=( /tank/GEOMARGEN3 /home/antonio/atlantico/PROCESSED_DATA/IBERIA )

/bin/rm -f mseed.list
touch mseed.list

for datadir in "${sds[@]}"; do
    [[ -d $datadir/SDS/$year ]] && find $datadir/SDS/$year -name "*.[BEHS]?Z.?.$year.$jday" -print >> mseed.list
done

[[ ! -s mseed.list ]] && { echo "ERROR: no miniSEED files for $sfile"; /bin/rm -f mseed.list; exit 1; }

mkdir -p $out_dir
cp $sfile $out_dir/.
$DATE -u --date="@$start_time" +%Y-%m-%dT%H:%M:%S.%N > $out_dir/start_time.txt

start_string=$($DATE -u --date="@$start_time" +%Y-%m-%dT%H%M%SZ)
end_string=$($DATE -u --date="@$end_time" +%Y-%m-%dT%H%M%SZ)

while read -r zfile; do

   filename=${zfile##*/}        # miniSEED file withouth path: GS.CA06.00.HHZ.D.2019.003
   prefix=${filename::-12}      # GS.CA06.00.HH
   dum=${filename#*.}           # CA06.00.HHZ.D.2019.003
   sta=${dum%%.*}               # CA06
   zcmp=${zfile:(-14):5}        # HHZ
   ecmp=${zcmp/Z/E}             # HHE
   ncmp=${zcmp/Z/N}             # HHN
   h1=${zfile//$zcmp/$ecmp}     # /full_path/GS.CA06.00.HHE.D.2019.003
   h2=${zfile//$zcmp/$ncmp}     # /full_path/GS.CA06.00.HHN.D.2019.003

   [[ -s $zfile ]] && dataselect -ts $ts -te $te -lso -Pe -o $out_dir/${prefix}Z__${start_string}__${end_string}.mseed $zfile
   [[ -s $h1 ]]    && dataselect -ts $ts -te $te -lso -Pe -o $out_dir/${prefix}E__${start_string}__${end_string}.mseed $h1
   [[ -s $h2 ]]    && dataselect -ts $ts -te $te -lso -Pe -o $out_dir/${prefix}N__${start_string}__${end_string}.mseed $h2

done < mseed.list

/bin/mv mseed.list $out_dir/.

#nfiles=$( cat $out_dir/fname.csv | wc -l )
#[[ $nfiles -lt 2 ]] && { echo "ERROR: no data for $sfile"; /bin/rm -rf $out_dir; exit 1; }

Run phasenet for all miniSEED files in a directory

$ event_pn_pick_new.sh $event_directory $destination_directory
#!/bin/bash
#: Title       : se_sfile_wf_extract.sh
#: Purpose     : Reads a hypocenter defined in an S-file, extracts all the waveform
#+               data available in a SeisComP directory structure (SDS) and runs
#+               PhaseNet for the extracted miniSEED files.
#: Usage       : se_sfile_wf_extract.sh S-file SDS_dir
#: Date        : 2021-03-26
#: Author      : "Antonio Villasenor" <antonio.villasenor@csic.es>
#: Version     : 1.0
#: Requirements: awk
#+               GNU date command (not macOS date)
#+               dataselect (https://github.com/iris-edu/dataselect)
#+               run.py (PhaseNet prediction script)
#: IMPORTANT!! : if PhaseNet is installed using a Python virtual environment
#+               (venv or conda) it must be activated before running this script
#: Arguments   : S-file SDS-dir
#: Options     : none
set -euo pipefail

progname=${0##*/}

[[ $# -lt 2 ]] && { echo "usage: $progname event_dir dest_dir"; exit 1; }

[[ ! -d $1 ]] && { echo "ERROR: event directory does not exist: $1"; exit 1; }
[[ ! -d $2 ]] && { echo "ERROR: destination directory does not exist: $2"; exit 1; }

datadir="$1"
destdir="$2"
eventdir=${datadir##*/extract/}

if command -v date > /dev/null && date --version > /dev/null 2>&1; then
    DATE=date
elif command -v gdate > /dev/null && gdate --version > /dev/null 2>&1; then
    DATE=gdate
else
    echo "ERROR: no GNU date command in this system"
    exit 1
fi

pn_dir=${HOME}/repos/PhaseNet
model=${pn_dir}/model/190703-214543

[[ ! -d $pn_dir ]] && { echo "ERROR: invalid directory for PhaseNet repo: $pn_dir"; exit 1; }
[[ ! -d $model ]] && { echo "ERROR: PhaseNet model does not exist: $model"; exit 1; }

# ES.ELOR.00.HHZ__2020-12-26T121519Z__2020-12-26T121919Z.mseed
/bin/ls -1 $datadir/*__*Z__*Z.mseed > mseed.list.$$
nfiles=$( cat mseed.list.$$ | wc -l )

[[ $nfiles -eq 0 ]] && { echo "ERROR: no miniSEED files in $datadir"; /bin/rm -f mseed.list.$$; exit 1; }

awk -F/ '{print $NF}' mseed.list.$$ | awk -F_ '{print substr($1,1,length($1)-1)}' - | sort -u > streams.list.$$

curdir=$PWD
outdir=$destdir/$eventdir
[[ -d $outdir ]] && { echo "Output event directory already exists: $outdir"; /bin/rm -f *.list.$$; exit 1; }
mkdir -p $outdir
cp $datadir/*L.S?????? $outdir/.
echo "fname,E,N,Z" > $outdir/fname.csv

while read -r stream; do

    /bin/rm -f tmp.$$
    grep "${stream}[ENZ12]" mseed.list.$$ > tmp.$$
    n_channels=$( cat tmp.$$ | wc -l )

    if [[ $n_channels -eq 3 ]]; then

        printf "%s" $stream.mseed >> $outdir/fname.csv
        while read -r cmpfile; do

            cat $cmpfile >> $outdir/$stream.mseed

            fname=${cmpfile##*/}     # ES.ERTA..HHE__2005-02-15T014307Z__2005-02-15T014707Z.mseed ES.ERTA..HHE
            cmpname=${fname%%__*}    # ES.ERTA..HHE
            cmp=${cmpname##*.}       # HHE
            printf ",%s" $cmp >> $outdir/fname.csv

        done < tmp.$$
        printf "\n" >> $outdir/fname.csv

    elif [[ $n_channels -eq 1 ]]; then
        echo only one component for $stream $n_channels
        /bin/rm -f tmp.$$
        continue
    else
        echo unusual number of components for $stream : $n_channels
        /bin/rm -f tmp.$$
        continue
    fi
    /bin/rm -f tmp.$$

done < streams.list.$$

/bin/rm -f mseed.list.$$ streams.list.$$

nfiles=$( cat $outdir/fname.csv | wc -l )
[[ $nfiles -lt 2 ]] && { echo "ERROR: no stations for $eventdir"; /bin/rm -rf $outdir; exit 1; }

cd $pn_dir
#timeout 10m \
python phasenet/predict.py --model=model/190703-214543 \
                           --data_list=${outdir}/fname.csv \
                           --data_dir=${outdir} \
                           --result_dir=${outdir}/phasenet \
                           --format=mseed --plot_figure > $outdir/phasenet.log 2>&1

#cd $outdir
#pnout2picks.sh

cd $curdir