How to put translation corpus into different files

I want to deal with a one-line translation corpus between Japanese and Chinese likeHow to put translation corpus into different files

JST_JC_ENVI-abst-06A0281759-par1-sen1 ||| Ｃ＆Ｄ管理施設の高度化 ||| Ｃ＆Ｄ管理设施的高度化JST_JC_ENVI-abst-06A0281759-par1-sen2 ||| メーンのポートランドはＲｉｖｅｒｓｉｄｅリサイクリング施設（ＲＲＦ）を所有しているが，建設及び解体（Ｃ＆Ｄ）ごみの埋立地に立地している。 ||| 缅因州的波特兰拥有Ｒｉｖｅｒｓｉｄｅ循环使用设施（ＲＲＦ），但其却位置选定于建设及解体（Ｃ＆Ｄ）垃圾的填埋地。JST_JC_ENVI-abst-06A0281759-par1-sen3 ||| この施設はかさばる廃棄物，住民の出す葉やＣ＆Ｄごみを受け入れているが，その最近の作業状況を紹介した。 ||| 该设施接受体积大的废弃物、居民投弃的叶子或Ｃ＆Ｄ垃圾，本文介绍了该设施最近的作业情况。

The Chinese and Japanese are begin with the prefix JST_JC_ENVI-abstXXXXXXXX string and split by ||| .

So my question is how to delete all the same prefix " JST_JC_ENVI-abstXXXXXXXX " strings and output the Chinese into chinese.txt by line ,the Japanese into japanese.txt by line ？

Thank you.

出典

2017-03-28 renzhe0009

First，deal with the lines and split with space.

# -*- coding: utf-8 -*- 

import sys 
reload(sys) 
sys.setdefaultencoding('utf-8') 

infile=open('dev.txt','r') 
outfile1 =open('dev-mid.txt','w') 
lines = infile.read() 
i = lines.split() 
for e in i: 
    outfile1.write(e+'\n')

then，use WORD to delete space and same prefix strings in dev-mid.txt .

At last ,

import os 


    infile=open('dev-mid.txt','r') 
    outfile1 =open('dev-in.txt','w') 
    outfile2 =open('dev-out.txt','w') 

    i=1 

    for line in infile.readlines(): 
     if i%2==1: 
    ##  print(line) 
      outfile1.write(line) 
      i+=1 
     else: 
      i+=1 
    ##  print(line) 
      outfile2.write(line) 
    infile.close() 
    outfile1.close() 
    outfile2.close()

Dealing with even and odd rows. dev-in.txt is Japanese and dev-out.txt is Chinese:-D

出典

2017-03-29 07:38:48 renzhe0009

How to put translation corpus into different files

答えて

関連する問題