THISfile.pyという名前のファイルに以下のコードを入れて、あるものを見るためにそれを実行します:
# myFile = input("Enter file name: ")
# line No 2: line with with double 'with'
# line No 3: double (word , word) is not a double word
myFile="THISfile.py"
lstUniqueWords = []
noOfFoundWordDoubles = 0
totalNoOfWords = 0
lineNo = 0
lstLineNumbersWithWordDoubles = []
with open(myFile, "r") as myFile:
for line in myFile:
lineNo+=1 # memorize current line number
lineWords = line.split()
if len(lineWords) > 0: # scan line only if it contains words
currWord = lineWords[0] # remember already 'visited' word
totalNoOfWords += 1
if currWord not in lstUniqueWords:
lstUniqueWords.append(currWord)
# put 'visited' word word into lstAllWordsINmyFile (if it is not already there)
lastWord = currWord # we are done with current, so current becomes last one
if len(lineWords) > 1 : # proceed only if line has two or more words
for word in lineWords[1:] : # loop over all other words
totalNoOfWords += 1
currWord = word
if currWord not in lstUniqueWords:
lstUniqueWords.append(currWord)
# put 'visited' word into lstAllWordsINmyFile (if it is not already there)
if(currWord == lastWord): # duplicate word found:
noOfFoundWordDoubles += 1
print("Found double word: ['{""}'] in line {}".format(currWord, lineNo))
lstLineNumbersWithWordDoubles.append(lineNo)
lastWord = currWord
# ^--- now after all all work is done, the currWord is considered lastWord
print(
"noOfDoubles", noOfFoundWordDoubles, "\n",
"totalNoOfWords", totalNoOfWords, "uniqueWords", len(lstUniqueWords), "\n",
"linesWithDoubles", lstLineNumbersWithWordDoubles
)
出力は次のようになります。
今
Found double word: ['with'] in line 2
Found double word: ['word'] in line 19
Found double word: ['all'] in line 33
noOfDoubles 3
totalNoOfWords 221 uniqueWords 111
linesWithDoubles [2, 19, 33]
コード内のコメントをチェックして、どのように動作するかを理解することができます。楽しいコーディングています:)
はちょうど= [] '各行の繰り返しで' LSTをリセットします。 –
@ Jean-FrançoisFabreは、隣接するものだけでなく、行内の重複した単語を検出します。 – Maciek