PDFから画像を削除

私はCreate a tiff with only text and no images from a postscript file with ghostscriptを読んで、KenSの答えを使ってみてください。しかし、この方法では、「黒」画像のみが取り除かれます。画像には黒チャンネルのデータのみが含まれます（PDFにはCMYKの色空間があります）。私のケースではどのようにイメージをすべて削除できますか？PDFから画像を削除

出典

2011-06-24 WebRacer

これはより良い仕事ですが、不完全です。たとえば、複数のデータソースを使用する画像は処理されません。 ps2writeを使用してPostScriptに変換し、次にPostScriptプログラムを使用してpdfwriteデバイスを使用してPDFに変換することで、小さなファイル（pages.pdf）をテストしたこと以外は本質的にテストされていません。

最初に気づくのは、ほぼすべてのテキストがドキュメントから消えていることです。これは、使用しているフォントがビットマップフォントであり、プログラムが文字を表すビットマップと他の種類のビットマップの違いを認識できないためです。このファイルでは、すべての文字がimagemaskを使用し、他のイメージが 'image'を使用するため、imagemaskの定義を削除することでその問題を解決できます。

が、私はプログラムのフォーマットはあなたが答えるために

8<------------------------------8<--------------------------8<------------------------- 
%! 

% 
% numbytes -file- ConsumeFileData - 
% 
/ConsumeFileData { 
    userdict begin 
    /DataString 256 string def 
    /DataFile exch def 
    /BytesToRead exch def 

%(BytesToRead =) print BytesToRead == 
    mark 
    { 
    DataFile DataString readstring {     % read bytes 
     /BytesToRead BytesToRead 256 sub def    % not EOF subtract 256 from required amount. 
%(Read 256 bytes) == 
%(BytesToRead now =) print BytesToRead == 
    } { 
     length 
%(Read) print dup 256 string cvs print (bytes) == 
     BytesToRead exch sub /BytesToRead exch def % Reached EOF, subtract length read froom required amount 
%(BytesToRead now =) print BytesToRead == 
     exit            % and exit loop 
    } ifelse 
    } loop 

%BytesToRead == 
    BytesToRead 0 gt { 
    (Ran out of image data reading from DataSource\n) == 
    } if 
    cleartomark 
    end 
} bind def 

% 
% numbytes -proc- ConsumeProcData - 
% 
/ConsumeProcData { 
userdict begin 
    /DataProc exch def 
    /BytesToRead exch def 

    { 
    DataProc exec          % returns a string 
    length BytesToRead exch sub      % subtract # bytes read 
    /BytesToRead exch def 
    BytesToRead 0 le { 
     exit           % exit when read enough 
    } if 
    } loop 
end 
} bind def 

/image { 
(image) == 
dup type /dicttype eq { 
    dup /MultipleDataSources known { 
    dup /MultipleDataSources get { 
     (Can't handle image with multiple sources!) == 
    } if 
    } if 
    dup /Width get     % stack = -dict- width 
    exch dup /BitsPerComponent get % stack = width -dict- bpc 
    exch dup /Decode get   % stack = width bpc -dict- decode 
    length 2 div     % decode = 2 * num components 
    exch 4 1 roll     % stack = -dict- width bpc ncomps 
    mul mul      % stack = -dict- width*bpc*ncomps 
    7 add cvi 8 idiv    % stack = -dict- width(bytes) 
    exch dup /Height get   % stack = width -dict- height 
    exch /DataSource get   % stack = width height DataSource 
    3 1 roll      % stack = DataSource width height 
    mul       % stack = DataSource widht*height 
    exch       % stack = size DataSource 
} { 
    5 -1 roll 
    pop      % throw away matrix 
    mul mul     % bits/sample*width*height 
    7 add cvi 8 idiv   % size in bytes of data floor(bits+7/8) 
    exch      % stack = size DataSource 
} ifelse 

dup type /filetype eq { 
    ConsumeFileData 
} { 
    dup type /arraytype eq or 
    1 index type /packedarraytype eq or { 
    ConsumeProcData 
    } { 
    pop pop     % Remove DataSource and size 
    } ifelse 
} ifelse 
} bind def 

/imagemask { 
(imagemask)== 
dup type /dicttype eq { 
    dup /MultipleDataSources known { 
    dup /MultipleDataSources get { 
     (Can't handle imagemask with multiple sources!) == 
    } if 
    } if 
    dup /Width get     % stack = -dict- width 
    7 add cvi 8 idiv    % size in bytes of width floor(bits+7/8) 
    exch dup /Height get   % stack = width -dict- height 
    exch /DataSource get   % stack = width height DataSource 
    3 1 roll      % stack = DataSource width height 
    mul       % stack = DataSource width*height 
    exch       % stack = size DataSource 
} { 
    5 -1 roll 
    pop      % throw away matrix 
    mul mul     % bits/sample*width*height 
    7 add cvi 8 idiv   % size in bytes of data floor(bits+7/8) 
    exch      % stack = size DataSource 
} ifelse 

dup type /filetype eq { 
    ConsumeFileData 
} { 
    dup type /arraytype eq or 
    1 index type /packedarraytype eq or { 
    ConsumeProcData 
    } { 
    pop pop     % Remove DataSource and size 
    } ifelse 
} ifelse 
} bind def 

/colorimage { 
(colorimage)== 
    dup 1 ne { 
    1 index 
    { 
     (Can't handle colorimage with multiple sources!) == 
    } if 
    } { 
    exch pop     % get rid of 'multi' 
        % stack: w h bpc m d ncomp 
    3 -1 roll pop    % stack: w h bpc d ncomp 
    exch 5 -1 roll    % stack d w h bpc ncomp 
    mul mul mul    % stack: d w*h*bpc*ncomp 
    7 add cvi 8 idiv exch  % stack: bytes datasource 
    } ifelse 

dup type /filetype eq { 
    ConsumeFileData 
} { 
    dup type /arraytype eq or 
    1 index type /packedarraytype eq or { 
    ConsumeProcData 
    } { 
    pop pop     % Remove DataSource and size 
    } ifelse 
} ifelse 
} bind def

出典

2011-06-28 08:35:22 KenS

ありがとうございました！私はフォーマットがうまくいかないか分からないので、フォーマットせずに使用しないでください。 – WebRacer

このペーストビンはhttp://pastebin.com/j8fTWTVh – KenS

でこのファイルを試してみてください。まだ動作しません。画像がないpsファイルの場合：/ – Joe

画像オペレータはカラー画像とモノクロ画像の両方に使用されるため、この手法は任意の色の画像に対して機能するはずです。あなたのファイルがobseleteレベル1.5の 'colorimage'演算子を使用していない限り。この例でその演算子を定義し直した場合は思い出すことができません。そうでなければ、同様の方法でそれを再定義することができます。

実際、イメージ、カラーイメージ、イメージマスクの再定義を提供していることがわかります。すべてのイメージタイプを削除する必要があります。おそらくあなたは例を共有することができますか？

出典

2011-06-24 16:12:34 KenS

おかげ:-(ここめちゃくちゃを取得しようとしている卑劣な疑いを持っているファイルをテストするために：！ http://array02.letmeprint.ru /noimages/cover.pdf（26M） http://array02.letmeprint.ru/noimages/pages.pdf（140K） http://array02.letmeprint.ru/noimages/noimage.ps このファイルの画像... – WebRacer

これらはPDFファイルで、あなたはあなたの質問で「PostScriptファイル」と言っていました。オリジナルの投稿で述べたように、PDFインタプリタは効果的に私たちこれらの演算子の 'システム'バージョンのため、この手法は動作しません。代わりに、まずps2writeデバイスを使用してPDFをPostScriptに変換し、このプロローグでPostScriptをTIFFにレンダリングしてください。これでOKです。 – KenS

実際には、そのコードに小さなバグがあり、インラインの画像が嫌いです。私はそれを修正する必要があります。後でもっと良いコードを投稿しようと思います。 – KenS

答えて

関連する問題