Sitecore: Extract Indexed Content of Media Files using MediaItemContentExtractor
Here is something in addition to my previous post regarding indexing associated content: Here is a common scenario: Your custom index configuration is set up to crawl all the content for your website which is then used by your site search (keywords search) to fetch search results. In addition to you content item crawlers, you add a crawler for Media Library items as well and Sitecore does a great job of indexing PDF, DOCX, DOC, etc. files automatically, provided your have a valid IFilter installed, and now you have search extended to show file items as search results. Now consider the following scenario: One of the lookup fields on your page points to a file in the media library and the new requirement is to show the page item in the search result when the search phrase matches the content in the associated file. Solution (Lucene & Solr): Create a computed field called "related_content" that stored the crawled content of the associate file and extend the query to now se...