Friday, January 16, 2015

Blogger Test

Diffbot, inventors of computer vision technology that sees web pages like humans do, today announced the release of its Product API, which automatically identifies and extracts product data from any shopping web page. Diffbot also announced updates to its Crawlbot spidering service, which can accurately determine which pages on a shopping site are product pages. Diffbot now offers a turnkey solution for retrieving the entire catalog from any e-commerce site — without need of a published API or any action on the part of the retailer. Developed over the course of two years, the Product API’s pioneering algorithm is built on Diffbot’s core vision technology which has accurately extracted structured data from billions of web pages. The API advances Diffbot’s machine learning, natural-language processing and computer vision systems to identify and structure information regardless of a site’s design, layout, markup or even its (human) language. an image of something The Product API automatically makes available data such as price, discount/savings, shipping cost, product description, images, SKU and manufacturer’s product number. The technology allows developers to immediately use product data from any e-commerce site in their web or mobile applications. The Product API will enable developers to rapidly build applications that can: track and compare prices from any site augment user bookmark or clipping data with product pricing and other information track merchandise availability across multiple storefronts migrate entire shopping sites to new platforms without the need of back-end integration deploy entire APIs on-the-fly for partner and other integrations “E-commerce is one of the most popular activities on the web. With 28% of US internet users shopping on a daily basis, we figured we should teach our robot how to understand products,” said Mike Tung, CEO of Diffbot.[1] “The Product API represents our latest advances in pushing the capabilities of automated page extraction. We are one step closer to the imminent goal of making the entire web machine-readable.” Last year, Diffbot conducted a study which found that 8% of links shared on Twitter are for product pages — a total of more than eight million product links per day.[2] [3] [4] Just as with news articles, intelligent automation to help sift through the vast quantities of products offered and shared online is something needed by consumers and businesses alike. The Product API joins Diffbot’s previous computer vision APIs, including the Frontpage API (for extracting content from home pages), the Article API (for extracting news article and blog post content), the Image API, and its Page Classifier API, which automatically determines the type of page of any web link.

4 comments:

  1. Cool, works better than other services like this I’ve found. I tried it on Ars Technica’s review of the Xoom tablet and it found all 10 pages. It didn’t find the embedded video though. Also, all the formatting is stripped which makes it hard to differentiate section headers from content paragraphs, and all the images are in one list to the side, removed from their original context.

    What I’d really love to see is a combination of the RSS API and the article API to produce full article RSS feeds for any site.

    ReplyDelete
  2. Right now, the API just returns back the raw text for simplicity’s sake, but it would be possible to make an option for returning a bit of HTML structure, which would address the problem of sections, inline images, tables, etc.

    The combination of the two APIs is a great idea.

    ReplyDelete
    Replies
    1. This is a reply post.

      You guys should really open up a tagging API. As a developer working on a social site, I’d love to be able to auto-tag content that users upload.

      Delete
  3. Here’s a third post from someone different.

    ReplyDelete