top of page
website_header_less_crowded_neon_waves.png

DietSkan

Automatic ingredient-level nutrient estimates from a single photo or short video

Manual food logging is time consuming and systematically biased. DietSkan’s goal is to produce near real-time estimates of calories and macronutrients from a simple smartphone capture, so researchers and consumers can track dietary intake quickly and with less burden. By turning a single photo or short video into ingredient-level nutrient estimates, DietSkan aims to support diabetes management, everyday health goals, and rigorous dietary assessment without scales, barcode scans, or long food diaries.

01

Project Overview

website visual.png

DietSkan is a vision-first system that produces ingredient-level calorie and macronutrient estimates from a single photo or a short smartphone video. The project responds to the well documented limitations of self-reported intake and the need for scalable, objective measurement in clinical and population research.

​

The pipeline follows a camera-first flow: capture an image or short multi-view clip, segment and classify food items, estimate portion size or mass, then map those measurements to standardized food composition tables. Models are trained on curated RGB datasets, including Nutrition5k, and are supplemented with in-lab collections and careful label curation so that mass and macro predictions are reliable and comparable across studies.

​

Outputs include per-item confidence or uncertainty scores that can trigger a quick re-capture or user prompt when needed. The system is designed for practical use on standard smartphones and for integration into clinical and research workflows, as well as consumer-facing applications.

02

Significance

Dietary self-report is persistently inaccurate, especially for mixed dishes, snacks, and eating outside the home. This inaccuracy limits both day-to-day self management and the quality of data collected in clinical and population studies. At the same time, most existing image-based systems either assume tightly controlled capture conditions, require extra hardware, or do not expose uncertainty in a way that is useful for researchers and clinicians.

​

DietSkan addresses the practical problem of producing reliable, ingredient-level nutrient estimates directly from smartphone vision. The system must identify ingredients in complex, mixed dishes, estimate portion size or mass with usable accuracy, and provide uncertainty-aware nutrient estimates that remain robust to everyday lighting, occlusion, and casual capture.

​

By reducing user effort to a single photo by default, with an optional short video for higher accuracy, DietSkan aims to make objective dietary assessment accessible to people managing diabetes and other chronic conditions, and to researchers who need consistent, scalable intake measures across large studies.

Screenshot 2025-12-04 at 2.49.25 AM.png

03

Approach

Camera capture and data flow

  • One quick photo by default, with an optional short multi-view clip that provides better cues for scale.

  • A camera-first pipeline that converts RGB frames into segmented food items, estimated portion size or mass, and mapped nutrients using standardized composition tables.

ChatGPT Image Dec 4, 2025, 03_15_16 AM.png
ChatGPT Image Dec 4, 2025, 03_17_18 AM.png

AI-driven vision-first estimation engine

  • Strong detectors and segmenters, including transformer-based components where helpful, to find ingredients and small items in mixed dishes.

  • Direct estimation of portion size (mass or volume) from images, with short handheld video used to resolve scale without any extra hardware.

  • Per-item confidence or uncertainty scores that allow the system to request a quick re-capture or apply simple checks to avoid large errors when confidence is low.

Detection, temporal linking, and post-processing

  • Detection and segmentation of each visible item, with a lightweight tracker that links items across frames for temporal stability in short videos.

  • Ingredient-aware classes and an explicit focus on small items to reduce misses, and the use of basic text cues where available to correct ambiguous labels.

  • Mapping of items to food composition tables, adjustment of labels when needed, and output of per-ingredient mass and nutrient estimates.

ChatGPT Image Dec 4, 2025, 03_23_32 AM.png
ChatGPT Image Dec 4, 2025, 03_29_17 AM.png
ChatGPT Image Dec 4, 2025, 03_31_27 AM.png

Practical deployment and evaluation

  • Minimal user steps: one photo by default, short video optional when the user or system wants higher accuracy. Core functionality runs on device, with an optional server step for heavier tasks.

  • Validation on Nutrition5k and an in-lab dataset, with reporting of per-ingredient error and a reproducible evaluation protocol for volume and calorie estimation that can be reused in future work.

Publications

In progress 

Reference

Add a Title

Reference

Ackhnowledgements

  • Text

Gallery

Department of Electrical and Computer Engineering

University of Washington
185 E Stevens Way NE
Seattle, WA 98195

Connect With Us

  • LinkedIn

© 2025 by ARC Lab. All Rights Reserved.

bottom of page